--- installation_est2uni_perl.txt.ori 2011-02-21 11:52:15.000000000 +0100 +++ installation_est2uni_perl.txt 2011-02-21 22:58:23.000000000 +0100 @@ -84,8 +84,12 @@ .qual. There are some library examples on the 'test_data' directory of the EST2uni -distribution. You could just copy that libraries to check the software. Be -aware that the sgn chromatograms could need a modification in the phredpar.dat file +distribution. You could just copy that libraries to check the software. + + > cp /usr/local/est2uni/test_data/libraries/* ./libraries + > cp -r /usr/local/est2uni/test_data/data . + +Be aware that the sgn chromatograms could need a modification in the phredpar.dat file (see installation instructions for phred in docs/installation_external_software.txt). 4.- Copy the estpipe.conf.example file from the installation directory into @@ -205,7 +209,7 @@ Although the clustering software could consider that two unigenes are not similar enough to be merged, it is useful to know which are these similar clusters of unigenes. Accordingly, EST2uni does a clustering of the unigenes, and these clusters are named -'superunigenes'. This analysis is done thorough a BLAST search, followed by +'superunigenes'. This analysis is done through a BLAST search, followed by parsing the results. Unigenes with a similarity high enough (above user-defined threshold parameters) are then clustered in a superunigene. @@ -219,15 +223,17 @@ In EST2uni a database can have several aspects like the blast indexed files, the fasta sequence files or the related GO files. This information should be placed in the file -referenced by the parameter databases_dat. This file should be uploaded to the database -setting the parameters populate_database_table and do_prior_data_population to 1. +referenced by the parameter databases_dat, usually named databases.csv. +Contents of this file will be uploaded into the database when the parameters +populate_database_table and do_prior_data_population are set to 1. In that file, the name of the database is defined in the field "name" and could be different than the BLAST name. For instance the database name could be "arabidopsis" and the local -BLAST name could be "tair6". When a BLAST against database "arabidopsis" is -asked, EST2uni will look for the local BLAST name in the database. In this example the blast should -be asked for the database "arabidopsis", not against "tair6", despite the fact that the -local blast database is named "tair6". +BLAST filename could be "tair6". When a BLAST against database "arabidopsis" is +asked, EST2uni will look for the local BLAST-formatted filenames tair6.* in the BLASTDB +of $blast_dir directory. In this example the blast should be asked for the database +"arabidopsis", not against "tair6", despite the fact that the local blast database is +named "tair6". BLAST ANNOTATION @@ -243,7 +249,8 @@ this example the arabidopsis database should be defined in the databases_dat file. On that file the field local_blast_name should be the name given to that database on the local blast installation. For instance it could be named tair6. That means that the name in the blast_dbs -parameter and the name of the database in the local blast installation could be different. +parameter and the name of the database in the local blast installation could be different +(they are mapped via the databases.csv trick). Other important field in the databases file is "kind", which should be set to dna or pep. @@ -357,7 +364,7 @@ same database. The paths for these files should be specified in the local_go_assoc_file and local_go_slim_obo_file of the databases information file. The GO association file for the arabidopsis database can be downloaded from the www.arabidopsis.org site. At the time of - this writting the file is located in: + this writing the file is located in: ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20061021.txt The GO slim obo file for arabidopsis and goa is on the Gene Ontology ftp: ftp://ftp.geneontology.org/pub/go/GO_slims/go_slim_plant.obo @@ -376,7 +383,7 @@ ---------------- You just need to set the parameter do_hmmer_annot to 1 and to set the name -of the HMMER database on the parameter pfam_db. +of the HMMER database on the parameter pfam_db with a full path. HMMER uses the ORFs as a starting point, so it is necessary to have the annotated ORF already in the database or to ask for the do_estscan_annot. @@ -400,5 +407,5 @@ --------------------------- Every researcher can be associated with several ESTs using the csv file defined in the working_on_dat -parameter. Those relationships will get loaded when the parameters populate_workin_on_table and +parameter. Those relationships will get loaded when the parameters populate_working_on_table and do_prior_data_population are set to 1.