[content]

Index A to ZApply NowFrom the ChancellorVisitorsAlumniPeople FinderFor the MediaFor Parentsjobs
Southern Illinois University Carbondale Home SIU Salukis
SalukinetSIUC IntranetAthleticsPublic Events CalendarWeather
Main
 HOME PAGE
 BUILD 3
 BUILD 4
 BUILD 5
STUDY GUIDES
 TUTORIALS
 USER GUIDES
 ONTOLOGY AND CLASSIFICATION
DOWNLOADS
 FINGER PRINT DATABASE
 GFF FILES
 RELATED      PUBLICATIONS
INFORMATION
 NEW STORIES
 FIND US
 CONTACT US
LINKED SITES
 SOYBASE
  NCGR
 TIGR
 Feedback
SOYBEAN SEQUENCE
 BES
 EST
 GENOME SEQUENCE
 
Other Projects
Feedback
Data Submission

 

 
 

 

Step 1, Phase 2, Soybean Genome


 
 

Step 1, Phase 2: soybean.gff Version 0.08, 01.soybean.conf Version 0.12, June 14, 2003

Phase 2 builds on Step 1, Phase 1 with three changes to the database: (1) the lengths of the QTL's were changed as described below; (2) the locations of the loci were slightly adjusted as described below; and, (3) The total base size was changed from 1,100 million bases to 1,115 million basesArumuganathan. Phase 2 also improves the Perl scripts so that changes like these are easier to make.

Dr. Lightfoot stated that the lengths of the QTL indicators in GBrowse should represent the likelihoods of the QTL's appearing in a range, and that each range should be 5-10 centimorgans. The median range of 7.5 centimorgans was chosen until more accuracy could be obtained. Since the input file gives a point as a location for a QTL, the start of a QTL indicator was set as the point minus 3.75 centimorgans and the end of a QTL indicator was set as the point plus 3.75 centimorgans. This led to a problem as to what to do when the right half of a QTL indicator potentially extends past the end of an MLG. For this phase, any of the remaining right half of a QTL indicator that might go past the end of an MLG was clipped.

The Loci indicators were calculated in Phase 1 as follows: The start of an indicator was the given point; and, the end of an indicator was the point plus 200 bases. This was changed in Phase 2 as follows: The start of an indicator is the given point minus 100 bases; and, the end of an indicator is the given point plus 100 bases.

 

When the total base size was changed from 1,100 million bases to 1,115 million bases, that changed all of the other calculations. The Perl scripts were changed to make this kind of transition easier. The input file was appropriately renamed input.txt. input.txt is the same as rawdata3.txt, below, but with a different name. This started us with the duplicates and poorly-named features already removed.

A file was created, mlg.txt, which simply lists the MLG's and each one's size in centimorgans. This file was used by get_factor.pl to calculate the centimorgan-to-bases conversion factor, which was written in factor.txt. get_factor.pl also created a new file, mlg2.txt which is like mlg.txt, but also lists the calculated lengths of all of the MLG's in bases, determined by the just-calculated factor.

Since the QTL's and Loci are Version 2, but the MLG's are Version 3, the QTL's and Loci do not fit properly inside the MLG's. Since it could be several weeks (from June 2003) until Version 3 QTL's and Loci are available, the locations of the QTL's and Loci were adjusted to fit inside the MLG's. In Step 1, Phase 1 this was done manually. In Phase 2, Perl script was written to assist in these calculations. First, a proportion file, proportion.txt was created with all of the proportions set to 1 (as if the Version 2 data matched exactly with the Version 3 data). The zeroth column of proportion.txt was the MLG; the first column was the calculated length in bases of the MLG; the second column was the raw location of the end of the feature which was at the furthest right end of the MLG; and, the third column was the proportion. convert.pl was rewritten to get the proportions from proportion.txt (previously, these proportions were called fudge factors and were hard coded into convert.pl). This data, based on proportions of 1, was written to soybean.gff, however this data mixes Version 2 with Version 3 and needs to be adjusted. check_lengths.pl was written to read this temporary soybean.gff file and to calculate new proportions for each MLG. The result is a new proportion.txt file with the correct proportions. convert.pl was run again with the correct proportions, creating the soybean.gff file. For double-checking, check_lengths2.pl was written to see if anything falls off of the end of an MLG. check_lengths2.pl is like check_lenghts.pl, but does not change any files, it just displays its results on the screen. Here are the results. The fact that all of the proportions are slightly over 1 means that the last feature on each MLG is slightly inside the right end of that MLG.

The resulting soybean.gff file was entered into GBrowse.

 


              Deepak
http://soybeangenome.siu.edu
Last update: July 31,2005.