Step 1, Phase 2: soybean.gff Version 0.08,
01.soybean.conf Version 0.12, June 14, 2003
Phase 2 builds on
Step 1, Phase 1 with three changes to the
database: (1) the lengths of the QTL's were changed
as described below; (2) the locations of the loci
were slightly adjusted as described below; and, (3)
The total base size was changed from 1,100 million
bases to 1,115 million basesArumuganathan.
Phase 2 also improves the Perl scripts so that
changes like these are easier to make.
Dr. Lightfoot stated that the lengths of the QTL
indicators in GBrowse should represent the
likelihoods of the QTL's appearing in a range, and
that each range should be 5-10 centimorgans. The
median range of 7.5 centimorgans was chosen until
more accuracy could be obtained. Since the input
file gives a point as a location for a QTL, the
start of a QTL indicator was set as the point minus
3.75 centimorgans and the end of a QTL indicator was
set as the point plus 3.75 centimorgans. This led to
a problem as to what to do when the right half of a
QTL indicator potentially extends past the end of an
MLG. For this phase, any of the remaining right half
of a QTL indicator that might go past the end of an
MLG was clipped.
The Loci indicators were calculated in Phase 1 as
follows: The start of an indicator was the given
point; and, the end of an indicator was the point
plus 200 bases. This was changed in Phase 2 as
follows: The start of an indicator is the given
point minus 100 bases; and, the end of an indicator
is the given point plus 100 bases.
When the total base size was changed from 1,100
million bases to 1,115 million bases, that changed
all of the other calculations. The Perl scripts were
changed to make this kind of transition easier. The
input file was appropriately renamed
input.txt.
input.txt is the same as
rawdata3.txt, below, but with a different
name. This started us with the duplicates and
poorly-named features already removed.
A file was created,
mlg.txt,
which simply lists the MLG's and each one's size in
centimorgans. This file was used by
get_factor.pl to calculate the
centimorgan-to-bases conversion factor, which was
written in
factor.txt. get_factor.pl also
created a new file,
mlg2.txt
which is like mlg.txt, but also
lists the calculated lengths of all of the MLG's in
bases, determined by the just-calculated factor.
Since the QTL's and Loci are Version 2, but the
MLG's are Version 3, the QTL's and Loci do not fit
properly inside the MLG's. Since it could be several
weeks (from June 2003) until Version 3 QTL's and
Loci are available, the locations of the QTL's and
Loci were adjusted to fit inside the MLG's. In Step
1, Phase 1 this was done manually. In Phase 2, Perl
script was written to assist in these calculations.
First, a proportion file,
proportion.txt was created with all of the
proportions set to 1 (as if the Version 2 data
matched exactly with the Version 3 data). The zeroth
column of proportion.txt was the
MLG; the first column was the calculated length in
bases of the MLG; the second column was the raw
location of the end of the feature which was at the
furthest right end of the MLG; and, the third column
was the proportion.
convert.pl was rewritten to get the proportions
from proportion.txt (previously,
these proportions were called fudge factors and were
hard coded into convert.pl). This
data, based on proportions of 1, was written to
soybean.gff, however this data
mixes Version 2 with Version 3 and needs to be
adjusted.
check_lengths.pl was written to read this
temporary soybean.gff file and to
calculate new proportions for each MLG. The result
is a new
proportion.txt file with the correct
proportions. convert.pl was run
again with the correct proportions, creating the
soybean.gff file. For double-checking,
check_lengths2.pl was written to see if anything
falls off of the end of an MLG.
check_lengths2.pl is like
check_lenghts.pl, but does not change any
files, it just displays its results on the screen.
Here are
the results. The fact that all of the
proportions are slightly over 1 means that the last
feature on each MLG is slightly inside the right end
of that MLG.
The resulting soybean.gff file
was entered into GBrowse. |