| SoyV4R4 prepares Version 4 data with Round 4
programming. The documentation is below. Additional
documentation may be added from time to time.
This page documents Round 4
of converting soybean genomic data so that it can be
displayed and accessed in GBrowse. Round 4
refers to the programming method and not to the
data.
Distinguishing
characteristics of Round 4:
- Object oriented Perl is used. All related
data can be automatically adjusted by Perl code
when one piece of data changes.
- Over 20 tracks, some of which separate
features into their alpha, beta, and gamma
categories.
See ontology.
- An SQL master database is used to contain
relationships between all of the features.
- The database processing was designed with
audit trails, so that the programmer can
determine why a particular feature was processed
as it was.
- Locally produced web pages search the
database and provide detailed information when
clicking on certain features.
- Feature labels designate relationships
between features.
- MLG's are referred to as such with an
mlg prefix. (An MLG is similar to a
chromosome. Not all soybean chromosomes have
been designated, yet.) For example, MLG A1 is
referred to as mlgA1 instead of A1.
This allows for feature names to be
distinguished from MLG names, when feature names
and MLG names are identical. This is also
consistent with other GBrowse sites that use the
prefix chr for chromosomes, such as
chr1.
- Contigs are displayed in separate tracks for
their forward, reverse, and spread locations.
Click on a contig for detailed information,
including the end matches for that contig. See
reverse contigs and
3 kinds of contigs.
- Beta 2 clones are also displayed in separate
tracks relative to their forward, reverse, and
spread contigs. Click on a clone for detailed
information about that clone.
- Loci are displayed in separate tracks for
alpha, beta, and gamma types. Iowa State loci
are labelled to distinguish them from USDA loci.
Click on a locus for detailed information about
that locus.
- QTL candidate genes are displayed in a
separate track below the QTL's. Click on a
candidate gene to go its entry in SoyBase.
- Soybean related genes are displayed in a
separate track from other related genes.
- Confirmed data is labelled as "match" if it
matches a clone in our database. Click on a
confirmed for detailed informatin.
- perldoc documentation is included
in the program files.
Given data:
- The MLG data was prepared in
mlg.txt
- The loci data was prepared in
loci.txt. Some of the loci data was obtained
from
usda.txt, a tab-delimited file created from
a March, 2003, USDA spreadsheet. The remaining
loci data was obtained from SoyBase at Iowa
State University.
- The FPC data was obtained from
soya.fpc, built by Jeff Shultz at SIU.
soya.fpc contains Version 4 data.
- The QTL data was obtained from SoyBase at
Iowa State University and put in
qtl.txt.
- The end matches data was obtained from Jeff
Shultz at SIU and put in
ends.txt.
- The MTP data was obtained from Jeff Shultz
at SIU and put in
mtp.txt.
- The EST data was obtained from Kay Shopinski
at SIU and put in
est.txt.
- The sequence data was obtained from Jeff
Shultz at SIU in the form of
sequence.gff.
- The related genes data was obtained by
Nagajyothi Lavu at SIU from GenBank and entered
into
related.gff by Jeff Shultz at SIU.
- The confirmed data was provided by Iowa
State University via Jeff Shultz at SIU and put
into
confirmed.txt.
An object-oriented Perl program
named Extropy was written with 30 modules and 9,466
lines of code to load the above data into a central
MySQL database, process this data, and write a GFF
file appropriate for use by GBrowse. The Extropy
system was placed in a directory structure beginning
with a gbrowse directory:
- gbrowse
- given_data
- round4
- working_data
Given data, listed above, was
placed in the gbrowse/given_data
subdirectory. Intermediate working data was placed
in the gbrowse/working_data subdirectory.
The main Extropy file, extropy,
configuration files, and the resulting GFF file were
all put in the gbrowse/round4 subdirectory.
Extropy Perl modules for constants and utilities
were placed in the gbrowse/round4/Extropy
subdirectory. All other modules were placed in the
gbrowse/round4/Extropy/Extropy
subdirectory.
Extropy was designed to make it
easy to reprocess the output if any data was changed
at any stage in the processing. To that end, the
Extropy Main Menu was designed to be used from the
top down. If any data is changed by any of the Main
Menu items, the user knows that the following Main
Menu items, below the instant item, also need to be
reprocessed. Here is the Main Menu:
**************************************
*********** extropy menu ***********
**************************************
* *
* 1 (A)ctivate a project *
* 2 - Read (M)LG File *
* 3 - Read (L)ocus File *
* 4 - Read (F)PC File *
* 5 - Read Q(T)L File *
* 6 - Read (E)nd Matches File *
* 7 - Read MT(P) File *
* 8 - Read EST File *
* 9 - Read Sequence File *
* 10 - Read Related Genes File *
* 11 - Read Confirmed File *
* 12 - (C)rosscheck Locus Names *
* 13 - (U)pdate Locus Names *
* 14 - C(o)unt Clone Anchors *
* 15 - Up(d)ate Clone Anchors *
* 16 - Cou(n)t Contig Anchors *
* 17 - Update Cont(i)g Anchors *
* 18 - (G)FF File Menu *
* 19 (S)tatus *
* 20 - (R)eset Project *
* 21 (H)elp *
* 22 (Q)uit *
* *
**************************************
**************************************
Enter a menu item...
The minus signs (-) in front of
some of the selections indicate that those
selections cannot be made, yet. In the above case, a
project must be activated before most of the other
menu selections can be enacted. Virtually each menu
item calls a dedicated Perl module to enact that
item.
Extropy requires that MySQL be
accessible. Also, due to the wide variety of ways
that given data is provided, programming expertise
would be required to adapt Extropy to another
location.
The main Extropy program file
is
extropy. All this main file does is call the
Main Menu module, in
Extropy::MenuMain.pm. These general purpose
modules are used in the Extropy system:
These modules read the given data:
These modules process the data:
This module produces the GFF file:
The resulting GFF file is
here. The accompanying configuration file is
here. The detail CGI code which
displays addition information about a feature from
GBrowse is
here.
These are examples of interim files used by the
above modules:
Here are images produced
with Microsoft Access describing relationships
between the tables in the database:
A mysqldump file of the database is
available upon request. |