[content]

Index A to ZApply NowFrom the ChancellorVisitorsAlumniPeople FinderFor the MediaFor Parentsjobs
Southern Illinois University Carbondale Home SIU Salukis
SalukinetSIUC IntranetAthleticsPublic Events CalendarWeather
Main
 HOME PAGE
 BUILD 3
 BUILD 4
 BUILD 5
STUDY GUIDES
 TUTORIALS
 USER GUIDES
 ONTOLOGY AND CLASSIFICATION
DOWNLOADS
 FINGER PRINT DATABASE
 GFF FILES
 RELATED      PUBLICATIONS
INFORMATION
 NEW STORIES
 FIND US
 CONTACT US
LINKED SITES
 SOYBASE
  NCGR
 TIGR
 Feedback
SOYBEAN SEQUENCE
 BES
 EST
 GENOME SEQUENCE
 
Other Projects
Feedback
Data Submission

 

 
 

 

Round 4, The Soybean GBrowse Database


 
 
SoyV4R4 prepares Version 4 data with Round 4 programming. The documentation is below. Additional documentation may be added from time to time.

Methodology

This page documents Round 4 of converting soybean genomic data so that it can be displayed and accessed in GBrowse. Round 4 refers to the programming method and not to the data.

Distinguishing characteristics of Round 4:

  • Object oriented Perl is used. All related data can be automatically adjusted by Perl code when one piece of data changes.
  • Over 20 tracks, some of which separate features into their alpha, beta, and gamma categories. See ontology.
  • An SQL master database is used to contain relationships between all of the features.
  • The database processing was designed with audit trails, so that the programmer can determine why a particular feature was processed as it was.
  • Locally produced web pages search the database and provide detailed information when clicking on certain features.
  • Feature labels designate relationships between features.
  • MLG's are referred to as such with an mlg prefix. (An MLG is similar to a chromosome. Not all soybean chromosomes have been designated, yet.) For example, MLG A1 is referred to as mlgA1 instead of A1. This allows for feature names to be distinguished from MLG names, when feature names and MLG names are identical. This is also consistent with other GBrowse sites that use the prefix chr for chromosomes, such as chr1.
  • Contigs are displayed in separate tracks for their forward, reverse, and spread locations. Click on a contig for detailed information, including the end matches for that contig. See reverse contigs and 3 kinds of contigs.
  • Beta 2 clones are also displayed in separate tracks relative to their forward, reverse, and spread contigs. Click on a clone for detailed information about that clone.
  • Loci are displayed in separate tracks for alpha, beta, and gamma types. Iowa State loci are labelled to distinguish them from USDA loci. Click on a locus for detailed information about that locus.
  • QTL candidate genes are displayed in a separate track below the QTL's. Click on a candidate gene to go its entry in SoyBase.
  • Soybean related genes are displayed in a separate track from other related genes.
  • Confirmed data is labelled as "match" if it matches a clone in our database. Click on a confirmed for detailed informatin.
  • perldoc documentation is included in the program files.

Given data:

  1. The MLG data was prepared in mlg.txt
  2. The loci data was prepared in loci.txt. Some of the loci data was obtained from usda.txt, a tab-delimited file created from a March, 2003, USDA spreadsheet. The remaining loci data was obtained from SoyBase at Iowa State University.
  3. The FPC data was obtained from soya.fpc, built by Jeff Shultz at SIU. soya.fpc contains Version 4 data.
  4. The QTL data was obtained from SoyBase at Iowa State University and put in qtl.txt.
  5. The end matches data was obtained from Jeff Shultz at SIU and put in ends.txt.
  6. The MTP data was obtained from Jeff Shultz at SIU and put in mtp.txt.
  7. The EST data was obtained from Kay Shopinski at SIU and put in est.txt.
  8. The sequence data was obtained from Jeff Shultz at SIU in the form of sequence.gff.
  9. The related genes data was obtained by Nagajyothi Lavu at SIU from GenBank and entered into related.gff by Jeff Shultz at SIU.
  10. The confirmed data was provided by Iowa State University via Jeff Shultz at SIU and put into confirmed.txt.

An object-oriented Perl program named Extropy was written with 30 modules and 9,466 lines of code to load the above data into a central MySQL database, process this data, and write a GFF file appropriate for use by GBrowse. The Extropy system was placed in a directory structure beginning with a gbrowse directory:

  • gbrowse
    • given_data
    • round4
      • Extropy
        • Extropy
    • working_data

Given data, listed above, was placed in the gbrowse/given_data subdirectory. Intermediate working data was placed in the gbrowse/working_data subdirectory. The main Extropy file, extropy, configuration files, and the resulting GFF file were all put in the gbrowse/round4 subdirectory. Extropy Perl modules for constants and utilities were placed in the gbrowse/round4/Extropy subdirectory. All other modules were placed in the gbrowse/round4/Extropy/Extropy subdirectory.

Extropy was designed to make it easy to reprocess the output if any data was changed at any stage in the processing. To that end, the Extropy Main Menu was designed to be used from the top down. If any data is changed by any of the Main Menu items, the user knows that the following Main Menu items, below the instant item, also need to be reprocessed. Here is the Main Menu:

    **************************************
    ***********  extropy menu  ***********
    **************************************
    *                                    *
    *   1  (A)ctivate a project          *
    *   2  - Read (M)LG File             *
    *   3  - Read (L)ocus File           *
    *   4  - Read (F)PC File             *
    *   5  - Read Q(T)L File             *
    *   6  - Read (E)nd Matches File     *
    *   7  - Read MT(P) File             *
    *   8  - Read EST File               *
    *   9  - Read Sequence File          *
    *  10  - Read Related Genes File     *
    *  11  - Read Confirmed File         *
    *  12  - (C)rosscheck Locus Names    *
    *  13  - (U)pdate Locus Names        *
    *  14  - C(o)unt Clone Anchors       *
    *  15  - Up(d)ate Clone Anchors      *
    *  16  - Cou(n)t Contig Anchors      *
    *  17  - Update Cont(i)g Anchors     *
    *  18  - (G)FF File Menu             *
    *  19  (S)tatus                      *
    *  20  - (R)eset Project             *
    *  21  (H)elp                        *
    *  22  (Q)uit                        *
    *                                    *
    **************************************
    **************************************

    Enter a menu item...

The minus signs (-) in front of some of the selections indicate that those selections cannot be made, yet. In the above case, a project must be activated before most of the other menu selections can be enacted. Virtually each menu item calls a dedicated Perl module to enact that item.

Extropy requires that MySQL be accessible. Also, due to the wide variety of ways that given data is provided, programming expertise would be required to adapt Extropy to another location.

The main Extropy program file is extropy. All this main file does is call the Main Menu module, in Extropy::MenuMain.pm. These general purpose modules are used in the Extropy system:

These modules read the given data:

These modules process the data:

This module produces the GFF file:

The resulting GFF file is here. The accompanying configuration file is here. The detail CGI code which displays addition information about a feature from GBrowse is here.

These are examples of interim files used by the above modules:

Here are images produced with Microsoft Access describing relationships between the tables in the database:

A mysqldump file of the database is available upon request.


              Deepak
http://soybeangenome.siu.edu
Last update: July 31,2005.