[content]

Index A to ZApply NowFrom the ChancellorVisitorsAlumniPeople FinderFor the MediaFor Parentsjobs
Southern Illinois University Carbondale Home SIU Salukis
SalukinetSIUC IntranetAthleticsPublic Events CalendarWeather
Main
 HOME PAGE
 BUILD 3
 BUILD 4
 BUILD 5
STUDY GUIDES
 TUTORIALS
 USER GUIDES
 ONTOLOGY AND CLASSIFICATION
DOWNLOADS
 FINGER PRINT DATABASE
 GFF FILES
 RELATED      PUBLICATIONS
INFORMATION
 NEW STORIES
 FIND US
 CONTACT US
LINKED SITES
 SOYBASE
  NCGR
 TIGR
 Feedback
SOYBEAN SEQUENCE
 BES
 EST
 GENOME SEQUENCE
 
Other Projects
Feedback
Data Submission

 

 
 

Round 3, SIU Soybean Genome


 
 

Methodology

This page documents Round 3 of converting soybean genomic data so that it can be displayed in GBrowse. Round 3 refers to the programming method and not to the data.

Round 3 Update.

Distinguishing characteristics of Round 3:

  • The programming consolidates and streamlines Round 2.
  • The data is divided into sets for description and manipulation. See ontology.

There are two types of sources of data: FPC results and loci locations. The FPC results are used to set up relationships between loci, clones, and contigs. The loci locations are then used via these relationships to place these features on a GBrowse physical map.

Putting the Loci Locations into a Standard Format

Three sources of loci locations were available:

  1. input.txt was the original loci data used in Round 1
  2. raw_loci.txt is from a file supplied by Soybase at Iowa State University.
  3. usda.txt is a tab-delimited file created from a March, 2003, USDA spreadsheet.

The above three files were in different formats. Furthermore, future loci data files will also likely be in different formats. For consistency, the loci locations were changed to the format used by sorted_locus_anchors.txt in Round 2. The data from one or a combination of these three files could be used at a time. Whatever would be put into sorted_locus_anchors.txt is what would be used.

input.txt was converted to sorted_locus_anchors.txt as described in Round 2. extract_loci.plx was used to convert the data from raw_loci.txt and extract_loci2.plx was used to extract data from usda.txt. Note the new usage of the .plx file extension. This stands for PerL eXecutable and is this way to conform with the style preferred by the Perl community.

Each of the datasets was tried, as well as combinations of them. The USDA dataset was then chosen as the one to continue with.

Putting the FPC Results into Usable Formats

Three sources of FPC results were available:

  1. The original fpc.txt file used in Round 1 (Version 3 data).
  2. The fpc2.txt file used in Round 2 (Version 4 data).
  3. webdata.fpc, an alternative build of Version 3 data.

examine_fpc_file.plx was updated from examine_fpc_file.pl to handle the different naming conventions of webdata.fpc (such as putting question marks after MLG names).

 

Whichever FPC file is input into examine_fpc_file.plx the following output files are created: sorted_clone2locus_relations.txt, sorted_locus2clone_relations.txt, and sorted_contig2clone_relations.txt.

Putting it all Together

After preparing the chosen loci location data and the chosen FPC output data, these were the files ready to be put together into a GFF file:

  • mlg.gff contained the MLG data from Round 2. It was put into the new GFF file without any changes.
  • qtl.gff contained the QTL data from Round 2. It was also put into the new GFF file without any changes.
  • sorted_locus_anchors.txt contained the locus anchor positions from above. This is equal to the set of Σ Loci. These were the bases for the locations of the clones and the contigs. The numeric fields in this file are for start, end, and midpoints of the anchor locations.
  • sorted_clone2locus_relations.txt contained the relationships between clones and loci obtained from the chosen FPC file. This is equal to the sets of Σβ Loci, γ Loci, Σβ1 Clones, and γ Clones.
  • sorted_locus2clone_relations.txt contained the relationships between clones and loci obtained from the chosen FPC file. This is the same data as the previous file, but it was sorted by loci instead of by clones.
  • sorted_contig2clone_relations.txt contained the relationships between contigs and clones obtained from the chosen FPC file. This is equal to the sets of Σβ2 Clones, γ Clones, and Σ Contigs. The numeric fields refer to starting and ending bands.

place_features.plx took the above files as input and created a new GFF file.

Contig conflicts were spread out using the Boats-on-a-Lake Algorithm.

Summary

Round 3 put the loci locations into a standard format, put the relevant FPC data into a usable format, and combined all of this into a GFF file


              Deepak
http://soybeangenome.siu.edu
Last update: July 31,2005.