The following notes describe the processing of EMBL release 82, and supplement the general details of the process in BrassicaDB Nucleotide Sequence Process.
Extracted Brassica accessions from the EMBL flat files (- CON & WGS divisions)
| Brassica | EMBL | |
|---|---|---|
| EST | 59,838 | 25,887,122 |
| HTG | 5 | 70,881 |
| GSS | 614,800 | 11,146,857 |
| Other sequences | 2,857 | 5,183,841 |
| Total | 677,500 | 42,288,701 |
Downloaded and parsed UniProt Release 4.3, (Swiss-Prot Release 46.3,TrEMBL Release 29.3) released 15/3/2005 (results). Built the SPTrEMBL database for the BrassicaDB BLAST compute.
Loading the ace files generated from EMBL r82 into an empty database gave ...
| Class | # objects | % change (cf r81) |
|---|---|---|
| DNA | 677,499 | +2.9% |
| Paper | 1,118 | +2.1% |
| Protein | 0 | 0 |
| Sequence | 680,352 | +2.9% |
Building the database with EMBL r82, post-r82 Brassica updates to r82u040 (13/04/05), the BBSRC SSR data, UniProt 4.3 and the BrassicaDB legacy dataset gives ...
| Class | # objects |
|---|---|
| Author | 10,194 |
| DNA | 677,947 |
| Gene_Product | 1,191 |
| Paper | 7,365 |
| Journal | 1,210 |
| Peptide | 1,981 |
| Protein | 1,981 |
| Sequence | 680,880 |
| Species | 26 |
Started the BrassicaDB BLAST compute at 1730, this comprising:
Processed the BLAST output to ace files and applied to database. Included parsed in silico mappings of Brassica GSSs, ESTs and SSR flanking sequences to the Arabidopsis genome database. Forged the intra-database links but not links to the deprecated Mendel-GFDb database.
Mirrored the binary DB files to both jicbio and UK CropNet servers.
Rebuilt database available from both servers. Outlinks to UniProt records for the Other_Protein class will be added soon - this was caused by a change in the UniProt database structure.
|
Last modified: Wed Aug 24 13:23:36 BST 2005 |