The UK Crop Plant Bioinformatics Network

BrassicaDB Process

EMBL Release 82

( ... and on )


The following notes describe the processing of EMBL release 82, and supplement the general details of the process in BrassicaDB Nucleotide Sequence Process.


Monday 21st March 2005

Extracted Brassica accessions from the EMBL flat files (- CON & WGS divisions)
BrassicaEMBL
EST59,83825,887,122
HTG570,881
GSS614,80011,146,857
Other sequences2,8575,183,841
Total677,50042,288,701


Monday 11th April 2005

Downloaded and parsed UniProt Release 4.3, (Swiss-Prot Release 46.3,TrEMBL Release 29.3) released 15/3/2005 (results). Built the SPTrEMBL database for the BrassicaDB BLAST compute.


Tuesday 12th April 2005

Loading the ace files generated from EMBL r82 into an empty database gave ...

Class# objects% change (cf r81)
DNA677,499+2.9%
Paper1,118+2.1%
Protein00
Sequence680,352+2.9%

Building the database with EMBL r82, post-r82 Brassica updates to r82u040 (13/04/05), the BBSRC SSR data, UniProt 4.3 and the BrassicaDB legacy dataset gives ...

Class# objects
Author10,194
DNA677,947
Gene_Product1,191
Paper7,365
Journal1,210
Peptide1,981
Protein1,981
Sequence680,880
Species26


Wednesday 13th - Tuesday 19th April 2005

Started the BrassicaDB BLAST compute at 1730, this comprising:


Wednesday 20th April 2005

Processed the BLAST output to ace files and applied to database. Included parsed in silico mappings of Brassica GSSs, ESTs and SSR flanking sequences to the Arabidopsis genome database. Forged the intra-database links but not links to the deprecated Mendel-GFDb database.

Mirrored the binary DB files to both jicbio and UK CropNet servers.


Thursday 21 stApril 2005

Rebuilt database available from both servers. Outlinks to UniProt records for the Other_Protein class will be added soon - this was caused by a change in the UniProt database structure.