The UK Crop Plant Bioinformatics Network

BLAST Process


For BrassicaDB we perform a range of BLAST analyses:

For each set of sequences to be BLASTed we dump them out of the database in fasta format. These files are transfered to the cluster along with a copy of the BLAST databases (EMBL, SPTrEMBL, BrassicaDB protein).

The data sets are generated using the following queries:

All the BLASTs are performed using WU-BLAST 2.0.

Currently we use a small Linux cluster to perform the BLASTs.

The BLASTs are performed in two runs: one for EMBL and SPTrEMBL and one for the BrassicaDB BLASTs.

A small script (rebuild.sh) is used to automate each run.

% ./rebuild.sh

The files produced are just a long list of BLAST reports. For the purposes of parsing these into ace files a simple script (lc2blasts.pl) is used to seperate them into individual BLAST reports (one to a file).

% ~/perl/seq2ace/blast2ace/lc2blasts.pl ../archive/Brassica_ESTs.blast SPTrEMBL

The BLAST parsing script is then run over the the BLAST reports and an ace file produced. This ace file is then read into the database.

% ~/perl/seq2ace/blast2ace/blast2ace.pl BLASTX_SPTREMBL SPTrEMBL "*" EST_SPTrEMBL_BLASTX-20000407.ace similar protein 1 "" 0

Note: remember to check the ace file is correct before reading it into the database.