For BrassicaDB we perform a range of BLAST analyses:
For each set of sequences to be BLASTed we dump them out of the database in fasta format. These files are transfered to the cluster along with a copy of the BLAST databases (EMBL, SPTrEMBL, BrassicaDB protein).
The data sets are generated using the following queries:
find Sequence_EST *; follow DNA
find Sequence * Probe; follow DNA
find Peptide *
All the BLASTs are performed using WU-BLAST 2.0.
Currently we use a small Linux cluster to perform the BLASTs.
Used for all the normal BLASTNs.
Used for all the BLASTXes.
The BLASTs are performed in two runs: one for EMBL and SPTrEMBL and one for the BrassicaDB BLASTs.
A small script (rebuild.sh) is used to automate each run.
% ./rebuild.sh
The files produced are just a long list of BLAST reports. For the purposes of parsing these into ace files a simple script (lc2blasts.pl) is used to seperate them into individual BLAST reports (one to a file).
% ~/perl/seq2ace/blast2ace/lc2blasts.pl ../archive/Brassica_ESTs.blast SPTrEMBL
The BLAST parsing script is then run over the the BLAST reports and an ace file produced. This ace file is then read into the database.
% ~/perl/seq2ace/blast2ace/blast2ace.pl BLASTX_SPTREMBL SPTrEMBL "*" EST_SPTrEMBL_BLASTX-20000407.ace similar protein 1 "" 0
Note: remember to check the ace file is correct before reading it into the database.
|
Last modified: Wed Jan 5 16:42:39 GMT 2005 |