From SNIC Documentation
BLAST (basic local alignment search tool) is a software package for aligning nucleotide or amino acid sequences, and its primary use is to search databases for sequences that are similar to a given candidate sequence.
|Beda||C3SE||throughput cluster resource|
|Kalkyl||UPPMAX||cluster resource of about 21 TFLOPS|
|Kappa||NSC||throughput cluster resource of 26 TFLOPS|
|Matter||NSC||cluster resource of 37 TFLOPS dedicated to materials science|
|Triolith||NSC||Capability cluster with 338 TFLOPS peak and 1:2 Infiniband fat-tree|
There are two BLAST versions that are in current widespread use; the legacy NCBI BLAST and the new rewrite BLAST+.
BLAST+ was written to improve performance and maintainability, and to facilitate introduction of new features. It is similiar in most respects and has been made almost completely backwards compatible, by way of a wrapper script called
./legacy_blast.pl. New projects are encouraged to use BLAST+ if at all possible.
Many of the features in BLAST require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your BLAST jobs in an optimal manner (for example
$BLASTDB, described for example here).
Use all your processors
BLAST uses only one processor core by default, but you increase this number using the
-a command line option (
-num_threads for BLAST+), which can often provide a significant increase in speed. If you are using a preinstalled BLAST version on a SNIC resource, the recommended number of cores to use is given by the
$BLAST_NUM_CPUS environment variable (e.g. used like
blastall -a $BLAST_NUM_CPUS ... ). However, in some situations you may want to consider decreasing this number, particularly if your searches generate a large enough number of hits to deplete RAM, causing the OS to start swapping data and results to disk, which will near slow your job to a stop (see below).
Do not run out of memory
If possible, you should ensure that you have enough RAM to hold the database as well as the results and still have some headroom. This ensures that BLAST will not need to read data from disk unnecessarily, which otherwise would cause significant slowdown. This can be done for example by:
- Choose a system with enough RAM
Multiprocessor systems generally have more memory than single processor systems, and the database will also require proportionally less memory, since only one copy is needed in the OS file cache regardless of the number of processors using it.
- Partition the search space
For huge databases or very restricted amounts available memory it may be required to split the database into manageable chunks and process them as separate jobs.
ExpertsNo experts have currently registered expertise on this specific subject. List of registered field experts:
|Field||AE FTE||General activities|
|Henric Zazzi (PDC)||PDC||Bioinformatics||100100||Bioinformatics Application support|
|Joel Hedlund (NSC)||NSC||Bioinformatics||00|
|Martin Dahlö (UPPMAX)||UPPMAX||Bioinformatics||1010||Bioinformatic support|
|Sebastian DiLorenzo (UPPMAX)||UPPMAX||Bioinformatics||5050||National bioinformatic support, NGS tumor data|