Difference between revisions of "BLAST"
m |
|||
Line 3: | Line 3: | ||
|license=free | |license=free | ||
|fields=bioinformatics | |fields=bioinformatics | ||
− | |resources=kappa;matter;neolith | + | |resources=kappa;matter;neolith;beda |
}} | }} | ||
[http://blast.ncbi.nlm.nih.gov/ BLAST] (basic alignment search tool) is a software {{#show: {{PAGENAME}} |?description}}, and its primary use is to search databases for sequences that are similar to a given candidate sequence. | [http://blast.ncbi.nlm.nih.gov/ BLAST] (basic alignment search tool) is a software {{#show: {{PAGENAME}} |?description}}, and its primary use is to search databases for sequences that are similar to a given candidate sequence. |
Revision as of 15:15, 7 September 2011
BLAST (basic alignment search tool) is a software package for aligning nucleotide or amino acid sequences, and its primary use is to search databases for sequences that are similar to a given candidate sequence.
Contents
Experts
No experts have currently registered expertise on this specific subject. List of registered field experts:
Field | AE FTE | General activities | ||
---|---|---|---|---|
Anders Hast (UPPMAX) | UPPMAX | Visualisation, Digital Humanities | 30 | Software and usability for projects in digital humanities |
Anders Sjölander (UPPMAX) | UPPMAX | Bioinformatics | 100 | Bioinformatics support and training, job efficiency monitoring, project management |
Anders Sjöström (LUNARC) | LUNARC | GPU computing MATLAB General programming Technical acoustics | 50 | Helps users with MATLAB, General programming, Image processing, Usage of clusters |
Birgitte Brydsö (HPC2N) | HPC2N | Parallel programming HPC | Training, general support | |
Björn Claremar (UPPMAX) | UPPMAX | Meteorology, Geoscience | 100 | Support for geosciences, Matlab |
Björn Viklund (UPPMAX) | UPPMAX | Bioinformatics Containers | 100 | Bioinformatics, containers, software installs at UPPMAX |
Chandan Basu (NSC) | NSC | Computational science | 100 | EU projects IS-ENES and PRACE. Working on climate and weather codes |
Diana Iusan (UPPMAX) | UPPMAX | Computational materials science Performance tuning | 50 | Compilation, performance optimization, and best practice usage of electronic structure codes. |
Frank Bramkamp (NSC) | NSC | Computational fluid dynamics | 100 | Installation and support of computational fluid dynamics software. |
Hamish Struthers (NSC) | NSC | Climate research | 80 | Users support focused on weather and climate codes. |
Henric Zazzi (PDC) | PDC | Bioinformatics | 100 | Bioinformatics Application support |
Jens Larsson (NSC) | NSC | Swestore | ||
Jerry Eriksson (HPC2N) | HPC2N | Parallel programming HPC | HPC, Parallel programming | |
Joachim Hein (LUNARC) | LUNARC | Parallel programming Performance optimisation | 85 | HPC training Parallel programming support Performance optimisation |
Johan Hellsvik | PDC | Materialvetenskap | 30 | materials theory, modeling of organic magnetic materials, |
Johan Raber (NSC) | NSC | Computational chemistry | 50 | |
Jonas Lindemann (LUNARC) | LUNARC | Grid computing Desktop environments | 20 | Coordinating SNIC Emerging Technologies Developer of ARC Job Submission Tool Grid user documentation Leading the development of ARC Storage UI Lunarc Box Lunarc HPC Desktop |
Krishnaveni Chitrapu (NSC) | NSC | Software development | ||
Lars Eklund (UPPMAX) | UPPMAX | Chemistry Data management FAIR Sensitive data | 100 | Chemistry codes, databases at UPPMAX, sensitive data, PUBA agreements |
Lars Viklund (HPC2N) | HPC2N | General programming HPC | HPC, General programming, installation of software, support, containers | |
Lilit Axner (PDC) | PDC | Computational fluid dynamics | 50 | |
Marcus Lundberg (UPPMAX) | UPPMAX | Computational science Parallel programming Performance tuning Sensitive data | 100 | I help users with productivity, program performance, and parallelisation. I also work with allocations and with sensitive data questions |
Martin Dahlö (UPPMAX) | UPPMAX | Bioinformatics | 10 | Bioinformatic support |
Matias Piqueras (UPPMAX) | UPPMAX | Humanities, Social sciences | 70 | Support for humanities and social sciences, machine learning |
Mikael Djurfeldt (PDC) | PDC | Neuroinformatics | 100 | |
Mirko Myllykoski (HPC2N) | HPC2N | Parallel programming GPU computing | Parallel programming, HPC, GPU programming, advanced support | |
Pavlin Mitev (UPPMAX) | UPPMAX | Computational materials science | 100 | |
Pedro Ojeda-May (HPC2N) | HPC2N | Molecular dynamics Machine learning Quantum Chemistry | Training, HPC, Quantum Chemistry, Molecular dynamics, R, advanced support | |
Peter Kjellström (NSC) | NSC | Computational science | 100 | All types of HPC Support. |
Peter Münger (NSC) | NSC | Computational science | 60 | Installation and support of MATLAB, Comsol, and Julia. |
Rickard Armiento (NSC) | NSC | Computational materials science | 40 | Maintainer of the scientific software environment at NSC. |
Szilard Pall | PDC | Molecular dynamics | 55 | Algorithms & methods for accelerating molecular dynamics, Parallelization and acceleration of molecular dynamics on modern high performance computing architectures, High performance computing, manycore and heterogeneous architectures, GPU computing |
Thomas Svedberg (C3SE) | C3SE | Solid mechanics | ||
Torben Rasmussen (NSC) | NSC | Computational chemistry | 100 | Installation and support of computational chemistry software. |
Wei Zhang (NSC) | NSC | Computational science Parallel programming Performance optimisation | code optimization, parallelization. | |
Weine Olovsson (NSC) | NSC | Computational materials science | 90 | Application support, installation and help |
Åke Sandgren (HPC2N) | HPC2N | Computational science | 50 | SGUSI |
Versions
There are two BLAST versions that are in current widespread use; the legacy NCBI BLAST and the new rewrite BLAST+.
BLAST+ was written to improve performance and maintainability, and to facilitate introduction of new features. It is similiar in most respects and has been made almost completely backwards compatible, by way of a wrapper script called ./legacy_blast.pl
. New projects are encouraged to use BLAST+ if at all possible.
Computational considerations
Work locally
Many of the features in BLAST require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your BLAST jobs in an optimal manner (for example prepare_db
and $BLASTDB
, described for example here).
Use all your processors
BLAST uses only one processor core by default, but you increase this number using the -a
command line option (-num_threads
for BLAST+), which can often provide a significant increase in speed. If you are using a preinstalled BLAST version on a SNIC resource, the recommended number of cores to use is given by the $BLAST_NUM_CPUS
environment variable (e.g. used like blastall -a $BLAST_NUM_CPUS ...
). However, in some situations you may want to consider decreasing this number, particularly if your searches generate a large enough number of hits to deplete RAM, causing the OS to start swapping data and results to disk, which will near slow your job to a stop (see below).
Do not run out of memory
If possible, you should ensure that you have enough RAM to hold the database as well as the results and still have some headroom. This ensures that BLAST will not need to read data from disk unnecessarily, which otherwise would cause significant slowdown. This can be done for example by:
- Choose a system with enough RAM
Multiprocessor systems generally have more memory than single processor systems, and the database will also require proportionally less memory, since only one copy is needed in the OS file cache regardless of the number of processors using it. - Partition the search space
For huge databases or very restricted amounts available memory it may be required to split the database into manageable chunks and process them as separate jobs.
Availability
Resource | Centre | Description |
---|---|---|
Beda | C3SE | throughput cluster resource |
Kalkyl | UPPMAX | cluster resource of about 21 TFLOPS |
Kappa | NSC | throughput cluster resource of 26 TFLOPS |
Matter | NSC | cluster resource of 37 TFLOPS dedicated to materials science |
Triolith | NSC | Capability cluster with 338 TFLOPS peak and 1:2 Infiniband fat-tree |
License
License: Free.