Difference between revisions of "HMMER"

From SNIC Documentation
Jump to: navigation, search
m (moved Hmmer to HMMER)
Line 21: Line 21:
  
 
HMMER-3.0 may seem like an obvious choice; it is much faster than its predecessor and it is currently used in large scale production (e.g. by [http://pfam.sanger.ac.uk/ Pfam]), and it is also promoted as the official main HMMER version. However, HMMER-3.0 is not feature complete. Especially, the old default alignment behavior (glocal, hmm_ls) is missing, so if this feature is necessary: choose HMMER-2.3.2.
 
HMMER-3.0 may seem like an obvious choice; it is much faster than its predecessor and it is currently used in large scale production (e.g. by [http://pfam.sanger.ac.uk/ Pfam]), and it is also promoted as the official main HMMER version. However, HMMER-3.0 is not feature complete. Especially, the old default alignment behavior (glocal, hmm_ls) is missing, so if this feature is necessary: choose HMMER-2.3.2.
 +
 +
 +
== Computational considerations ==
 +
 +
Many of the features in HMMER require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your HMMER jobs in an optimal manner (for example <code>prepare_db</code> and <code>$HMMER_DB_DIR</code>).

Revision as of 08:51, 25 February 2011

General info

HMMER is a software package for working with profile hidden Markov models (HMM) of known regions in proteins.

An HMM is a statistical model that describes the known sequence variations within a specific group of proteins that may be of special interest; for example a protein family with known function, or a domain containing a well studied interaction surface or an active site. HMM is a machine learning technique [1] where the models are built from training examples that are known good members, and where the finished models can be used to reliably classify and annotate new or poorly understood protein sequences in an automated fashion. Large libraries of trusted HMMs (such as Pfam) are of course immensely beneficial, as they can be used to automatically classify large portions of newly sequenced genomes, directly as they become available.

The HMMER package contains applications for working with HMMs, for example for:

  • Building and calibrating HMMs.
  • Matching an HMM against a sequence database (for finding new members).
  • Matching a sequence against an HMM database (for finding new sequence features).

Versions

There are two verions of HMMER that can conceivably be useful:

  • HMMER-2.3.2: Old stable version.
  • HMMER-3.0: Fast, but backwards incompatible and non-feature-complete.

Their implementations and output (and potentially also the actual results) are vastly different, so ongoing projects are not recommended to switch between them. For new project, it is highly recommended to spend some time to deduce which version is the most suitable.

HMMER-3.0 may seem like an obvious choice; it is much faster than its predecessor and it is currently used in large scale production (e.g. by Pfam), and it is also promoted as the official main HMMER version. However, HMMER-3.0 is not feature complete. Especially, the old default alignment behavior (glocal, hmm_ls) is missing, so if this feature is necessary: choose HMMER-2.3.2.


Computational considerations

Many of the features in HMMER require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your HMMER jobs in an optimal manner (for example prepare_db and $HMMER_DB_DIR).