Author
Bayesian inference in phylogeny generates a posterior distribution for a parameter, composed of a phylogenetic tree and a model of evolution, based on the prior for that parameter and the likelihood of the data, generated by a multiple alignment.
MP, TD
(http://www.sciencemag.org/content/294/5550/2310.full)
Bayesian_Interence
Example:
CONSENSUS TREE:
The numbers at the forks indicate the number of times the group con-sisting of the species which are to the right of that fork occurred among the trees, out of 1000 trees
+--------------Taxon 4
!
! +---------Taxon 1
+1000.0
! +----Taxon 2
+572.7
· +----Taxon 3
MP, HY
The parametric bootstrap: A method of attempting to estimate confidence levels of inferred relationships. Bootstrap proceeds by resampling the original data matrix by replacement of characters. The data sets are obtained by simulation on our best estimate of the tree rather than by resampling columns of the original data matrix.
Usage:
Programs such as DAMBE, Fast tree, GARLI, PHYLIP, RAXML.
Bootstrapping
Branch_Swampling
MP, HY
Combinatorial extension uses local geometry to align short fragments of the two proteins being analysed and then assembles these fragments into a larger alignment.
Combinatorial_Extension
Conference_Paper
Example: To determine the expression levels of 20 proteins to predict whether a cancer patient will respond to a drug. Cross-validation uses the best fit and will generally include only a subset of the features that are deemed truly informative.
MP, HY
Cross-validation (rotation estimation) is a technique for assessing how the results of a statistical analysis will generalize to an independent data set.
Cross_Validation
Example: construction of protein data bank.
MP, HY
distance matrix alignment is a fragment based method for constructing structural alignment based on similarity patterns between successive hexapeptides in queries.
DALI
MP,HY
(Data Analysis in Molecular Biology and Evolution)
It is a general-purpose package that reads and converts a number of file formats, and has many features for descriptive statistics in computing.
Xuhua Xia, Department of Biology
and the Center for Advanced Research in Environmental Genomics
(CAREG), University of Ottawa, Ontario, Canada.
Details: http://dambe.bio.uottawa.ca/dambe.asp.
Usage: Parsimony, distance, or likelihood methods, including
bootstrapping and jackknifing.
~Analyze: DNA and protein sequence and gene frequencies.
Advantages: It allows sequences to be fetched over the web while
running DAMBE using simple web browser.
DAMBE
DOI
Date
MP,SV
The distance-Wagner method is intended as an approximation to construction of a most parsimonious tree. Species are added to a tree, each in the best possible place. This is judged by computation of the increase in the length of the tree caused by each possible placement of that species.
Distance_Wangner
Example: identify insertions, deletions or inverted repeats.
MP, HY
Dot –matrix method is a primary method of producing pairwise alignment.
Is qualitative, simple and time consuming.
Problems: noise, lack of clarity, non-intuitiveness, difficulty extracting match summary and match positions on the two sequences
Dot_Metrix_Method
MP, HY
Dynamic programming is a primary method of producing pairwise alignment.
Can produce global alignments via Needleman-Wunsch algorithm and local alignments via Smith-Waterman algorithm.
Dynamic_Programming
MP,HY
This model resulted in the development of a set of widely used replacement matrices. In this approach, replacement rates are derived from alignments of protein sequences that are at least 85% identical such that the likelihood of a particular mutation being the result of a set of successive mutations is low.
Ref: Dayhoff et al. 1978.
Empirical_Substitution_Models
End_Page
Epub_Date
Experimental_Data
External_Data
MP
FastTree: Computing Large Minimum Evolution Trees with
Profiles instead of a Distance Matrix, Molecular Biology and Evolution
MP,HY
Written by/released/produced by: Morgan Price of the Adam Arkin's
group, Physical Biosciences Division of Lawrence Berkeley National Laboratory, Berkeley, California
Usage: Bootstrapping, Shimodaira-Hasegawa (SH), maximum
likelihood, GTR.
Analyze: Nucleotide or protein sequences
Advantages: Fast maximum likelihood program
Fast_Tree
MD,TD
GARLI (Genetic Algorithm for Rapid Likelihood Inference) performs phylogenetic searches on aligned nucleotide, codon and amino acid data sets using the maximum likelihood criterion.
(http://molecularevolution.org/software/phylogenetics/GARLI)
(http://molecularevolution.org/resources/activities/garli_activity)
Usage:
GARLI main steps of an analysis of a nucleotide data set (both codon and partitioned models) includes setting up and monitoring a run, using features such as constrained searches and bootstrap analyses if needed, and inspecting the output from a run.
Garli
MP, HY
Global alignment is a computational approach to sequence alignment.
Mostly used to align every residue in every sequence where the sequences in the query set are of similar and equal size.
Global_Alignmnet
MP, HY
Hidden Markov model is used to produce probability scores for a family of multiple sequence alignment.
Hidden_Markov_Model
ISSN
Issue
MP, HY
Iterative methods: It is an extension of progressive methods attempting to improve the heavy dependence on the accuracy of the initial pairwise alignment.
Iterative_Method
MP, TD
The Jukes-Cantor (JC69) model is the simplest DNA substitution model because it assumes equal base frequencies (25%) and equal nucleotide substitution rates for all pairs of the four nucleotides A, T, C, and G. It also does not correct for the higher rate of transitional substitutions in comparison to transversional substitutions.
(http://www.megasoftware.net/3.1/WebHelp/distancemethods_hc/hc_jukes_cantor_distance.htm)
Usage: It is special case of F81 model and thus is only used for the original likelihood test.
(http://www.annualreviews.org/doi/pdf/10.1146/annurev.ecolsys.28.1.437)
JC
MP, HY
Jack knifing, is used in statistical interference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. This is a method of resampling data in an effort to assess confidence in the hypothesized relationships between taxa.
Usage: Programs as DAMBE
Jack_Knifing
Journal
Journal_Article
Literature
Example:
S1= GCGCATGGATTGAGCGA
S2= TGCGCCATTGATGACC
possible alignment:
S’1= ATTGA-G
S’2= ATTGATG
MP, HY
Local alignment is a computational approach to sequence alignment.
Used for aligning dissimilar sequences.
Local_Alignmnet
MP, HY
Motif finding: Constructs global alignment sequences that attempt to align short conserved sequence motifs.
Motif_Finding
MP,HY
MrBayes assumes prior distribution of tree topologies and uses MCMC (Markov Chain Monte Carlo) methods to search tree space and infer the posterior distribution of topologies. It reads sequence data in the NEXUS file format, and outputs posterior distribution estimates of trees and parameters.
Written by/released/produced by: John Huelsenbeck and Fredrik
Ronquist
Details: http://mrbayes.net
Usage: Bayesian inference
Analyze: Nucleic acid sequences, protein sequences and
morphological characters
MrBayes
Example:
S1=AGGTC
S2=GTTCG
S3=TGAAC
Possible alignment AGGGT-C-
-G-T TC G
TG-AAC-
Possible alignment AGGT-C-
GTT—CG
-TGAAAC
MP, HY
It is an extension of pairwise alignment to incorporate more than two sequences at a time.
Multiple_Sequence_Alignment
MP, SV
Nearest Neighbor Interchange (NNI): This is a heuristic algorithm for searching through treespace. It proceeds by juxtaposing the positions of neighbors on a phylogenetic tree. If the resulting tree is better, then it is retained. This algorithm is quite a gentle perturbation of the tree and is inferior to either SPR or TBR in terms of completeness of the searsh. On average it will be quicker than SPR or TBR.
http://www.dbbm.fiocruz.br/james/GlossaryN.html
Programs used:
PAUP, PhyML 3.0
NNI
Analyze: DNA sequence data or discrete characters
MP
It searches for most parsimonious trees according to character weights defined by the user a priori.
Written by/released/produced by: Pablo Goloboff, Instituto Miguel
Lillo, Tucumán, Argentina
~Details: http://www.cladistics.com/aboutNona.htm.
MP,HY
Usage: Parsimony
~Advantages: Very fast than other parsimony programs.
NONA
MP, TD
An optimality criterion provides a measure of the fit of the data to a given hypothesis. The selection process is determined by the solution that optimizes the criteria used to evaluate the alternative hypotheses. The term has been used to identify the different criteria that are used to infer a phylogenetic tree and include maximum likelihood, Bayesian, maximum parsimony.
Optimaly
MP, TD
PHYLIP (PHYLogeny Inference Package) is a package of 35 programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executables.
(http://evolution.gs.washington.edu/phylip.html)
(http://evolution.gs.washington.edu/phylip/general.html)
Usage: Methods available in each program include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and discrete characters.
PHYLIP
MP, HY
Are used to find the local or global alignments of two query sequences
Used for methods that do not require extreme precision.
Pairwise_Alignment
MP, HY
Permutation tests are standard nonparametric methods. It is often called the permutation tail probability test (PTP). Here the columns of the matrix are randomised so that the consensus sequence and the composition for any particular column is maintained, but any signal is lost.
Permutation_Test
MP,HY
This implements a poissson distribution that accurately estimates the number of amino acid replacements when species are closely related.
Ref: Nei et al. (1987)
Poisson_Model
MP, HY
Progressive, hierarchial or tree methods generate multiple sequence alignment by aligning the most similar sequences first and then adding successively less related sequences until the entire query has been incorporated.
Progressive_Method
Provenance
MP, HY
Resampling techniques provide an estimate of dispersion or statistics of uknown or poorly known distribution.
Resampling
MP,SV
Subtree Pruning Regrafting (SPR) This is a heuristic search algorithm for searching through treespace. It proceeds by breaking off part of the tree and attaching it to another part of the tree. If it finds a better tree, then the new tree is used as a starting tree for another round of SPR. This is a more rigorous algorithm than NNI, but not as robust as TBR. Another name for SPR is cut-and-paste.
http://www.dbbm.fiocruz.br/james/GlossaryS.html
Programs used:
PAUP, PhyML 3.0
SPR
MP, HY
sequential structure alignment program is a program of structural alignment that uses atom-to-atom vectors in structure space.
SSAP
MP, HY
It is a way of arranging the specific sequences (DNA, RNA, protein) to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships between the sequences.
Sequence_Alignmnet
Short_Title
Start_Page
MP, HY
Uses information about the secondary and teritiary structure of the specific molecule in aligning sequences.
Structure_Alignment
MP, SV
Tree-Bisection-Reconnection (TBR) This is a heuristic algorithm for searching through treespace. It proceeds by breaking a phylogenetic tree into two parts and then reconnecting the two subtrees at all possible branches. If a better tree is found, it is retained and another round of TBR is initiated. This is quite a rigorous method of searching treespace. It is not guaranteed to find the optimal tree, but it is more robust than SPR OR NNI.
http://www.dbbm.fiocruz.br/james/GlossaryT.html
Programs used:
PAUP
TBR
Title
MP
tree-traversal refers to the process of visiting (examining and/or updating) each node in a tree data structure, exactly once, in a systematic way
Traversing_Tree_Space
Type_of_Article
Volume
Year
Data_Set_Type
Pages
Analyze: Molecular sequences, morphological data and other data
types.
MP,HY
(Phylogenetic Analysis Using Parsimony)
PAUP is a comprehensive program and competes with PHYLIP to be responsible for the most trees published.
Written by/ released/produced by: David Swofford and distributed by
Sinauer Associates of Sunderland,Massachusetts.
Details: http://www.sinauer.com/detail.php?id=8060.
Swofford et al. (1998)
Usage: Parsimony, Maximum likelihood and distance matrix methods
PAUP
MP, HY
Word method (K-tuple method) is a primary method of producing pairwise alignment. Particularly used in large data base searche.
Word_Method/k_tuple_Method
Amino_Acid_Model
MP, TD
Whereas the distance based methods compress all sequence information into a single number, the character-based methods attempt to infer the phylogeny based on all the individual characters (nucleotides or amino acids).
Character_Based
MP,TD
A codon-based model describes the evolution of protein-coding DNA sequences using substitions between codons. They are used in phylogenetic estimation within the ML framework.
(Goldman and Yang 1994)
- 3 types of codon based models:
(http://e-collection.library.ethz.ch/eserv/eth:3/eth-3-02.pdf)
1/ Parametric model/Muse and Gaut (1994)
The model uses six parameters, α and β for the synonymous and nonsynonymous
substitution rates and the four nucleotide frequencies πx (x =A,G,T,C). The instantaneous rate
between codons i and j is only positive if they differ by exactly one nucleotide.
2/ Empirical model/ Yang and others (1994)
This model has parameter for transversion/transition bias, codon usage bias, physicochemical distances between amino acids (coded by the codons) and ”the variability of the gene or its tendency to undergo nonsynonymous substitution.” Like the first model, it only takes into account codons that are different by one nucleotide.
3/Combined codon model/Whelan and Goldman (2004)
Whelan and Goldman model consider substitutions of two or three consecutive nucleotides as one possible evolutionary event.
Codon_Based_Model
DNA_Model
DataFormat
DataType
MP,SV
It requires distance measures between the sequences of the data.
It calculates a measure of the distance between each pair of species and then finds a tree that predicts the observed set of distances as closely as possible or that minimizes discrepancies among pair-wise distances.
Distances are considered as the estimates of the branch length separating that pair of species.
Ref: Inferring Phylogenies by Joseph Felsenstein
Programs used:
FITCH, KITSCH, NEIGHBOR, DNADIST, RESTDIST
Ref: http://www.mbio.ncsu.edu/bioedit/appinstall.html
Distance_Based
Example: We assume that the probability of change from state ‘i’ to state ‘j’ is proportional to the frequency of state ‘j’.
MP, HY
Felsenstein 1981 (F81, nst=1):
Model in which the probability of nucleotide changes were determined by the equilibrium nucleotide frequencies. This is an extension of JC model. There are variable base frequencies, all substitutions equally likely (PAUP, PAML)
Ref:D.H. Bos, D. Posada, Developmental and Comparative Immunology 29 (2005) 211–227.
F81
FASTA
MP,SV
GTR(General Time-Reversal):
It assumes a symmetric substitution matrix and so the time is reversible. The nucleotides can occur at different frequencies. Distance is made by estimating the base frequencies and the rates and finding ones that exactly predict the observed net transition matrix.
http://www.life.umd.edu/labs/delwiche/bsci348s/lec/NTSeqEvol.html
GTR
MP,TD
Unlike JC69, “the HKY (Hasegawa, Kishino, Yano, 1985) model assumes a time-reversible process, a non-uniform distribution of nucleotides and different rates for transitions and transversions.”
(http://lectures.molgen.mpg.de/phylogeny_ws05/exercises/exercises_07.pdf)
(http://www.annualreviews.org/doi/pdf/10.1146/annurev.ecolsys.28.1.437)
Usage: HKY85 and F81 are similar and can be used as appropriate models for the alternative hypothesis of likelihood. (The alternative hypothesis answers the question “Does the addition of a substitution parameter provide a signiﬁcant increase in the likelihood”?)
HKY
MP,TD
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, inferring ancestral sequences, and testing evolutionary hypotheses. (http://www.megasoftware.net/)
(http://www.megasoftware.net/tutorial.php)
Usage:
MEGA can re-construct a phylogeny using Maximum Likelihood, Minimum Evolution, UPGMA, and Maximum Parsimony methods in addition to Neighbor-Joining. For example, MEGA can re-construct the MP phylogeny using Branch and Bound search method.
Mega
MP, TD
MOLPHY is a free package of programs for molecular phylogenetics based on the Maximum Likelihood (ML) method. The main program of MOLPHY is ProtML which infers evolutionary trees from Protein (amino acid) sequences
(http://www.is.titech.ac.jp/~shimo/class/doc/csm96.pdf)
using ML criterion.
Molphy
Example:
The likelihood generally depends upon the phylogeny sites, branch lengths, and other substitution parameters. The method of maximum likelihood estimates the mutational probabilities and finds the values of the parameters which maximize the likelihood function.
MP, TD
Maximum Likelihood Estimation (MLE) is a method for the inference of phylogeny. Using statistical techniques to assign probabilities to the proposed tree models, the method searches for the tree with the highest probability or likelihood.
(http://bioinf.ncl.ac.uk/molsys/data/like.pdf)
(http://www.sciencemag.org/site/feature/data/1050262.pdf)
Usage:
The method is used in phylogeny programs such as PAUP*, PHYLIP, and PAML. MLE works well with many nucleotide-based models such as JC, GTR, HKY, F81, K2P.
Maximum_Likelihood
Example:
Given a set of aligned sequences or taxa, parsimony analysis determines the number of steps of each character on a given tree. The sum over all characters is called tree Length and most parsimonious trees have the minimum tree length needed to explain the
observed distribution of all the characters.
MP, TD
Maximum parsimony is a character-based method that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data, or in other words by minimizing the total tree length.
(http://bioinf.ncl.ac.uk/molsys/data/characters.pdf)
Usage:
MALIGN and POY program use the advantage of maximum parsimony analysis to optimize multiple sequence alignments and cladogram score of the corresponding tree.
Maximum_Parsimony
Method
MP,HY
Model represents the footprint of evolutionary phenomenon that generated the data (such as mutation and selection) and provides framework through which the phylogenetic construction method estimates parameters to find the preferred tree.
The specific model selected for a data set depends on features of the data such as level of variation and frequency.
Benefits:
•Overcome most of the phylogenetic scenarios.
•Provide simplifications, summarizing many evolutionary forces and incorporation of these models leads to improvement in phylogenetic analysis.
Example: Maximum likelihood, neighbor joining and Bayesian methods use models and benefit from them but maximum parsimony does not use models.
Ref:
•Nei M. Phylogenetic analysis in molecular evolutionary genetics. Ann Rev Genet 1996;30:371–403.
•Kuhner MK, Felsenstein J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 1994;11:459–68.
•Huelsenbeck JP, Hillis DM. Success of phylogenetic methods in the four-taxon case. Syst Biol 1993;42:247–64.
•Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 1978;27: 401–10.
Model
Model_of_Protein_Evolution
NEXUX
MP,TD
NINJA is a software for large-scale neighbor-joining phylogeny inference.
http://nimbletwist.com/software/ninja/docs.html
Usage: It expects inputs to be either alignments (in fasta format) or pairwise distance matrices (in phylip format), and can produce both a distance matrix (phylip) and a tree file (newick format).
Ninja
NeXML
MP,SV
Neighbor joining method is a bottom-up clustering method for the creation of phonetic trees. This method also applies an algorithm for clustering.
Programs used: Neighbor
Neighbor_Joining
- Nucleotide Substitution models describe the evolutionary rates at which one nucleotide replaces another. “These models assume that only the instantaneous state of a character is important and the probability of change from state i to state j depends upon the amount of time that has passed and the substitution rate.”
(http://www.life.umd.edu/labs/delwiche/bsci348s/lec/NTSeqEvol.html)
Usage: They are used in molecular phylogenetic analyses and tree likelihood calculation (mostly in Bayesian and maximum likelihood approaches of tree estimation)
Nucleotide_Substitution_Model
Analyze: DNA sequence data.
MP,SV
POY is a program for phylogenetic analysis of DNA sequence data based on the principle behavior of parsimony. POY uses concepts of homology of DNA sequence data than those of static DNA sequence alignments. This program implements two methods of DNA analysis “optimization alignment” and “fixed-states optimization”
Exploring the Behavior of POY, a Program for Direct Optimization of Molecular Data, Cladistics 17, S60–S70 (2001)
Usage: Parsimony
Poy
PhylogeneticTree
These are the tools for interpreting and inferring phylogenetic trees.
Program
Protein_Structure_and_Correlated_Change
Analyze: Molecular sequence data.
MP,SV
It is a fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method. It provides faster heuristic search, use of parallel processing, and a simulated annealing algorithm.
RAxML-II: a program for sequential, parallel and distributed inference of large phylogenetic trees, Concurrency Computat.: Pract. Exper. 2005; 17:1705–1723.
Usage: Maximum likelihood method, parsimony, bootstrapping, and consensus tree methods.
Raxml
Example: The probability of having a ‘i’ which mutates to a ‘j’ is the same as starting with a ‘j’ which mutates into an ‘i’.
Sites are divided into two classes those that are variable or invariable and incorporated them into the nucleotide models as below:
I: Proportion of invariable sites within the model;
G (Π): Gamma distribution of rates among sites: There is a continuous probability of change in nucleotide sites that determines the shape of gamma distribution. Slow rates have a skewed distribution to right,whereas high rates have a small shape.
MP,HY
Symmetric Model: Estimation of evolutionary distances between nucleotide sequences of equal base frequencies, symmetrical substitution matrix (A to T = T to A) (PAUP, PAML)
•D.H. Bos, D. Posada / Developmental and Comparative Immunology 29 (2005) 211–227
SYM
Sequence
SequenceAlignment
SourceTaxon
Tree Estimation
Analyze: molecular sequence data
MP,SV
This is a program for tree estimation and to reconstruct phylogenetic trees. It implements a fast tree search algorithm by the strategy of “quartet puzzling” which allows large data sets. It also computes pair-wise maximum likelihood distances as well as branch lengths for user specified trees. It also conducts a number of statistical tests on the data sets.
http://www.tree-puzzle.de/
Usage: Maximum likelihood method
Tree_Puzzle
Example: If a node leads to two branches, one of which leads on upwards to all mammals and the other on upwards to all birds, the estimate of the total branch length down to the node is half the average of the distances between all( bird, mammal) pairs.
MP,SV
Unweighted Pair Group Method with Arithmetic Mean
This method applies a particular algorithm to a distance matrix to come up with a tree directly.At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters A and B is taken to be the average of all distances between pairs of objects "x" in A and "y" in B, that is, the mean distance between elements of each cluster.
Programs used:
neighbor
UPGMA
MP,SV
k80 (kimura):
This model uses two rates, ALPHA (Transitions) and BETA (Transversions) as parameters along with time. Transitions are substitutions between nucleotides of the same or similar molecular structure like purines or between pyrimidines. All other substitutions are tranversions and are known to occur less frequently than transitions.
http://www.stats.ox.ac.uk/__data/assets/pdf_file/0003/4296/Basic_Models_of_Nucleotide_Evolution_2.pdf
K80
K80+1
K80+∏
K80+∏+1