T-Coffee: A Novel Method
for Accurate Multiple
Developed by Cédric Notredame et al, CNRS
Information Génomique et Structurale
What is T-Coffee ???
• Tree-based Consistency Objective
Function for alignment Evaluation.
• A multiple sequence alignment software
using a progressive approach.
• It was developed as an attempt to improve
How does it work ???
• Generates a library.
• Combines multiple/pair-wise, global/local
alignments into a single one.
• Different sources
• Estimates the level of consistency
• Uses optimization method to find the MSA
best fitting pair-wise alignments in library
• Indicator of alignment accuracy.
• Global Alignments try to align full length
• Local Alignments Algorithms provide sets of
non-overlapping alignments from comparison
– These perform well when there are clear blocks of
common ungapped alignment
• T- Coffee combines the best properties of each
• A simple, flexible and accurate solution using
combination of all the information
• Primary library pair-wise alignment b/w
• Two sources ClustalW, Lalign
• Weight Assignment to pairs
• Pooling of libraries
• For higher level of consistency
• Pair-wise alignments are combined
through intermediate sequence
• Removal of mismatches
• Distance Matrix
• Guide Tree using neighbor-joining
• Closest two sequences aligned
• Dynamic Programming
• Next Closest pair or sequence added
• All sequences aligned
• Related Topics
– COFFEE: A New Objective Function For Multiple Sequence Alignmnent.
– T-Coffee: A novel method for multiple sequence alignments.
– Using Multiple Alignment Methods to Assess the Quality of Genomic
Data Analysis, in Bioinformatics and Genomes
– 3DCoffee: Combining Protein Sequences and Structures within Multiple
• Related Tools
Download, Installation, Running…
• Latest Version 5.03
• Download from www.tcoffee.org
• Or use Online Server at www.tcoffee.org
• Runs on Unix / Linux / MAC osX
• Runs on Windows through Cygwin
– Cygwin is a Linux-like environment for Windows.
– A DLL (cygwin1.dll) which acts as a Linux API
emulation layer providing substantial Linux API
– A collection of tools which provide Linux look and feel.
Download, Installation and Run…
• Unix, MAC osX, Linux
– gunzip t_coffee.tar.gz
– tar -xvf t_coffee.tar
– cd t_coffee
– go into <distrib> folder in which you have input files
– ./t_coffee xxx.yyy
– Install Cygwin
– Follow Linux procedure
What Can it do ???
• Align nucleic and protein sequences.
• Use structural information for protein
sequences with a known structure.
• Compare alignments
• Reformat files
• Evaluate alignment using structural
• Simplest Usage
– t_coffee xxx.yyy
• Combining Alignments
– t_coffee –aln=a_cw.msf, a_mus.msf, a_tc.msf –
• Evaluating Alignments
– t_coffee –infile= xxx.yyy –special_mode=evaluate
– Creates xxx.score_ascii and xxx.html.
• The color scheme of T-Coffee is an
indicator of the reliability of the alignment.
• Red bits are the more consistent and
therefore the more likely to be correctly
• Blue bits are the less trustable.
• Combining Sequences and Structures
– t_coffee 3d.fasta –special_mode=3dcoffee
– T-Coffee to automatically identify the target corresponding to
your sequence as indicated by an NCBI BLAST.
– PDB sequences from RCSB (Research Collaboratory for
Structural Bioinformatics ).
• Identifying occurunces of a motif
– Uses special mode Mocca
– t_coffee –other_pg mocca sample_seq1.fasta
• t_coffee -other_pg seq_reformat
• Removing the gaps from an alignment
– t_coffee -other_pg seq_reformat -in abc.aln -output fasta_seq >
• Changing file formats
– t_coffee -other_pg seq_reformat -in abc.aln -output msf > abc.msf
• Colouring residues in an Alignment
– seq_reformat -in=abc.aln -struc_in=cache.seq -struc_in_f number_fasta
• Selectively modifying residues
– seq_reformat -in sample_aln7.aln -struc_in sample_aln7.cache_aln -
struc_in_f number_aln -action +lower '[1-2]'
– List of actions upper, lower, keep, switchcase, remove, convert
• Extracting Sequences
– t_coffee -other_pg seq_reformat -in sproteases_small.aln -action +grep
NAME REMOVE HUMAN -output clustalw
– t_coffee -other_pg seq_reformat -in sproteases_small.aln -action
+extract_block cons 100 120 > block1.aln
– Extracting most informative sequences
– Identifying and removing outliers.
• Concatenating Alignments
– t_coffee -other_pg seq_reformat -in block1.aln -in2 block2.aln -action
• Manipulating DNA sequences
– t_coffee -other_pg seq_reformat -in sproteases_small_dna.fasta -action
+translate -output fasta_seq
– T-Coffee works better with proteins
• Fetching Structures
– t_coffee -other_pg extract_from_pdb -infile 1PPGE
• Dealing with Phylogentic Trees
– Comparing two phylogenetic trees
– seq_reformat -in sample_tree2.dnd -in2 sample_tree3.dnd -
action +tree_cmp -output newick
– Prunning Phylogenetic Trees
– seq_reformat -in sample_tree2.dnd -in2 sample_seq8.seq -
action +tree_prune -output newick
• Aligning Large datasets
– t_coffee sproteases_large.fasta -special_mode quickaln
– Faster Aligning mode with reduced accuracy
• Changing the Substitution Matrix
• Comparing Two Alternative Alignments
• Changing gap Penalty
– Meta Coffee computes multiple sequence
alignment using various specified and
installed MSA Packages.
– Latest mode
– Finds similar structure to use as template
Evaluating using iRMSD
• intra-catener Root Mean Square Deviation
1- Make sure you include two structures whose sequences are so distantly
related that most of the other sequences are intermediates.
2- Align your sequences without using the structural information (i.e.
3- Evaluate your alignment with irmsd (see later in this section). The score
will be S1
4- Realign your sequences, but this time using structural information
5- Measure the score of that alignment (Score=S2)
• S1 and S2 are almost similar, it means your distantly
related structures were well aligned
• Expresso claims to have the best results in this test
Aligning Reference Sequneces
• Download Balibase Reference Sequence
• Unzip Files
• Run t_coffee on the fasta files
• C program for aligning files
• System Calls to t_coffee
• Use the bali_score C program that comes with the
Balibase reference package
• Install and set the path for the expath XML parser
• Make file
– bali_score ref_aln test_aln
• Use C program with system calls to compare
– t_coffee aligned files
– balibase reference aligned files
– muscle aligned files (for comparison with t_coffee)
• Aligned BB50001.tfa fasta input file
– Using T-Coffee tcBB50001.msf (time taken very large in minutes)
– Using Muscle musBB50001.msf (time taken very small in seconds)
• Compared the alignment with the Balibase
SP Sum of Pairs 0.736
TC Total Column 0.240
SP Sum of Pairs 0.757
TC Total Column 0.400
Conclusions (so far)
• T- Coffee is slow
• No considerable increase in accuracy
• Newer modes of T-Coffee however seem
• Faster MSA packages like Muscle should
• Website http://www.tcoffee.org/
• Link for the Journal and source code
• General References www.wikipedia.org