Phylome Analysis Tool

Tancred Frickey and Andrei Lupas
Max Planck Institut fuer Entwicklungsbiologie
Spemannstr. 35; 72076 Tuebingen, Germany

PHAT is part of the PhyloGenie package which is available: Download


Automated analysis of phylogenetic trees is far more complex than analysis of, for example, pairwise similarities such as calculated by BLAST or PSI-BLAST. For the phylome inference package "PhyloGenie" to be regarded as useful, there has to be a way to separate relevant from irrelevant trees for the question at hand. PHAT (Phylome Analysis Tool) is a tool that filters a set of Newick trees (New Hampshire Bracket Format) for those corresponding to topological selection criteria. Careful choice of the selection criteria can greatly reduce the number of trees that have to be manually examined.


Screenshot of the PHAT graphical user interface short info:

  • 'Config File' lets you load a default configuration.
  • 'Input Directory' specifies the directory where the tree files can be found.
  • 'Taxonomy Directory' denotes where to find the NCBI taxonomy files 'names.dmp' and 'nodes.dmp'.
  • 'Outfile' specifies where the program output should be stored.
  • 'Exclude the following from analysis': All taxonomic descriptions entered here are not taken into account (do not influence) the analysis.
  • 'Selection String' the node topology to select for (here: a node with Thermoplasmata and Sulfolobus sequences and no other Archaea).
  • 'Input File Extension' the file extension of trees in the input directory.
  • 'Minimum Bootstrap Value' All nodes with bootstrap support below X are collapsed prior to analysis.
  • 'Output to File / Directory' determines wether to save the names of positive trees to a file or the trees to a directory.
  • 'Redo all / Only do new files' repeat analysis for all files or only for the files not previously examined. 

Selection Syntax

The following elements can be combined to form queries:

Example selection Strings


Rooting Scheme

Rooting a tree is done in 4 steps:

A: unrooted tree.

  1. The unrooted tree is rooted at the seed sequence (here: Man) and a taxonomic description is assigned to all descendant nodes (Fig:b).
  2. Next, the tree is rerooted at the tipnode the least related (in terms of taxonomy) and most distant from the seed sequence in number of nodes (here: rerooted at E. coli K12).
  3. Taxonomic levels are then reassigned to each node. Combining taxonomic assignments from both trees (rooted at Man vs. E. coli K12) and retaining the assignment closest to 'species' level (or most distant from 'root') yields Figure c.
  4. The tree used for analysis is the one rooted at the most basal node (closest to 'root) most distant from the 'seed' sequence. This guarantees correct directionality for at least the branch containing the 'seed' sequence.

Related content