Plant Phylogenomics: Defining Synergies Between Plant Systematics and Genome Biology
Sjolander, Kimmen .
Improving plant gene functional classification using structural phylogenomic analysis.
The standard protocol for functional annotation—transfer of annotation from top database hit—is known to be prone to systematic error due to gene duplication, domain shuffling, speciation, and mutations. Experts estimate the frequency of annotation error to range from 8-25%; existing errors can also be propagated using the same annotation protocol. Phylogenomic analysis, combining phylogenetic tree construction, integration of experimental data, and differentiation of orthologs and paralogs, has been shown to significantly improve annotation accuracy. Structural phylogenomics expands on this protocol to explicitly include protein structure considerations at different stages, to improve the sensitivity and specificity of classification. However, phylogenomic inference of gene function has limited application due to the computational complexity of the approach and the dependency on expertise in numerous tasks.
To address this need, the Berkeley Phylogenomics Group has developed the PhyloFacts Phylogenomic Encyclopedia for protein families across the Tree of Life (http://phylogenomics.berkeley.edu/phylofacts). As of April 1, 2007, the PhyloFacts Encyclopedia contains over 27K “books” for protein families and domains and over 988K hidden Markov models (HMMs). Each book contains a multiple sequence alignment of homologs, one or more phylogenetic trees, predicted subfamilies, Gene Ontology (GO) annotations and evidence codes, PFAM domains, cellular localization, predicted (or known) 3D structures and predicted critical residues. Each book contains an HMM for the family as a whole and for individual subfamilies, which are used for classification of novel sequences (submitted by users or in periodic updates to the resource). Biologists can submit DNA or protein sequences for classification to families and subfamilies, or for prediction of protein 3D structure. All data can be downloaded from the website.
In this talk, I will present new methods developed by my group for key tasks in a phylogenomic pipeline, and results of these analyses on selected protein families.
Log in to add this item to your schedule
PhyloFacts Phylogenomic Encyclopedia
1 - University of California Berkeley, Bioengineering, Berkeley, CA, 94720, USA
protein structure prediction.
Presentation Type: Symposium or Colloquium Presentation
Location: Stevens 2/Hilton
Date: Monday, July 9th, 2007
Time: 9:30 AM