Modeling and Computational Biology
Liang, Chun , Wang, Gang , Lin, Liu .
ConiferEST: an integrated bioinformatics system for data processing and mining of conifer expressed sequence tags (ESTs).
With the advent of low-cost, high-throughout sequencing, the amount of public domain Expressed Sequence Tag (EST) sequence data available for both model and non-model organism is growing exponentially. While these data represent by far the most abundant data resource for expressed protein-coding portions of various genomes, they also present a serious challenge for data verification and quality control due to the inherent deficiencies of EST sequences, particularly for species without genome sequences.
ConiferEST is a generic bioinformatics system for data processing, integration and mining of conifer expressed sequence tags (ESTs). It consists of a MySQL relational database with a PHP web application that communicates with the database. In its current release, version 1.0, it houses 172,229 loblolly pine EST sequence reads, which were obtained from processing raw DNA sequencer trace files using our newly developed in-house software – WebTraceMiner. The trace files were downloaded from NCBI Trace Archive. In contrast to all other existing public EST resources, ConiferEST provides biologists unique, easy-to-use data visualization and mining tools for a variety of putative sequence features, including the fidelity, location, orientation, order and number of cloning vector segments, adapter sequences, restriction enzyme sites, poly(A) and poly(T) runs, and their corresponding Phred quality values. Such features provide valuable insight for data verification and quality control of error-prone EST sequences, as well as the ability to identify features having potential biological function, such as 3’ polyA tails, relevant polyadenylation signals, and other sequence motifs embedded in large EST datasets. Interestingly, only about 30% of the designated 3’ EST sequences were found to have authenticated polyT tails at their 3’ ends, while fewer than 5% of the designated 5’ EST sequences had in-silico verified polyA tails at their 3’ ends.
ConfierEST represents a unique and complementary public resource for EST data integration and mining in conifers. It serves as a bridge between the NCBI resources, dbEST and Trace Archive, by processing raw DNA traces, identifying all putative sequence features and determining in-silico verified sequence features. Because of its generic and modular design, ConiferEST can be easily scaled to incorporate large volumes of additional EST traces from conifers, and can be readily adopted for any other species.
Log in to add this item to your schedule
1 - Miami University, Botany, 316 Pearson Hall, Oxford, Ohio, 45056, USA
2 - Miami University, Department of Botany, Oxford, Ohio, 45056, USA
3 - Miami University, Botany
Expressed Sequence Tags.
Presentation Type: Plant Biology Abstract
Location: Exhibit Hall (Northeast, Southwest & Southeast)/Hilton
Date: Sunday, July 8th, 2007
Time: 8:00 AM