Genomics / Proteomics
Moustafa, Ahmed , Bhattacharya, Debashish .
ESTeasy: a user-friendly tool for processing EST data and its application in a comparative analysis of four microalgal species.
We make the following hypotheses in this work: 1) Across different algal species, the same set of genes have a tendency to be duplicated, and 2) SNP analyses can provide an estimate of the number of gene duplication events and help us identify ancient versus more recent duplications. To address these hypotheses we have developed ESTeasy, a user-friendly and interactive pipeline for processing expressed sequence tag (EST) data. Our pipeline starts with assessing the qualities of the reads and providing an overall estimate of the distribution of read lengths and ratios of the deterministic (unambiguous) nucleotides to all nucleotides within each read and the average ratio across all reads. Based on the quality assessment, the user can choose to do a combination of trimming and filtering to generate a uniform and acceptable distribution of reads. The next step is to identify polymorphic sites (SNPs) for the acceptable reads using PolyPhred. ESTeasy then reads the SNP report from PolyPhred, performs statistical analyses, and produces a graphical representation of the distribution of the SNPs within all UniGene clusters. This output is useful for understanding the relationship between cluster sizes (number of reads within a cluster), frequencies of sizes, and number of SNPs. Using ESTeasy to perform a comparative analysis of EST data sets from four microalgal species, Alexandrium tamarense, Cyanophora paradoxa, Emiliania huxleyi, and Karenia brevis, we provide preliminary insights into the evolutionary history of these four genomes. We are currently working on improving the accuracy of SNP identification to reduce the false positive rate that is introduced through sequencing errors and/or computational artifacts.
Log in to add this item to your schedule
1 - University of Iowa, Genetics, 456 Biology Building, Iowa City, Iowa, 52242, USA
2 - University of Iowa, Biology, Roy J. Carver Center for Comparative Genomics, 446 Biology Building, Iowa City, Iowa, 52242, USA
Presentation Type: Oral Paper:Papers for Topics
Location: Boulevard A/Hilton
Date: Wednesday, July 11th, 2007
Time: 4:45 PM