est overall assembly size and highest average BIO GSK-3 inhibitor ortholog hit ratio, a measure of assembly quality, The size of this final assembled E. propertius tran scriptome is similar to that previously produced for the related butterfly species M. cinxia, While the final P. zelicaon assembly is somewhat larger, differences in assembly size between assemblers and parameter sets were similar to those seen for E. propertius. The custom Celera assembly for E. propertius resulted in 17,110 contigs and 10,934 singletons, for a total of 28,044 unigenes. Both the average contig length and aver age singleton length are noticeably larger than previous studies at 753 bp and 324 bp, respectively. Cleaned P. zelicaon ESTs assembled into 19,110 contigs and 18,847 singletons, The larger number of unassembled single tons for P.
zelicaon may be due to mitochondrial rRNA sequences, Figure 1 shows the distribu tions of contig and singleton lengths for both species. other detailed assembly statistics also are found in Table 2. Average contig coverage was 10× for E. propertius and 9. 6× for P. zelicaon. Figure 2 shows the contig coverage distributions for the two transcrip tomes and the BIO GSK-3 inhibitor average sequence length for contigs within each coverage bin on a log scale. As expected and as found in previous studies, there was a positive correlation between contig length and the number of reads incorporated, Figure 2 also shows that contigs with very high coverage tend to be shorter in length. Annotation Bombyx mori, Gene Assembly Completeness We compared the unigene sets to the predicted protein database for Bombyx mori, the silkworm, for which full genome data are available, This reference dataset con tains 14,623 predicted B.
mori proteins. Of the 28,044 E. propertius unigenes, 9,393 had BLASTX hits to 7,866 unique B. mori predicted proteins. 5,289 unigenes hit more than one B. mori protein, PluriSln 1 5,449 B. mori proteins were hit by more than one unigene, Of the 37,957 P. zelicaon unigenes, 12,485 hit 8,359 unique B. mori predicted pro teins. 6,518 hit more than one protein, and 5,883 proteins were hit by more than one unigene, Figure 3 shows the distribution of 24 categories for gene ontology terms, each categorized into three higher level categories, asso ciated with the unigenes and the B. mori dataset, For the purposes of this study, we consider each uni gene and its best B.
Haematopoiesis mori BLASTX hit to be orthologs, and we consider the hit region in the unigene to be a con servative estimator of the putative coding region. Thus, we can compute PluriSln 1 the percentage of a unigene found by dividing the length of the putative coding region by the total length of the ortholog. This ratio, which we call the ortholog BIO GSK-3 inhibitor hit ratio, is described in Figure 4. The assump tion is that the unigene and its best hit are orthologs and not paralogs or some other mis association. Using the conservative, BLAST based annotation to find putative coding regions, as opposed to non comparative methods such as ESTScan, ensures that hit ratios are not over estimated. The ortholog hit ratio gives an estimate on the amount of a transcript contained in each unigene. PluriSln 1 If there are rel ative insertions in best hit B.
mori BIO GSK-3 inhibitor proteins, this will tend to lower ortholog hit ratios, whereas relative insertions in unigenes will artificially inflate ortholog hit ratios. Ortholog hit ratios greater than 1. 0 likely indicate large insertions PluriSln 1 in unigenes. Figures 5 and 5 show ortholog hit ratio in terms of assembly coverage of unigenes, For E. propertius contigs with less than the median assembly coverage of 3. 3×, the average ortholog hit ratio was 0. 35. For those with greater than median coverage, the average ratio was 0. 56. The corresponding averages for P. zelicaon were 0. 34 and 0. 55, respectively. Thus, completeness of unigene assembly is partially governed by assembly coverage as expected. Figures 5 and 5 relate ortholog hit ratio to the length of the B. mori ortholog. As found in other studies, completeness of gene disco
Friday, April 25, 2014
The World's Extremely Unusual SC144PluriSln 1 Tale
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment