EN
Iterative Genome Correction Strategy


Background

For nonmodel species, De novo assembly will face a series of problems such as expensive, time-consuming, highly assembly error rate and annotation difficulties. Iterative closely related Genome Correction is a relative energy saving, accurate and reliable strategy. As traditional algorithm has low error-tolerant rate, so high difference reference can't be revised. Due to FANSe has high precision accuracy and high error-tolerant rate, which can solve reference problem of nonmodel species to a large extent.

Principle
Three steps, namely, read mapping –SNV calling−genome correction, are iteratively performed. Each iteration corrects a fraction of SNVs and approximates the actual genomic sequence. More reads can be mapped to the corrected genome sequence in the next round to correct more SNVs. The iterations terminate after several rounds when the genome sequence is enough corrected.




Application

1.RNA denovo
2.Protein spectrum identification
3.Group genotyping

Case study:
Iterative Genome Correction Largely Improves Proteomic Analysis of Nonmodel Organisms


Background: A Bacillus pumilus (B. pumilus) strain was picked from a single colony of a freshly streaked plate, Pair-end sequencing was performed on an Illumina HiSeq-2000 apparatus for 2 × 100 cycles. In this study, we demonstrated a new strategy to correct genome sequence in an iterative manner based on our stable, accurate, and error-tolerant FANSe mapping algorithm. This strategy can deal with the genome that deviates from the reference genome ~5% and export corrected reference protein sequence. The use of the corrected protein database in MS/MS spectra searching significantly enhanced protein and peptide identification. This strategy largely facilitates functional proteomic analysis on nonmodel organisms.

Result:To examine the result of genomic sequence correction, we amplified four randomly chosen fragments from our B. pumilus genome (in coding sequences of BPUM_1139, hisC, rpsC, and yecD genes) and sequenced them using capillary electrophoresis. The chromatograms showed all clear peaks. Our method fully corrected these fragments and was 100% validated by the capillary sequencing results. Bowtie2 almost failed to correct any nucleotide in these fragments, while Stampy erroneously identified the deletion of six nucleotides in BPUM_1139 and failed to correct 19 mismatches in BPUM_1139, hisC, and yceD. This experimentally validated the power of our method, while it exhibited the false positives and false negatives of Bowtie2 and Stampy.






参考文献
[1] X Wu, L Xu, W Gu, Q Xu, QY He *, X Sun *, G Zhang * Iterative Genome Correction Largely Improves Proteomic Analysis of Nonmodel Organisms. Journal of proteome research (2014), 13 (6), 2724–2734.