About

Assembling the full-length protein sequence

Full-length protein sequencing is still challenging nowadays. A protein is first digested into peptides by proteases, and the peptides are then analyzed in a mass spectrometer. The mass spectra are analyzed and de novo identified as amino acid sequences. Then, an assembler is used to assemble peptide sequences into a long and complete protein sequence.

However, multiple problems often lead to insufficient assembly or misassembly of the protein sequence.

  • Instrumental errors, signal loss and noise often mislead the de novo sequencing algorithms, resulting in high error rate.
  • Impurities add noise into signal peaks, leading to incorrect amino acid identification.
  • Insufficient fragmentation efficiency leads to signal loss.
  • Single restriction protease can hardly produce overlapping peptides.
  • Additional errors introduced in the hydrolysis procedure (e.g. Microwave-assisted acid hydrolysis, MAAH).

We developed a new strategy to accurately assemble peptide sequences into a complete and accurate full-length protein sequence. Our strategy uses multiple MS experiments with different unspecific enzymes to hydrolyze the protein, and assemble the protein sequence using stepwise contig-scaffolding scheme.

  • Experimentally, multiple MS experiments with different unspecific enzymatic hydrolysis are performed.
  • The MS spectra are identified by de novo algorithms, e.g. pNovo, PEAKS, etc.
  • Then comes to our algorithm MuCS to assemble these peptide sequences into a complete protein sequence.
Learn More

Software tool

Download and user manual

System requirements

Windows 7 or later, 64-bit operating system

1GB usable memory

Free Licence

CC-BY-SA licence. You can freely use it.

Usage

  • This version of MuCS should be run in command line and located to the MuCS folder. (cd ../MuCS)
  • De novo peptide sequencing is performed in a separate step. MuCS do not perform de novo peptide sequencing. We recommend to use pNovo or PEAKS to do it. Please download and run either to get the top 10 peptide sequences for each spectra.
  • The result files are then transformed into fasta format, first line is integrated mass spectra name wth the number of the sequence (i.e. >spec.dta@P1, >spec.dta@P2), second line is the peptide sequence. We provided programs (PEAKS2fasta.exe and pnovocmd2fasta.exe) for this conversion:

    For PEAKS: PEAKS2fasta.exe -p PEAKS_result.csv -k kmer -o output.fa
    -p. The result file from PEAKS
    -k. Length of kmer
    -o. Output fasta file for MuCS process

    For pNovo: pnovocmd2fasta.exe pNovo.res
    pNovo.res is the result file from pNovo and the output file is a fasta format file for MuCS process.

  • Run the MuCS main program C-S_assembly.exe to assemble the de novo sequenced peptides into full-length protein sequences.

Command-line:
MuCS_assembly.exe -1 peptide1.fa -2 peptide2.fa -3 peptide3.fa -d denovotype -k kmer -o assembly_prefix -w workfold

Parameter explanation:

  • -1. De novo peptide sequencing result of hydrolysis strategy 1. fasta format.
  • -2. De novo peptide sequencing result of hydrolysis strategy 2. fasta format.
  • -3. De novo peptide sequencing result of hydrolysis strategy 3. fasta format.
  • -d. De novo sequencing software (PEAKS: 0; pNovo: 1)
  • -k. Length of kmer (default: 7)
  • -o. Prefix of assembly result file.
  • -w. Work and output folder.

F.A.Q

Frequently Asked Questions

to be constructed...

Contact

Contact Us

Please send email to zhanggong-uni [at] qq.com