RenameRefSeq

Download RenameRefSeq

This tool removes the spaces in the FASTA name of reference sequences. It is originally designed to rename the RefSeq-RNA database, but you can also use it to process other FASTA format reference sequence files to remove the spaces in their names.

For example, the RefSeq-RNA for mouse mm10 looks like this:

>NR_037984 1
tgaagtggctgtaagcaagagggacaattaccacaccctatctccccttc
gattccacctttgtgataacaaaattaccacagggcaggaggagttggtc
ccctaaacaggaccatctcaaacccagcttcactactgagaagctggccc
tacgccttcctcaagaggaaacacctgagcccctatccacggcatgcagg
...

Note that there is a suffix " 1" after the real NR id "NR_037984". This tool removes the space and everything after it:

>NR_037984
tgaagtggctgtaagcaagagggacaattaccacaccctatctccccttc
gattccacctttgtgataacaaaattaccacagggcaggaggagttggtc
ccctaaacaggaccatctcaaacccagcttcactactgagaagctggccc
tacgccttcctcaagaggaaacacctgagcccctatccacggcatgcagg
...

Another example, the name of HIV genome downloaded from NCBI database looks like this:

>gi|9629357|ref|NC_001802.1| Human immunodeficiency virus 1, complete genome

After processing with this tool, it will be like this, containing still enough information to identify this sequence:

>gi|9629357|ref|NC_001802.1|

This tool needs .NET framework 4.0 to run.

 

How to use:

There is only one button. Click it and choose the FASTA file you want to process. The program will output another file with the name added one "a" in the file name.