Saturday, July 5, 2014

Yeast ORF Name table from FASTA headers



$ grep ">" sce.orf.faa > /tmp/sce.orf.name.txt
$ cut -f1-2 -d ' ' sce.orf.name.txt > tmp.txt 

Byte:tmp hqin$ cut -f1-2 -d ' ' sce.orf.name.txt > tmp.txt 
Byte:tmp hqin$ head tmp.txt 
>YAL001C TFC3
>YAL002W VPS8
>YAL003W EFB1
>YAL005C SSA1
>YAL007C ERP2
>YAL008W FUN14
>YAL009W SPO7
>YAL010C MDM10
>YAL011W SWC3
>YAL012W CYS3

Byte:tmp hqin$ cut -f2 -d '>' tmp.txt > ORF_name.txt
Byte:tmp hqin$ head ORF_name.txt 
YAL001C TFC3
YAL002W VPS8
YAL003W EFB1
YAL005C SSA1
YAL007C ERP2
YAL008W FUN14
YAL009W SPO7
YAL010C MDM10
YAL011W SWC3
YAL012W CYS3

I then use TextWrangler and removed two single quote ' 

([\w-]+)\s([\w-]+) replaced with \1\t\2




The generated file is saved as "SceORF_name.csv"




No comments:

Post a Comment