Wednesday, September 23, 2015

Use the following code to get protein sequences and gene names from NCBI

echo -e "3119" | while read G; do curl -s "${G}" | grep -A 1 "<Link>" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read S ; do curl -s "${S}&retmode=text&rettype=fasta" ; done;  done > _out3119.txt


We decided to run this code individually for each id, because there are multiple output sequence for each NCBI gene ID. 

