Wednesday, September 23, 2015

MBryant 20150923Wed


Use the following code to get protein sequences and gene names from NCBI

echo -e "3119" | while read G; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=protein&id=${G}" | grep -A 1 "<Link>" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read S ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${S}&retmode=text&rettype=fasta" ; done;  done > _out3119.txt

See https://www.biostars.org/p/52652/

We decided to run this code individually for each id, because there are multiple output sequence for each NCBI gene ID. 




No comments:

Post a Comment