Use the following code to get protein sequences and gene names from NCBI
echo -e "3119" | while read G; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=protein&id=${G}" | grep -A 1 "<Link>" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read S ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${S}&retmode=text&rettype=fasta" ; done; done > _out3119.txt
See https://www.biostars.org/p/52652/
We decided to run this code individually for each id, because there are multiple output sequence for each NCBI gene ID.
No comments:
Post a Comment