Wednesday, November 6, 2013

Student's and my notes, dbSNP project


I think we are trying to retrieve from dbSNP XML files those SNPs that have clinical significance, i.e. disease-associated SNPs. But I found out that, there are no attributes for "clinical significance", or any OMIM info, in the dbSNP XML files.

Here is what I did:
1. In the following site: http://www.ncbi.nlm.nih.gov/snp/limits, choose "22" in "Chromosomes" and "OMIM" in "Annotation", and then search. There are 266 results returned. I take RS_4633 as the example.
2. I found a software "010 Editer" at http://www.sweetscape.com/010editor/, which can open the >3GB XML for Chromosome 22. And I extract the entry of RS_4633, which is in the attachment.
I could not found anything related to OMIM or clinical significance in the extracted info.

On http://www.ncbi.nlm.nih.gov/books/NBK44379/#Search.Finding_Records_with_OMIM_DataLin, it says:

"
What dbSNP report format will provide both a SNP and its specific OMIM ID number?
The easiest way you can get all the SNPs with OMIM links is from Entrez SNP:
1.Go to the Entrez SNP site.
2.Click on the grey “Limits” tab near the top of the page (just beneath the text search boxes).
3.Select the organism you are interested in from the organism list located at the top left of the page.
4.Scroll down the page almost to the bottom, until you find the list of “Annotation” limits. Select “OMIM”
5.Press the “Go” button located at the top of the page next to the empty text search box, and you will receive a list of your organism’s SNPs with OMIM annotation.
You can also get those SNPs with an OMIM ID number by downloading from the dbSNP FTP site: the OmimVarLocusIdSNP table contains the information you need for your organisim of interest (human, in this case). This table is located in your organism’s organism_data directory on the dbSNP FTP site."

Column definitions for this table are as follows:
ColumnDescription
1omim_id.
2The locus id the SNP is on
3 omim variation id.
4locus symbol
5Amino acid using the contig reference allele.
6Amino acid position in the protein.
7Amino acid of the snp variance.
8var class (used for internal dbSNP processing)
9snp_id (rs#)

Columns of OmimVarLocusIdSNP table are:
1omim_id.
2The locus id the SNP is on
3 omim variation id.
4locus symbol
5Amino acid using the contig reference allele.
6Amino acid position in the protein.
7Amino acid of the snp variance.
8var class (used for internal dbSNP processing)
9snp_id (rs#)

The last column should be the SNP rs ID. The 4th column seems to be gene name which may be mapped directly into the human protein network.


URL:
http://www.ncbi.nlm.nih.gov/books/NBK44379/#Search.Finding_Records_with_OMIM_DataLin

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_data/OmimVarLocusIdSNP.bcp.gz

http://www.ncbi.nlm.nih.gov/projects/SNP/docs/rs_attributes.html#clinical

No comments:

Post a Comment