Showing posts with label dbSNP. Show all posts
Showing posts with label dbSNP. Show all posts

Friday, January 8, 2016

COSMIC genotype


v75
~/genotype/

gunzip 1240121_complexGenotypes.csv.gz

Byte-4:genotypes hqin$ wc -l 1240121_complexGenotypes.csv

  884149 1240121_complexGenotypes.csv
There are 884K rows of SNPs in this file. 


Genotypes
---------

Files listing the SNP calls for each cell line identified by PICNIC analysis of
Affymetrix SNP6.0 array data. Both a simple genotype (AA, BB – homozygous or AB
– heterozygous) and a complex interpretation of the genotype are given (for
example, in a triploid region of the genome the genotype maybe AAB). 

Download from genotypes directory.

File Description

Chr - Chromosome GRCh38/hg38

pos - Genome Position GRCh38/hg38
ncopies.A - Number of copies of allele A
ncopies.B - Number of copies of allele B
Probe.Set.ID - SNP6.0 probe ID
dbSNP.RS.ID - dbSNP reference ID
Allele.A - genotype 'A' nucleotide
Allele.B - genotype 'B' nucleotide
chr_b36 - Chromosome NCBI36/hg18
pos_b36 - Genome Position NCBI36/hg18
chr_b37 - Chromosome GRCh37/hg19
pos_b37 - Genome Position GRCh37/hg19
complexGenotype - a complex interpretation of the genotype eg in a triploid
region the genotype maybe AAB
simpleGenotype - a simple genotype eg AA, BB – homozygous or AB –
heterozygous

Thursday, May 29, 2014

network hSNP, CNV note

CNV genes show no SNP with frequency from Bio-Q and Ensemble.

http://bioq.saclab.net/query/submit.php?db=bioq_dbsnp_human_138


Tuesday, May 20, 2014

dbSNP ftp site


ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_data/

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_data/OmimVarLocusIdSNP.bcp.gz

Schema
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_schema/

Friday, February 1, 2013

How to obtain SNP genotype data? (to be continued)

How to obtain SNP genotype data? (A work in progress)

From dbSNP
I want to use either R or Python to analysis the dbSNP data for different populations.  We are especially interested in the human clinic associated variants. ENsemble imports dbSNP data, and it shows clinic associated snps, frequency in different populations. An example is s1333049

FAQ for dbSNP offers some tips on downloading flat file. Some are related to my purpose here.
Q: I would like to use a script to fetch average allele frequency data for each human SNP from every web page, but I’m afraid that my IP will be blocked by the server for the heavy usage.
A: In general, large amounts of data can be obtained using our ftp, efetch or batch query services. Specifically, if you only need SNP allele frequency data, then use SNPAlleleFreq.bcp.gz, which is located on our ftp site. The Allele.bcp.gz file, also available on the ftp site, has Allele_id to allele_string mapping.
 At the SNP Eutility website, some parameters are listed for batch fetching of various types of data, including Genotype XML.
 eFetch params for EntrezSNP:
# (id=NNNNNN[,NNNN,etc]) or (query_key=NNN, where NNN - number in the history, 0 - clipboard content for current database)
# db=snp (mandatory)
# report= (listed below)

A BioStar discussion on obtaining SNP information discussed UCSC, ensemble, and dbSNP.  Apparently, different Python parsers are needed for SNP xmal data in comparison to other Entrez xml data. Some Python parser for SNP XML were discussed on BioStar.

From Ensemple, 
A PERL example can be found at Biostar.

From Bioconductor,
Bioconductor provide some dnSNP build, as recent as build 137.

From UCSC MySQL
At UCSC website, a discussion on SNP suggests download the xml files from dbSNP. These files are what UCSC used to integrated dbSNP into their annotations. One user commented that dbSNP merged frequency reported by different labs and can lead to biases.
The help page is:  http://genome.ucsc.edu/goldenPath/help/mysql.html
However, "Bot access and excessive program-driven use are not permitted" by UCSC.

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'select name,chrom,chromStart,chromEnd,observed from snp130 where name="rs35568883"'
+------------+-------+------------+----------+----------+
| name       | chrom | chromStart | chromEnd | observed |
+------------+-------+------------+----------+----------+
| rs35568883 | chr21 |   38782125 | 38782126 | A/G      |
+------------+-------+------------+----------+----------+
 
 
 
From HapMap
 
 
http://hapmap.ncbi.nlm.nih.gov/



From 1000genome

  FTP site: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data 
 
http://www.1000genomes.org/data
 
  Links and References:
  • https://cgsmd.isi.edu/dbsnpq/
  • XML file for genotype data: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/genotype/
  • SNP Eutility, http://www.ncbi.nlm.nih.gov/SNP/SNPeutils.htm 
  •  http://www.1000genomes.org/data
  • http://hapmap.ncbi.nlm.nih.gov/ 
     
    human disease database 
    http://www.genecards.org/cgi-bin/listdiseasecards.pl?type=full