Showing posts with label SNPs. Show all posts
Showing posts with label SNPs. Show all posts

Friday, January 8, 2016

COSMIC genotype


v75
~/genotype/

gunzip 1240121_complexGenotypes.csv.gz

Byte-4:genotypes hqin$ wc -l 1240121_complexGenotypes.csv

  884149 1240121_complexGenotypes.csv
There are 884K rows of SNPs in this file. 


Genotypes
---------

Files listing the SNP calls for each cell line identified by PICNIC analysis of
Affymetrix SNP6.0 array data. Both a simple genotype (AA, BB – homozygous or AB
– heterozygous) and a complex interpretation of the genotype are given (for
example, in a triploid region of the genome the genotype maybe AAB). 

Download from genotypes directory.

File Description

Chr - Chromosome GRCh38/hg38

pos - Genome Position GRCh38/hg38
ncopies.A - Number of copies of allele A
ncopies.B - Number of copies of allele B
Probe.Set.ID - SNP6.0 probe ID
dbSNP.RS.ID - dbSNP reference ID
Allele.A - genotype 'A' nucleotide
Allele.B - genotype 'B' nucleotide
chr_b36 - Chromosome NCBI36/hg18
pos_b36 - Genome Position NCBI36/hg18
chr_b37 - Chromosome GRCh37/hg19
pos_b37 - Genome Position GRCh37/hg19
complexGenotype - a complex interpretation of the genotype eg in a triploid
region the genotype maybe AAB
simpleGenotype - a simple genotype eg AA, BB – homozygous or AB –
heterozygous

Wednesday, April 30, 2014

Li Ma defense, haplotype, Qing Song lab, Morehouse medical school


Fu and Ma in prep


Teri manlolio,

Schizophrenia, Lee 2012, GWAS
http://www.nature.com/ng/journal/v44/n3/abs/ng.1108.html

Smemo etal Nature 2014 longrage haplotype FTO IRX3
 http://www.nature.com/nature/journal/v507/n7492/full/nature13138.html


haplotyping methods
GMP 2001, nat genetics
Qiagen 2005
Polony2006, Nat Genet
Barcode 2009, Nat Method
Fosmid 2011, Nat biotech
HiC 2013, Nat Biotech
Illumina 2014, Nat biotech,


Laser microdissection of chromosomes, take half of 46 chrosomes, by chance we can get some single chromosomes which can be used for haplotyping.
23 chromosome is 3.5pg,  amplified to 5-8ug for highthroughput sequencing.
Use heterozygosity of identify diploid and haploid chromosomes.

HiFi software
http://www.cs.gsu.edu/?q=node/536

Quake Dataset
http://www.cbcb.umd.edu/software/quake/








Friday, October 11, 2013

notes on dbSNP, in progress



Multiple genome locations can be mapped to a single SNP.  One reasons is the ambiguity of alignment.
http://www.biostars.org/p/2323/


https://cgsmd.isi.edu/dbsnpq/downloads.php


Wednesday, August 21, 2013

Biopython and SNP


References:
http://comments.gmane.org/gmane.comp.python.bio.devel/8928

https://github.com/ngopal/23andMe


http://biopython.org/pipermail/biopython/2010-April/006416.html
2010/4/13 Tiago Antão <tiagoantao at gmail.com>:
> Hi,
>
> Just a simple question:
> Entrez SNP seems to return ASN.1 format only.
> Is there any way to parse this in biopython? I've looked at SeqIO and
> found nothing...
> I can think of tools to process this outside, but I am just curious if
> this is processed natively with Biopython (being an exposed NCBI
> format...)
>
> Many thanks,
> Tiago
> PS - You can easily try this with:
> hdl = Entrez.efetch(db="snp", id="3739022")
> print hdl.read()

Hi Tiago,

No, we don't support ASN.1, and I don't see any good reason to - I
think it would only be NCBI ASN.1 we'd we interested in, and I think
that all their resources are available in other easier to use formats
like XML these days.

See also http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

Instead ask Entrez to give you the SNP data as XML:

Entrez.efetch(db="snp", id="3739022", retmode="xml")

Hopefully the SNP XML file has everything in it.

You have a choice of Python XML parsers to use. However, the
Bio.Entrez parser doesn't like this XML. This appears to be related
(or caused by) a known NCBI bug. See
http://bugzilla.open-bio.org/show_bug.cgi?id=2771

Peter