Thursday, August 22, 2013

hSNP project, 0822 notes

Current goal: 
 Parse dbSNP XML data file into csv files, map SNP into genes, and then perform network permutation analysis. Are there network clustering difference of diseases between races?

 dbSNP XML -> parse coordindates --> genes --> map to gene network --> association patterns?



Tasks
0) Figure out collaboration on project using GitHub.

1a) Download dbSNP data from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/genotype/

1b) Download the human reference genome. http://www.ncbi.nlm.nih.gov/genome/guide/human/
(There are probably better sources for this. Hong needs to check the right version for dbSNP).

1c) Register and download the OMIM database http://omim.org/downloads. OMIM cannot be shared with third party. This is another database for human disease information.

2a) Parse dbSNP XML files into csv format or tab-delimited format. This probably can be done using Python and bioPython.

2b) Map SNPs coordinates and disease association into genes using the human reference genomes. 

2c) Cross-valiatation of dbSNP disease associations with OMIM.

3) Network permutation test of disease clustering by races. Are there racial differences from network perspectives?



Hong needs to check
dbGaP,  Genetics association database

Which human reference genome should we use for dbSNP? Or, it should not matter?



No comments:

Post a Comment