Thursday, May 18, 2017

day4, jackson lab,

=> Krish Karuturi
big data genomics, computational and informatics challenges

https://www.jax.org/research-and-faculty/tools/scientific-research-services/computational-sciences/staff/krishna-karuturi

https://github.com/TheJacksonLaboratory/civet
TORQUE resource manager

https://en.wikipedia.org/wiki/Comparison_of_cluster_software

benchmarking pipelines

https://en.wikipedia.org/wiki/SNV_calling_from_NGS_data#List_of_available_software

https://www.nature.com/articles/srep43169

GSEA
GSA, Effron & Tibshirani

XENOME

etherpad
https://public.etherpad-mozilla.org/p/2017-05-18-bigData-grad-prof


===============================
Peter Robinson, Ph.D., The Jackson Laboratory for Genomic Medicine
Phenotype driven genome analysis

https://scholar.google.com/citations?user=TPOD_XUAAAAJ&hl=en

Ontology, disambuilgous terms.

human phenotype ontology

information content (IC) of concept.

semantically similar diseases scores

PhenoBLAST
Washington NL 2009, Plos Biology

======================================
Y Ada Zhan, ChIP-seq

https://en.wikipedia.org/wiki/ChIP-sequencing

encodeproject.org
https://academic.oup.com/bib/article/17/6/953/2453197/A-comprehensive-comparison-of-tools-for

bd2kuser@ip-172-31-73-47:~/ChIPseq$ cat readme.txt
###################
# ChIP-seq module #
###################

# ChIP-seq data
In the directory ChIPseq/
 GM12878_control_chr1.fastq
 GM12878_CTCF_chr1.fastq

# Genome
In the directory ChIPseq/hg38/
 GRCh38.chr1.fa
 GRCh38.chr1.size

# Tools
 fastqc (quality check)
 bowtie (sequence mapping or alignments)
 samtools (manipulating alignments in SAM format. BAM format is a compressed version of SAM file)
 macs2 (peak calling)
 bedtools (to handle sequence coordinate files in BED format)
bd2kuser@ip-172-31-73-47:~/ChIPseq$




bd2kuser@ip-172-31-73-47:~/ChIPseq$ cat workflow.sh
# quality check
fastqc GM12878_control_chr1.fastq
fastqc GM12878_CTCF_chr1.fastq

# Prepare genome
bowtie-build hg38/GRCh38.chr1.fa hg38/GRCh38.chr1

# Mapping
bowtie -m 1 -S ./hg38/GRCh38.chr1 GM12878_control_chr1.fastq > GM12878_control_chr1.sam
bowtie -m 1 -S ./hg38/GRCh38.chr1 GM12878_CTCF_chr1.fastq > GM12878_CTCF_chr1.sam

# Further processing
## compress to BAM
samtools view -bSo GM12878_control_chr1.bam GM12878_control_chr1.sam
samtools view -bSo GM12878_CTCF_chr1.bam GM12878_CTCF_chr1.sam
## sort
samtools sort GM12878_control_chr1.bam GM12878_control_chr1.sorted
samtools sort GM12878_CTCF_chr1.bam GM12878_CTCF_chr1.sorted
## index
samtools index GM12878_control_chr1.sorted.bam
samtools index GM12878_CTCF_chr1.sorted.bam

# Peak calling
macs2 callpeak -t GM12878_CTCF_chr1.sorted.bam -c GM12878_control_chr1.sorted.bam -f BAM -g 175000000 -n GM12878_CTCF_chr1 -B -q 0.01

# Check the peak model
Rscript GM12878_CTCF_chr1_model.r

# Motif analysis
## extend summits 100bp on both directions
bedtools slop -i GM12878_CTCF_chr1_summits.bed -g hg38/GRCh38.chr1.size -b 100 > GM12878_CTCF_chr1_summits_ext.bed
## get sequence file (i.e. fasta)
bedtools getfasta -fi hg38/GRCh38.chr1.fa -bed  GM12878_CTCF_chr1_summits_ext.bed -fo GM12878_CTCF_chr1_summits_ext.fa
## The .fa file will be uploaded to MEME online server for motif discovery (http://meme-suite.org/tools/meme)


BED file format


MEME motif discovery

ChiPseek website for interactive data analysis,




No comments:

Post a Comment