=> Krish Karuturi
big data genomics, computational and informatics challenges
https://www.jax.org/research-and-faculty/tools/scientific-research-services/computational-sciences/staff/krishna-karuturi
https://github.com/TheJacksonLaboratory/civet
TORQUE resource manager
https://en.wikipedia.org/wiki/Comparison_of_cluster_software
benchmarking pipelines
https://en.wikipedia.org/wiki/SNV_calling_from_NGS_data#List_of_available_software
https://www.nature.com/articles/srep43169
GSEA
GSA, Effron & Tibshirani
XENOME
etherpad
https://public.etherpad-mozilla.org/p/2017-05-18-bigData-grad-prof
===============================
Peter Robinson, Ph.D., The Jackson Laboratory for Genomic Medicine
Phenotype driven genome analysis
https://scholar.google.com/citations?user=TPOD_XUAAAAJ&hl=en
Ontology, disambuilgous terms.
human phenotype ontology
information content (IC) of concept.
semantically similar diseases scores
PhenoBLAST
Washington NL 2009, Plos Biology
======================================
Y Ada Zhan, ChIP-seq
https://en.wikipedia.org/wiki/ChIP-sequencing
encodeproject.org
https://academic.oup.com/bib/article/17/6/953/2453197/A-comprehensive-comparison-of-tools-for
bd2kuser@ip-172-31-73-47:~/ChIPseq$ cat readme.txt
###################
# ChIP-seq module #
###################
# ChIP-seq data
In the directory ChIPseq/
GM12878_control_chr1.fastq
GM12878_CTCF_chr1.fastq
# Genome
In the directory ChIPseq/hg38/
GRCh38.chr1.fa
GRCh38.chr1.size
# Tools
fastqc (quality check)
bowtie (sequence mapping or alignments)
samtools (manipulating alignments in SAM format. BAM format is a compressed version of SAM file)
macs2 (peak calling)
bedtools (to handle sequence coordinate files in BED format)
bd2kuser@ip-172-31-73-47:~/ChIPseq$
bd2kuser@ip-172-31-73-47:~/ChIPseq$ cat workflow.sh
# quality check
fastqc GM12878_control_chr1.fastq
fastqc GM12878_CTCF_chr1.fastq
# Prepare genome
bowtie-build hg38/GRCh38.chr1.fa hg38/GRCh38.chr1
# Mapping
bowtie -m 1 -S ./hg38/GRCh38.chr1 GM12878_control_chr1.fastq > GM12878_control_chr1.sam
bowtie -m 1 -S ./hg38/GRCh38.chr1 GM12878_CTCF_chr1.fastq > GM12878_CTCF_chr1.sam
# Further processing
## compress to BAM
samtools view -bSo GM12878_control_chr1.bam GM12878_control_chr1.sam
samtools view -bSo GM12878_CTCF_chr1.bam GM12878_CTCF_chr1.sam
## sort
samtools sort GM12878_control_chr1.bam GM12878_control_chr1.sorted
samtools sort GM12878_CTCF_chr1.bam GM12878_CTCF_chr1.sorted
## index
samtools index GM12878_control_chr1.sorted.bam
samtools index GM12878_CTCF_chr1.sorted.bam
# Peak calling
macs2 callpeak -t GM12878_CTCF_chr1.sorted.bam -c GM12878_control_chr1.sorted.bam -f BAM -g 175000000 -n GM12878_CTCF_chr1 -B -q 0.01
# Check the peak model
Rscript GM12878_CTCF_chr1_model.r
# Motif analysis
## extend summits 100bp on both directions
bedtools slop -i GM12878_CTCF_chr1_summits.bed -g hg38/GRCh38.chr1.size -b 100 > GM12878_CTCF_chr1_summits_ext.bed
## get sequence file (i.e. fasta)
bedtools getfasta -fi hg38/GRCh38.chr1.fa -bed GM12878_CTCF_chr1_summits_ext.bed -fo GM12878_CTCF_chr1_summits_ext.fa
## The .fa file will be uploaded to MEME online server for motif discovery (http://meme-suite.org/tools/meme)
BED file format
MEME motif discovery
ChiPseek website for interactive data analysis,
No comments:
Post a Comment