This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Wednesday, October 28, 2015
Unix/Linux exercises
http://www.doc.ic.ac.uk/~wjk/UnixIntro/Exercise1.html
http://www.doc.ic.ac.uk/~wjk/UnixIntro/index.html
Protocol for using MEGA to generate a 16s rDNA tree for giant panda evolution
Overview:
We will first align the 16s rDNA sequences and then generate a neighor-joining tree.
The steps are illustrated below:
Start MEGA6 software
Choose "Align"
Choose "No" because 16s rDNA is not protein-coding gene.
Choose the appropriate alignment file, and then "Open"
Generated tree (only partial picture is shown here)
We will first align the 16s rDNA sequences and then generate a neighor-joining tree.
The steps are illustrated below:
Start MEGA6 software
Click "File" --> "Open A File/Session", Choose "panda_16srDNA.fas".
Click "Alignment"-->"Align by ClustalW"
Click "OK" to select all sequences,
Click "OK" to accept the default setting for alignment
Click "No", because 16s rDNA is an RNA coding sequence
Your sequences are now aligned.
Click "Data" -->"Export Alignment"-->"MEGA Format"
Input an appropriate file name.
Input an appropriate title:
Choose "No" because 16s rDNA is not protein-coding gene.
Now, we can start to generate a phylogeny.
At the main MEGA window, click "Phylogeny" -->"Construct/Test Neighbor-Joining Tree":
Choose the appropriate alignment file, and then "Open"
Click "Computing" to use the default setting
Generated tree (only partial picture is shown here)
Save your image to use in your lab report.
Tuesday, October 27, 2015
bio386 20151027Tue
Research topic update.
Gordon login, and Unix shell practice.
Problems:
1. There is a delay for Gordon to update user profiles. Some students cannot log into Gordon after I add them to Gordon.
2. My login attempt to "login.xsdede.org" led to "access denied".
Gordon login, and Unix shell practice.
Problems:
1. There is a delay for Gordon to update user profiles. Some students cannot log into Gordon after I add them to Gordon.
2. My login attempt to "login.xsdede.org" led to "access denied".
Monday, October 26, 2015
use nutrient plates for yeast
Oct 26-Nov3. 2015
Use nutrient plates for yeast. The growth is extremely slow. Small colonies can be seen after 1 week.
Use nutrient plates for yeast. The growth is extremely slow. Small colonies can be seen after 1 week.
Tuesday, October 20, 2015
bio233, review for exam 2
Course evaluation
learning outcomes
key concepts
mentioned that cAMP and glucose are in a seesaw.
learning outcomes
key concepts
mentioned that cAMP and glucose are in a seesaw.
bio386, 20151020Tue, go over exam 1, key
Screen cast of exam 1 key codes.
Use some of student video.
Use some of student video.
Monday, October 19, 2015
bio233, 20151019Mon
_ submission of oral presentation topic.
2:20pm, _ distribute midterm grades,
2:300-3pm -> count colonies
_ tally, marker, counting colonies
_ input data on googleSheet
3:20pm_ practical test on aseptic streak for single colonies.
_ microscrope exam
2:20pm, _ distribute midterm grades,
2:300-3pm -> count colonies
_ tally, marker, counting colonies
_ input data on googleSheet
_ microscrope exam
Sunday, October 18, 2015
bio233 midterm grades
byte:midterm,Fall15 hqin$ pwd
/Users/hqin/Dropbox/courses.student.research.dp/bio233.Fall2015_dpbx/course.management/grades/midterm,Fall15
/Users/hqin/Dropbox/courses.student.research.dp/bio233.Fall2015_dpbx/course.management/grades/exam1,fall15
Friday, October 16, 2015
reorganize mactower network aging directories
The directory is messy due to the increasing complexity of the project. I decided to make separate folders for the grid-network simulation part and original network part.
byte:mactower-network-failure-simulation hqin$ ll
drwxr-xr-x 7 hqin staff 238B Oct 16 09:45 1.grid.network.2015Oct
drwxr-xr-x 7 hqin staff 238B Oct 15 11:28 1.ori.ginppit.2015Oct
I copy-pasted some R code into 1.grid.network.2015Oct, but was not certain how to work on it.
Thursday, October 15, 2015
bio386, exam 1, due, Q&A sessions.
Ask students to work on their research project due.
Pick a primary research paper for presentation.
Many students submitted screencast on their exam problems. I realized that I can use student video as tutorials.
Pick a primary research paper for presentation.
Many students submitted screencast on their exam problems. I realized that I can use student video as tutorials.
bio233, metabolic regulation
Review questions in Chapter 6.
Review questions in Chapter 7.
Lac operon
video recording
https://youtu.be/90qP3OdPT0U
Reference
http://hongqinlab.blogspot.com/2014/03/bio233-chapter-8-gene-regulation-lac.html
https://www.youtube.com/watch?v=YzmGJgmrSQU
Review questions in Chapter 7.
Lac operon
video recording
https://youtu.be/90qP3OdPT0U
Reference
http://hongqinlab.blogspot.com/2014/03/bio233-chapter-8-gene-regulation-lac.html
https://www.youtube.com/watch?v=YzmGJgmrSQU
Wednesday, October 14, 2015
20151014_generate_grid_networks.R
Purpose: generate grid network to my simulation code, to see whether simulation can reproduce analytic results.
file: 20151014_generate_grid_networks.R
helen:mactower-network-failure-simulation hqin$ pwd
/Users/hqin/github/mactower-network-failure-simulation
helen:mactower-network-failure-simulation hqin$ ll -t | head
total 201880
-rw-r--r-- 1 hqin staff 35K Oct 14 22:05 _Degree4N1000_network.csv
-rw-r--r-- 1 hqin staff 9.8K Oct 14 22:05 _Degree4N100_EssenLookupTb.csv
-rw-r--r-- 1 hqin staff 1.3K Oct 14 22:05 20151014_generate_grid_networks.R
##########################################
# generate gride network for simulation and quality control
rm(list=ls())
debug = 0;
# R -f file --args Degree numOfEssenNode outNetworkFile outEssenLookupTbFile
# R -f 20151014_generate_grid_networks.R --args 4 1000 _Degree4N1000_network.csv _Degree4N100_EssenLookupTb.csv
options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)
print(args)
degree = as.numeric(args[1]); degree;
numOfEssenNodes = as.numeric(args[2]); numOfEssenNodes;
outNetworkFile = args[3]
outEssenLookupTbFile = args[4]
#for debug
# degree=4; numOfEssenNodes=1000; outNetworkFile = "__test.csv"; outEssenLookupTb="__EssenLookupTb.csv"
pairs = data.frame(integer(numOfEssenNodes*degree), integer(numOfEssenNodes*degree))
names(pairs) = c("No1","No2")
for( i in 1:numOfEssenNodes) {
current_i_row = (i-1)*degree + 1
for( j in 1:degree){
pairs$No1[current_i_row +j-1] = i;
pairs$No2[current_i_row +j-1] = current_i_row +j-1 + numOfEssenNodes
}
}
# check
table( pairs$No1)[1:10]
table( pairs$No2)[1:10]
length(unique(pairs[,1]))
length(unique(pairs[,2]))
write.csv(pairs, outNetworkFile, row.names = F)
outEssenLookupTb = c(rep(1, numOfEssenNodes), rep(0, numOfEssenNodes*degree))
write.csv(outEssenLookupTb, outEssenLookupTbFile, row.names = F)
MJones RISE
X install R
X install Rstudio
X sign up install Github
X install Github
X clone mactower simulation project
I mentioned bigdata, high performace computing, precision medicine.
Her summer mentior is Dr. Shawnis Patel at Emory.
Assignments:
EdX intro to R course
X install Rstudio
X sign up install Github
X install Github
X clone mactower simulation project
I mentioned bigdata, high performace computing, precision medicine.
Her summer mentior is Dr. Shawnis Patel at Emory.
Assignments:
EdX intro to R course
Read Qin13 proposal
Watch qin research talk videos.
Tuesday, October 13, 2015
simulate network aging with two lambda, results need to be verified.
File: "20151013-net-sim-ginppi.R".
$ pwd
/Users/hqin/github/mactower-network-failure-simulation
##########################################
#2015Oct13, use numeric lookup table for essential genes.
#2014March3, redo ginppi simulation witht same parameter for ms02networks.
# 2014 Feb 17, change name "20131221.DIPandGIN.sim.aging_v2.R" to "net-aging-sim-2014Feb17.R"
# 2013 Dec 20, merge DIP PPI and Genetic Inxt Net -> Multi-net approach
#rm(list=ls())
single_network_failure_v2 = function(lambda1, lambda2=lambda1/10, threshold=4, p, pairs, essenLookupTb ) {
# single network failure simulation, 20151013Tue
# lambda1: First exponential constant failure rate for edges with degree > threshold
# lambda2: Second exponential constant failure rate for edges with degree <= threshold
# threshold: degree threshold for lambda1 and lambda2
# pairs: network in pairwide format, using numeric NOs 20151013
# essenLookupTb: lookup table for essential and nonessential genes, numeric values
## for debug: lambda1 = 1/50; lambda2= lambda1/10; threshold=4; p=0.8
inpairs = pairs[,3:4] #bookkeeping
names(inpairs) = c('No1','No2')
#get connectivities per node
degreeTb = data.frame( table(c(inpairs$No1, inpairs$No2)))
names(degreeTb) = c("No", "degree")
degreeTb$moduleAge = NA;
for( i in 1:length(degreeTb[,1])){
if ( essenLookupTb[ degreeTb$No[i] ]) { #essential node
lambda = ifelse( degreeTb$degree[i] >= threshold, lambda1, lambda2)
age = rexp( degreeTb$degree[i], rate=lambda ) #exponential age
if(degreeTb$degree[i] >= threshold){
active = runif(degreeTb$degree[i]) #uniform interaction stochasticity
active = ifelse( active<=p, 1, NA ) #pick active interactions
if( sum(active, na.rm=T) > 0 ){ #there should be at least 1 active intxn
age = age * active # only active interactions for modular age estimation
degreeTb$moduleAge[i] = max(age, na.rm=T) #maximum intxn age is the module age
} else {# when no active intxn is available
degreeTb$moduleAge[i] = 0; #this module is born dead.
}
} else { # for degree < threshold, no stochasticity is applied.
degreeTb$moduleAge[i] = max(age, na.rm=T) #maximum intxn age is the module age
}
} else {# non-essential node
degreeTb$moduleAge[i] = NA
}
}
summary(degreeTb)
currentNetworkAge = min(degreeTb$moduleAge, na.rm=T)
}
#source("network.r")
# R -f file --args lambda1 lambda2 degreeThreshold p popSize
# R -f 20151013-net-sim-ginppi.R --args 0.002 0.0002 4 0.9 5
options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)
print(args)
lambda1 = as.numeric(args[1]); lambda1;
lambda2 = as.numeric(args[2]); lambda2;
degreeThreshold = as.integer((args[3])); degreeThreshold
p = as.numeric(args[4]); p;
popSize = as.numeric(args[5]); popSize;
list.files(path='data', )
debug = 0;
#essential gene info
essenTb = read.csv("data/SummaryRegressionHetHomFactorized2015Oct13.csv", colClasses=rep('character', 9))
essenLookupTb = read.csv("data/essntialGeneLookupTable_20151013.csv", row.names=1)
essenLookupTb = essenLookupTb[,1]
infile = "data/merged_PPIGIN_Factorized2015Oct13.csv"
pairs = read.csv(infile)
names(pairs) = c("id1",'id2', "No1", "No2")
print(head(pairs))
if(debug==9) { pairs = pairs[1:1000,] }
pairs = pairs[ pairs$No1 != pairs$No2, ]
# label essential nodes, remove nonesse-nonessen pairs
pairs$essen1 = essenLookupTb[pairs$No1]
pairs$essen2 = essenLookupTb[pairs$No2]
#remove nonessen <-> nonessen intxn because they do not affect aging.
pairs$remove = ifelse( pairs$essen1==F & pairs$essen2==F, T, F )
pairs= pairs[! pairs$remove, ]
# 31394 for one ms02 network
#get connectivities per node
degreeTb = data.frame( table(c(pairs$No1, pairs$No2)))
summary(degreeTb);
degreeTb[1:10,]
#median degree =5, mean=12
#for one ms02, media =6, mean=13.68, so orginal network is power-law like, skew at two ends.
full_age_dir = "ori.ginppit.2015Oct"
popAges = numeric(popSize)
time1 = date()
j=1; count = 0;
while ((j <= popSize) && ( count < popSize*30)) {
count = count + 1;
print(paste("count=",count))
currentNetworkAge = single_network_failure_v2(lambda1, lambda2, degreeCutoff, p, pairs, essenLookupTb)
if (currentNetworkAge > 0) {
popAges[j] = currentNetworkAge
j = j+1
}
}# end of j while-loop, population loop
timestamp = format(Sys.time(), "%Y%b%d_%H%M%S")
age.file.name=paste("threshold", degreeCutoff, "p", p, "lambda1", lambda1,
"lambda2", lambda2,'popsize',popSize, "time",timestamp, "txt", sep="." )
full_age_file = paste( full_age_dir,'/', age.file.name, sep='')
write.csv( popAges, full_age_file, row.names=F)
Monday, October 12, 2015
convert string to numeric for essential gene lookup, 20151012-factorize_networks.R
To speedup network aging simulation, use numeric values to replace string orf values.
file: "20151012-factorize_networks.R"
pwd: /Users/hqin/github/mactower-network-failure-simulation
# input a gene network and essential gene list.
# remove self-interactions
# assign smaller number to essential nodes, and large numbers to non-essential nodes
# identify essential and non-essential nodes in gene networks
# remove non-essential gene -noessential gene interactions.
rm(list=ls())
source("lifespan.r")
source("network.r")
setwd("~/github/mactower-network-failure-simulation")
list.files(path='data', )
debug = 0;
#essential gene info
#essenTb = read.csv('SummaryRegressionHetHom2013Oct29.csv', colClasses=rep('character', 9))
#20151012, fixed a bug due to comma in orf.
essenTb = read.csv("SummaryRegressionHetHom2015Oct12.csv", colClasses=rep('character', 9))
length(unique(essenTb$orf)) #5772,
infile = "data/merged_PPIGIN_2014Jan20.tab";
pairs = read.table( infile, header=F, sep="\t", colClass = c("character", "character") )
names(pairs) = c("id1",'id2')
print(head(pairs))
if(debug==9) { pairs = pairs[1:1000,] }
pairs = pairs[ pairs$id1 != pairs$id2, ]
# How do the two data set overlap? DIP seems to contain some questionable orfs
uniq.orf.from.pairs = unique(c(pairs$id1, pairs$id2)) #5496 ORF
matches = uniq.orf.from.pairs %in% unique(essenTb$orf)
table(matches)
#unmatchedORF = uniq.orf.from.pairs[! matches]
# Generate geneNOs (numeric IDs) using the essential, nonessential gene file.
essentialORFs = essenTb$orf[essenTb$essenflag == 'essential']
essentialORFs = essentialORFs[order(essentialORFs)]
nonessentialORFs = essenTb$orf[essenTb$essenflag != 'essential']
mergedORFs = c(essentialORFs, nonessentialORFs)
mergedNOs = 1: length(mergedORFs)
names(mergedNOs) = mergedORFs # This is the numbers assigned to ORFs/Nodes !!!
mergedNOs[c('YAL016W', "YPR116W")] #check, passed 20151012Mon
essenTb$geneNO = mergedNOs[essenTb$orf]
essenTb$essenflagNumeric = ifelse( essenTb$essenflag=='essential', 1, 0)
rownames(essenTb) = essenTb$orf
head(essenTb)
essenTb2 = essenTb[, c("geneNO", "essenflagNumeric")]
essenLookupTb = essenTb2$essenflagNumeric[order(essenTb2$geneNO)]
#manual check
essenLookupTb[c(1152,1153)]
### convert pairs ORF into geneNOs
essenTb[pairs$id1[1:10], 'geneNO']
pairs$No1 = essenTb$geneNO[match(pairs$id1, essenTb$orf)]
pairs$No2 = essenTb$geneNO[match(pairs$id2, essenTb$orf)]
#manually check a few
pairs[100, ]
# id1 id2 No1 No2
# 100 YDL043C YGL049C 131 2623
essenTb["YDL043C", ]
essenTb["YGL049C", ]
##output
write.csv(essenTb,"SummaryRegressionHetHomFactorized2015Oct13.csv")
write.csv(pairs, "data/merged_PPIGIN_Factorized2015Oct13.csv")
fix a bug in orf parsing for 20131028.growth.fitness.R
new output file is "SummaryRegressionHetHom2015Oct12.csv".
The bug is caused by comma in 1 or 2 orfs, such as "DUR1,2".
# This bug is caused by write.csv(, quote=F, row.names=F)
# The problem is caused by "DUR1,2" an ORF that contains a comma.
The bug is caused by comma in 1 or 2 orfs, such as "DUR1,2".
# This bug is caused by write.csv(, quote=F, row.names=F)
# The problem is caused by "DUR1,2" an ORF that contains a comma.
notes and speeding up network aging simulation.
The old simulation method is slow due to single_network_failure()
inpairs$age = rexp( length(inpairs[,1]), rate=lambda ) #exponential ages for pairs
inpairs$age = ifelse(inpairs$active > (1-p), inpairs$age, NA ) #if not active, intxn is excluded.
ModuleTb = data.frame(runningORFs) #buffer for module ages
#loop every essential genes to identify the module age
for (i in 1:length(runningORFs)) {
myORF = runningORFs[i]
pos1 = grep(myORF, inpairs$id1)
pos2 = grep(myORF, inpairs$id2) #id1,2 to ORF1,2 is a really bad choice.
if( length( c(pos1,pos2))>=1 ) {
ModuleTb$age.m[i] = max( inpairs$age[c(pos1,pos2)], na.rm=T ) #maximal intxn age -> module age
} else {
ModuleTb$age.m[i] = NA;
}
}
Basically, there were too much ifelse, matching, grep(). The grep() probably is the most expensive step.
I propose a new algorithm to calculate by Degree for each essential gene, thereby eliminating the grep() process.
inpairs$age = rexp( length(inpairs[,1]), rate=lambda ) #exponential ages for pairs
inpairs$age = ifelse(inpairs$active > (1-p), inpairs$age, NA ) #if not active, intxn is excluded.
ModuleTb = data.frame(runningORFs) #buffer for module ages
#loop every essential genes to identify the module age
for (i in 1:length(runningORFs)) {
myORF = runningORFs[i]
pos1 = grep(myORF, inpairs$id1)
pos2 = grep(myORF, inpairs$id2) #id1,2 to ORF1,2 is a really bad choice.
if( length( c(pos1,pos2))>=1 ) {
ModuleTb$age.m[i] = max( inpairs$age[c(pos1,pos2)], na.rm=T ) #maximal intxn age -> module age
} else {
ModuleTb$age.m[i] = NA;
}
}
Basically, there were too much ifelse, matching, grep(). The grep() probably is the most expensive step.
I propose a new algorithm to calculate by Degree for each essential gene, thereby eliminating the grep() process.
minority in STEM references
Ref1: National Science Foundation, National Center for Science and Engineering Statistics. 2013. Science and Engineering Doctorate Awards: 2009–10. Detailed Statistical Tables NSF 13-322. Arlington, VA. Available at http://www.nsf.gov/statistics/nsf13322.
Ref 2National Science Foundation | National Center for Science and Engineering Statistics (NCSES) Science and Engineering Indicators 2014 Arlington, VA (NSB 14-01) | February 2014
Ref 3 National Science Foundation, National Center for Science and Engineering Statistics. 2015. Women, Minorities, and Persons with Disabilities in Science and Engineering: 2015. Special Report NSF 15-311. Arlington, VA. Available athttp://www.nsf.gov/statistics/wmpd/.
Ref 4. Fiegener, M.K; Proudfoot, S.L. Baccalaureate Origins of U.S.-trained S and E Doctorate Recipients. National Center for Science and Engineering Statics (NCSES); NSF-13-123; National Science Foundation: Arlington, VA, 2013.
Sunday, October 11, 2015
statistical power, GWAS, quantitative genetics
Genet Epidemiol. 2015 Oct 10. doi: 10.1002/gepi.21935. Evaluating the Calibration and Power of Three Gene-Based Association Tests of Rare Variants for the X Chromosome. Ma C1, Boehnke M1, Lee S1; GoT2D Investigators.
Family-based association tests for sequence data, and comparisons with population-based association tests, Iuliana Ionita-Laza1, Seunggeun Lee, Vladimir Makarov, Joseph D. Buxbaum, and Xihong Lin
Saturday, October 10, 2015
computational biology, genomics syllabus and books
Githubs
Books
Durbin, Eddy, etc, Biological sequence analysis
Ewens, Grant, Statistical methods in bioinformatics
https://github.com/LiLabAtVT/ComputationalBiologyBooks
Syllabus
http://www.cs.cmu.edu/~durand/03-711/2011/syllabus.html
http://www.cs.cmu.edu/~sssykim/teaching/s13/s13.html
https://www.cs.umd.edu/class/fall2011/cmsc858s/
Dissertation on computational biology
http://www.ml.cmu.edu/research/phd-dissertations.html
Books
Durbin, Eddy, etc, Biological sequence analysis
Ewens, Grant, Statistical methods in bioinformatics
https://github.com/LiLabAtVT/ComputationalBiologyBooks
Syllabus
http://www.cs.cmu.edu/~durand/03-711/2011/syllabus.html
http://www.cs.cmu.edu/~sssykim/teaching/s13/s13.html
https://www.cs.umd.edu/class/fall2011/cmsc858s/
Dissertation on computational biology
http://www.ml.cmu.edu/research/phd-dissertations.html
Friday, October 9, 2015
*** h2, heritability
http://content.csbs.utah.edu/~rogers/ant5221/lecture/QTs2.pdf
See also
My R code on h2
set.seed(2015)
N=1E3
genotype_elements = c(-1,0,1)
#genotype_elements = rnorm(N)
error = rnorm(N)
tb = data.frame(error)
tb$g1 = sample(genotype_elements, N, replace=T)
h2 = 0.5
tb$phynotype = sqrt(h2)*tb$g1 + sqrt(1-h2)*tb$error
summary(lm(tb$phynotype ~ tb$g1))
tb$g2 = sample(genotype_elements, N, replace=T)
h2 = 1/10
tb$phynotype = sqrt(h2)*tb$g1 + sqrt(h2)*tb$g2+ sqrt(1-2*h2)*tb$error
summary(lm(tb$phynotype ~ tb$g1 + tb$g2))
h2 = 1/10
tb$g3 = sample(genotype_elements, N, replace=T)
tb$phynotype = sqrt(h2)*tb$g1 + sqrt(h2)*tb$g2 + sqrt(h2)*tb$g3 + sqrt(1-2*h2)*tb$error
summary(lm(tb$phynotype ~ tb$g1 + tb$g2 + tb$g3))
summary(lm(tb$phynotype ~ tb$g1 ))
summary(lm(tb$phynotype ~ tb$g2 ))
summary(lm(tb$phynotype ~ tb$g3 ))
NSF proposal guideline
Use one of the following typefaces identified below:
- Arial10, Courier New, or Palatino Linotype at a font size of 10 points or larger;
- Times New Roman at a font size of 11 points or larger; or
- Computer Modern family of fonts at a font size of 11 points or larger.
Thursday, October 8, 2015
bio233, 20151008Thu Ecoli genome investigation in R
2:30-3:20pm. short class. Using R code to analyze E coli K12 genome.
Ask students to work in pairs. Several students said, "We like this exercise."
Youtube: https://youtu.be/UeUVSxnK8qo
R Code:
source("http://bioconductor.org/biocLite.R")
biocLite('seqinr')
require(seqinr)
# Change working direcotry to the current folder
list.files()
seqs = read.fasta(file="EcoliK12.fna")
# seqs = seqs[[1:3]]
str(seqs);
# look at the first sequence
seq1 = seqs[[789]]
getName(seqs[[789]])
table( seq1 ); #nucleotide composition
GC(seq1); # GC content
# now, try to find name and GC content for 865th gene.
# a loop for all sequences
num = 1:length(seqs); #these are storages for later use
gc = 1:length(seqs); # gc = num;
out = data.frame( cbind( num, gc ) );
for( i in 1:length(seqs) ) {
out$gene_name = getName( seqs[[i]] )
out$gc[i] = GC( seqs[[i]] );
}
head(out)
write.csv(out, "gc.csv", row.names=F) # output the results
#### now, calculate the length of each gene
num = 1:length(seqs);
out2 = data.frame( cbind( num) );
for( i in 1:length(seqs) ) {
out2$name[i] = getName( seqs[[i]] )
out2$len[i] = length( seqs[[i]] );
out2$gc[i] = GC( seqs[[i]] );
}
out2;
write.csv(out2, "gc2.csv", row.names=F) # output the results
prot = translate( seqs[[999 ]])
length(prot)
prot
table(prot)
# now, find the protein length for 1005th gene
#### now, translate the ORF into proteins
num = 1:length(seqs);
out3 = data.frame( cbind( num) );
for( i in 1:length(seqs) ) {
out3$name[i] = getName( seqs[[i]] )
out3$ORFLen[i] = length( seqs[[i]] );
prot = translate( seqs[[i ]])
length(prot)
out3$ProtLen[i] = length( prot)
}
out3;
write.csv( out3, "gc-prot.csv")
Ask students to work in pairs. Several students said, "We like this exercise."
Youtube: https://youtu.be/UeUVSxnK8qo
R Code:
source("http://bioconductor.org/biocLite.R")
biocLite('seqinr')
require(seqinr)
# Change working direcotry to the current folder
list.files()
seqs = read.fasta(file="EcoliK12.fna")
# seqs = seqs[[1:3]]
str(seqs);
# look at the first sequence
seq1 = seqs[[789]]
getName(seqs[[789]])
table( seq1 ); #nucleotide composition
GC(seq1); # GC content
# now, try to find name and GC content for 865th gene.
# a loop for all sequences
num = 1:length(seqs); #these are storages for later use
gc = 1:length(seqs); # gc = num;
out = data.frame( cbind( num, gc ) );
for( i in 1:length(seqs) ) {
out$gene_name = getName( seqs[[i]] )
out$gc[i] = GC( seqs[[i]] );
}
head(out)
write.csv(out, "gc.csv", row.names=F) # output the results
#### now, calculate the length of each gene
num = 1:length(seqs);
out2 = data.frame( cbind( num) );
for( i in 1:length(seqs) ) {
out2$name[i] = getName( seqs[[i]] )
out2$len[i] = length( seqs[[i]] );
out2$gc[i] = GC( seqs[[i]] );
}
out2;
write.csv(out2, "gc2.csv", row.names=F) # output the results
prot = translate( seqs[[999 ]])
length(prot)
prot
table(prot)
# now, find the protein length for 1005th gene
#### now, translate the ORF into proteins
num = 1:length(seqs);
out3 = data.frame( cbind( num) );
for( i in 1:length(seqs) ) {
out3$name[i] = getName( seqs[[i]] )
out3$ORFLen[i] = length( seqs[[i]] );
prot = translate( seqs[[i ]])
length(prot)
out3$ProtLen[i] = length( prot)
}
out3;
write.csv( out3, "gc-prot.csv")
Wednesday, October 7, 2015
todo: SurvCurv database, matthias Ziehm, EBL
SurvCurv database, matthias Ziehm
SurvCurn does not cover yeast lifespan data?
https://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/
https://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/faq.php
SurvCurn does not cover yeast lifespan data?
https://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/
https://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/faq.php
note taking for computational experiments
Many students did not to take notes during computational experiments.
Tuesday, October 6, 2015
bio233, Tue, serial dilution data analysis for hemocytometer
Led 20 students download and install R and Rstudio.
Then run sample R code to analyze the class data on hemocytometer.
=> Problems:
-> Many students did not ZIP file and how to extract them. On windows computer, the R code can be openned directly using Rstudio, but now xlsx or csv files.
-> R package 'xlsx' lead to runtime error on Yosemite Rstudio. I had to convert xlsx to csv files for these students to run the codes. Lessons: Simple methods are always safer than fancier ways in classrooms.
=> Solutions
-> I went step-by-step on screen, and make sure most students caught up.
-> I ask students who finished the steps to help students that were stuck.
Then run sample R code to analyze the class data on hemocytometer.
=> Problems:
-> Many students did not ZIP file and how to extract them. On windows computer, the R code can be openned directly using Rstudio, but now xlsx or csv files.
-> R package 'xlsx' lead to runtime error on Yosemite Rstudio. I had to convert xlsx to csv files for these students to run the codes. Lessons: Simple methods are always safer than fancier ways in classrooms.
=> Solutions
-> I went step-by-step on screen, and make sure most students caught up.
-> I ask students who finished the steps to help students that were stuck.
JC HHMI project,
30 mintues to save documents and turn off applications on student's frezon laptop
15 minutes on Github account
todo Github clone https://github.com/hongqin/mactower-network-failure-simulation
R code and PPT of past materials.
15 minutes on Github account
todo Github clone https://github.com/hongqin/mactower-network-failure-simulation
R code and PPT of past materials.
Monday, October 5, 2015
*** (in progress) bio233, serial dilution lab.
20150928Monday, streak DBY1394, box 1, I1 to YPD plate. 30C O/N. DBY1394 grew relatively slower than wild isolates.
20150930Wed, Pick colonies for growth in 5ml YPD, 30C O/N in 10ml falcon tube. (Cell precipated at the bottom).
Made fresh YPD plates, 20% glucose.
20151001Thu. 5pm. Added 1ml of overnight grown culture into 10 ml of YPD in large glass tube. 30C shaker. A total of 4 tubes were prepared.
Friday, expand to 4 more large glass tubes
20151005 Monday
1pm: mix all tubes and do serial dilutions. I prepared 9 groups, but only 6 group actually formed.
2pm, bio233 lab. I did not do a demo for the class, and many mistakes happened.
1) many student put pipetman deep into glass culture tubes, and bunsen burners were not turned on.
2) many students did not know which pipetman to use for what volumes.
3) some students forgot to put cover skips on hemocytomers.
4) some students put glassbeads onto their gloved hands and then to plates.
5) many students did vortex their samples before put them on YPD plates.
3:45pm. Students put their number into master googleDoc sheet.
Items:
10 sets of pipettman. P1000, P200, P20. tips. YPD plates. Nutrient plates. Glass beads. Hemocytometer. Tally counters.
Reference:
http://hongqinlab.blogspot.com/2014/02/bio233-lab-serial-dilution-feb-26-2014.html
http://hongqinlab.blogspot.com/2014/10/bio233-20131006-serial-dilution-loh.html
20150930Wed, Pick colonies for growth in 5ml YPD, 30C O/N in 10ml falcon tube. (Cell precipated at the bottom).
Made fresh YPD plates, 20% glucose.
20151001Thu. 5pm. Added 1ml of overnight grown culture into 10 ml of YPD in large glass tube. 30C shaker. A total of 4 tubes were prepared.
Friday, expand to 4 more large glass tubes
20151005 Monday
1pm: mix all tubes and do serial dilutions. I prepared 9 groups, but only 6 group actually formed.
2pm, bio233 lab. I did not do a demo for the class, and many mistakes happened.
1) many student put pipetman deep into glass culture tubes, and bunsen burners were not turned on.
2) many students did not know which pipetman to use for what volumes.
3) some students forgot to put cover skips on hemocytomers.
4) some students put glassbeads onto their gloved hands and then to plates.
5) many students did vortex their samples before put them on YPD plates.
3:45pm. Students put their number into master googleDoc sheet.
Items:
10 sets of pipettman. P1000, P200, P20. tips. YPD plates. Nutrient plates. Glass beads. Hemocytometer. Tally counters.
Reference:
http://hongqinlab.blogspot.com/2014/02/bio233-lab-serial-dilution-feb-26-2014.html
http://hongqinlab.blogspot.com/2014/10/bio233-20131006-serial-dilution-loh.html
references for 2-credits bioinformatics course
Genomics Medicine Gets Personal, EdX
https://courses.edx.org/courses/course-v1:GeorgetownX+MEDX202-01+2015_3T/courseware/83df4b53f2d444a38a90558b5c639711/aa544225cd604b689f7e421c90a624fe/
https://www.coursera.org/courses/?query=bioinformatics
https://www.coursera.org/specializations/bioinformatics
Introduction to Bioinformatics 4th Edition, Arthur Lesk
Bioinformatics for biologists,
http://www.amazon.com/Bioinformatics-Biologists-Pavel-Pevzner/dp/1107648874/ref=sr_1_8?s=books&ie=UTF8&qid=1443631856&sr=1-8&keywords=bioinformatics&refinements=p_n_feature_nine_browse-bin%3A3291437011%2Cp_72%3A1250221011
Thursday, October 1, 2015
HU experiment changes, bio125
* HU+ cell left in water after day 1. HU- cells left in SD media to avoid water induced G1 arrest.
* Add antibiotics to the media to avoid bacteria growth.
* Add antibiotics to the media to avoid bacteria growth.
bio233, chapter 5, growth (part 1)
Class started 10 mintues later.
Students worked on two problems.
1) Growth curve problem.
For the slope and intercept, one student volunteered to solve the problem on the board.
For generation calculation, I asked two students to work on the board. Most students can reach
N = N0 * 2^n. However, most did not know how to solve for "n".
2) Hemocytometer problem.
I let students read the manual of Hausser bright line hemocytomer counter.
Students worked on two problems.
1) Growth curve problem.
For the slope and intercept, one student volunteered to solve the problem on the board.
For generation calculation, I asked two students to work on the board. Most students can reach
N = N0 * 2^n. However, most did not know how to solve for "n".
2) Hemocytometer problem.
I let students read the manual of Hausser bright line hemocytomer counter.
bio386 20151001Thu midterm exam released
Released midterm exam. Found the keys were somehow released as well, luckily I correct the error before most students downloaded the files.
I then let students worked on practical problems.
I then let students worked on practical problems.
Subscribe to:
Posts (Atom)