Open Notebook: January 2014

Friday, January 31, 2014

R, permute network without selfpairing, bug fix

#permute merged yeast PPI+GIN, fix a bug that insert "NA" into new networks.

#require(igraph)
rm(list=ls())
debug = 0
set.seed(2014)
permute.pairs.wo.selfpairs = function( inpairs, ncycles=10, debug=1 ) {#bug, id1-id2 and id2-id1 are not considered.
if (ncycles >= 1 ) {
if(debug) {
print(paste('ncycles=', ncycles))
}
longids = c(as.character(inpairs[,1]), as.character(inpairs[,2]) )
longids = sample(longids)
len = length(inpairs[,1])
newpairs = data.frame( cbind( longids[1:len], longids[(len+1): (2*len)]) )
names(newpairs) = c('id1', 'id2')
newpairs$id1 = as.character( newpairs$id1)
newpairs$id2 = as.character( newpairs$id2)
newpairs$selfpairs = ifelse( newpairs$id1 == newpairs$id2, 1, 0 )
self.tb = newpairs[ newpairs$selfpairs==1, ]
nonself.tb = newpairs[newpairs$selfpairs==0, ]
if(debug) {
print(paste("===selfpairs===="),NULL)
print(self.tb)
print(paste("================="),NULL)
}
if( length(self.tb[,1])>=1 ) {
if ( ncycles == 0) {
#return (c(NA,NA, NA) );
print(paste("ncycles reached zero, ncycles"),ncycles)
print(paste("Abort!"),NULL)
stop;
} else {
ncycles = ncycles - 1
splitPos = round( length(self.tb[,1]) * sqrt(ncycles) ) + 5 #2014 Jan 31 changes
selectedpairs = rbind(self.tb, nonself.tb[1: splitPos, ] )
restpairs = nonself.tb[ (splitPos + 1): length(nonself.tb[,1]), ]
return( rbind(restpairs, permute.pairs.wo.selfpairs(selectedpairs, ncycles)))
}
} else {
return (newpairs)
}
} else {
return( c(NA,NA,NA ))
}
}

#write.table(pairs, "merged_PPIGIN_2014Jan20.tab", quote=F, row.names=F, col.names=F, sep='\t')
net = read.table( "merged_PPIGIN_2014Jan20.tab", header=F, sep="\t", colClass = c("character", "character") )
head(net)
if(debug==9) {
#net = read.table('pair.tab',header=F)
net = net[1:90000,]
}

net2 = permute.pairs.wo.selfpairs( net )
write.csv(net2, "/tmp/net2.csv")
write.csv(net,"/tmp/net.csv")

#do they have the same degree?
t1 = table(c(net[,1],net[,2]))
t2 = table(c(net2[,1],net2[,2]))
comp <- t1 == t2
table(comp)
tf = comp[comp==F]; tf
t1[names(tf)[1]]
t1[names(tf)]
t2[names(tf)]

#note, this R code runs even faster than ms02, presumably because ms02 spent time on network configuration.

#2014 April 7. Bug found. id1-id2, id2-id1 should be considered.

ms02 run, ubuntu server

run.sh
ms02 -i merged_PPIGIN_2014Jan20.tab -o output/_ms02.merged.network.tab -d 2

nohup sh run.sh &

2014-1-31, 12:42 1.8% MEM
System clearly responded slower as soon as the job started.

2014-1-31, 12:46, 'top' shows that 'free' Mem is declining rapidly, suggesting memory leaking of 'ms02'.

Thursday, January 30, 2014

Hill coefficient and coperativity, Hill equation

Hill equation
fraction of occupied sites = 1 /((K/L)^n+1) for positive n

When n -> -n,

the Hill equation become:
fraction of occupied sites = 1/ ((L/K)^n + 1)
which can be used to model inhibition.

http://en.wikipedia.org/wiki/Hill_equation_%28biochemistry%29

Wednesday, January 29, 2014

Resarch.gov and NSF project report guideline

https://www.research.gov/common/robohelp/public/WebHelp/Research.htm#Project_Outcomes_Report.htm

http://www.udel.edu/research/pdf/NSF_Project_Report.pdf

Sample variance, n-1

http://www.youtube.com/watch?v=D1hgiAla3KI

Tuesday, January 28, 2014

Lotus Notes, archive emails to release disk quota.

Archive emails.

After archiving, check the email quota.

The archived email can be found at:

Moments and their generating functions

Moments

http://www.youtube.com/watch?v=peIWy4N6HTA

moments generating functions
http://www.youtube.com/watch?v=kXIdNfvOWak

BIO233, asceptic technique, streak plates, Jan 28, 2014

Eight tubes of Bacillus, 30C incubation.

All students have their own nutrient plates. Students were asked to write their full-last names on the back of the plates.

I did a quick demo, open the tube and tuck the cap with my right small finger, handling the loop with right hand, and the sample tube with left hand.

Streak yeast frozen stocks

Some yeast can precipitate to the bottom of the glycerol stock, especially if they have been thrawed before. Sometime, the white pellet of cells can even be seen at the bottom. In this case, sterile wood sticks work better can tips.

Monday, January 27, 2014

ABRCMS document directory

http://www.abrcms.org/documents/?C=M;O=D

Sort by modification time can be used to search document by years.

Out of application disk space caused by 'ms02'

helen:ms02GINPPI hqin$ cat run.sh
ms02 -i merged_PPIGIN_2014Jan20.tab -o output/_m202.merged.network.tab -d 2

'ms02' may have memory leak and ran out memory.

I remotely login to 'helen.spelman.edu', and found that the shell still works.

I should try R script for comparison.

Friday, January 24, 2014

BIO125, Jan 23, 2014, quiz1, serial dilution, ApE

I spent 20 minutes on Quiz1, which covers unit conversion, how to make solutions. Among the 12 students, about 4 students did very well. About 3 students did not get most questions.

I then demonstrated the correct usage of micropipette. I gathered 5 balances for the lab practice. Common problems:
Most students are not sure how to read the volumes from the various models of micropipette.
Many students are not sure what kind of tips to choose for different micropipette.
Some students use hands to pick tips and then put them back after usage.

Rare problems:
One student used micropipette without tips.

After about 1 hour of practice, the serial dilution started. I use the white board to explain the procedure and standard curve plot. The actual lab went through in about 0.5 hour.

I then started the ApE exercise, how to find gene sequence from GenBank, download it, and load into ApE, find ORF, and translate.

Common problem for the gene sequence analysis:
Many students are unclear about exons and ORF.
Several students submitted the nucleotide sequences of ORF as protein sequences.

Thursday, January 23, 2014

BIO233, Jan 23, 2014, Thu,

Use item analysis to see student performance in quiz 1 and 2.

The Proscope HR in the podium Windows computer is a Pro-version, and offers different versions of filters. I could adjust filters to tune the color of images to show 'purple' and 'pink'. One problem is that there were too much dust on the len. The dust appears as round shaped cells and look confusing. I showed Bacillus and cocci sample using the 40X objective. I was unable to focus spirillum sample using Proscope on screen even using the oiled 100x objective. Maybe the Proscope lens really need to be cleaned.

Wednesday, January 22, 2014

graduate school interview

Common questions
http://gradschool.about.com/od/interviews/a/intquest.htm
http://www.petersons.com/graduate-schools/graduate-admission-tips-interview.aspx

BIO233 Lab, microscope and scale of microbial world,

There is another class before BIO233L, which left me only 10 minutes to prepare the microbe lab.

Use youtube tutorial for microscope usage.

Many students chose to work as group, even though I emphasized that microscope practical test require independent effort.

Problems that students had during the lab: Some used 4x objective to identify bacterial cell shape. Some mistook dye debris for cells. Pink can be mistaken as purple. Because of fission growth, bacillus tend to form long stretch of line. These lines were mistaken as spirillum.

In the future, I should spent more time on Proscope demo show students what cocci, rod, and spirillum cells look like. One problem is that most students could not wait to get their hands on the microscope before I even finished the demo. I probably can ask student volunteer to demonstrate the different shapes of bacteria.

One problem is that 'purple' comes out as 'pink' in Proscope images. Another faculty told me that I should be able to adjust the 'filter' parameter in ProscopHR software to tweak the color of the images.

For the calculation exercises from the scale of microbial world, many students do not know what numbers go to numerator or denominator.

Proscope HR setup with Nikon YS2-T, OS 10.6 laptop.

Download software from http://www.bodelin.com/proscope/software
Pick Mac version.

Make sure that shutter for the vertical path is open before starting Proscope software.

Tuesday, January 21, 2014

BIO233, Jan 21, 2014, Chapter 1, review of microbiology

17 students finished pre-lecture assignments. Many students are not familiar with the 'adaptive mode' for Moodle quizzes.

13 students are on time. I played a youtube video during the waiting time. The video is microbiology overview. I then went through chapter 1, with clicker questions focused on comparing the 1900 and today's major cause of deaths in human populations. The goal is to teach students how to read figures. I started with figure legends, title, axes. Then compare the changes of colored bars in some categories.

I then briefly explain compound microscope. I used two plastic sheets to demonstrate the concept of 'contrast'.

Contrast demo using plastic sheets. Use color to demonstrate contrast.

Announced that there will be a Lab on Wed on microscope and scale of microbes.

BIO125, Jan 21, 2014, MSH2, serial dilution with graphing paper and playdough

1.5 hours spent on going over reading assignments, the abstract of Gammie Genetic 2007.

PubMed usage

MSH2

HNPCC

I used beads again to explain microsatellite and replication errors.

I spent about 40 minutes on sequence alignment, conservation to explain 'cognant' sites.

I use different models of cars to explain that 'wheels' and 'driver seat' are conserved, but the 'back seat' are not.

>1hour on serial dilution with graphing paper and playdough. 10X, 1X, 0.1X

Some students have trouble to understand 10X and 1X and how are connection to absolute concentrations.
C1V1 = C2V2
When applying the mass conservation formula, some students have trouble to decide which volume should the unknown.

Annoucement: read protocol 3.3, mathbench assignment 3.1 for quiz on Thursday.

Monday, January 20, 2014

Smi-log plot, GINPPI mortality rate

The amazingly linear log(m)~t indicate m ~ exp(t), Gompertz model. Lifespan of 2000 individuals were simulated using merged GINPPI, p=0.7, lambda=0.005

I can also identify the best linear section based on R^2. The best linear section is informative on 'initial virtual age'.

Todo: overlay simulated lifespan with different lambda in p=0.7 figures. Esimate G and R and Makeham constant from the linear plot.

When lamba is small and p is high, the initial bin can become too small, as little as 1 per bin. This would lead to highly inaccurate mortality rate estimations. This numerical problem can be mitigated by increase simulation sample size.

The above simulation shows that when bin-size < 5 (in both left and right tails), the mortality rate estimation are very noises.

*** Output merged yeast GIN+PPI data merged_PPIGIN_2014Jan20.tab

20171026: bug found. ORF1 and ORF2 order are not checked, so pairs may not be unique.

In 20131221.DIPGIN.sim.aging_v2.R

Move 'merged_PPIGIN_2014Jan20.tab" to "~/projects/0.ginppi.reliability.simulation/data".

# todo,
# permutation effect on aging?
# lambda ~ 1/connectivity of nodes

# 2013 Dec 20, merge DIP PPI and Genetic Inxt Net -> Multi-net approach
rm(list=ls())

#require(date)
require('flexsurv')
#source("/Users/hongqin/lib/R/lifespan.r")
source("lifespan.r")

#setwd("~/projects/0.network.aging.prj/0.ppi.reliability.simulation")
list.files(path='data', )

debug = 0;

#yeast PPI
#pairs = read.csv('data/pairs.csv', colClasses=c('character','character'))
#this yeast ppi dataset is consistent with Taiwan group's report.
dip = read.csv("data/yeastDIP.csv")
pairsPPI = dip[,c(1,2)]
pairsPPI$ORF1 = as.character(pairsPPI$ORF1)
pairsPPI$ORF2 = as.character(pairsPPI$ORF2)
pairsPPI = pairsPPI[ pairsPPI$ORF1 != pairsPPI$ORF2, ]#remove self-intxns

# yeast genetic network
#gPairs = read.csv("data/sgadata_costanzo2009_stringentCutoff_101120.csv", header=F)
gPairs = read.csv("data/sgadata_costanzo2009_lenientCutoff_101120.csv", header=F)
names(gPairs) = c("ORF1", "Name1", "ORF2", "Name2", NA, NA, NA)
gPairs$ORF1 = as.character( gPairs$ORF1 )
gPairs$ORF2 = as.character( gPairs$ORF2 )

#merge PPI and GIN
pairs = rbind(pairsPPI[,c("ORF1","ORF2")], gPairs[,c("ORF1","ORF2")])

# remove self-intxns
pairs = pairs[ pairs$ORF1 != pairs$ORF2, ]
# 96851 pairs for DIP+GIN.strigentCutoff
# 786118 for DIP+GIN.lenientCutoff

#essential gene info
essenTb = read.csv('data/SummaryRegressionHetHom2013Oct29.csv', colClasses=rep('character', 9))

#######################
# How do the two data set overlap? DIP seems to contain some questionable orfs
uniq.orf.from.pairs = unique(c(pairs$ORF1, pairs$ORF2)) #4207 ORF
matches = uniq.orf.from.pairs %in% unique(essenTb$orf)
table(matches)
#FALSE TRUE
# 720 5507

unmatchedORF = uniq.orf.from.pairs[! matches]

matches = uniq.orf.from.pairs %in% unique(essenTb$orf[essenTb$essenflag=='essential'])
table(matches)
#FALSE TRUE
#5171 1056
#This is a good coverage

matches = uniq.orf.from.pairs %in% unique(essenTb$orf[essenTb$essenflag=='nonessential'])
table(matches)
# FALSE TRUE
# 1839 4388 #this is amazingly consistent with Taiwan group's report.

#remove unmatched orfs from pairs
pairs$Removeflag = ifelse( pairs$ORF1 %in%unmatchedORF | pairs$ORF2 %in%unmatchedORF, T,F )
table(pairs$Removeflag)
# FALSE TRUE
# 572221 213897

#So, the updated DIP has 19770 intxn
#So, the DIP+GIN.lenior lead to 572221 intxns

pairs = pairs[! pairs$Removeflag, ]
table(pairs$Removeflag)
pairs = pairs[,1:2] ##This set of pairs is read for analysis

#write.table(pairs, "merged_PPIGIN_2014Jan20.tab", quote=F, row.names=F, col.names=F, sep='\t')

Todo, teaching, spring 2014

HP laptops.
Respondus lock, ApE installation to
Rstudio and R installation (update)

GINPPI reliability simulation, cutoff=4, popSize=2000

directory: ~/run27.2K.cutoff4/

directory:~/run27.2K.cutoff4.2/

Because I forgot to change the random seed, the simulations in the two runs are exactly the same.

Saturday, January 18, 2014

bio233, day 1, spring 2014. overview

Go over syllabus, grading policies, past students presentation videos, grading policies, etc. I show an A+ example of past student oral presentation. Explain the practice tests.

Discuss of the over-registration problem due to separated lectures and lab listings.

I used name tags to remember student names. I went over 3 round to call each student name.

IRB, academic integrity form, photo and video release forms.

Video on cell phone microbe, white blood celll chasing E coli cells.

Went over first assignment: due before class. I found that Respondus puts 'answers' as part of the last choice, and it is a terrible bug.

BIO125 day 1, Jan 16 Thu, spring 2014, Overview and team building.

Go over syllabus, learning objectives.

Use name tags to introduce everyone.

Lab safety guidelines.

IRB, academic integrity, backgrounds, photo and video release forms.

Tested clickers using questions on some simple yeast facts.

Test HP laptop logins, wireless connections.

Students finished the surveys and self-exercises on DNA structures in class.

Use play-dough to make right-handed DNA double-helices in class, a team building activity. Two groups finished smoothly, The third group made a left-handed helix twice. I made a right-hand snake and a right-handed snake as demo. I show the students they should first make a pair of parallel strands, make a right-hand twist, and then lay flat on the counter surface. All groups made right AT and GC pairs. Most students googled on the number of hydrogen bonds and the number of bps in each DNA helix loop.

Three HP laptops in the rear corner on the left-hand side have wireless connection problems. Ethernet were used.

Friday, January 17, 2014

Fisher exact test, demo

Fisher test and hypergeometric tests are often used for enrichment test.

cluster = c(10, 90)
network= c(100,9000)
tb = rbind(cluster, network)
fisher.test(tb)
ft = fisher.test(tb)

Reference:
https://mail.google.com/mail/u/0/?shva=1#circle/students/1437f7c3a77a24bf

See also hypergeometric test
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030039#s4

Thursday, January 16, 2014

ApE on pMSH2, and mutant identification

Tools-> Align
Align yeastMSH2 CDS (remove last codon of TAA) to plasmid pMSH2 sequence.
There is a gap in the alignment, which was suggested to be the HA tag by Dr. Wang.

http://en.wikipedia.org/wiki/HA-tag

Edit -> Select
Select from 3378 to 6302 MSH2 CDS for 975AA
New Feature, feature type -> CDS

enzymes, Graphic map

The same method can be used to identify mutant msh2 ORF. Translated mutant ORF can be used to align with WT and identify mutations.

Resources on apE (a plasmid editor)

http://bcrc.bio.umass.edu/intro/content/guide-ape-plasmid-editor-application

Tuesday, January 14, 2014

writing notes

apply something to doing something

compare to and with

vowels and consonants

adjectives, adverbs, articles, clauses, pronouns

seek help (not helps) from the instructor.

Appropriate

Tense

weighted vs waited

Students have problems with spelling

Shimon Gibson, an American archaeologist who was not involved in the dig, said the find was truly amazing, less because of its Roman origins than for its precious nature.

Introduction introductory

veracity

More than one method IS (not are) used.

Different kind of feedback (no plural form)

commensurate with: equal in measure or extent with sth

Inte(:i)grate inte(:e)grity

Sludge

attenuate, mitigate, weaken, decrease,

calibrate

spread, spread, spread

esoteric--understood by only a group of people

illude: to deceive

allude: to mention indirectly

undecamic complex ??

facetious

formal members -- formally speaking

this is the same as somthing (a noun); this is same as we did last time (a sentence)

misleading, not "miss leading"

lose loose

cotton (catn, not corton)

a smattering of Americans and swarms of executives from China,

jury vs jewelry

parabolic

atrium

devin - david

spun down

assist in doing sth

impose on you

lantern

to reckon with

network

recite recitation

stultify

constipation

ballute

connotation

labatory not labatorial

cash == cache

produce product

analyze analysis

subsequent 's&b-si-kw&nt <-> sequent

aerobic, anaerobic

senescence

philadelphia

red, retrograde

conducive work environment

network

phone [' fOn]

nonsignificant

principal investigator, school principal

inform somebody of doing something

consider of doing something

dissension

WORD format for Respondus

Questions are numbered 1, 2, 3, ...
Answers are numbered A, B, C, D, ...

Tips: It appears format such as tab, parenthesis, space should be as consistent as possible for Respondus to work properly.

20140921. Format with extra line space in DOCX seemed to cause problem.

https://php.radford.edu/~knowledge/lore/attachment.php?id=60‎

Paper activity for serial dilutions

See also youtube demo at
http://www.youtube.com/watch?v=_araJt0GOvU

Make colored copies of graphing papers: red, blue, and white.

Basic idea: 1 cm2 of paper = 1 part of Things or water.
Red papers represent Things
Blue papers represent water.
White papers are containers.

For 10X red stock:
Draw a 20 parts rectangle in a white graphing paper. This is volume of the 10X stock solution.
Cut 10 part red and 10 part blue paper to fill the space.
The concentration of this solution is 50% red.

To make 1X 20 parts solution, we need 2 parts of 10X. The 2 parts of 10X contain 1 part of red and 1 part of blue (water).
Draw a 20 parts rectangle, cut 1 part of red and 1 part of blue from the 10X stock area, and move to the 1X area. The remaining area of the 1X stock is 18 parts and should be filled with water (blue papers).
The concentration of this 1X solution is 5% red.

To make 0.1X 20 parts solution, we need 2 parts of 1X which contains 2x5%=0.1 part of red and 1.9 parts of blue (water). The concentration of this solution is 0.5% red.

-----------------------The older version, v1---------------
Make colored copies of graph papers: red, blue, and white.

Basic idea: 1 cm2 of paper = 1 part of things or water

10X red stock: 10 cm2 red paper + 10 cm2 white paper.
The concentration of this solution is 50% red, and its total volume is 20 parts.

5X blue stock: 5 cm2 blue paper + 15 cm2 white paper.
The concentration of this solution is 25% blue, and its total volume is 20 parts.

Exercise 1: To make a volume of 20 parts solution of 1X red and 1X blue, we need 1 part of red and 1 part of blue:
1. Use scissor to cut 1 part (1cm2) of red and 1 part (1cm2) of water from 10X red stock, add to an empty beaker labelled with "1X red, 1X blue".
2. Use scissor to cut 1 part (1cm2) of blue and 3 part of (3 cm2) of water from 5X blue stock, add to the '1X red 1Xblue' beaker.
3. How much water should we add?
20 - 1(red) - 1(water) - 1 (blue) - 3 (water) = 14 parts of water.
This 1X solution should contain 1cm2 red, 1cm2 blue, and 18 cm2 water.

Exercise 2: To make a 1/10 X solution in a volume of 50 parts:
1. Take 10 part of 1X: Cut 0.5 part red, 0.5 part blue, 9 part of water from 1X stocks
2. Add 50 - 0.5 - 0.5 - 9 = 40 part of water.

Materials needed: Three beakers; 1 piece of red graph paper, 1 piece of blue graph paper, 3 pieces of white graph papers; 1 Scissor.

H2O2-LOH, strain-averaged data, distribution of Cv and Cb.

Distribution of Cv and Cb in yeast natural isolates

Stain-averaged data are in file "LOHH2O2_averaged20131210_v1.csv". These data were obtained by supervised gnls fitting.

Distribution of H2O2-viability midpoints (Cv) :

Largest change is 0.17/0.0065 = 26 fold
Histogram of H2O2-viability is:

Distribution of H2O2-LOH midpoints (Cb) :

Largest change is 0.086/0.0065 = 13 fold.
Histogram H2O2-LOH is:

Working directory is ~/projects/0.LOH_H2O2_2012-master/analysis
Script is _3b.2013Dec10.summarizeFiting.H2O2LOH.R

*** Useful R tips

Read csv is much faster than xlsx.

tb = read.csv(fullFileName, colClasses=c("character",NA, NA, "character", rep("numeric",8 ), NA));

options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)print(args)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandsArgs(trailingOnly=FALSE))i = as.integer(args[1])
j = as.integer(args[2])
x = seq(i, j)
print(x)

R -f R-args.R --args 2 5
Rscript file

#from lower case to upper case

chartr(old, new, x)

tolower(x)

toupper(x)

casefold(x, upper = FALSE)

require(stringr)
conditions$media[r] = str_replace( conditions$media[r], "\\/", "")

tb$AssignmentTotal= apply(tb[, assignments], 1, FUN=function(x){sum(x,na.rm=T)} )

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-create-rotated-axis-labels_003f

cumsum()
with()

R -f filename

axis( 2, at=pretty(tbf$s), tcl=0.2, las=2 ) #rotate axis labels

text( tb$G + 0.01*nchar(tb$strain)/4, log10(tb$R0)-0.1*nchar(tb$strain)/4, tb$strain, pos=3)

layout(mat, heights= c( 1.15, rep(1, nrow(mat)-2), 1.2) );
par(mar=c(5.1,4.1,4.1,2.1)
http://www.r-bloggers.com/setting-graph-margins-in-r-using-the-par-function-and-lots-of-cow-milk/

text ( aa, bb, t, cex=0.8);

####aplha
names(fit)[ grep("alpha", names(fit))]
fit_alpha_tb = data.frame( t( fit[, grep("alpha", names(fit)) ]))
rownames(fit_alpha_tb) = names(fit)[grep("alpha", names(fit))]
fit_alpha_tb$names = gsub("_.*", "", rownames(fit_alpha_tb))

library(RColorBrewer);
#hmcol = colorRampPalette(brewer.pal(5,"RdBu"))(8);
hmcol = colorRampPalette(brewer.pal(3,"Blues"))(8);

format(Sys.time(), "%a %b %d %H:%M:%S %Y")

format(Sys.time(), "%Y%b%d_%H%M%S")

#regular expression

require(org.Sc.sgd.db)
x <- org.Sc.sgdALIAS
ls(x)[grep("^Y..\\d{3}", ls(x))]

http://www.regular-expressions.info/rlanguage.html
http://www.r-bloggers.com/regular-expressions-in-r-vs-rstudio/

list.files for the contents of a directory.

normalizePath for a ‘canonical’ path name.

(WD <- getwd())

if (!is.null(WD)) setwd(WD)

require(xlsx) # read Excel in R.

Usage

! x
x & y
x && y
x | y
x || y
xor(x, y)

rm(list=ls() );



unlist(strsplit("a.b.c", "\\."))
-----
str(x)
attributes(x)
--------------
outer( month.abb, 1999:2003, FUn="paster");

Letters <- c( LETTERS, letters);
Letters[ ! sapply(Letters, function(xx) exists(xx) ) ]; # anonymous function as a wrapper for a primitive function

------------
 legend(100,60, seq(100,200,1), lty=1) # line legends

  Library(MASS); example(Skye); #tenary plot

 library(help = survivial)   

ColorBrewer.org

useful comnds:
x11; factor; relevel; class; loess; contour; is.element; math %in%; grep; sample; nrow; 
grepmisc: hist2d
url()
---
class and object
CA@a[1]

----

test1 <- list( time= c(4, 3,1,1,2,2,3),
  status=c(1,NA,1,0,1,1,0),
  x= c(0, 2,1,1,1,0,0),
  sex= c(0, 0,0,0,1,1,1))
coxph( Surv(time, status) ~ x + strata(sex), test1) #stratified model

----
delete NA form matrix

 > x<-matrix(1:16,4,4)
 > x[col(x)>=row(x)]<-NA
 > x[,! apply(x,2,function(x) all(is.na(x))) ]
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]    2   NA   NA
[3,]    3    7   NA
[4,]    4    8   12

----
? R/Splus Perl interface   RSperl
? R Pythong interface Rpy Rpython
not in CRAN
----
date.grouping <- function(d) {
  # for ea date in d calculate date beginning 6 month period which contains it
  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
  f <- function(x) do.call( "ISOdate", as.list(x) )
  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
  breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf)
  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) }

----
nonlinear regression
library(nls)
----
http://www.bioconductor.org/
----
library(lattice)
----
persp()
----
las=1 or 2
You can use the graphics parameter "srt" to rotate displayed text by a specified number of degrees,
 e.g. srt=45 to put it on an angle, srt=90 to put it vertical.  

----
cnams = dimnames(aa)[[2]]

cnams[which(cnams == 'blah3.Mg')] = 'Mg (%)'
...
dimnames(aa)[[2]] = cnams
----
eval(substitute(lf <- locfit(~s, data=age), list(s=s)))
------
  sub=sort(sample(x,200, replace=F))
  postscript("try.ps")
  matplot(x[sub],y[sub,],type="l",lwd=5)     
  dev.off()

-----

>Does anyone know if R has the functionality to calculate a simple 
>moving average. I cant seem to find it in the help menu.

filter in library ts. does filter() do what you need?
Or look at the 'running' function in the gregmisc package.


moving.average <- 
function(x, k) { 
 n <- length(x) 
 y <- rep(0, n) 
 for (i in (1+k):n) 
   y[i] <- mean(x[(i-k):i]) 
 return(y)
 } 

----
tree packages
----

# Create an Example Data Frame Containing Car x Color data, with long car names 
carnames <- c("BMW: High End, German",
              "Renault: Medium End, French",
              "Mercedes: High End, German", 
              "Seat: Imaginary, Unknown Producer")
carcolors <- c("red","white","silver","green")
datavals <- round(rnorm(16, mean=100, sd=60),1)
data <- data.frame(Car=rep(carnames,4),
                   Color=rep(carcolors, c(4,4,4,4) ),
                   Value=datavals )

# generate balloon plot with default scaling, the column labels will overlap 
# balloonplot( data$Color, data$Car, data$Value)


# try again, with column labels rodated 90 degrees, and given more space 
balloonplot( data$Car, data$Color, data$Value, colmar=3, colsrt=90)



----

Here is a very rough addlogo() using pixmap:

"addlogo" <- function(x, y, pixmap) {
    if (is.list(x)) {
        y <- x$y
        x <- x$x
    }
    else if (missing(y)) 
        stop("missing y")
    if (!is.numeric(x) || !is.numeric(y)) 
        stop("non-numeric coordinates")
    if ((nx <- length(x)) <= 1 || nx != length(y) || nx > 2) 
        stop("invalid coordinate lengths")
    pixmap@bbox[1] <- x[1]
    pixmap@bbox[2] <- y[1]
    pixmap@bbox[3] <- x[2]
    pixmap@bbox[4] <- y[2]
    pixmap@cellres[1] <- (pixmap@bbox[3] - pixmap@bbox[1]) / pixmap@size[2]
    pixmap@cellres[2] <- (pixmap@bbox[4] - pixmap@bbox[2]) / pixmap@size[1]
    plot(pixmap, add=TRUE)
    invisible(pixmap)
}

which will work with locator() too. To maintain aspect, it shouldn't alter 
the relative cell resolutions, and should just use the new x or y, bur 
this is the general case. The handling of the location of the logo is 
copied & pasted from legend().



----
x <- readLines(myfile)
strsplit(substring(x,8),split="")
----

Monday, January 13, 2014

Sharing ipad screen on an Apple laptop

Air server

On ipad, look for 'Airplay' icon in the application panel by double-clicking home button

Konical Minotla c745 osX 10.6.8 install, failed.

http://onyxweb.mykonicaminolta.com/OneStopProductSupport?productId=1487

The install basically hangs in the above window..., ....

How to write lab notes

Key elements:
Date
Title of experiments
Names of all members
Aims,
Materials, methods and procedures, such as sequences of primers.
Results and analysis, such as pictures of gels, plate, colony counts, etc. Lanes in gels should be labeled.
Summary/Conclusions
References

Relevant links:
http://ecee.colorado.edu/~mcleod/teaching/ugol/references/Good%20Lab%20Notebooks%20by%20Carter.pdf

http://misterguch.brinkster.net/sept2000.pdf

Sample notebooks
http://hongqinlab.blogspot.com/2012_02_01_archive.html
http://glanglab.com/notebooks.html
http://en.wikipedia.org/wiki/Open_notebook_science#cite_note-usefulchem.wikispaces-8

Gene Ontology dataset

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

Friday, January 10, 2014

Lotusnote traveler, iPad/iPhone

Pearson Chi-square test with specified probability using chisq.test() in R

chisq.test function in R

> chisq.test
function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
    rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)
{
    DNAME <- deparse(substitute(x))
    if (is.data.frame(x))
        x <- as.matrix(x)
    if (is.matrix(x)) {
        if (min(dim(x)) == 1L)
            x <- as.vector(x)
    }
    if (!is.matrix(x) && !is.null(y)) {
        if (length(x) != length(y))
            stop("'x' and 'y' must have the same length")
        DNAME2 <- deparse(substitute(y))
        xname <- if (length(DNAME) > 1L || nchar(DNAME, "w") >
            30)
            ""
        else DNAME
        yname <- if (length(DNAME2) > 1L || nchar(DNAME2, "w") >
            30)
            ""
        else DNAME2
        OK <- complete.cases(x, y)
        x <- factor(x[OK])
        y <- factor(y[OK])
        if ((nlevels(x) < 2L) || (nlevels(y) < 2L))
            stop("'x' and 'y' must have at least 2 levels")
        x <- table(x, y)
        names(dimnames(x)) <- c(xname, yname)
        DNAME <- paste(paste(DNAME, collapse = "\n"), "and",
            paste(DNAME2, collapse = "\n"))
    }
    if (any(x < 0) || any(is.na(x)))
        stop("all entries of 'x' must be nonnegative and finite")
    if ((n <- sum(x)) == 0)
        stop("at least one entry of 'x' must be positive")
    if (simulate.p.value) {
        setMETH <- function() METHOD <<- paste(METHOD, "with simulated p-value\n\t (based on",
            B, "replicates)")
        almost.1 <- 1 - 64 * .Machine$double.eps
    }
    if (is.matrix(x)) {
        METHOD <- "Pearson's Chi-squared test"
        nr <- nrow(x)
        nc <- ncol(x)
        sr <- rowSums(x)
        sc <- colSums(x)
        E <- outer(sr, sc, "*")/n
        v <- function(r, c, n) c * r * (n - r) * (n - c)/n^3
        V <- outer(sr, sc, v, n)
        dimnames(E) <- dimnames(x)
        if (simulate.p.value && all(sr > 0) && all(sc > 0)) {
            setMETH()
            tmp <- .C(C_chisqsim, as.integer(nr), as.integer(nc),
                as.integer(sr), as.integer(sc), as.integer(n),
                as.integer(B), as.double(E), integer(nr * nc),
                double(n + 1), integer(nc), results = double(B))
            STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE)) #Pearson chisq
            PARAMETER <- NA
            PVAL <- (1 + sum(tmp$results >= almost.1 * STATISTIC))/(B +
                1)
        }
        else {
            if (simulate.p.value)
                warning("cannot compute simulated p-value with zero marginals")
            if (correct && nrow(x) == 2 && ncol(x) == 2) {
                YATES <- min(0.5, abs(x - E))
                if (YATES > 0)
                  METHOD <- paste(METHOD, "with Yates' continuity correction")
            }
            else YATES <- 0
            STATISTIC <- sum((abs(x - E) - YATES)^2/E)
            PARAMETER <- (nr - 1) * (nc - 1)
            PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
        }
    }
    else {
        if (length(x) == 1L)
            stop("'x' must at least have 2 elements")
        if (length(x) != length(p))
            stop("'x' and 'p' must have the same number of elements")
        if (any(p < 0))
            stop("probabilities must be non-negative.")
        if (abs(sum(p) - 1) > sqrt(.Machine$double.eps)) {
            if (rescale.p)
                p <- p/sum(p)
            else stop("probabilities must sum to 1.")
        }
        METHOD <- "Chi-squared test for given probabilities"
        E <- n * p
        V <- n * p * (1 - p)
        names(E) <- names(x)
        STATISTIC <- sum((x - E)^2/E)#Pearson chi-sq with input p
        if (simulate.p.value) {
            setMETH()
            nx <- length(x)
            sm <- matrix(sample.int(nx, B * n, TRUE, prob = p),
                nrow = n)
            ss <- apply(sm, 2L, function(x, E, k) {
                sum((table(factor(x, levels = 1L:k)) - E)^2/E)
            }, E = E, k = nx)
            PARAMETER <- NA
            PVAL <- (1 + sum(ss >= almost.1 * STATISTIC))/(B +
                1)
        }
        else {
            PARAMETER <- length(x) - 1
            PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
        }
    }
    names(STATISTIC) <- "X-squared"
    names(PARAMETER) <- "df"
    if (any(E < 5) && is.finite(PARAMETER))
        warning("Chi-squared approximation may be incorrect")
    structure(list(statistic = STATISTIC, parameter = PARAMETER,
        p.value = PVAL, method = METHOD, data.name = DNAME, observed = x,
        expected = E, residuals = (x - E)/sqrt(E), stdres = (x -
            E)/sqrt(V)), class = "htest")
}

Thursday, January 9, 2014

Spelman portal, IE setting

https://my.spelman.edu/group/mycampus/home