Friday, January 31, 2014

R, permute network without selfpairing, bug fix

#permute merged yeast PPI+GIN, fix a bug that insert "NA" into new networks. 

debug = 0
permute.pairs.wo.selfpairs = function( inpairs,  ncycles=10, debug=1 ) {#bug, id1-id2 and id2-id1 are not considered. 
  if (ncycles >= 1 ) {
    if(debug) {
      print(paste('ncycles=', ncycles))
    longids = c(as.character(inpairs[,1]), as.character(inpairs[,2]) )
    longids = sample(longids)
    len = length(inpairs[,1])
    newpairs = data.frame( cbind( longids[1:len], longids[(len+1): (2*len)]) )
    names(newpairs) = c('id1', 'id2')
    newpairs$id1 = as.character( newpairs$id1)
    newpairs$id2 = as.character( newpairs$id2)    
    newpairs$selfpairs = ifelse( newpairs$id1 == newpairs$id2, 1, 0 )
    self.tb = newpairs[ newpairs$selfpairs==1, ]
    nonself.tb = newpairs[newpairs$selfpairs==0, ]
    if(debug) {
    if( length(self.tb[,1])>=1 ) {
      if ( ncycles == 0) { 
        #return (c(NA,NA, NA) );
        print(paste("ncycles reached zero, ncycles"),ncycles)
      } else {
        ncycles = ncycles - 1
        splitPos = round( length(self.tb[,1]) * sqrt(ncycles) ) + 5  #2014 Jan 31 changes
        selectedpairs = rbind(self.tb,  nonself.tb[1: splitPos, ] )
        restpairs = nonself.tb[ (splitPos + 1): length(nonself.tb[,1]), ]
        return( rbind(restpairs, permute.pairs.wo.selfpairs(selectedpairs, ncycles)))
    } else {  
      return (newpairs)
  } else {
    return( c(NA,NA,NA )) 

#write.table(pairs, "", quote=F, row.names=F, col.names=F, sep='\t')
net = read.table( "", header=F, sep="\t", colClass = c("character", "character") )
if(debug==9) { 
  #net = read.table('',header=F) 
 net = net[1:90000,]

net2 = permute.pairs.wo.selfpairs( net  )
write.csv(net2, "/tmp/net2.csv")

#do they have the same degree?
t1 = table(c(net[,1],net[,2]))
t2 = table(c(net2[,1],net2[,2]))
comp <- t1 == t2
tf = comp[comp==F]; tf

#note, this R code runs even faster than ms02, presumably because ms02 spent time on network configuration. 

#2014 April 7. Bug found. id1-id2, id2-id1 should be considered. 

ms02 run, ubuntu server
ms02 -i -o output/ -d 2

nohup sh &

2014-1-31, 12:42 1.8% MEM
System clearly responded slower as soon as the job started.

2014-1-31, 12:46, 'top' shows that 'free' Mem is declining rapidly, suggesting memory leaking of 'ms02'.

Thursday, January 30, 2014

Hill coefficient and coperativity, Hill equation

Hill equation
fraction of occupied sites = 1 /((K/L)^n+1)  for positive n

When n -> -n,

the Hill equation become:
fraction of occupied sites = 1/ ((L/K)^n + 1)
which can be used to model inhibition.

Wednesday, January 29, 2014 and NSF project report guideline

Sample variance, n-1

Tuesday, January 28, 2014

Lotus Notes, archive emails to release disk quota.

Archive emails.

After archiving, check the email quota.

The archived email can be found at:

Moments and their generating functions


moments generating functions

BIO233, asceptic technique, streak plates, Jan 28, 2014

Eight tubes of Bacillus, 30C incubation.

All students have their own nutrient plates. Students were asked to write their full-last names on the back of the plates.

I did a quick demo, open the tube and tuck the cap with my right small finger, handling the loop with right hand, and the sample tube with left hand.

Streak yeast frozen stocks

Some yeast can precipitate to the bottom of the glycerol stock, especially if they have been thrawed before. Sometime, the white pellet of cells can even be seen at the bottom. In this case, sterile wood sticks work better can tips.

Monday, January 27, 2014

ABRCMS document directory;O=D

Sort by modification time can be used to search document by years.

Out of application disk space caused by 'ms02'

helen:ms02GINPPI hqin$ cat
ms02 -i -o output/ -d 2

'ms02' may have memory leak and ran out memory.

I remotely login to '', and found that the shell still works.

I should try R script for comparison.

Friday, January 24, 2014

BIO125, Jan 23, 2014, quiz1, serial dilution, ApE

I spent 20 minutes on Quiz1, which covers unit conversion, how to make solutions.  Among the 12 students, about 4 students did very well.  About 3 students did not get most questions.

I then demonstrated the correct usage of micropipette.  I gathered 5 balances for the lab practice. Common problems:
Most students are not sure how to read the volumes from the various models of micropipette. 
Many students are not sure what kind of tips to choose for different micropipette.
Some students use hands to pick tips and then put them back after usage.

Rare problems:
One student used micropipette without tips.

After about 1 hour of practice, the serial dilution started.  I use the white board to explain the procedure and standard curve plot. The actual lab went through in about 0.5 hour.

I then started the ApE exercise, how to find gene sequence from GenBank, download it, and load into ApE, find ORF, and translate.

Common problem for the gene sequence analysis:
Many students are unclear about exons and ORF.
Several students submitted the nucleotide sequences of ORF as protein sequences.

Thursday, January 23, 2014

BIO233, Jan 23, 2014, Thu,

Use item analysis to see student performance in quiz 1 and 2.

 The Proscope HR in the podium Windows computer is a Pro-version, and offers different versions of filters. I could adjust filters to tune the color of images to show 'purple' and 'pink'.  One problem is that there were too much dust on the len.  The dust appears as round shaped cells and look confusing. I showed Bacillus and cocci sample using the 40X objective. I was unable to focus spirillum sample using Proscope on screen even using the oiled 100x objective. Maybe the Proscope lens really need to be cleaned.

Wednesday, January 22, 2014

graduate school interview

Common questions

BIO233 Lab, microscope and scale of microbial world,

There is another class before BIO233L, which left me only 10 minutes to prepare the microbe lab. 

Use youtube tutorial for microscope usage.

Many students chose to work as group, even though I emphasized that microscope practical test require independent effort.

Problems that students had during the lab: Some used 4x objective to identify bacterial cell shape. Some mistook dye debris for cells.  Pink can be mistaken as purple. Because of fission growth, bacillus tend to form long stretch of line.  These lines were mistaken as spirillum.

In the future, I should spent more time on  Proscope demo show students what cocci, rod, and spirillum cells look like.  One problem is that most students could not wait to get their hands on the microscope before I even finished the demo. I probably can ask student volunteer to demonstrate the different shapes of bacteria.
One problem is that 'purple' comes out as 'pink' in Proscope images.  Another faculty told me that I should be able to adjust the 'filter' parameter in ProscopHR software to tweak the color of the images.

For the calculation exercises from the scale of microbial world, many students do not know what numbers go to numerator or denominator.

Proscope HR setup with Nikon YS2-T, OS 10.6 laptop.

Download software from
Pick Mac version.

Make sure that shutter for the vertical path is open before starting Proscope software.

Tuesday, January 21, 2014

BIO233, Jan 21, 2014, Chapter 1, review of microbiology

17 students finished pre-lecture assignments. Many students are not familiar with the 'adaptive mode' for Moodle quizzes.

13 students are on time. I played a youtube video during the waiting time. The video is microbiology overview. I then went through chapter 1, with clicker questions focused on comparing the 1900 and today's major cause of deaths in human populations. The goal is to teach students how to read figures. I started with figure legends, title, axes. Then compare the changes of colored bars in some categories.
I then briefly explain compound microscope. I used two plastic sheets to demonstrate the concept of 'contrast'. 

Contrast demo using plastic sheets. Use color to demonstrate contrast.

Announced that there will be a Lab on Wed on microscope and scale of microbes. 

BIO125, Jan 21, 2014, MSH2, serial dilution with graphing paper and playdough

1.5 hours spent on going over reading assignments, the abstract of Gammie Genetic 2007. 
PubMed usage

I used beads again to explain microsatellite and replication errors.

I spent about 40 minutes on sequence alignment, conservation to explain 'cognant' sites.
I use different models of cars to explain that 'wheels' and 'driver seat' are conserved, but the 'back seat' are not.

>1hour on serial dilution with graphing paper and playdough. 10X, 1X, 0.1X
Some students have trouble to understand 10X and 1X and how are connection to absolute concentrations.
C1V1 = C2V2
When applying the mass conservation formula, some students have trouble to decide which volume should the unknown.
Annoucement: read protocol 3.3, mathbench assignment 3.1 for quiz on Thursday. 

Monday, January 20, 2014

Smi-log plot, GINPPI mortality rate

The amazingly linear log(m)~t indicate m ~ exp(t), Gompertz model. Lifespan of 2000 individuals were simulated using merged GINPPI, p=0.7, lambda=0.005

I can also identify the best linear section based on R^2. The best linear section is informative on 'initial virtual age'.

Todo: overlay simulated lifespan with different lambda in p=0.7 figures. Esimate G and R and Makeham constant from the linear plot.

When lamba is small and p is high, the initial bin can become too small, as little as 1 per bin. This would lead to highly inaccurate mortality rate estimations. This numerical problem can be mitigated by increase simulation sample size.

The above simulation shows that when bin-size < 5 (in both left and right tails), the mortality rate estimation are very noises.

*** Output merged yeast GIN+PPI data

20171026: bug found. ORF1 and ORF2 order are not checked, so pairs may not be unique. 

In 20131221.DIPGIN.sim.aging_v2.R

Move '" to  "~/projects/0.ginppi.reliability.simulation/data".

# todo,
# permutation effect on aging?
# lambda ~ 1/connectivity of nodes

# 2013 Dec 20, merge DIP PPI and Genetic Inxt Net -> Multi-net approach


list.files(path='data', )

debug = 0;

#yeast PPI
#pairs = read.csv('data/pairs.csv', colClasses=c('character','character'))
#this yeast ppi dataset is consistent with Taiwan group's report.
dip = read.csv("data/yeastDIP.csv")
pairsPPI = dip[,c(1,2)]
pairsPPI$ORF1 = as.character(pairsPPI$ORF1)
pairsPPI$ORF2 = as.character(pairsPPI$ORF2)
pairsPPI = pairsPPI[ pairsPPI$ORF1 != pairsPPI$ORF2, ]#remove self-intxns

# yeast genetic network
#gPairs = read.csv("data/sgadata_costanzo2009_stringentCutoff_101120.csv", header=F)
gPairs = read.csv("data/sgadata_costanzo2009_lenientCutoff_101120.csv", header=F)
names(gPairs) = c("ORF1", "Name1", "ORF2", "Name2", NA, NA, NA)
gPairs$ORF1 = as.character( gPairs$ORF1 )
gPairs$ORF2 = as.character( gPairs$ORF2 )

#merge PPI and GIN
pairs = rbind(pairsPPI[,c("ORF1","ORF2")], gPairs[,c("ORF1","ORF2")])

# remove self-intxns
pairs = pairs[ pairs$ORF1 != pairs$ORF2, ]
# 96851 pairs for DIP+GIN.strigentCutoff
# 786118 for DIP+GIN.lenientCutoff

#essential gene info
essenTb = read.csv('data/SummaryRegressionHetHom2013Oct29.csv', colClasses=rep('character', 9))

# How do the two data set overlap? DIP seems to contain some questionable orfs
uniq.orf.from.pairs = unique(c(pairs$ORF1, pairs$ORF2)) #4207 ORF
matches = uniq.orf.from.pairs %in% unique(essenTb$orf)
# 720  5507

unmatchedORF = uniq.orf.from.pairs[! matches]

matches = uniq.orf.from.pairs %in% unique(essenTb$orf[essenTb$essenflag=='essential'])
#5171   1056
#This is a good coverage

matches = uniq.orf.from.pairs %in% unique(essenTb$orf[essenTb$essenflag=='nonessential'])
# 1839  4388  #this is amazingly consistent with Taiwan group's report.

#remove unmatched orfs from pairs
pairs$Removeflag = ifelse( pairs$ORF1 %in%unmatchedORF | pairs$ORF2 %in%unmatchedORF, T,F   )
# 572221  213897

#So, the updated DIP has         19770 intxn
#So, the DIP+GIN.lenior lead to 572221 intxns

pairs = pairs[! pairs$Removeflag, ]
pairs = pairs[,1:2]  ##This set of pairs is read for analysis

#write.table(pairs, "", quote=F, row.names=F, col.names=F, sep='\t')

Todo, teaching, spring 2014

HP laptops.
Respondus lock, ApE installation to
Rstudio and R installation (update)

GINPPI reliability simulation, cutoff=4, popSize=2000

directory: ~/run27.2K.cutoff4/

Because I forgot to change the random seed, the simulations in the two runs are exactly the same. 

Saturday, January 18, 2014

bio233, day 1, spring 2014. overview

Go over syllabus, grading policies, past students presentation videos, grading policies, etc. I show an A+ example of past student oral presentation. Explain the practice tests.

Discuss of the over-registration problem due to separated lectures and lab listings.

I used name tags to remember student names.  I went over 3 round to call each student name.

IRB, academic integrity form, photo and video release forms.

Video on cell phone microbe, white blood celll chasing E coli cells.

Went over first assignment: due before class. I found that Respondus puts 'answers' as part of the last choice, and it is a terrible bug.

BIO125 day 1, Jan 16 Thu, spring 2014, Overview and team building.

Go over syllabus, learning objectives.

Use name tags to introduce everyone.

Lab safety guidelines.

IRB, academic integrity, backgrounds, photo and video release forms.

Tested clickers using questions on some simple yeast facts.

Test HP laptop logins, wireless connections.

Students finished the surveys and self-exercises on DNA structures in class.

Use  play-dough to make right-handed DNA double-helices in class, a team building activity.  Two groups finished smoothly, The third group made a left-handed helix twice. I made a right-hand snake and a right-handed snake as demo. I show the students they should first make a pair of parallel strands,  make a right-hand twist, and then lay flat on the counter surface.  All groups made right AT and GC pairs. Most students googled on the number of hydrogen bonds and the number of bps in each DNA helix loop.

Three HP laptops in the rear corner on the left-hand side have wireless connection problems. Ethernet were used.

Friday, January 17, 2014

Fisher exact test, demo

Fisher test and hypergeometric tests are often used for enrichment test.

cluster = c(10, 90)
network= c(100,9000)
tb = rbind(cluster, network)
ft = fisher.test(tb)


See also hypergeometric test

Thursday, January 16, 2014

ApE on pMSH2, and mutant identification

Tools-> Align
Align yeastMSH2 CDS (remove last codon of TAA) to plasmid pMSH2 sequence.
There is a gap in the alignment, which was suggested to be the HA tag by Dr. Wang.

Edit -> Select
Select from 3378 to 6302 MSH2 CDS for 975AA
New Feature, feature type -> CDS

enzymes, Graphic map

The same method can be used to identify mutant msh2 ORF.  Translated mutant ORF can be used to align with WT and identify mutations.

Resources on apE (a plasmid editor)

Tuesday, January 14, 2014

writing notes

apply something to doing something

compare to and with

vowels and consonants
adjectives, adverbs, articles, clauses, pronouns

seek help (not helps) from the instructor. 



weighted vs waited

Students have problems with spelling

Shimon Gibson, an American archaeologist who was not involved in the dig, said the find was truly amazing, less because of its Roman origins than for its precious nature.

Introduction introductory

More than one method IS (not are) used.  

Different kind of feedback (no plural form)

commensurate with: equal in measure or extent with sth

Inte(:i)grate inte(:e)grity


attenuate, mitigate, weaken, decrease, 


spread, spread, spread

esoteric--understood by only a group of people

illude: to deceive
allude: to mention indirectly

undecamic complex ??

formal members -- formally speaking
this is the same as somthing (a noun);   this is same as we did last time (a sentence)
misleading, not "miss leading"
lose loose
cotton (catn, not corton)
a smattering of Americans and swarms of executives from China,
jury vs jewelry
devin - david
spun down
assist in doing sth
impose on you
to reckon with
recite recitation

labatory not labatorial

cash == cache

produce product
analyze analysis
subsequent  's&b-si-kw&nt <-> sequent
aerobic, anaerobic
red, retrograde
conducive work environment
phone [' fOn]

principal investigator, school principal
inform somebody of doing something
consider of doing something


WORD format for Respondus

Questions are numbered 1, 2, 3, ...
Answers are numbered A, B, C, D, ...

Tips: It appears format such as tab, parenthesis, space should be as consistent as possible for Respondus to work properly.

20140921. Format with extra line space in DOCX seemed to cause problem.

Paper activity for serial dilutions

See also youtube demo at

Make colored copies of graphing papers: red, blue, and white.

Basic idea: 1 cm2 of paper = 1 part of Things or water.
Red papers represent Things
Blue papers represent water.
White papers are containers.

For 10X red stock:
Draw a 20 parts rectangle in a white graphing paper. This is volume of the 10X stock solution.
Cut 10 part red and 10 part blue paper to fill the space.
The concentration of this solution is 50% red.

To make 1X 20 parts solution, we need 2 parts of 10X. The 2 parts of 10X contain 1 part of red and 1 part of blue (water).
Draw a 20 parts rectangle, cut 1 part of red and 1 part of blue from the 10X stock area, and move to the 1X area. The remaining area of the 1X stock is 18 parts and should be filled with water (blue papers).
The concentration of this 1X solution is 5% red.

To make 0.1X 20 parts solution, we need 2 parts of 1X which contains 2x5%=0.1 part of red and 1.9 parts of blue (water). The concentration of this solution is 0.5% red.

-----------------------The older version,  v1---------------
Make colored copies of graph papers: red, blue, and white.

Basic idea: 1 cm2 of paper = 1 part of things or water

10X red stock: 10 cm2 red paper + 10 cm2 white paper.
The concentration of this solution is 50% red, and its total volume is 20 parts.

5X blue stock: 5 cm2 blue paper + 15 cm2 white paper.
The concentration of this solution is 25% blue, and its total volume is 20 parts.

Exercise 1: To make a volume of 20 parts solution of 1X red and 1X blue, we need 1 part of red and 1 part of blue: 
 1. Use scissor to cut 1 part (1cm2) of red and 1 part (1cm2) of water  from 10X red stock, add to an empty beaker labelled with "1X red, 1X blue".
  2. Use scissor to cut 1 part (1cm2) of blue and 3 part of (3 cm2) of water from 5X blue stock, add to the '1X red 1Xblue' beaker.
  3. How much water should we add?
  20 - 1(red) - 1(water) - 1 (blue) - 3 (water) = 14 parts of water.
This 1X solution should contain 1cm2 red, 1cm2 blue, and 18 cm2 water. 

Exercise 2: To make a 1/10 X solution in a volume of 50 parts: 
1. Take 10 part of 1X: Cut 0.5 part red, 0.5 part blue, 9 part of water from 1X stocks
2. Add 50 - 0.5 - 0.5 - 9 = 40 part of water.

Materials needed: Three beakers; 1 piece of red graph paper, 1 piece of blue graph paper, 3 pieces of white graph papers; 1 Scissor.

H2O2-LOH, strain-averaged data, distribution of Cv and Cb.

Distribution of Cv and Cb in yeast natural isolates

Stain-averaged data are in file "LOHH2O2_averaged20131210_v1.csv".  These data were obtained by supervised gnls fitting.

Distribution of H2O2-viability midpoints (Cv) :
Largest change is 0.17/0.0065 = 26 fold
Histogram of H2O2-viability is:

Distribution of H2O2-LOH midpoints (Cb) :
Largest change is 0.086/0.0065 = 13 fold.
Histogram H2O2-LOH is:

Working directory is   ~/projects/0.LOH_H2O2_2012-master/analysis
Script is _3b.2013Dec10.summarizeFiting.H2O2LOH.R

*** Useful R tips

Read csv is much faster than xlsx.

tb = read.csv(fullFileName, colClasses=c("character",NA, NA, "character", rep("numeric",8 ), NA));

options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)print(args)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandsArgs(trailingOnly=FALSE))i = as.integer(args[1])
j = as.integer(args[2])
x = seq(i, j)

R -f R-args.R --args 2 5
Rscript file 

#from lower case to upper case
chartr(old, new, x)

casefold(x, upper = FALSE)

    conditions$media[r] = str_replace( conditions$media[r], "\\/", "")

tb$AssignmentTotal= apply(tb[, assignments], 1, FUN=function(x){sum(x,na.rm=T)} )


R -f filename

axis( 2, at=pretty(tbf$s), tcl=0.2, las=2 )  #rotate axis labels

text( tb$G + 0.01*nchar(tb$strain)/4, log10(tb$R0)-0.1*nchar(tb$strain)/4, tb$strain, pos=3)

layout(mat, heights= c( 1.15, rep(1, nrow(mat)-2), 1.2) );

 text ( aa, bb, t, cex=0.8);

names(fit)[ grep("alpha", names(fit))]
fit_alpha_tb = data.frame( t( fit[, grep("alpha", names(fit)) ]))
rownames(fit_alpha_tb) = names(fit)[grep("alpha", names(fit))]
fit_alpha_tb$names = gsub("_.*", "", rownames(fit_alpha_tb))

#hmcol = colorRampPalette(brewer.pal(5,"RdBu"))(8);
hmcol = colorRampPalette(brewer.pal(3,"Blues"))(8);

format(Sys.time(), "%a %b %d %H:%M:%S %Y")
format(Sys.time(), "%Y%b%d_%H%M%S")

#regular expression 
x <- org.Sc.sgdALIAS
ls(x)[grep("^Y..\\d{3}", ls(x))]

list.files for the contents of a directory.
normalizePath for a ‘canonical’ path name.

(WD <- getwd())
if (!is.null(WD)) setwd(WD)

require(xlsx) # read Excel in R.


! x
x & y
x && y
x | y
x || y
xor(x, y)

rm(list=ls() );

unlist(strsplit("a.b.c", "\\."))
outer(, 1999:2003, FUn="paster");

Letters <- c( LETTERS, letters);
Letters[ ! sapply(Letters, function(xx) exists(xx) ) ]; # anonymous function as a wrapper for a primitive function

 legend(100,60, seq(100,200,1), lty=1) # line legends

  Library(MASS); example(Skye); #tenary plot

 library(help = survivial)

useful comnds:
x11; factor; relevel; class; loess; contour; is.element; math %in%; grep; sample; nrow; 
grepmisc: hist2d
class and object


test1 <- list( time= c(4, 3,1,1,2,2,3),
  x= c(0, 2,1,1,1,0,0),
  sex= c(0, 0,0,0,1,1,1))
coxph( Surv(time, status) ~ x + strata(sex), test1) #stratified model

delete NA form matrix

 > x<-matrix(1:16,4,4)
 > x[col(x)>=row(x)]<-NA
 > x[,! apply(x,2,function(x) all( ]
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]    2   NA   NA
[3,]    3    7   NA
[4,]    4    8   12

? R/Splus Perl interface   RSperl
? R Pythong interface Rpy Rpython
not in CRAN
date.grouping <- function(d) {
  # for ea date in d calculate date beginning 6 month period which contains it
  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
  f <- function(x) "ISOdate", as.list(x) )
  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
  breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf)
  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) }

nonlinear regression
las=1 or 2
You can use the graphics parameter "srt" to rotate displayed text by a specified number of degrees,
 e.g. srt=45 to put it on an angle, srt=90 to put it vertical.  

cnams = dimnames(aa)[[2]]

cnams[which(cnams == 'blah3.Mg')] = 'Mg (%)'
dimnames(aa)[[2]] = cnams
eval(substitute(lf <- locfit(~s, data=age), list(s=s)))
  sub=sort(sample(x,200, replace=F))

>Does anyone know if R has the functionality to calculate a simple 
>moving average. I cant seem to find it in the help menu.

filter in library ts. does filter() do what you need?
Or look at the 'running' function in the gregmisc package.

moving.average <- 
function(x, k) { 
 n <- length(x) 
 y <- rep(0, n) 
 for (i in (1+k):n) 
   y[i] <- mean(x[(i-k):i]) 

tree packages

# Create an Example Data Frame Containing Car x Color data, with long car names 
carnames <- c("BMW: High End, German",
              "Renault: Medium End, French",
              "Mercedes: High End, German", 
              "Seat: Imaginary, Unknown Producer")
carcolors <- c("red","white","silver","green")
datavals <- round(rnorm(16, mean=100, sd=60),1)
data <- data.frame(Car=rep(carnames,4),
                   Color=rep(carcolors, c(4,4,4,4) ),
                   Value=datavals )

# generate balloon plot with default scaling, the column labels will overlap 
# balloonplot( data$Color, data$Car, data$Value)

# try again, with column labels rodated 90 degrees, and given more space 
balloonplot( data$Car, data$Color, data$Value, colmar=3, colsrt=90)


Here is a very rough addlogo() using pixmap:

"addlogo" <- function(x, y, pixmap) {
    if (is.list(x)) {
        y <- x$y
        x <- x$x
    else if (missing(y)) 
        stop("missing y")
    if (!is.numeric(x) || !is.numeric(y)) 
        stop("non-numeric coordinates")
    if ((nx <- length(x)) <= 1 || nx != length(y) || nx > 2) 
        stop("invalid coordinate lengths")
    pixmap@bbox[1] <- x[1]
    pixmap@bbox[2] <- y[1]
    pixmap@bbox[3] <- x[2]
    pixmap@bbox[4] <- y[2]
    pixmap@cellres[1] <- (pixmap@bbox[3] - pixmap@bbox[1]) / pixmap@size[2]
    pixmap@cellres[2] <- (pixmap@bbox[4] - pixmap@bbox[2]) / pixmap@size[1]
    plot(pixmap, add=TRUE)

which will work with locator() too. To maintain aspect, it shouldn't alter 
the relative cell resolutions, and should just use the new x or y, bur 
this is the general case. The handling of the location of the logo is 
copied & pasted from legend().

x <- readLines(myfile)

Monday, January 13, 2014

Sharing ipad screen on an Apple laptop

Air server

On ipad, look for 'Airplay' icon in the application panel by double-clicking home button

Konical Minotla c745 osX 10.6.8 install, failed.

The install basically hangs in the above window..., ....

How to write lab notes

Key elements:
Title of experiments
Names of all members
Materials, methods and procedures, such as sequences of primers.
Results and analysis, such as pictures of gels, plate, colony counts, etc. Lanes in gels should be labeled.

Relevant links:

Sample notebooks

Gene Ontology dataset

Friday, January 10, 2014

Lotusnote traveler, iPad/iPhone

Pearson Chi-square test with specified probability using chisq.test() in R

chisq.test function in R

> chisq.test
function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
    rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)
    DNAME <- deparse(substitute(x))
    if (
        x <- as.matrix(x)
    if (is.matrix(x)) {
        if (min(dim(x)) == 1L)
            x <- as.vector(x)
    if (!is.matrix(x) && !is.null(y)) {
        if (length(x) != length(y))
            stop("'x' and 'y' must have the same length")
        DNAME2 <- deparse(substitute(y))
        xname <- if (length(DNAME) > 1L || nchar(DNAME, "w") >
        else DNAME
        yname <- if (length(DNAME2) > 1L || nchar(DNAME2, "w") >
        else DNAME2
        OK <- complete.cases(x, y)
        x <- factor(x[OK])
        y <- factor(y[OK])
        if ((nlevels(x) < 2L) || (nlevels(y) < 2L))
            stop("'x' and 'y' must have at least 2 levels")
        x <- table(x, y)
        names(dimnames(x)) <- c(xname, yname)
        DNAME <- paste(paste(DNAME, collapse = "\n"), "and",
            paste(DNAME2, collapse = "\n"))
    if (any(x < 0) || any(
        stop("all entries of 'x' must be nonnegative and finite")
    if ((n <- sum(x)) == 0)
        stop("at least one entry of 'x' must be positive")
    if (simulate.p.value) {
        setMETH <- function() METHOD <<- paste(METHOD, "with simulated p-value\n\t (based on",
            B, "replicates)")
        almost.1 <- 1 - 64 * .Machine$double.eps
    if (is.matrix(x)) {
        METHOD <- "Pearson's Chi-squared test"
        nr <- nrow(x)
        nc <- ncol(x)
        sr <- rowSums(x)
        sc <- colSums(x)
        E <- outer(sr, sc, "*")/n
        v <- function(r, c, n) c * r * (n - r) * (n - c)/n^3
        V <- outer(sr, sc, v, n)
        dimnames(E) <- dimnames(x)
        if (simulate.p.value && all(sr > 0) && all(sc > 0)) {
            tmp <- .C(C_chisqsim, as.integer(nr), as.integer(nc),
                as.integer(sr), as.integer(sc), as.integer(n),
                as.integer(B), as.double(E), integer(nr * nc),
                double(n + 1), integer(nc), results = double(B))
            STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE)) #Pearson chisq
            PARAMETER <- NA
            PVAL <- (1 + sum(tmp$results >= almost.1 * STATISTIC))/(B +
        else {
            if (simulate.p.value)
                warning("cannot compute simulated p-value with zero marginals")
            if (correct && nrow(x) == 2 && ncol(x) == 2) {
                YATES <- min(0.5, abs(x - E))
                if (YATES > 0)
                  METHOD <- paste(METHOD, "with Yates' continuity correction")
            else YATES <- 0
            STATISTIC <- sum((abs(x - E) - YATES)^2/E)
            PARAMETER <- (nr - 1) * (nc - 1)
            PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
    else {
        if (length(x) == 1L)
            stop("'x' must at least have 2 elements")
        if (length(x) != length(p))
            stop("'x' and 'p' must have the same number of elements")
        if (any(p < 0))
            stop("probabilities must be non-negative.")
        if (abs(sum(p) - 1) > sqrt(.Machine$double.eps)) {
            if (rescale.p)
                p <- p/sum(p)
            else stop("probabilities must sum to 1.")
        METHOD <- "Chi-squared test for given probabilities"
        E <- n * p
        V <- n * p * (1 - p)
        names(E) <- names(x)
        STATISTIC <- sum((x - E)^2/E)#Pearson chi-sq with input p
        if (simulate.p.value) {
            nx <- length(x)
            sm <- matrix(, B * n, TRUE, prob = p),
                nrow = n)
            ss <- apply(sm, 2L, function(x, E, k) {
                sum((table(factor(x, levels = 1L:k)) - E)^2/E)
            }, E = E, k = nx)
            PARAMETER <- NA
            PVAL <- (1 + sum(ss >= almost.1 * STATISTIC))/(B +
        else {
            PARAMETER <- length(x) - 1
            PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
    names(STATISTIC) <- "X-squared"
    names(PARAMETER) <- "df"
    if (any(E < 5) && is.finite(PARAMETER))
        warning("Chi-squared approximation may be incorrect")

    structure(list(statistic = STATISTIC, parameter = PARAMETER,
        p.value = PVAL, method = METHOD, = DNAME, observed = x,
        expected = E, residuals = (x - E)/sqrt(E), stdres = (x -
            E)/sqrt(V)), class = "htest")

Thursday, January 9, 2014