Monday, March 31, 2014

Virtual lab, mcgraw-hill

Cancer and cell cycle

archiving emails on lotus notes, 'byte'

Thursday, March 27, 2014

bio125, FOA assay of MSH2 mutants

Spin down 5 ml of culture (SD+LEU), wash with 1ml water, resuspend in 100 ul water, do 10X, 100X dilutions, then spot 90ul on FOA plate.

The FOA plate is -histidine - tryptophan - leucine -theorine + FOA.

Several groups did not draw the 3x3 square correctly. We found that the marking can be wiped off by ethanol and redrawn.

Three out of the five groups did not pipetted the small volumes correctly or did not mix the cells well before loading to FOA plates.

Two groups did not label the FOA plates and had to label the cover after sample are spotted.

bio233, chapter 10, bacterial genetics

I was not clear on transformation, transfection and transduction. I need to summarize a table for this.
The table is:
Transformation: Transfer DNA fragments into cells.
Transfection: Transfer virus DNA into cells
Transduction: Transfer DNA via phages into cells

R factor

Cis-trans complementation.

Wednesday, March 26, 2014

GAS 2014 poster, R based computing into biology curriculum.

(todo) play dough demo on topology of phylogeny

bio233, phylogeny lab on west nile virus

MEGA6 is over 350M and took extremely long time to download over Spelman wireless. I had to put them on flash drive and distribute it around.

Many students work out the phylogeny tree on their own.  Some even high-fived afterwards.

Once students started to work on the computer lab, it is hard to get their attention back. 

Bio233, reflection on microbes in environments.

This lab should be done at the start of the semester. Students then can use these microbes for Gram stain, and scale of the microbe exercises.

Student organizations at Spelman College

Phylogeny lab, west nile virus using envelop glycoprotein squences

Download MEAG from
This menu is written for a Windows computer.  Toolbar features are slightly different in Mac.

Open file "WNV.mas" with MEGA.  (In Mac, you need to start MEGA first, and then look for this file.)

In "Alignment Explorer", click "Translated Protein Sequences"

 Click "Yes" for selection and genetics tables.

 Translated proteins should look like this

Choose "Alignment" by "Align by ClustalW"

"OK" to use the default alignment setting.

Click "Data" -> "Export Alignment" -> "MEGA format".

Input title for the alignment

This alignment are for "protein coding sequences". 

In the main window of MEGA, choose "Phylogeny" -> "Construct/Test Neighbor-Joining Trees"

Choose default parameters.

We should be able to see the generated tree.

In order to save the image, we can "Image"->"Copy to Clipboard".

And then "Paste" to a WORD document. 

Note: Please feel free to explore many other features of MEGA. 

Online MEGA help file,

GIMP usage

pen, write,

Image -> flatten image

File -> export -> png.

Tuesday, March 25, 2014

bio125, lac operon, human genome, review transformation results, prepare FOA assay,

Go over midterm grades as of March 22.

Discussed lac operon for 1hour . Asked students to draw lac operons on board, and worked through a problem set on unknown mutation in lac operon.

Several students asked why glucose can lead to low level of cAMP. We google searched and did not seem to get a short and clear answer. I used a balance to explain the inverse correlation between glucose and cAMP.

5 min break.

1.5 hour on UCSC human genome browswer on MSh2 locus.
Problem: Two student met "page not found" error from UCSC browers. I put a direct  link is provided on Moodle to solve this problem.

I picked a SNP that is a nonsense mutation.

When students download the SNP text file and tried to input into Excel, quite a few had problems. I showed that copy-past would import the text file into Excel.

I went through every student to check that they got the Excel ouput hMSH2 SNPs for participation grades.

At the end of the class, students volunteered to say one thing that they learned in today's class. This is a good conclusion exercise. 

I then quickly went over 5FOA assay on URA3 and MSH2 function.

I asked students to take pictures of their transformation plates, upload to GoogleDoc for lab reports.

I did not have a chance to go over Poster preparation assignments.

Lab manual on UCSC human MSH2 locus

bio233, chapter 8, gene regulation, lac operon, attenuation, quorem sensing

Focused on lac operon. Questions can be designed for CAPbinding site or CAP or lacXYZ genes, such as frameshift. Frameshift in the one protein should not cause frame shit mutations in other genes in the same operon.

Video recording:

Monday, March 24, 2014

Network configuration on gompertz and weibull fitting (need follow up)

When p=0.6, pop=2000, original GINDIP fits much better to Gompertz with regards to Weibull, as compared to the ms02 permutated network. 

I need to generate a distribution using 100 ms02 networks.

Moodle, reassign questions to new category

Create new category in "category" tab.

Edit question and resave them as new questions in a different category.

This method can be used to select questions from assignments and generate exams. The questions can be sorted by item analysis. 

bio125, plans

human mutation to yeast mutation,

Yeast preferred codon, preferred codons

cloning strategy, mutagenic primer design

image of mutants cells, G1/S arrest?

Friday, March 21, 2014

mortality rate change at the first time point, bug fix

When I estimated the mortality rates from simulated network aging data, the first time points always have a very low value.  This abnormality bugged me because it contradicts the theoretic assumption on the initial stage, but I thought it is due to the boundary effect and did not investigate it for several weeks. Also, I thought there may be some differences between theoretical predictions and empirical data. I went over the code today and found out that it is due to typo in the code.  After correction, the initial stage of mortality rate change fit extremely well to the network aging model.
Lesson learned: I should be more confident about my theoretic models and should pay more attention to abnormality during numerical works.

Video tutorial watch peaked around midterm, spring semester, 2014

Thursday, March 20, 2014

bio233, ch7, gene expression in eukarya,

In the pre-class assignment, the most difficulty question has 83% percentage of correctness. So, this seems to be an easy chapter for students. I went through key concept to refresh student memory. I clickered several questions during the lecture.

During the demo of yeast genome, I misrember the number of genes as 5500. It should be 6500.

When I used UCSC genome broswer to show the exon, many student though they are introns, as shown by clickers.

The questions are on anticodon, transcription, hairpin, alternative splicing.

video @youtube

Bio125, yeast transformation

Forty minutes on transformation protocol and media secletion, SD-his-trp.  One student volunteer to explain the transformation and selection principle on the board.

During the discussion of ampR on pMSH2, I am not sure whether it would be expressed in yeast, because it may be under a bacterial promoter.

Some students were slow to grasp the selection media. I used food recipe as an analogy. I am lactose intolerant, if the Spelman cafeteria only have whole milk as food, I would not survive on spelman campus.

Students heated carrier DNA at 99C for 2 min, immediately chilled on ice.

During transformation, after cells were spun down, students were shown to pipitte the cells up-and-down with water to wash the cells.

Only the mutant plasmid was used for transformation.

To add 3 ul of plasmid using the 10ul pipette, some students mistook 0.3ul as 3ul.

Students used glass beads to spread transformed yeast on SD-his-trp plates. Most group did not what to do about resuspend cells with water. One group spread yeast cells on the cover.

To make sure that students pay attention, I told them that notebook will be checked. At the end of the class, I went over every student's note and gave a grade.

I made a mistake on the role of lithium acetate: Wikipedia show that lithium is used to permeate cell wall.
Lithium acetate is also used to permeabilize the cell wall of yeast for use in DNA transformation. It is believed that the beneficial effect of LiAc is caused by its chaotropic effect; denaturing DNA, RNA and proteins.[2]

Wednesday, March 19, 2014

bio125 potential new modules

ucsd genome browers

microarray on cell cycle

yeast two-hybrid.


human msh2-dna structure

cell image and ImageJ analysis. counting budded cells, cell cycle effect of msh2 mutants.

flow cytometer of msh2 mutants

BIO125, reminder list

Streak yeast cells at the start of the semester. Some mutants, especially the report strain grow extremely slow due to lack of MSH2. Mutant M707I also grow slow due to low expression.

2015Jan: Gammie said that slow growth are due to second mutation accumulated due to lack of MSH2.

bio233, lab, microbes in environments

Ask students to read CB kit. Focus on 'surface'. Emphasize 'steril' procedure.

Ask one student to do a demo. Make sure to open the end of the wooden stick (not the cotton heads). Use the same side for swapping and streaking on plates.

Quite a few students were gloves outside of the lab.

Using googledoc to assign  sampling locations and students groups.

(BUILD) human gene network reliability and pathogenic association of genetic variations in human populations

Reliability of human gene networks and their pathogenic implications.

Reliability of gene network in different human tissues and cell types -> robustness, cancer incidence?

health disparity
aging associated genes

expression profiling, ngs to infer tissue specific gene networks.

human twin aging expression

gwa aging

age of puberty in japanese

CR effect on cell lines SNPS
flow cytometers
fluorecesnce microscope

yeast model?

Gompertz and Weibull fitting, yGINDIP vs ms02, repliminary

lapbio-hq:mathramp hqin$ pwd


Tuesday, March 18, 2014

bio125, central dogma review


I spent two hours on central dogma.

Using the arginine pathway to explain how mutations and selection on plates can be used to infer genetic pathways.

Use the central dogma on white board as the main theme. Ask students to form 3 groups to work on replication, transcription, and translation.  I asked student to find out "start", "players", "stop" signals in the three processes.

For Shine-Delgarno signal, I gave a hint to students, "It shines".

Animations on transcription, and translation were used.

At the end of the class, students were asked to paraphrase my concept map on the board and submit it in hand writings.

Reviewed midterm grades and gave back sheets back to students.

Things that I can improve:
 1)  It is not clear how anticodons are defined on the tRNA in both Brooker, Madigan and other online tools. I told the students that 5' and 3' will be told to them if anticodon were put into quizzes or exams.
 2)  Transcription difference between bacteria and eukarya include promoter, protein factors, intron and exon, maturation of mRNA.
3)  Translation difference between bacteria and eukarya include ribosomes, initiation signals.
4) Replication differences between bacteria and eukarya can emphasize telomerase

I did not have time to go over 'error correction' mechanism or 'quality control' steps.
DNA polymerase has proofreading 3'-5' exonuclease. DNA repair mechanism is also there to repair mistakes or damages to DNA.

For transcription in eukarya, RNA polymerase (polymerase II) also backtracks. 

For translation, ribosomal proofreading is found in E coli.  Folding chaperones work in the post-translation steps.

Note: Although students are supposed to know central dogma in bio120, some complained that I did not explain what 30S and 50S stand for?

bio233, gene expression, chapter 6, continued. March 18, 2014 Tue


Finished chapter 6. Add some in-slide quiz and clickered student responses.

Ecoli FASTA files were used to explain genome size, gene length, etc.  The default font 11 on the shell terminal window was too small, and I had to increase to 14.

Polymerase speed was used to calculate the time that it need for a single replication fork to finish one round of replication. It should 5M/(rate*2) for bi-direction replication.

Summarize central dogma on white board in the end.

micrarray online tools, an incomplete list

genepattern broad mit
This seems to be popular choice.

 galaxy: good at NGS.


Monday, March 17, 2014

ms02 network aging, preliminary

I run preliminary analysis on network aging, and found that ms02 network tend to have short maximum network lifespan, even though their average lifespan is more or less the same with the yeast GIN/PPI. This seem to suggests that ms02 network has a more homogenous aging process. It seems to make sense that power-law increase heterogeneity in the yeast network. 

Population size of 500 did not good p-value using wilcox.test().

Github contribution activity not shown

I had to make sure the email in GitHub-GUI is the same with my Github account.


Sunday, March 16, 2014

Align two vector, first try, not perfect results

See "align_curves.20140316.R" at github/project.H2O2.tolerance/Feb14,

#20140316 align growth curves by minimizing sum of errors 

debug = 9; 

pairwise_sum_of_errors = function( X1, X2, Start1,End1,Start2, End2 ) {
  if( (End1 - Start1) == (End2 - Start2) ) {
    return( sum( ( X1[Start1:End1]-X2[Start2:End2] )^2))
  } else {
    print("Error: X1 and X2 should have the same lengths")
    return( NA); 

x1 = 1:100;  x2 = 5:104
pairwise_sum_of_errors( x1, x2, 5, 100, 1,96)
pairwise_sum_of_errors( x1, x2, 5, 100, 1,100)

#align two vectors with equal lengths
#Let's assume start2 lags behind start1 
#start2 = start1 + delta # 2nd start lags behind 1st start by delta
pairwise_cost_of_shifted_vectors = function( delta, X1, X2,  low.threshold=0.01, debug=10 ){
  delta = floor( (delta+0.5) ) #delta must be an integer
  Start1 = 1;
  Start2 = Start1 + delta;  #left truncation at delta
  End2 = length(X2)
  End1 = length(X1) - delta; #right truncation at delta
  if(debug){ print(paste('Start1', Start1, "Start2", Start2, "End1", End1, "End2", End2))
  if( (End1 - Start1) == (End2 - Start2) ) {
    return( sum( ( X1[Start1:End1]-X2[Start2:End2] )^2))
  } else {
    print("Error: X1 and X2 should have the same lengths")
    return( NA); 

#testing 1
x1 = 5:104; x2 = 1:100;  
pairwise_sum_of_errors( x1, x2, 1,96, 5,100 )
pairwise_cost_of_shifted_vectors( 4, x1, x2)

delta = 4
res1 = optim(c(delta), fn=pairwise_cost_of_shifted_vectors, X1=x1, X2=x2, method="Brent", lower=c(1), upper=c(100))
#return 3 not 4? 

#testing 2
x1 = 50:149; x2 = 1:100;  
pairwise_sum_of_errors( x1, x2, 1,50, 50,99 )
pairwise_cost_of_shifted_vectors( 49, x1, x2)

delta = 50
res2 = optim(c(delta), fn=pairwise_cost_of_shifted_vectors, X1=x1, X2=x2, method="Brent", lower=c(1), upper=c(100))
#return 50 not 49?

R pairwise alignment

17 equations that changed the world

Ian Stewart published a book on 17 Equations That Changed The World.

Saturday, March 15, 2014

Calculating the lag phases in growth curves by minimizing sum of squared errors

I studied '1471-2180-11-140-s1.m', a matlab script that can estimate lag phases from growth curves. This program can 'align' growth curves using the matlab optimization procedures.  I used Octave 3.8.0 with GUI and found it useful.

My plan to is to use R optim() to minimize the sum of errors.  The differences of lag phases are the indices in the vectors.   So, somehow, I need to re-index the vectors.  Maybe I can use the a small threshold and put NA to very small values.  I will then have two vectors, X1 and X2.  The sum of errors can be defined as
  SumOfErrors = sum( (X1[1:lengthOfX2] - X2[delta:END2] )^2 )

This is basically the global alignment problems in sequence alignment.

1471-2180-11-140-s1.m  from the Xavier group.

install octave on a Mountain Lion laptop,

Try to install octave for "Byte" with os 10.9.2.

$ sudo port install octave @3.6.4_12

... ... ... 
--->  Building clang-3.3
Error: for port clang-3.3 returned: command execution failed
Error: Failed to install clang-3.3
Please see the log file for port clang-3.3 for details:
Error: The following dependencies were not installed: atlas clang-3.3 epstool fftw-3-single gawk glpk gnuplot aquaterm wxWidgets-3.0 wxWidgets_select gperf grep pcre gsed hdf5-18 less pstoedit ImageMagick djvulibre urw-fonts plotutils qhull qrupdate transfig netpbm
To report a bug, follow the instructions in the guide:
Error: Processing of port octave failed

Then try install binary "GNU_Octave_3.8.0-6.dmg" from sourceforge. This worked.

SOD1 ORF phylogeny

Different species are download from SGD fungal alignment.

Tried MEGA6. It now can generate ML tree, compute dN, dS.

Tuesday, March 11, 2014

RStudio workspace size problem, LDKEY is too small for this problem. FEXACT error 6.

This seems to be error to show that Exact test is unnecessary, based on

Monday, March 10, 2014

Change file associations for all files types, osX

Choose "Get Info", --> "Open with" --> "Change all". 

Synch paper2 libraries in two laptops

Papers3 is available for os 10.7.x with Euro29 ,  I decided to stay with Paper2.

I turned on the 'file sharing' on 'ace'.

On 'byte', I zipped the Library.paper2 folder. I then copy-pasted the ace/hongqin/Library/Application Support/Papers2/* to the byte/hqin/Library/Applicaiton Support/Papers2/

The 15.26G transfer is estimated to take an hour.  11:40am -->13:05.
After sync, I need to reactivate Papers2,  probably because the previous registration info has been erased because the entire  Papers/ folder is replaced.

After re-activate Papers2 on 'byte', all materials seem to have to restored.


Sunday, March 9, 2014

NGS and R

yeast MSH2 gene expression peaks in G1 phase (protein peak in S phase?)

MSH2 Expression and yeast cell cycle

The yeast MSH2 mRNA peak in G1 phase.  If the MSH2 protein have half-live longer than G1 phase, the protein MSH2 level can be high in both G1 and S phases.


The original paper, Cho 1998

cell cycle and gene expression:

At GEO, GSE3635, two cell cycle data of wildtype W303
The GEO Dataset Browser allow user to look at gene name and its expression profile.  I did not find a way to generate a heatmap?!

Saturday, March 8, 2014

human MSH2 using the UCSC genome browser

Go to UCSC genome browser:
Click "genomes"

Input "MSH2" to the "search term" and press "submit".

You will see 6 entries of human MSH2. You can tell which one is the longest form based on their coordinates. Click any one of them will take you to the genomic regions of hMSH2

The region of MSH2 in the human genome looks like: 

Scroll down, and we can see options for adjustment. 
For "Genes and Gene Predictions",  we can 'hide' UCSC genes and 'pack' RefSeq Genes'. Hit 'refresh' and we should see the updated genome view. 

For 'Phenotype and Literature', we can 'hide' Publications and show 'full' 'pack'ed ClinVar Variants. 

After hitting 'refresh', we can see many 'mutations' in the human MSH2 region. 

Go to 'Tools' on the menu toolbar, and select 'Table Browser' 

Make sure that we are still at the MSH2 genomic region, chr2:47630206-47710367. 
We can set 'group' as 'Phenotype and Literature', 'track' as "clinVar Variants', and 'table' as "ClinVar Main(clivarmain)',  The 'region' should select 'position' in chr2:47630206-47710367,
the input output file as "human_msh2_clinvariant".  Then click "get output".

The downloaded file is a text file, and we can open it with any text editor. 

At the stage, we can run Excel and generate a new spread sheet. We can then copy-paste the downloaded clinic variants of human MSH2 into the Excel sheet. 

The file now looks like

In the column 'clinSign', we can see the clinic relevance of the mutations.  Let pick a 'pathogenic' variant start at 47702268.  From column 'hgvsProt', we can see this is a mutation Pro622Leu (or P622L). 

To double-check the SNP rs2303426, we can look at NCBI db-SNP database:

Now, we can design a mutagenic primer that can introduce this human mutation P622L into the cognant site in yeast.   The mutagenic primer should also consider the preferred codons in yeast. 

We shall also be able to design PCR primers that can amplify a fragment that cover this mutation in yeast MSH2 and identify a Restriction Enzyme that can distinguish the wildtype yeast MSH2 from the mutant MSH2. 

Yeast codon usage can be found at:


Human MSH2 protein: NP_000242