Saturday, November 29, 2014

Friday, November 28, 2014

LotusNotes Apple iPhone/iPad

1. Direct the Safari browser (

your Apple iPhone/iPad

) to the Traveler User Home Page (

http://spelbes.spelman.edu/servlet/traveler

). When prompted, enter your Lotus Notes webmail username and password.

2. A User Status section at the top of the home page shows the status of the user and any of the user's devices. Make sure that there are no error messages, which would be highlighted in red, in this section. If errors exist, they probably need to be addressed before synchronization will be successful.

3. Select Configure your Apple iPhone/iPod Touch

4. Select Generate.

5. Select Install to begin the profile installation process.

6. When prompted about the authenticity of the profile, select Install Now to continue to install the profile.

7. When prompted, enter your Lotus Notes webmail password and select Next.

8. When the profile has been installed, select Done to return to the previous application (e.g., Safari). Your new Lotus Notes ActiveSync account will have been created under Mail, Contacts, and Calendars in the Settings Application. Registration with the server begins immediately and mail, calendar, and contacts should begin to show up soon.

Wednesday, November 26, 2014

Elements of Statistical Learning, video, pdf, Hastie, Tibshirani, 2014

Cloned from R-Blogger.

In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning (also known as “machine learning”), largely due to the high quality of both the textbook and the video lectures. And as an R user, it was extremely helpful that they included R code to demonstrate most of the techniques described in the book.

If you are new to machine learning (and even if you are not an R user), I highly recommend reading ISLR from cover-to-cover to gain both a theoretical and practical understanding of many important methods for regression and classification. It is available as a free PDF download from the authors’ website.

If you decide to attempt the exercises at the end of each chapter, there is a GitHub repository of solutions provided by students you can use to check your work.

As a supplement to the textbook, you may also want to watch the excellent course lecture videos (linked below), in which Dr. Hastie and Dr. Tibshirani discuss much of the material. In case you want to browse the lecture content, I’ve also linked to the PDF slides used in the videos.

Tuesday, November 25, 2014

Algorithms and tools for protein–protein interaction networks clustering, with a special focus on population-based stochastic methods Clara Pizzuti1,† and Simona E. Rombo2,*,†

Algorithms and tools for protein–protein interaction networks clustering, with a special focus on population-based stochastic methods

Clara Pizzuti¹,^† and

Simona E. Rombo, Bioinformatics 2014.

http://bioinformatics.oxfordjournals.org/content/30/10/1343.short

PR14 used 3 yeast PPI data to compare MCL with others. The MCL parameter was taken from Boheree2006. PR14 used protein complex as 'golden standard'. When overlapping score > 20%, MCL is the best algorithm. Bader's MCODE is also a good method for certain parameter settings.

useful references on teaching

Active learning increases student performance in science, engineering, and mathematics
Scott Freemana,1, Sarah L. Eddya, Miles McDonougha, Michelle K. Smithb, Nnadozie Okoroafora,Hannah Jordta,and Mary Pat Wenderoth, April 15, 2014
http://www.pnas.org/content/111/23/8410.full.pdf+html

Research-Based Learning Principles

http://www.josephjaywilliams.com/education

http://www.josephjaywilliams.com/education#TOC-Comparison:-Help-learners-grasp-or-construct-new-abstract-principles-by-comparison-of-specific-examples-of-the-generalization.

Monday, November 24, 2014

E coli flow cytometer, PI staining

Hawley & Hawley
PI stain

Wang, Li, Deng, Pan, BMC review on clustering methods for protein interaction networks.

Recent advances in clustering methods for protein interaction networks

Jianxin Wang1,2*, Min Li1*, Youping Deng3, Yi Pan2

From The ISIBM International Joint Conference on Bioinformatics, Systems Biology and Intelligent

Computing (IJCBS), Shanghai, China. 3-8 August 2009

http://www.biomedcentral.com/content/pdf/1471-2164-11-S3-S10.pdf

cited by
http://scholar.google.com/scholar?cites=16432683922097612422&as_sdt=5,43&sciodt=0,43&hl=en

Reviewed 20 clustering methods, including MCL. MCL is commented as the highly successful.

10. Brohée S, van Helden J: Evaluation of clustering algorithms for proteinprotein

interaction networks. BMC Bioinformatics 2006, 7:48.

63. Vlasblom J, Wodak SJ: Markov clustering versus affinity propagation for

the partitioning of protein interaction graphs. BMC Bioinformatics 2009,10:99.

References on teaching math biology /systems biology

Learning Biology by Recreating and Extending Mathematical Models

Hillel J. Chiel,1,2,3 Jeffrey P. Gill,1 Jeffrey M. McManus,1 Kendrick M. Shaw1

vision and change,

Lin C, Cho Y-R, Hwang W-C, Pei P, and Zhang A. 2007. Clustering Methods in a Protein–Protein Interaction Network. In: Hu X, and Pan Y, eds. Knowledge Discovery in Bioinformatics: John Wiley & Sons, Inc., 319-355.

CLUSTERING METHODS IN PROTEIN-PROTEIN INTERACTION NETWORK
Chuan Lin, Young-rae Cho, Woo-chang Hwang, Pengjun Pei, Aidong Zhang
Department of Computer Science and Engineering
State University of New York at Buffalo

Cite as:
Lin C, Cho Y-R, Hwang W-C, Pei P, and Zhang A. 2007. Clustering Methods in a Protein–Protein Interaction Network. In: Hu X, and Pan Y, eds. Knowledge Discovery in Bioinformatics: John Wiley & Sons, Inc., 319-355.

This review article did not provide enough details on validation and comparison of different algorithms.

Sunday, November 23, 2014

minimal version, R package

For my todo list, write a personalized R package

http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/

Friday, November 21, 2014

MCL algorithm comparison

Brohee and van Helden 2006, BMC [BH06]

BH06 compared Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Super Paramagnetic Clustering (SPC), and Molecular Complex Detection (MCODE), using annotated protein complex as bench mark. Random noises were introduced by randomly adding and deleting edges. MCL is the most reliable and robust method.

Thursday, November 20, 2014

Atlanta-qbio google group (AQBIO)

https://groups.google.com/forum/#!forum/atlanta-qbio

http://groups.google.com/d/groupsettings/atlanta-qbio/information

atlanta-qbio@googlegroups.com

http://groups.google.com/d/managemembers/atlanta-qbio/invite

conference call on comp and systems biology

Lessons:
I cannot see all the peoples. Sometimes it is hard to know who is talking.
When people not talking, mic should be muted to decrease background noises.

Wednesday, November 19, 2014

bio386, proofread of CNV manuscript

The student proofread CNV manuscript.

Final take home exam was given.

Tuesday, November 18, 2014

bio233, epidemiology,

I used an R-based simulation in class. The pace was fast and about 2/3 students are not following. However, a few students clearly paid attention.

Problems:
1) Plot did not show in Rstudio due to screen resolution. I used PDF to circumvent the problem.
2) I did not give student enough time to run the R code themselves, partially because I only require this for bonus points.

Monday, November 17, 2014

bio233, lab, analysis of flow cytomter data, Australian rabbit virus

Went over oral presentation order, final project report
Exam schedule

30 minutes, analysis of flow cytomter data,
http://youtu.be/BN5Ldu1AFgk

30 minutes, Australian rabbit virus
http://youtu.be/FqSDxGYu3K0

30 minutes, streak single colonies. I used my streak plate as an example of "A".

Thursday, November 13, 2014

bio233, phylogeny

Let bio233 worked through power point slides.

I stumbled on the comparison between the canonical endosymbiosis and hydrogen hypothesis for mitochondria, after a student asked the question.

Wednesday, November 12, 2014

bio386, R coursera

I led students worked on R programming offered by Coursera.

Tuesday, November 11, 2014

bio233, virus, problem sets

I let students worked through 2 set of MCAT-styple problems. It was the first time that we finished two reading paragraphs in a 75-minute class.

Lotus Notes, initiate a new proposal for internal routing

http://spelmanosp.wordpress.com/2014/09/22/obtaining-approval-to-submit-a-grant-proposal/

Monday, November 10, 2014

bio233, streaking for single colonies

Six-streak procedure
1: short
2: twice
3: three times
4: 4 times.
5: many times
6: many times to evenly spread out the cells.

Usually the 4th streak should be thinned out.

Many students went back to the cell cultures for every streak, and this even happened after 2-3 trials.

bio233, flow cytometer lab on DHE-labelled yeast cells.

The class spent 2 hours on DHE staining. I could save the time by giving assignment on the protocol itself.

When I let students to re-streak their plates, some groups stopped working on their DHE staining procedure.

At 4pm, I started the Cellquest but no signal can be read from the Calibur. This is the 3rd time that this machine malfunctioned. Really bad timing.

Thursday, November 6, 2014

Flow cytometry, flow cytometer teaching resource

YouTube Learning material on flow cytometry
Hand-drawing introduction.
Animated introduction
UW's tutorial on flow cytometry
Hong Qin's tutorial on BD FACS Calibur usage. Useful for understanding experimental procedure.

Reading materials on flow cytometr
Flow cytmetry from Wikipedia

DHE, superoxide indicator
http://www.lifetechnologies.com/order/catalog/product/D1168

NIH, Flow Cytometry Interest Group

http://sigs.nih.gov/FCIG/Pages/default.aspx

Wednesday, November 5, 2014

BIO125, spring 2015 strain and data request,

AGY 75, yeast strain with pSH44 reporter plasmid
AGY125, yeast strain with the wild type pMSH2 and pSH44 (This is the wildtype MSH2 control)
AGY124, yeast strain with pRS413 and pSH44 (This is the plasmid control)

Ecoli strain with plasmid
AG372 pmsh2-H658R
AG421 pmsh2-A618V

Read Gammie's recent papers.

Small NGS data of wildtype MSH2 and mutant msh2 for students to analyze using Galaxy

Tuesday, November 4, 2014

bio233, guest lecture, circulating tumor DNA

bio233 guest lecture

CAPP-seq

CT, biopsy are common method for tumor diagnosis.

How did ctDNA comes from tumor?

There arre much cell-free DNA in human circulation, typically 5ng/ml of plasma in healthy adults, primarily from hemopoetic cells. Cell-free DNA often have half-live are 0.5 ~ 2 hours.

Hybrid selection (NimbeGen), target enrichment.

10,000X sequencing is required?

Monday, November 3, 2014

bio233 phylogeny and lab, practical exam on streaking single colonies

I spent 1 hour on introduction of phylogeny using my own slides.

Many students did not bring laptops.

For the lab, MEGA6 on Mac runs very slow.

For practical exam, some students did not see the previous streaking example clearly.

Sunday, November 2, 2014

SVM, reading notes

See http://hongqinlab.blogspot.com/2014/11/elements-of-statistical-learning-video.html

SVM kernel trick

trial and error to separate data in high dimenstional space

cross validation

predict True Negative?

Mathews correlation coefficient (MCC) (for binary classification)

In general the equation for a hyperplane has the form

SVM maximize soft margin.

Data should be standardized for SVM analysis, because SVM treats every columns the same.

On researchgate, someone argues: Perform different normalization such as Z-Score or Min-Max before using PCA. Z-Score normalization before using PCA might be beneficial.

For principal component (PCA) and svm,
http://www.softcomputing.net/isda2010_2.pdf
On researchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.

Support vector classifer in the enlarged spaced solves separation problem in the lower-dimensional space.

Question: Kernel is used to computer inner products of vectors. Why are there different types of kernels for computing the same thing (inner products)?

SVM for more than 2 classes:

MATLAB ODE solver

ode15s

fmincon

http://laser.cheng.cam.ac.uk/wiki/images/e/e5/NumMeth_Handout_7.pdf

PDE

Numerical difference, approximate PDE

polytopes and phylogeny

polytopes is the convex hull

http://en.wikipedia.org/wiki/Polytope

tree -> matrix as markov process OR polytope

Hamiltonian system

http://en.wikipedia.org/wiki/Hamiltonian_system

Saturday, November 1, 2014

funding 2014

PD 14-7513 , due Feb 4, 2014
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504976&org=EHR&from=home#.Uo-EwOFRdH8.facebook

NIH big data
http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-14-009.html

http://bd2k.nih.gov/funding_opportunities.html#sthash.hFOQ3jpE.dpbs

NIH diversity RFP
http://grants.nih.gov/grants/guide/pa-files/PAR-12-016.html expires in Jan 2015

Tajima's D, D_non, D_syn

Tajima's D = D_non + D_syn
additive

seems to be used by Autin Hughs

Cholera

http://cph.osu.edu/people/jtien

Cholera SIWR model

Seasonal variations will be used to further improve the model, modeled directly into infection force.

19th centry sample
http://muttermuseum.org/

Haiti, no recorded history of cholera infection. no immune responses.

Cholera spatial spread, waterways, human movement, cell phone movement (Digicel, Flowminder)

Moran's I to compare cell phone movement and cholera spread. Local movement versus waterways.

Community networks with environmental pathogen movement
patch heterogeneity
weight directed edges in networks

When can disease invade the network? R0 of the network,

Coupled locations

Next generation matrix (second generation matrix, Diekmann, Heesterbeek, Metz 1990, van den Driessche and Watmough 2002.

Transfer matrix V = Transfer out - Transfers in + Decay
Laplacian matrix(from graph theory) can be used model transfer out and in.

i.e. V = L + D

D = diag{\delta_i}

Time scale have to be right

V^-1 as a perturbation problem

Langenhop 1971, Laurent series for perturbed singular matrices

According to Tien, lifespan of cholera were fitted with expoential model in the lab. Later, Tien explained that fresh cholera have high infectious rate, so fitness of cholera bacteria has a characteristic of aging.

Saturday, November 29, 2014

Friday, November 28, 2014

Wednesday, November 26, 2014

Chapter 1: Introduction (slides, playlist)

Chapter 2: Statistical Learning (slides, playlist)

Chapter 3: Linear Regression (slides, playlist)

Chapter 4: Classification (slides, playlist)

Chapter 5: Resampling Methods (slides, playlist)

Chapter 6: Linear Model Selection and Regularization (slides, playlist)

Chapter 7: Moving Beyond Linearity (slides, playlist)

Chapter 8: Tree-Based Methods (slides, playlist)

Chapter 9: Support Vector Machines (slides, playlist)

Chapter 10: Unsupervised Learning (slides, playlist)

Interviews (playlist)