Friday, November 28, 2014

LotusNotes Apple iPhone/iPad

2. A User Status section at the top of the home page shows the status of the user and any of the user's devices. Make sure that there are no error messages, which would be highlighted in red, in this section. If errors exist, they probably need to be addressed before synchronization will be successful.

3. Select Configure your Apple iPhone/iPod Touch

4. Select Generate.

5. Select Install to begin the profile installation process.

6. When prompted about the authenticity of the profile, select Install Now to continue to install the profile.

7. When prompted, enter your Lotus Notes webmail password and select Next.

8. When the profile has been installed, select Done to return to the previous application (e.g., Safari). Your new Lotus Notes ActiveSync account will have been created under Mail, Contacts, and Calendars in the Settings Application. Registration with the server begins immediately and mail, calendar, and contacts should begin to show up soon.

Wednesday, November 26, 2014

Elements of Statistical Learning, video, pdf, Hastie, Tibshirani, 2014

Cloned from R-Blogger. 

In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning (also known as “machine learning”), largely due to the high quality of both the textbook and the video lectures. And as an R user, it was extremely helpful that they included R code to demonstrate most of the techniques described in the book.
If you are new to machine learning (and even if you are not an R user), I highly recommend reading ISLR from cover-to-cover to gain both a theoretical and practical understanding of many important methods for regression and classification. It is available as a free PDF download from the authors’ website.
If you decide to attempt the exercises at the end of each chapter, there is a GitHub repository of solutions provided by students you can use to check your work.
As a supplement to the textbook, you may also want to watch the excellent course lecture videos (linked below), in which Dr. Hastie and Dr. Tibshirani discuss much of the material. In case you want to browse the lecture content, I’ve also linked to the PDF slides used in the videos.

Chapter 1: Introduction (slidesplaylist)

Chapter 2: Statistical Learning (slidesplaylist)

Chapter 3: Linear Regression (slidesplaylist)

Chapter 4: Classification (slidesplaylist)

Chapter 5: Resampling Methods (slidesplaylist)

Chapter 6: Linear Model Selection and Regularization (slidesplaylist)

Chapter 7: Moving Beyond Linearity (slidesplaylist)

Chapter 8: Tree-Based Methods (slidesplaylist)

Chapter 9: Support Vector Machines (slidesplaylist)

Chapter 10: Unsupervised Learning (slidesplaylist)

Interviews (playlist)

ISL Cover 2

Tuesday, November 25, 2014

Algorithms and tools for protein–protein interaction networks clustering, with a special focus on population-based stochastic methods Clara Pizzuti1,† and Simona E. Rombo2,*,†

Algorithms and tools for protein–protein interaction networks clustering, with a special focus on population-based stochastic methods

  • Simona E. Rombo, Bioinformatics 2014.


    PR14 used 3 yeast PPI data to compare MCL with others. The MCL parameter was taken from Boheree2006. PR14 used protein complex as 'golden standard'. When overlapping score > 20%, MCL is the best algorithm. Bader's MCODE is also a good method for certain parameter settings.

    useful references on teaching

    Active learning increases student performance in science, engineering, and mathematics
    Scott Freemana,1, Sarah L. Eddya, Miles McDonougha, Michelle K. Smithb, Nnadozie Okoroafora,Hannah Jordta,and Mary Pat Wenderoth, April 15, 2014

    Research-Based Learning Principles

    Monday, November 24, 2014

    E coli flow cytometer, PI staining

    Hawley & Hawley
    PI stain

    Wang, Li, Deng, Pan, BMC review on clustering methods for protein interaction networks.

    Recent advances in clustering methods for protein interaction networks
    Jianxin Wang1,2*, Min Li1*, Youping Deng3, Yi Pan2
    From The ISIBM International Joint Conference on Bioinformatics, Systems Biology and Intelligent
    Computing (IJCBS), Shanghai, China. 3-8 August 2009

    cited by,43&sciodt=0,43&hl=en

    Reviewed 20 clustering methods, including MCL. MCL is commented as the highly successful.

    10. Brohée S, van Helden J: Evaluation of clustering algorithms for proteinprotein

    interaction networks. BMC Bioinformatics 2006, 7:48.

    63. Vlasblom J, Wodak SJ: Markov clustering versus affinity propagation for
    the partitioning of protein interaction graphs. BMC Bioinformatics 2009,10:99.

    References on teaching math biology /systems biology

     Learning Biology by Recreating and Extending Mathematical Models
    Hillel J. Chiel,1,2,3 Jeffrey P. Gill,1 Jeffrey M. McManus,1 Kendrick M. Shaw

    vision and change, 

    Lin C, Cho Y-R, Hwang W-C, Pei P, and Zhang A. 2007. Clustering Methods in a Protein–Protein Interaction Network. In: Hu X, and Pan Y, eds. Knowledge Discovery in Bioinformatics: John Wiley & Sons, Inc., 319-355.

    Chuan Lin, Young-rae Cho, Woo-chang Hwang, Pengjun Pei, Aidong Zhang
    Department of Computer Science and Engineering
    State University of New York at Buffalo

    Cite as:
    Lin C, Cho Y-R, Hwang W-C, Pei P, and Zhang A. 2007. Clustering Methods in a Protein–Protein Interaction Network. In: Hu X, and Pan Y, eds. Knowledge Discovery in Bioinformatics: John Wiley & Sons, Inc., 319-355.

    This review article did not provide enough details on validation and comparison of different algorithms.

    Sunday, November 23, 2014

    minimal version, R package

    For my todo list, write a personalized R package

    Friday, November 21, 2014

    MCL algorithm comparison

    Brohee and van Helden 2006, BMC [BH06]
    BH06 compared Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Super Paramagnetic Clustering (SPC), and Molecular Complex Detection (MCODE), using annotated protein complex as bench mark. Random noises were introduced by randomly adding and deleting edges. MCL is the most reliable and robust method. 

    Thursday, November 20, 2014

    Atlanta-qbio google group (AQBIO)!forum/atlanta-qbio

    conference call on comp and systems biology

    I cannot see all the peoples. Sometimes it is hard to know who is talking.
    When people not talking, mic should be muted to decrease background noises.

    Wednesday, November 19, 2014

    Tuesday, November 18, 2014

    bio233, epidemiology,

    I used an R-based simulation in class. The pace was fast and about 2/3 students are not following. However, a few students clearly paid attention.

    1) Plot did not show in Rstudio due to screen resolution. I used PDF to circumvent the problem.
    2) I did not give student enough time to run the R code themselves, partially because I only require this for bonus points.

    Monday, November 17, 2014

    bio233, lab, analysis of flow cytomter data, Australian rabbit virus

    Went over oral presentation order, final project report
    Exam schedule

    30 minutes, analysis of flow cytomter data,

    30 minutes, Australian rabbit virus

    30 minutes, streak single colonies. I used my streak plate as an example of "A".

    Thursday, November 13, 2014

    bio233, phylogeny

    Let bio233 worked through power point slides.

    I stumbled on the comparison between the canonical endosymbiosis and hydrogen hypothesis for mitochondria, after a student asked the question.

    Wednesday, November 12, 2014

    Tuesday, November 11, 2014

    bio233, virus, problem sets

    I let students worked through 2 set of MCAT-styple problems. It was the first time that we finished two reading paragraphs in a 75-minute class.

    Lotus Notes, initiate a new proposal for internal routing

    Monday, November 10, 2014

    bio233, streaking for single colonies

    Six-streak procedure
    1: short
    2: twice
    3: three times
    4: 4 times.
    5: many times
    6: many times to evenly spread out the cells.

    Usually the 4th streak should be thinned out.

    Many students went back to the cell cultures for every streak, and this even happened after 2-3 trials.

    bio233, flow cytometer lab on DHE-labelled yeast cells.

     The class spent 2 hours on DHE staining. I could save the time by giving assignment on the protocol itself.

    When I let students to re-streak their plates, some groups stopped working on their DHE staining procedure. 

    At 4pm, I started the Cellquest but no signal can be read from the Calibur. This is the 3rd time that this machine malfunctioned. Really bad timing. 

    Wednesday, November 5, 2014

    BIO125, spring 2015 strain and data request,

    AGY 75, yeast strain with pSH44 reporter plasmid
    AGY125, yeast strain with the wild type pMSH2 and pSH44 (This is the wildtype MSH2 control)
    AGY124, yeast strain with pRS413 and pSH44 (This is the plasmid control)

    Ecoli strain with plasmid
    AG372   pmsh2-H658R
    AG421  pmsh2-A618V

    Read Gammie's recent papers.

    Small NGS data of wildtype MSH2 and mutant msh2 for students to analyze using Galaxy

    Tuesday, November 4, 2014

    bio233, guest lecture, circulating tumor DNA

    bio233 guest lecture


    CT, biopsy are common method for tumor diagnosis.

    How did ctDNA comes from tumor?

    There arre much cell-free DNA in human circulation, typically 5ng/ml of plasma in healthy adults, primarily from hemopoetic cells.  Cell-free DNA often have half-live are 0.5 ~ 2 hours.

    Hybrid selection (NimbeGen), target enrichment.

    10,000X sequencing is required?

    Monday, November 3, 2014

    bio233 phylogeny and lab, practical exam on streaking single colonies

    I spent 1 hour on introduction of phylogeny using my own slides.

    Many students did not bring laptops.

    For the lab, MEGA6 on Mac runs very slow.

    For practical exam, some students did not see the previous streaking example clearly.

    Sunday, November 2, 2014

    SVM, reading notes


    SVM kernel trick

    trial and error to separate data in high dimenstional space

    cross validation

    predict True Negative?

    Mathews correlation coefficient (MCC)  (for binary classification)

    In general the equation for a hyperplane has the form

    SVM maximize soft margin.

    Data should be standardized for SVM analysis, because SVM treats every columns the same. 

    On researchgate, someone argues: Perform different normalization such as Z-Score or Min-Max before using PCA. Z-Score normalization before using PCA might be beneficial.

    For principal component (PCA) and svm,
    On researchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.

    Support vector classifer in the enlarged spaced solves separation problem in the lower-dimensional space.

    Question: Kernel is used to computer inner products of vectors. Why are there different types of kernels for computing the same thing (inner products)? 

    SVM for more than 2 classes:

    MATLAB ODE solver




    Numerical difference, approximate PDE

    polytopes and phylogeny

    polytopes is the convex hull

    tree -> matrix as markov process  OR polytope

    Hamiltonian system

    Saturday, November 1, 2014

    funding 2014

    PD 14-7513 , due Feb 4, 2014

    NIH big data

    NIH diversity RFP  expires in Jan 2015

    Tajima's D, D_non, D_syn

    Tajima's D = D_non + D_syn

    seems to be used by Autin Hughs


    Cholera SIWR model

    Seasonal variations will be used to further improve the model, modeled directly into infection force.

    19th centry sample

    Haiti, no recorded history of cholera infection. no immune responses.

    Cholera spatial spread, waterways, human movement, cell phone movement (Digicel, Flowminder)

    Moran's I to compare cell phone movement and cholera spread. Local movement versus waterways.

    Community networks with environmental pathogen movement
     patch heterogeneity
     weight directed edges in networks

    When can disease invade the network? R0 of the network,

    Coupled locations

    Next generation matrix (second generation matrix, Diekmann, Heesterbeek, Metz 1990, van den Driessche and Watmough 2002.

    Transfer matrix V = Transfer out  - Transfers in + Decay
      Laplacian matrix(from graph theory) can be used model transfer out and in.

       i.e. V = L + D

    D = diag{\delta_i}

    Time scale have to be right

    V^-1 as a perturbation problem

     Langenhop 1971, Laurent series for perturbed singular matrices

    According to Tien, lifespan of cholera were fitted with expoential model in the lab. Later, Tien explained that fresh cholera have high infectious rate, so fitness of cholera bacteria has a characteristic of aging.