Monday, July 22, 2013

R tutorial for Oak Ridge Science Education


Please note that the entire workshop material can be download from its GitHub repository ORAU-R.

1.What is R? 

Wikipedia entry on R

Why R by Courtney Brown at Emory. 

Why R and beyond.

R blogger that provides recent and often interesting development about R.  

What is R video (Added after the class).

2. Install R to your own computers.

Instructions to download R. 

Install R studio.  RStudio provides a nice GUI to R.

Install packages to R: Video for Windows Version.

3. Introduction to R.


Hong Qin's slides: Overview of R;   Basic programming in R; Input & Output in R;

Lydon Walker, getting started with R, an accelerated primer


 4. Simple exercises in R.

Qin's simple.R excises.  Please select 'Raw' and 'save-as' in text format. Unfortunately, web browser seem to automatically add 'txt' to the file name, and Rstudio does not run 'txt' format.  To solve this problem, we can 'create' a new R script, and copy-paste from code from the text file.

(The workshop audience mostly reached this step).

Youtube tutorial converting Excel data to CSV and load into R. 
    The sunflow seed Excel file is here.  

5. Make solution exercise.

Write an R function to calculate how much NaCl needed for X ml of Y mM NaCl solution. 

6. Simple statistical analysis in R.

Simple regression exercise.

7. Advanced training materials.

Multiple regression demo

Hierarchical clustering using cities. Code, Video.

Laddy Gaga and clustering analysis. Code. Video.

Bioconductor workshop materials.




http://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf



9. Discussions on programming workshops.

Titus Brown's blog on teaching programming.

10. My reflections

The main workshop is led by Jame Ferguson and Bob Panoff. There were 9 college biology and 2 high-school biology faculty in the workshop.

I was given 90 minutes. My goal was that audience would be able to install R and Rstudio and run R scripts on their own after the workshop.

I spent 30 minutes on introducing R to the audience, my experience of teaching R and computational genomics to undergraduates. I showed them the GEO database and R interface.  I used the examples of make-solution, hclust on cities and Lady Gaga.  For the next 50 minutes, I let the audience to download and install Rstudio.  A few of them needed my help to down and install R and Rstudio. Most of them were able to run to the simple.R exercise (step 4.a). I run out of time after step 6.

I was somewhat stunned that downloading R code directly from GitHub repository is surprisingly cumbersome.

During the exercise time, a few people were clearly ahead and poked around. Some were especially interested in the GEO2R portal.

Bob mentioned that R has been used in a few other liberal art colleges, including Davidson and Pomona.

At the end of the workshop, I was asked "why do 'we' have to teach R to biology students?". I used my own experiences and argued that R is the state-of-the-art tool for data analysis in biology.


For preparation for the workshop
Wireless connection, laptops, power-outlets are recommended. 
Install packages and data on flash-drives in case internet connection is slow. (This can be a problem when all participants are download at the same time.)
A flow-chart on easel can be used for clarity. 


No comments:

Post a Comment