Saturday, January 31, 2015

REU comparisons

Wustl, $1200 for housing or $500 for travel, on-campus housing $1300 for 11 weeks

http://reu.cse.wustl.edu/reu/FAQ.html

Friday, January 30, 2015

todo, SVM project. find out the support vectors. verify the predictions in ken rls database.

SVM project. find out the support vectors. verify the predictions in ken rls database.

This project could take a long time

sequence exercies in R, occurence of DNA words

R code exercise on occurrence of DNA words.

Learning outcomes:
Longer words should have less occurrence in DNA
Restriction enzymes with longer sites should occur less frequently in DNA.

Reference
http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter1.html
http://www.bioconductor.org/packages/release/bioc/html/REDseq.html

# Exercise to study how occurence of DNA words are influenced by their length.
# What are the occurence of 1-letter, 2-letter, 3-letter, ... 8-letter DNA words?
# Learning outcome: longer words should have less occurrence in DNA
# by Hong Qin, Jan 30, 2015, for Bio125 @ Spelman College

library("seqinr");

# read in some bacterial 16s rDNA sequences
seqs = read.fasta( "http://www.bioinformatics.org/ctls/download/data/16srDNA.fasta",seqtype="DNA");

# look at the first sequence
seq1 = seqs[[1]]
count(seq1, 1) #nucleotide composition
mean( count(seq1, 1) )

count(seq1, 2) # occurence of two-letter DNA words
mean( count(seq1, 2) )

count(seq1, 3) # occurence of 3-letter DNA words
mean( count(seq1, 3) )
results = count(seq1, 3)
results['agc']

# ?? # occurence 4-letter words?
# ? # occurence of 5-letter DNA words
# ? # occurence of 6-letter DNA words

count(seq1, 8) # occurence of 8-letter DNA words
mean( count(seq1, 8) )
median( count(seq1, 8) )
max( count(seq1, 8) )
hist(count(seq1, 8), br=30)

results = count(seq1, 8)
results['agccgacc']

*** Instructional Technolog request on LotusNote

To request add student TA into Moodle course.

The following link only worked on Lotus Notes 9 at Windows. (Did not work on Lotus Notes 8 in my apple laptop)

To Submit an Instructional Technology Request:

1. From the Lotus Notes Dashboard select MIT Requests in Category section
2. Select MIT Requests in the Applications section in the right column
3. Click on the Submit Requests folder
4. Click on Service Request
5. Complete the request
6. Click Submit

You can also email your support request to the Service Desk at help@spelman.edu.

For Moodle course request:

Locate the MIT Request category
Select Instructional Technology Request form
Click Open Selected App button
Open Submit Request folder
Click Moodle Course link
Complete form as instructed

For Moodle User access

Thursday, January 29, 2015

Bizhub c754 to osX 10.9.1 laptop

http://onyxweb.mykonicaminolta.com/OneStopProductSupport/SearchResults?products=1603&fileTypes=0&OSs=39

Gram stain lab

Preparation:

To prepare fresh bacteria, I can only focus on gram positive ones, because age of the gram-negative bacteria would not influence the gram stain outcomes.

Note that Bacillus subtilis can take 2 days of 30C incubation to form colonies.

Materials:
Crystal violet, 95% EtOH, Gram Iodine, Safranin.

Problems. Some 95EtOH were contaminated by Iodine, and rubber were oxidized and cracked.

Georgia Academy of Science

James A Nienow, Treasure.

todo: PCA analysis to virus data

Data driven investigator, Moore

http://ged.msu.edu/downloads/2014-moore-ddd-preapp.pdf

http://www.moore.org/programs/science/data-driven-discovery/ddd-investigators

envent brite

"release tickets" on waiting list:

bio233, epidemiology,

print london maps
student scatter play dough on the map

Clicker presenter card usage

Ref
www.turningtechnologies.com/pdf/UserGuides/PresenterCard_1.1.pdf

2015: Skipped clicker, adopt socrative and Moodle online tests.

The application of statistical physics to evolutionary biology, Sella and Hirsh, 2005 PNAS

The application of statistical physics to evolutionary biology, Sella and Hirsh, 2005 PNAS

This is a quite influential paper in many aspects.

It seems to provides a theoretic explanation for the multiplicative and additive fitness measures.

Some of the words that I found interesting are:
Energy is an additive quantity in physical systems, so it is not surprising that its counterpart in the evolutionary process should be the additive fintess.

Maximization of free fitness is precisely analogous to the second law of thermodynamics.

Organismal complexity measure, Tenaillon et al 2008

Quantifying Organismal Complexity using a Population Genetic Approach
PloS ONE 2007
Olivier Tenaillon1,2.*, Olin K. Silander2,3., Jean-Philippe Uzan4, Lin Chao2

College Learning Assessment

http://en.wikipedia.org/wiki/Collegiate_Learning_Assessment#Criticisms

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.134.383&rep=rep1&type=pdf&utm_source=buffer&utm_campaign=Buffer&utm_content=buffer5a240&utm_medium=twitter

bio125, Thu, 20150129, mini prep, plasmid map, ApE

Section 1

8-8:20am. review mutation assignment
DNA mutation assignment, Wikipedia is wrong.

review mini prep protocol ApE

absorption spectrum of DNA and protein

Endo wash is to improve transformation efficienty (remove endotoxin, endonuclease. Though this is probably not a problem for yeas, Kioko found it improve 260/280 ratio).

Student asked why the elution buffer is used as blank control to measure DNA concentration.

9am. mini prep lab started.
by 10am. Students finished first round of centrifuge (binding DNA to zappy columns)

11am. Some students came back from nano drop measurement. DNA yields and 260/270 ratio were good.

Problems:
Students not sure whether they should change new tips every time!
One student hold micropitte with tip upside down
Most students do not know to how resuspend properly. In large falcon tubes, Kioko prefer to use pipette up-and-down.
Many students are clear when lysis will be done.
Many students did not racks when they pick tubes from centrifuge.
Students did not realize there were two micro-centrifuge in room 351.

Tips: Kioko gave TE, lysis, neutralization buffer, wash buffer, elution buffer separately but in order in order to minimize mixups of tube. (This seems to slowed things down, but avoided chaos)
Kioko asked students to one additional spin to dry the columns.

Section 2:
15 min review preclass lab assignment
30min, student went through protocol
10 minutes, show video from morning section

by 1:55 pm, miniprep started.

by 3:09 pm, most groups finished eluting plasmid DNA. Students went to 2nd floor core facility for nanodrop DNA concentration measurement.

by 4pm. All groups finished nanodrop measurement. So, nanodrop took 1 hour.

Problems:
Similar problems with section 1.
A group arranged 15ml tubes in unbalanced positions.

Concerns:
Kioko thinks elution buffer should be increased to 50 ul, given that we used 5ml Ecoli.

Tuesday, January 27, 2015

bio125, Jan 27, Tue, central dogma, dna repair, msh2 overview

Section 1:

Go over assignment and Student presentations

=>Repair problem #4 not right. #5 not right?

8:38am replication 3 in class, drawing

bottom 2, helicase or topoisomase?

9:15am, mutation, HSV paper

http://www.ncbi.nlm.nih.gov/pubmed/20026654 ATM S1981 reference

9:15-10am, student demo

ApE on gene X preclass assignment

10-10:10am Rstudio usage on standard curves, led by a student

math4

10:15, msh2 video presentation of spring 2014.

Section 2:

Go over assignment and Student presentations

=>mutation, #4 #5

by 1:50 pm, finished NSV1 problem set in class

by 2pm, finished replication picture quiz, (This seems really helpful).

2:15pm, standard curve

2:43pm, ApE

by 3pm, reviewed math 4 assignment

3pm, show msh2 student presentaiton video:

Ask students to identify gene, cancer, subtype, mechanism, model organism,

===============

pre-class: central dogma, DNA repair , MSH2 overview (abstract reading),,

In-class:
(1) Replication (in class, drawings-mutation),
(2) mutation, HSV paper

(2) Math problem 5, standard curve

(3) MCAT

Running R code on linear regression, generating plot and save figures

ApE usage

replication assignment 3

mutation&repair assignment 1

MCAT problem set on DNA replication

Review lab report

Math problem review

MSH2 project overview, Socrative quiz using old problems

http://highered.mheducation.com/sites/007353224x/student_view0/chapter11/index.html

Sunday, January 25, 2015

toread, KEGG GO analysis

http://www.ncbi.nlm.nih.gov/pubmed/25207935

NCBI GEO resources

http://bcb.io/2010/01/02/automated-retrieval-of-expression-data-with-python-and-r/

http://www.ncbi.nlm.nih.gov/geo/info/geo_paccess.html

Saturday, January 24, 2015

toread, TOR and essential amino acid

http://www.ncbi.nlm.nih.gov/pubmed/24861087

Thursday, January 22, 2015

bio125, Thu, bradford protein concentration determination

Section 1:
8am. Announcement:
office hours, wed 1-5pm.
Books on SpelElearn common site
Student demo, recording
How to use pipette
Go over serial dilution protocol.

Studens were asked to figure out way to perform experiement on their own, only asking for help when they are 'challenged'.

by 8:30, student finished protocol presentation, demonstrated pipette usage.
I then went over R code on standard curve preparation and analysis. I explained that sample codes will always provide to students in bio125.
R and Rstudio demo for data analysis

I explained how to save pictures in Rstudio

Serial dilution
Bradford protein concentration determination

9:10am. Lab instructor started the lab. BSA stock in eppen tube. Unknown samples are labeled by numbers.

Some student continue to work at 10:45am.

Problems: Students are not clearly about 5X and 1X. This convention was not explained in the protocol. (Wang said 5X is in Math4)
Students were not sure how to label cuvettes.
Wrong wavelength was used by one student.
Wrong orientation of cuvett in spec: A group measue OD by putting cuvett sideways.
Some BSA stock does not have glycerol. Kioko thought protein may fall out of water and lead to low concentration.

Many students have trouble to figure the concentration of the original solution Unknown. I used the following figure to explain them.

Kioko: Bradford staining is irreversible. In other words, stained over-concentrated proteins in the cuvette cannot be diluted. So, if the measurement shoot over the standard curve, students has to dilute from the concentrate protein stock again to make it landed in the range.

Kioko: Add Bradford stock as a the last solution to the tube, ensuring staining reaction occur to the same extent.

Kioko also said the students did the serial dilution experiment in bio120.

skipped: Linear fitting, R2 and p-value

Review homework and assignment. No time, leave for next class.

Section 2:
1/3 students did not read protocol or finish quiz for the lab.

Spelman Factbook

http://www.spelman.edu/docs/FactBook/facts-figsbook_103114.pdf?sfvrsn=2

Tuesday, January 20, 2015

toread, single cell genome and transcriptome sequencing

http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3129.html

Target of rapamycin signalling mediates the lifespan-extending effects of dietary restriction by essential amino acid alteration.

Emran S1, Yang M, He X, Zandveld J, Piper MD.

Author information

Abstract

Dietary restriction (DR), defined as a moderate reduction in food intake short of malnutrition, has been shown to extend healthy lifespan in a diverse range of organisms, from yeast to primates. Reduced signalling through the insulin/IGF-like (IIS) and Target of Rapamycin (TOR) signalling pathways also extend lifespan. InDrosophila melanogaster the lifespan benefits of DR can be reproduced by modulating only the essential amino acids in yeast based food. Here, we show that pharmacological downregulation of TOR signalling, but not reduced IIS, modulates the lifespan response to DR by amino acid alteration. Of the physiological responses flies exhibit upon DR, only increased body fat and decreased heat stress resistance phenotypes correlated with longevity via reduced TOR signalling. These data indicate that lowered dietary amino acids promote longevity via TOR, not by enhanced resistance to molecular damage, but through modified physiological conditions that favour fat accumulation.

Ubuntu LTS 14.04 installation and basic usage

Install Ubuntu LTS 14.04 to an old Dell laptop with LTS 12.

Ubuntu LTS14.04 was written on a DVD.

The installation asks for internet connection, but gave option to upgrade LTS v12. I was asked to enter username and password. Presumable the old directory will be over-written? (No, this was not the case. Old directory was kept there. )

Near the end of the installation, a warning saying that some application need to reinstalled. Then it asked for restart.

Somehow, the language was set to Chinese. I figure out that "shift" can switch between language input modes.

Georgia Tech Campus map

Guest parking are labeled as Area 2, 3, and 4.

bio125, Tue, Jan 20, 2015, DNA structure, replication

_Camcorder to record student performance.

Section 1:
8-8:20am

_Go over student assignments

8:20-9am, do exericse in class on nucleotide structure, chromosome, and replication.

One student asked the difference between nucleosome and chromosome. I used a cable and some balls made from playdough to illustrate the nucleosome made of histone complex wrapped by DNA strands.

section 1, 9-9:30
_NCBI nucleotide database, video capture

mRNA (Kozak sequence, translation initiation)

section 1: 9:30-10:00

_ApE

_ApE, CDS, reverse complementation

section 1, 10:am
_R and Rstudio for the assignment,
_simple R demo code

_make solution demo code

Lessons:
Youtube video was cropped off the top section during iMovie editing.

What worked:
I screen cast many lectures and demo and uploaded them in-time for for section 2.

Section 2:
30 min:
_Go over student assignments. Several groups in this section had problems with math problems.

_NCBI nucleotide database

1:45-2:13, let students help each other.
_ApE
2:13pm-2:20

_ApE, CDS, reverse complementation

by 2:30pm

_ build nucleotide exercise

2:30-3:20pm
_R and Rstudio for the assignment,
_simple R demo code

_make solution demo code

One student asked about "#" in the R codes. Some students asked about the parenthesis.

3:20, ask students to summarize the class.

For homework assignment: nucleotide structure, chromosome, and replication.

Lessons:

Some student have trouble downloading Rstudio.

Not used:
_Central dogma review (concept map, group activity)
_Prepare R and Rstudio on flashdrives

_DNA structure and replication, with MSH2's role mentioned briefly

human genome 3 billion DNAbase
yeast, how big?

Brooker chapter 11 slides

Go-over brocker questions in classes

Past student presentations on MSH2 project
Past project poster

Note, in 2015 spring, students are said to have learned micro pipette usage in BIO120.

Optional: Serial dilution exercise using colored papers and petri dishes

skipped: DNA double struck in SPDB, based bio233 materials on DNA.

Monday, January 19, 2015

Emerging researcher meeting

http://www.emerging-researchers.org/
Good meeting for students.

I did not see specific requirement of the poster requirement.

Todo: Need to add an introduction of yeast aging. SVM diagram.

R/Rstudio tutorial page for BIO125, Spring 2015

This is a dynamic page and will change frequently during Spring semester of 2015.

1.What is R?

Wikipedia entry on R

Why R by Courtney Brown at Emory.

Why R and beyond.

R blogger that provides recent and often interesting development about R.

What is R video (Added after the class).

2. Install R to your own computers.

Instructions to download R.

Install R studio. RStudio provides a nice GUI to R.

Install packages to R: Video for Windows Version.

3. Introduction to R.

Stephen J. Eglen's short intro slides.

Hong Qin's slides: Overview of R; Basic programming in R; Input & Output in R;

Lydon Walker, getting started with R, an accelerated primer

4. Simple exercises in R.

Youtube tutorial converting Excel data to CSV and load into R.
The sunflow seed Excel file is here.

5. Make solution exercise.

Write an R function to calculate how much NaCl needed for X ml of Y mM NaCl solution.

6. Simple statistical analysis in R.

Simple regression exercise.

7. Advanced training materials.

Multiple regression demo

Hierarchical clustering using cities. Code, Video.

Laddy Gaga and clustering analysis. Code. Video.

Bioconductor workshop materials.

8. Useful R materials.

Hong Qin's R and computational biology GitHub site

Stephen J. Eglen's PLOS article. A quick guide to teaching R programming to computational biology students.

http://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf

http://www.rseek.org.

http://addictedtor.free.fr/graphiques/.

Newly launched interactive cloud-based R for teaching: http://www.datamind.org/

Google-developers R programming videos

Avril Coghlan's little book of R for bioinformatics.

Github's collection of free R books

R by examples

Saturday, January 17, 2015

NIH glossary

http://grants.nih.gov/grants/glossary.htm#F

Federal Pell Grant

The Federal Pell Grant Program provides need-based grants to low-income undergraduate and certain postbaccalaureate students to promote access to postsecondary education. Students may use their grants at any one of approximately 5,400 participating postsecondary institutions. Grant amounts are dependent on: the student's expected family contribution (EFC) (see below); the cost of attendance (as determined by the institution); the student's enrollment status (full-time or part-time); and whether the student attends for a full academic year or less.

http://www2.ed.gov/programs/fpg/index.html

Friday, January 16, 2015

file synchronization

A very long list of file synchronization software are listed.

http://en.wikipedia.org/wiki/Comparison_of_file_synchronization_software

Coursesmart link

Online textbook link, coursesmart

http://instructors.coursesmart.com/mycoursesmart

http://instructors.coursesmart.com/offlinebookshelf

Thursday, January 15, 2015

Socrative, report student results

It seems Socrative can only require student names in the pre-defined quizzes.

I solve this problem by writing a generic quiz on Socrative. I then started Socrative in teacher mode on Mac-tower, and tried student login in two other computers. I loginto Socrative through gmail directly. After finish the quiz, I save the quiz results to my GoogleDrive directly in a ZIP file that contains the students names and their answers in a Excel file.

Apparently, Socrative does not students to change their answers after their initial submission.

Bio125, day 1, week 1,

Names tags,
Syllabus, flipped classroom, assignments are mostly due before class,
Learning objectives
Use a seat map to connect student names, faces and seatings.

Signatures, notepad and pens
lab safety form (I added the students names to the list).
IRB, academic integrity, photo video release form (Need to add Yes or No on the form)

pre-assessment
bring incentives

HP laptop log, wireless connections
Bring ethernet cables

Socrative login. Go over slides to test socrative. (students have trouble to generate account. Later, I figure out that students names are only available in pre-defined quizzes).

ApE installation. On Yosemite, when the security setting is not changed, OS X gave a warning as if the downloaded software is damaged.

Presentation orders. natural group order

Did Group building on cancer.
PubMed search
Primary literature: original work versus reviews and commentaries.

group 1, 2: colorectal cancer, MSH2
group 3: breast cancer, BRCA1
group 4; leukemia,
group 5: oral cancer, smoking, alcohol PMID 25564114
group 6: ovarian cancer: BRCA1 and 2
group 7: prostate cancer:
group 8: stomach cancer
group 9: pancreatic cancer
group 10: liver cancer

Summary: presentation groups of next class on homework assignments.

Wednesday, January 14, 2015

Snowball microphone test, level setting for lecture captureing

Level 3 gave the highest volume recorded in QuickTime. Level 1 is second. Level 2 somehow gave the weakest voice.

I used OS X Yosemite, laptop 'ace'.

NIH Big Data to Knowledge (BD2K) Enhancing Diversity in Biomedical Data Science (R25)

RFA
http://grants.nih.gov/grants/guide/rfa-files/RFA-MD-15-005.html

On January 13, 2015, a new funding opportunity announcement was released entitled NIH Big Data to Knowledge (BD2K) Enhancing Diversity in Biomedical Data Science (R25). The over-arching goal of this BD2K R25 program is to support educational activities that enhance the diversity of the biomedical, behavioral, and clinical research workforce. To accomplish the stated over-arching goal, this FOA will support creative educational activities with a primary focus on research experiences for students and faculty, and for curriculum development.

The primary purpose of the NIH BD2K Enhancing Diversity in Biomedical Data Science program is to provide resources for eligible institutions to implement innovative approaches to research education for diverse students in Big Data science, including those from underrepresented backgrounds in biomedical research. Higher education institutions listed in the FOA are eligible to apply. Some institutions provide unique opportunities for access to students from diverse backgrounds underrepresented in biomedical and behavioral research. Accordingly, the NIH Big Data to Knowledge (BD2K) program strongly encourages applications from the following institutions: Historically Black Colleges and Universities (HBCUs), Tribally Controlled Colleges and Universities (TCCUs), Hispanic-Serving Institutions (HSIs), Alaska Native and Native Hawaiian-Serving Institutions, and institutions serving individuals living with disabilities. Applicants must collaborate with at least one NIH BD2K Center [NIH BD2K Centers] across the nation to develop the BD2K R25 program at the applicant institution. Refer to RFA-MD-15-005 for details.

"Collaborative activities with the NIH BD2K Centers may include, but are not limited to: short-term research experiences for students and faculty at the NIH BD2K Centers, and hands-on projects; developing and/or disseminating curriculum materials that will be used at the applicant institution, and/or in a joint-instructional capacity with BD2K faculty. - See more at: http://grants.nih.gov/grants/guide/rfa-files/RFA-MD-15-005.html#sthash.gBxhvyIY.dpuf"

List of BD2K centers:
http://bd2k.nih.gov/FY14/COE/COE.html#sthash.NGUHPDVC.nZubDRF2.dpbs

UNIVERSITY OF PITTSBURGH (Super computing center?)
http://projectreporter.nih.gov/project_info_description.cfm?aid=8932078&icde=22003109
http://www.dbmi.pitt.edu/person/gregory-cooper-md-phd

UW Madison (past Spelman REU program)
http://projectreporter.nih.gov/project_info_description.cfm?aid=8921373&icde=22003161

References:
http://bd2k.nih.gov/FY14/Ed/Ed.html#sthash.5HdqQJNh.dpbs

Report of a workshop, very informative
http://bd2k.nih.gov/pdf/bd2k_training_workshop_report.pdf

Who to Train: The BD2K workforce will need both quantitative (statistical and computational)
expertise and biomedical domain expertise, taken together as “data science” expertise.
Examples of biomedical fields that already incorporate varying amounts and mixtures of
quantitative expertise are bioinformatics, computational biology, biomedical informatics,
biostatistics, and quantitative biology. Both basic and clinical researchers at all career levels
need to receive training.
 When to Train: Training is needed at all career stages: exposure courses for
undergraduates, cross-training for graduate students and postdoctoral fellows, training as
needed for researchers at all levels to facilitate their work, refresher courses or certificates in
specific competencies for mid-level researchers, and relevant continuing medical education
courses for clinical professionals.
 What to Train: Both long- and short-term training is needed, and efforts should be guided by
the competency level required for the technical knowledge and skills to be gained. The
technical knowledge and skills needed include: (1) computational and informatics skills; (2)

mathematics and statistics expertise; and (3) domain science knowledge.

How to Train: Several ways to cross-train biomedical and quantitative scientists were
suggested, including through (1) new or expansion of existing long-term research training
programs (which can incorporate activities such as boot camps, joint and team coursework,
delayed laboratory rotations, dual or team mentoring, clinical and industrial externships, and
team challenges); (2) short-term courses and hands-on immersive experiments (which can
span short courses, certificate programs, immersive workshops, summer institutes, clinical
immersion and shadowing, and continuing medical education opportunities); (3) curricula for
biomedical Big Data; (4) technology-enabled learning systems and environments (e.g., webbased
courses and Massive Open Online Courses (MOOCs) to offer training to a much
larger audience; and (5) a training laboratory that has tools and resources for self-directed

learning and exploration.

Moodle 2.5 customerization, dock panel

The Dock panel is a little triage sign on the left.

Bamboom Wacom driver installation, OS X, Yosemite

Bamboom Wacom driver installation, OS X, Yosemite

The older driver on the CD in the shipment does not work for Yosemite anymore. I found a legacy support page on bamboom
http://us.wacom.com/en/support/legacy-drivers/

My Bamboo Pen tablet is Model CTL-470.

I connected my Bamboo tablet and it worked.

Tuesday, January 13, 2015

Useful teaching/coaching strategies

From
http://www.scientificamerican.com/article/the-secret-to-raising-smart-kids1/?WT.mc_id=SA_Twitter

You did a good job drawing. I like the detail you added to the people's faces.

You really studied for your social studies test. You read the material over several times, outlined it and tested yourself on it. It really worked!

I like the way you tried a lot of different strategies on that math problem until you finally got it.

That was a hard English assignment, but you stuck with it until you got it done. You stayed at your desk and kept your concentration. That's great!

I like that you took on that challenging project for your science class. It will take a lot of work—doing the research, designing the apparatus, making the parts and building it. You are going to learn a lot of great things.

Oh, sorry, that was too easy—no fun. Let's do something more challenging that you can learn from.

Let's all talk about what we struggled with today and learned from. I'll go first.

Mistakes are so interesting. Here's a wonderful mistake.

Let's see what we can learn from it.

PSC blacklight trial, 20150113

Instructions:

"Once you login you will be in your $HOME directory (/usr/users/1/hqin2) which is backed up but has a quota of 5 Gbytes. You also have access to a $SCRATCH directory (/brashear/hqin2) which has essentially unlimited storage and is not backed up. Files in $SCRATCH may be removed, oldest first, to make room when needed, though we try to keep them for 2-weeks at least.

There is a file archiver, you can access it as the directory /arc/users/hqin2/ from the login node, where you can store whatever you need to keep long-term (while your allocation is active, of course). You can also connect to the archiver via sftp, at data.psc.edu. You can use Fugu or any other graphical user interface if you prefer. This is the simplest way to transfer files to PSC, you can see them in the /arc directory from the login node and copy them to/from the $HOME or $SCRATCH directory as needed.

When you run and write data, we prefer that you write to $SCRATCH, which is a distributed file system and can handle the load, and not to $HOME."

hqin2@tg-login1:~> echo $HOME

/usr/users/1/hqin2

hqin2@tg-login1:~> echo $SCRATCH

/brashear/hqin2

hqin2@tg-login1:~> du /arc/users/hqin2

2 /arc/users/hqin2

hqin2@tg-login1:~> df /arc/users/hqin2

Filesystem 1K-blocks Used Available Use% Mounted on

/arc 3656882477312 2021505932032 1635376545280 56% /arc

hqin2@tg-login1:~> df -h /arc/users/hqin2

Filesystem Size Used Avail Use% Mounted on

/arc 3.4P 1.9P 1.5P 56% /arc

Instructions:

"Look at this webpage:
http://www.psc.edu/index.php/computing-resources/blacklight

it has examples of scripts for running batch jobs, in particular I think you will want to run an 'interactive batch job' to check that your code works.

    qsub -I -l ncpus=16 -l walltime=0:30:00 -q debug

once you get a prompt, you are on the 'backend', or 'compute node', i.e. Blacklight proper, and everything runs there, not on the login node.

Let's say I have a trivial R example:

y <- rnorm(10)
print(y)

this is saved in a file (example.R), and I want to run it. So I type the 'qsub ....' command above, and after I got an interactive prompt, enter the following;

source /usr/share/modules/init/bash
module load R

R --slave CMD BATCH ./example.R

and the output appears in 'example.Rout'. OK, so I'm done. To get out of the 'compute node', I type 'exit' and press enter.

The first two lines (source ... ; module ...) load the definition of the 'module' command, the second uses the module command to put (a version of) R in my path, and the last executes the R script in batch mode.

Once I have figured out that everything is working, I can run the script in full batch mode (non-interactively) by putting this into a PBS script, i.e. a file, let's call it 'R.pbs':

#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00

source /usr/share/modules/init/bash
module load R
cd $PBS_O_WORKDIR

ja
R --slave CMD BATCH ./example.R
ja -chlst

So you are just entering the commands you typed interactively, after a line that indicates what 'shell' you want to run under, and some options to the batch scheduler (the number of cores, and the minutes, which you had entered on the command line before).   What is new is the "cd $PBS_O_WORKDIR" which makes the script start on whatever directory you were when you submitted the command. Also, the couple of lines "ja" and "ja -chlst" surrounding the call to R. They are not essential, but collect useful information on the job (maximum amount of memory, time spent, cpu time used, etc.)

So you have this script called 'R.pbs', and you can submit it to the scheduler with the command

    qsub R.pbs

The scheduler will reply with something like:
394363.tg-login1.blacklight.psc.teragrid.org

the number is the 'job ID' of your PBS job, which you can use to ask for more information from the scheduler. You can always ask it 'what jobs do I have in the queue' like this:

    qstat -u hqin2

and it will list them all, together with the state (R means running, Q means it still in the queue). If it lists nothing, it means all your jobs completed. After the job completed, there should appear a couple of files in the directory where you put the script. Since I didn't use any option to give the job a name, the files would be named {script name}.e#### and {script name}.o####, in the example that would be R.pbs.o########## and R.pbs.e#######. The 'o' file has any output that the job would write to the standard output, the 'e' file anything that would normally go to the standard error file.   You can also redirect output from any command in the job script to a file. "

source /usr/share/modules/init/bash
module load R

R --slave CMD BATCH ./example.R

hqin2@tg-login1:~> ll example.R* #output is example.Rout

-rw-r--r-- 1 hqin2 mc48o9p 24 2015-01-13 20:47 example.R

-rw-r--r-- 1 hqin2 mc48o9p 942 2015-01-13 20:48 example.Rout

hqin2@tg-login1:~> nano -w R.pbs

hqin2@tg-login1:~> pwd

/usr/users/1/hqin2

hqin2@tg-login1:~> qsub R.pbs

418673.tg-login1.blacklight.psc.teragrid.org

hqin2@tg-login1:~> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org:

Req'd Req'd Elap

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----

418673.tg-login1 hqin2 batch_r R.pbs -- -- 16 -- 00:03 Q --

hqin2@tg-login1:~>

Nothing was in the output file. So, I modified the running line to "R -f example.R"

hqin2@tg-login1:~/test> ls

example.R R2.pbs

hqin2@tg-login1:~/test> ll

total 8

-rw-r--r-- 1 hqin2 mc48o9p 24 2015-01-13 22:33 example.R

-rw-r--r-- 1 hqin2 mc48o9p 199 2015-01-13 22:33 R2.pbs

hqin2@tg-login1:~/test> qsub R2.pbs

418692.tg-login1.blacklight.psc.teragrid.org

hqin2@tg-login1:~/test> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org:

Req'd Req'd Elap

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----

418692.tg-login1 hqin2 batch_r R2.pbs -- -- 16 -- 00:03 Q --

hqin2@tg-login1:~/test> cat R2.pbs

#!/bin/bash

#PBS -q batch

#PBS -l ncpus=16

#PBS -l walltime=0:03:00

source /usr/share/modules/init/bash

module load R

cd $PBS_O_WORKDIR

#R --slave CMD BATCH ./example.R

R -f example.R

ja -chlst

hqin2@tg-login1:~/test> ll

total 16

-rw-r--r-- 1 hqin2 mc48o9p 24 2015-01-13 22:33 example.R

-rw-r--r-- 1 hqin2 mc48o9p 199 2015-01-13 22:33 R2.pbs

-rw------- 1 hqin2 mc48o9p 0 2015-01-13 23:13 R2.pbs.e418692

-rw------- 1 hqin2 mc48o9p 4905 2015-01-13 23:13 R2.pbs.o418692

hqin2@tg-login1:~/test> cat R2.pbs.o418692

R version 2.15.3 (2013-03-01) -- "Security Blanket"

ISBN 3-900051-07-0

Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

> y = rnorm(10)

> print (y)

[1] -0.46271891 0.34547494 -0.97556883 -0.64659599 0.01052027 0.06472313

[7] 0.43858725 0.83961732 -0.74945123 0.15012829

Job Accounting - Command Report

===============================

Command Started Elapsed User CPU Sys CPU CPU Block I/O Swap In CPU MEM Characters Logical I/O CoreMem VirtMem Ex

Name At Seconds Seconds Seconds Delay Secs Delay Secs Delay Secs Avg Mbytes Read Written Read Write HiValue HiValue St Ni Fl SBU's

=============== ======== ========== ========== ========== ========== ========== ========== ========== ========= ========= ======== ======== ======== ======== === === == =======

# CFG ON( 1) ( 7) 23:13:32 01/13/2015 System: Linux bl0.psc.teragrid.org 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64

ja 23:13:32 0.31 0.00 0.00 0.00 0.00 0.00 0.85 0.019 0.000 19 3 1064 23780 0 0 0.00

uname 23:13:32 0.00 0.00 0.00 0.00 0.00 0.00 12.64 0.004 0.000 8 1 664 5316 0 0 0.00

R 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.000 0.000 0 1 884 12616 0 0 F 0.00

sed 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.004 0.000 10 1 816 5396 0 0 0.00

R 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.000 0.000 0 1 888 12616 0 0 F 0.00

sed 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.004 0.000 10 1 812 5396 0 0 0.00

R 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.000 0.000 0 0 856 12612 0 0 F 0.00

rm 23:13:33 0.01 0.00 0.00 0.00 0.00 0.00 0.96 0.012 0.000 20 0 712 5336 0 0 0.00

R 23:13:33 0.35 0.22 0.08 0.00 0.00 0.00 70.16 4.166 0.001 190 25 32412 75240 0 0 0.00

Job CSA Accounting - Summary Report

====================================

Job Accounting File Name : /dev/tmpfs/418692/.jacct65df3

Operating System : Linux bl0.psc.teragrid.org 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64

User Name (ID) : hqin2 (51231)

Group Name (ID) : mc48o9p (15132)

Project Name (ID) : ? (0)

Job ID : 0x65df3

Report Starts : 01/13/15 23:13:32

Report Ends : 01/13/15 23:13:33

Elapsed Time : 1 Seconds

User CPU Time : 0.2200 Seconds

System CPU Time : 0.1090 Seconds

CPU Time Core Memory Integral : 5.2741 Mbyte-seconds

CPU Time Virtual Memory Integral : 15.2699 Mbyte-seconds

Maximum Core Memory Used : 31.6523 Mbytes

Maximum Virtual Memory Used : 73.4766 Mbytes

Characters Read : 4.2103 Mbytes

Characters Written : 0.0012 Mbytes

Logical I/O Read Requests : 257

Logical I/O Write Requests : 33

CPU Delay : 0.0030 Seconds

Block I/O Delay : 0.0002 Seconds

Swap In Delay : 0.0000 Seconds

Number of Commands : 9

System Billing Units : 0.0000

hqin2@tg-login1:~/test>

Note: I compared today's R.pbs with job1.sh on 20150112
the line "source /usr/share/modules/init/bash" seems to be critical. It make sure that "module" can be recognized.

See: XSEDE note on 20150111 http://hongqinlab.blogspot.com/2015/01/qsub-usage.html