## Tuesday, December 30, 2014

### Draw to look for latex symbol syntax

http://detexify.kirelabs.org/classify.html

### least square method, matrix view

https://youtu.be/MC7l96tW8V8

### Matrix diagonalization

The approach of this proof might be useful in the matrix approach for network aging study.

## Monday, December 29, 2014

### orthogonal projection, orthogonality

orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u

U has orthonormal columns if and only if U^T U = I

### inner product of vectors, dot products, orthogonality

$u \dot v = u^T v$

$u \dot v = v \dot u$

length (or norm) of vector $v$ is the square root of its inner product. This can be seen from the v [a,b], whose length(norm) is sqrt(a^2 + b^2)

u \dot v = ||u|| ||v|| cos \theta
||u-v|| = ||u||^2 + ||v||^2 - 2||u|| ||v|| cos\theta

Two vector $u$ and $v$ are orthogonal if and only if $u \dot v = 0$.

U has orthonormal columns if and only if U^T U = I

orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u

Orthogonal projection of a point y to W space with {u1, u2, ... up} basis can be found by orthogonal projections on each base vector, u1, u2, ..., u_p.

### diagonal matrix of essential genes in network aging model

diagonal matrix of essential genes in network aging model
A^k = P D P^-1
If the diagonal matrix contains only the number of links of essential genes, its decaying might be easily computed numerically.
Diagonalization of A can be found through eigen values and eigen vectors.

### PCA notes

From covariance matrix, the eigen vector is the PCA.

http://youtu.be/5zk93CpKYhg

## Sunday, December 28, 2014

### toread, interaction based discovery of cancer genes

2014 Feb;42(3):e18. doi: 10.1093/nar/gkt1305. Epub 2013 Dec 19.

# Interaction-based discovery of functionally important genes in cancers.

http://www.ncbi.nlm.nih.gov/pubmed/24362839

## Saturday, December 27, 2014

### SQLite3 code on rls.db

file 'test_rls.sql'

.open rls.db
.databases
.tables
.separator ::
.mode column
select distinct experiment from result_experiment limit 20;
.indices
.indices set
.width 5
select * from result  limit 1;

/* The following select can take rls and its reference rls */
select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean
from result  limit 2;

/* The fields of set_name and set_genotype sometimes provide the ORF-name pair, but there are many exceptions. */

### mysql tips

MYSQL tips "mysql.txt" file

show tables like "h%";

select * form someTable into outfile "/tmp/tmpfile";

create temporary table tmptab select distinct id1 from sampleTab1 UNION ALL
select distinct id2 from sampleTab2;

grant ALL on homo_sapiens_core_17_33.* to hqin@localhost;

SUBSTRING(str,pos,len)
SUBSTRING(str FROM pos FOR len)

MID(str,pos,len)
#Returns a substring len characters long from string str, starting at position pos.
#The variant form that uses FROM is SQL-92 syntax:

-> 'ratica'

mysqldump test name --no-data --no-create-db > tmp.dump
mysqlimport -u root -h shanghai hong_database *.txt.table

/* try left, inner, outer join to see what's missing */
->  select orf, Name1
->  from   curagenOrf2name left join Ks_Ka_Yeast_Ca
->         on curagenOrf2name.orf = Ks_Ka_Yeast_Ca.Name1;
Query OK, 6268 rows affected (54.27 sec)
Records: 6268  Duplicates: 0  Warnings: 0

mysql>  select * from bader2gu where Name1 is NULL;
/* return 4313 rows */

mysql>  select * from bader2gu where Name1 is not NULL;
/* return 1955 rows.  Note, one record is missing probably
due to different annotations bw curagen and the public release from SGD
*/

### SQLite 3, osX, byte, rls.db

Reference: http://www.sqlite.org/cli.html

#I want to install SQLite to load 'rls.db'.

$sudo port install sqlite3 #OK #how to load 'rls.db' ?$ sqlite3
SQLite version 3.8.7.4 2014-12-09 01:34:36
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open rls.db
sqlite> .databases
seq  name             file
---  ---------------  ----------------------------------------------------------
0    main             /Users/hqin/projects/0.network.aging.prj/4.svm/rls.db
sqlite> .tables
build_log           genotype_pubmed_id  result_experiment   set
cross_mating_type   meta                result_ref          yeast_strain
cross_media         result              result_set

sqlite> .indices
build_log_filename
cross_mating_type_background
cross_mating_type_genotype
cross_mating_type_locus_tag
cross_mating_type_media
cross_mating_type_temperature
cross_media_background
cross_media_genotype
cross_media_locus_tag
cross_media_mating_type
cross_media_temperature
genotype_pubmed_id_genotype
genotype_pubmed_id_pubmed_id
meta_name
result_experiment_experiment
result_experiment_result_id
result_percent_change
result_pooled_by
result_ranksum_p
result_ref_background
result_ref_genotype
result_ref_locus_tag
result_ref_mating_type
result_ref_media
result_ref_name
result_ref_result_id
result_ref_set_id
result_ref_strain
result_ref_temperature
result_set_background
result_set_genotype
result_set_lifespan_mean
result_set_locus_tag
result_set_mating_type
result_set_media
result_set_name
result_set_result_id
result_set_set_id
result_set_strain
result_set_temperature
set_experiment
set_media
set_name
set_strain
set_temperature
yeast_strain_background
yeast_strain_genotype_short
yeast_strain_genotype_unique
yeast_strain_mating_type
yeast_strain_name
yeast_strain_owner

sqlite> select distinct experiment from result_experiment limit 20;
experiment
1
100
101
102_plate115
103
104
105
106_plate116
107
108_plate117
...

sqlite> .separator :::
sqlite> select * from result limit 2;
id:::experiments:::set_name:::set_strain:::set_background:::set_mating_type:::set_locus_tag:::set_genotype:::set_media:::set_temperature:::set_lifespan_start_count:::set_lifespan_count:::set_lifespan_mean:::set_lifespan_stdev:::set_lifespans:::ref_name:::ref_strain:::ref_background:::ref_mating_type:::ref_locus_tag:::ref_genotype:::ref_media:::ref_temperature:::ref_lifespan_start_count:::ref_lifespan_count:::ref_lifespan_mean:::ref_lifespan_stdev:::ref_lifespans:::percent_change:::ranksum_u:::ranksum_p:::pooled_by
1:::127:::BY4741:::KK19:::BY4741:::MATa::::::BY4741:::YPD:::30.0:::20:::20:::30.3:::7.526095:::23,26,34,31,22,37,26,39,22,36,38,24,36,40,26,38,38,17,34,19:::BY4742:::DH502:::BY4742:::MATalpha::::::BY4742:::YPD:::30.0:::40:::40:::29.625:::8.279377:::36,26,15,28,16,44,40,28,25,32,24,29,39,37,30,31,14,17,29,28,44,27,38,29,26,39,38,32,34,33,32,38,16,28,31,11,20,39,30,32:::2.278481:::409.0:::0.8916505557143:::file
2:::127:::ymr226c:::DC:4G4:::BY4741:::MATa::::::tma29:::YPD:::30.0:::20:::20:::27.1:::11.702:::24,11,37,32,41,38,12,11,31,23,39,36,22,19,28,36,24,49,24,5:::BY4741:::KK19:::BY4741:::MATa::::::BY4741:::YPD:::30.0:::20:::20:::30.3:::7.526095:::23,26,34,31,22,37,26,39,22,36,38,24,36,40,26,38,38,17,34,19:::-10.56106:::169.5:::0.4163969339623:::file
#Notes, field 'experiments' in 'result' maybe used to find the in-experiment wildtype controls.
# Ken once suggested that "pooled by" column?? file, genotype, mixed
# set lifespan
# ref lifespan

select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean

from result  limit 2;

## Wednesday, December 24, 2014

### toread, quality control of inner nuclear membrane proteins by the Asi complex

2014 Nov 7;346(6210):751-5. doi: 10.1126/science.1255638. Epub 2014 Sep 18.

# Quality control of inner nuclear membrane proteins by the Asi complex.

### Author information

http://www.ncbi.nlm.nih.gov/pubmed/25236469

http://www.ncbi.nlm.nih.gov/pubmed/25315269
http://www.ncbi.nlm.nih.gov/pubmed/25378608

## Tuesday, December 23, 2014

### toread, power law network paper

http://www.ncbi.nlm.nih.gov/pubmed/25520244

### big data sets, free

http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

### 125 years of public health records

http://www.tycho.pitt.edu/

### toread, Distinguishing cause from effect using observational data: methods and benchmarks Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard SchÃ¶lkopf

http://arxiv.org/abs/1412.3773

## Monday, December 22, 2014

### CITI training

I spent 70 minutes (9:30-10:40) on CITI training, with 93% final score (1 wrong due to mis-clicking)

### CITI refresher course reading materials, URLs

• If you have not read the Belmont Report yet, please review this document and/or copy it for future reference. (Close the new browser window to return here.)
Links to Ethical Codes and Regulations of Human Subjects in Research.
• Title 21, CFR Part 50 and CFR 56 of the Code of Federal Regulations.
• CLIA - Clinical Laboratory Improvement Amendments
• Title 21 Code of Federal Regulations (21 CFR Part 11) Electronic Records; Electronic Signatures.

### svm project, pca() repalaced by princomp() 20141222

updated file '040610.scmd.Ka.fitness.R'

'pca' package from old code 040610.scmd.Ka.fitness.R does not exist in R 3.x anymore. I switched to princomp() in the base package.

TODO: check the predicted long-lived strains in the Kaeberlein database.

### study notes on PCA, principal components, with R testing codes, princomp()

ResearchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.

37:50 in Ng's video
Ng showes to PCA (linear combinations of raw data) to reduce dimension of data.

PCA is basically orthogonal transformation
http://en.wikipedia.org/wiki/Principal_component_analysis
"Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components."
"PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)."

R:  princomp( )
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
pc.cr$scores[1:5, ] #scores probably are PCA results, based on wikipedia entry I can use examples of linear combination to verify my guess. #######start of the R testing code and results ######### x1 = rnorm(100) x2 = rnorm(100) x3 = x1 + x2 + rnorm(100)/20 x4 = 2*x1 + rnorm(100)/20 X = data.frame(cbind(x1,x2,x3,x4)) pc <- princomp(X) plot(pc)#only two major components, consistent head(pc) #######start of the R testing code and results ######### #######start of 2nd R testing code and results ######### set.seed(2014) x1 = rnorm(100) x2 = x1 + rnorm(100)/20 X = data.frame(cbind(x1,x2)) pc <- princomp(X, cor = TRUE) head(pc) pc$score[,1] - (0.707*x1 + 0.707*x2) #does this approach zero?
summary( pc$score[,1] - (0.707*x1 + 0.707*x2) + mean( 0.707*x1 + 0.707*x2 ) ) #good, it approaches zero summary(lm( pc$score[,1] ~ x1 ))

#######End of 2nd R testing code and results #########

### Useful Unix / Linux shell commands

cat tmp.txt | sed s/CREATE/DROP/

who | cut -c1-8 | sort | uniq | nl
cat /usr/local/apache2/logs/access_log | grep 128\.135 | cut -c1-16 | uniq
ps -ef | grep nohup | cut -c53-57 | sort | uniq | nl

/sbin/shutdown -r now ?
lsof

/etc/rc.local  # system startup configuration

grep CREATE ensembl_mart_16_1.sql | sed s/CREATE/DROP/ | sed s/\(/\;/ > $HOME/trim_mart.sql ls enc.* | sed "s/^/\"/" | sed "s/$/\"\,/"

### health disparity gene expression datasets, collection, data resources

Differential endothelial cell gene expression by African Americans versus Caucasian Americans: A possible contribution to health disparity in vascular disease and cancer
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22688
http://www.ncbi.nlm.nih.gov/pubmed/21223544

PLoS One. 2008 Aug 6;3(8):e2847. doi: 10.1371/journal.pone.0002847.
Gene expression and functional studies of the optic nerve head astrocyte transcriptome from normal African Americans and Caucasian Americans donors.
Miao H1, Chen L, Riordan SM, Li W, Juarez S, Crabb AM, Lukas TJ, Du P, Lin SM, Wise A, Agapova OA, Yang P, Gu CC, Hernandez MR.
http://www.ncbi.nlm.nih.gov/pubmed/18716680

Genome Biol. 2008;9(7):R111. doi: 10.1186/gb-2008-9-7-r111. Epub 2008 Jul 9.
Susceptibility to glaucoma: differential comparison of the astrocyte transcriptome from glaucomatous African American and Caucasian American donors.
http://www.ncbi.nlm.nih.gov/pubmed/18613964

Physiol Genomics. 2011 Jul 14;43(13):836-43. doi: 10.1152/physiolgenomics.00243.2010. Epub 2011 Apr 26.
Gene expression variation between African Americans and whites is associated with coronary artery calcification: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/21521779

review, subclinical coronalry atherosclerosis, racial profiling is necessary
http://www.ncbi.nlm.nih.gov/pubmed/17070140

General Cardiovascular Risk Profile identifies advanced coronary artery calcium and is improved by family history: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/20160201

J Transl Med. 2013 Oct 1;11:239. doi: 10.1186/1479-5876-11-239.
Quantitative proteomic analysis in HCV-induced HCC reveals sets of proteins with potential significance for racial disparity.
Dillon ST, Bhasin MK, Feng X, Koh DW, Daoud SS.
http://www.ncbi.nlm.nih.gov/pubmed/24283668

Exerc Sport Sci Rev. 2013 Jan;41(1):44-54. doi: 10.1097/JES.0b013e318279cbbd.
Are there race-dependent endothelial cell responses to exercise?
Brown MD1, Feairheller DL.
http://www.ncbi.nlm.nih.gov/pubmed/23262464

## Sunday, December 21, 2014

### Braunewell Bornholdt, 2007, Superstability of the yeast cell-cycle dynamics

[PB07JTB 2007 Apr 21;245(4):638-43. Epub 2006 Nov 21. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity

In their 2009 JTB paper, the author cited a measure of reliability in this 07JTB paper. I searched the entire paper for reliability, but did find one hit in the abstract. In the main text, the author mentioned  "stability of the systems under strong noise", termed "stability criterion" (basically robustness or reliability. Based on its explanation below, this is a rather context-specific criterion.

It seems that PB07 and PB09 are based on the Li04PNAS paper, a boolean network model on yeast cell cycle.

### Braunewell and Bornholdt, 2009, reliability of network

PB09JTB
investigate the interplay of topological structure and dynamical robustness.

reliability of attractors

boolean network dynamics

The reliability criteriont was used to show the robustness of the yeast cell-
cycle dynamics against timing perturbations (Braunewell and Bornholdt, 2007

### HBCU-PRIDE,

Help us get the word out about PRIDE by forwarding the below to your institution, colleagues, & organizations you may be a member of and posting on Linked In & Facebook.  We appreciate your assistance!

The PRIDE Summer Institute Programs to Increase Diversity Among Individuals Engaged in Health-Related Research are now accepting applications. Space is limited for the 2015 mentored summer training programs so Apply early!
Who: Eligible applicants are junior-level faculty or scientists from minority groups that are under-represented in the biomedical or health sciences, and are United States Citizens or Permanent Residents. Research interests should be compatible with those of the National Heart, Lung, and Blood Institute (NHLBI) in the prevention and treatment of heart, lung, blood, and sleep (HLBS) disorders.
What: Seven unique Summer Institute programs with intensive mentored training opportunities to enhance the research skills and to promote the scientific and career development of trainees. Trainees will learn effective strategies for preparing, submitting and obtaining external funding for research purposes, including extensive tips on best practices. Research emphasis varies by program.
Where/When (Dates subject to change.  Verify on website):
• Location:Arizona Health Sciences Center, University of Arizona, Tucson, Arizona
• PI: Joe G.N. “Skip” Garcia, MD; Francisco Moreno, MD
Behavioral and Sleep Medicine (BSM) (July 19 – August 1, 2015)
• Location: NYU Langone Medical Center, New York, New York
• PI: Girardin Jean-Louis, PhD

• Location: Washington University in St. Louis, St. Louis, Missouri
• PI: D.C. Rao, PhD; Victor Davila-Roman, MD

Cardiovascular Health-Related Research (CVD) (July 19 – August 1, 2015)
• Location: SUNY Downstate Medical College, New York, New York
• PI: Mohamed Boutjdir, PhD

• Location: Georgia Regents University, Augusta, Georgia
• PI: Betty Pace, MD

HBCU-PRIDE (June 21 – July 1, 2015)
• Location:University of Mississippi Medical Center, Jackson, Mississippi
• PI: Bettina M. Beech, DrPH, MPH; Keith C. Norris, MD, PhD

• Location: The UCSF Center for Vulnerable Populations at San Francisco General Hospital, San Francisco, California
• PI: Kirsten Bibbins-Domingo, PhD, MD, MAS; Alicia Fernandez, MD; Margaret Handley, PhD, MPH

Programs typically are all expenses paid including travel, meals, housing, and tuition. Contact the program of interest for details. Mentees can apply to more than one program, but may accept only one.

If you know of colleagues or program alumni at the junior faculty level who would benefit from this innovative research training and mentorship opportunity, we urge you to encourage them to Apply.
We would appreciate your help in getting the word out …
·         Forward this message to appropriate faculty advisors and colleagues.
·         Print and post the program flyer in a common location.

·         Encourage eligible junior faculty to consider this

## Thursday, December 18, 2014

### NGS method, RNA seq

Method in Lei 2013, Gene, Diminishing returns in next-generation sequencing (NGS)
transcriptome data.

converted from SRA format to FASTQ format using SRA toolkit
(http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software). Then, the
raw data were filtered using the following criteria: (1) the number of
unknown bases (N) was no more than two for each read; and (2) the
fraction of low quality sites (Q b 5) was no more than 50% for each
read. The data that passed this quality control were then used to map
back to their respective genome sequences using bowtie2 (Langmead
and Salzberg, 2012). Only uniquely mapped reads with no more than
two mismatches were retained for further analysis. After mapping, the
counts for each gene were summarized using HTSeq (http://wwwhuber.
embl.de/users/anders/HTSeq/doc/overview.html). In the simulation,
a predetermined-sized subset of reads was randomly selected
from the original file. Using the samemapping procedure as mentioned
above, the RPKM for each gene and depth of coverage were calculated
and comparedwith those fromoriginal data. In-house Perl and R scripts

were developed for data analysis and graphing (available upon request).

## Wednesday, December 17, 2014

### convert read-only pdf to modifiable pdf

I saved pdf/A file to postscript.

I then "ps2pdf" and generated a new pdf, which can be annotated.

### toread, pan-cancer network, somatic mutaitons

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes, nature genetics, 2014

### reciprocity and power-law network, TOREAD

http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf

This is paper is related to my network aging and network configuration.

# Proteins drive cancer cells to change states

When RNA-binding proteins are turned on, cancer cells get locked in a proliferative state.

## Monday, December 15, 2014

### How children learn math

How children learn math

http://www.dailymail.co.uk/sciencetech/article-2727268/Peek-brain-shows-kids-learn-math-skills.html

http://www.education.com/reference/article/how-children-learn-mathematics/

### Liu & Chen, 2012, Protein Cell, Proteome-wide prediction of protein-protein interactions from high-throughput data.

2012 Jul;3(7):508-20. doi: 10.1007/s13238-012-2945-1. Epub 2012 Jun 22.

# Proteome-wide prediction of protein-protein interactions from high-throughput data.

http://www.ncbi.nlm.nih.gov/pubmed/22729399

Good Review on protein/gene network study

## Sunday, December 14, 2014

require(xlsx)
rm(list=ls())

list.files()
#The - signs have to be replaced with zeros in textwrangler

empty.columns= NULL
for (j in 8:length(tb[1,])){
#for( i in 1:length(tb[,1])){
#  if( tb[i,j]=='-') {tb[i,j]=NA }
#}
tb[,j] = as.numeric( tb[,j])
tb[is.na(tb[,j]),j] = 0
if( max(tb[,j])==0 ) { empty.columns = c(empty.columns, j)}
}
str(tb)
tb2 = tb[, - empty.columns]
#tb2 = tb2[, -"Course.total"]
#tb2 = tb2[, - grep('Spring', names(tb2))]
#tb2 = tb2[, - grep('spring', names(tb2))]
tb2 = tb2[, -grep("Quiz.Retake..Fall.2014.Exam.1..part.2..online.part", names(tb2))]
names(tb2)

examColumns = names(tb)[grep("xam", names(tb))]
exam1 = c(         "Quiz.Exam1.Part1..Fall.2014"      ,
"Assignment.Exam1..part2..calculation.questions..Fall.2014"     ,
"Quiz.Fall.2014.Exam.1..part.2..online.part"
)
exam2= c("Quiz.Exam.2..closed.book.section..Fall.2014..Thursday",
"Quiz.Exam2..open.book.section..Fall.2014..Tuesday")
exam3=c("Quiz.Exam3..closed.book.section..Nov.20..2014",
"Quiz.Exam.3..open.book.section..Fall.2014"   )
final = c( "Quiz.Final.Exam..Open.book.section..Fall2014..Dec.9..11am.13.00",
"Quiz.Closed.book.section.of.final.exam..Dec.9..2014..10.30am.12.30pm")

tb2[,final]
names(tb)[grep("inal", names(tb))]

report= tb2[,1:2]
report$Exam1 = apply( tb2[,exam1], 1, sum) report$Exam2 = apply( tb2[,exam2], 1, sum)
report$Exam3 = apply( tb2[,exam3], 1, sum) report$Final = apply( tb2[,final], 1, sum)

practical = names(tb2)[grep("ractical", names(tb2))]
report$ToTpractical = (tb2[,"Assignment.Practical.Exam..microscope.and.morphology..Sep.29..2014"]/10 + tb2[,"Assignment.Streak.plate..practical.exam"])/4 ### do find out assignments and chapter quiz #scale lap report were posted twice scale = c("Quiz.Lab.assignment..Scale.of.Microbes", "Quiz.scale.of.microbes..lab.report") report$scale= apply( tb2[, scale],1, max)

# chapter homework can be found with "Quiz" or "Chapter". The names should be consisteny!!!
names(tb2)[grep("Chapter", names(tb2))]

report$ch1 = tb2[, grep("Chapter.1",names(tb2))] report$ch2 = tb2[, grep("Chapter\\.2",names(tb2))] ##.2 can match 32
report$ch3= apply( tb2[, grep("Chapter.3",names(tb2))],1, max) report$ch4= apply( tb2[, grep("Chapter.4",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.5",names(tb2))],1, max) report$ch5= apply( tb2[, grep("Chapter.6",names(tb2))],1, max)
report$ch7= apply( tb2[, grep("Chapter.7",names(tb2))],1, max) report$ch8= apply( tb2[, grep("Chapter.8",names(tb2))],1, max)
report$ch9= apply( tb2[, grep("Chapter9",names(tb2))],1, max) report$ch10= apply( tb2[, grep("Chapter10",names(tb2))],1, max)
report$ch16= apply( tb2[, grep("Chapter16",names(tb2))],1, max) report$ch32= apply( tb2[, grep("Chapter32",names(tb2))],1, max)

#misc assignment and lab reports, which can be quiz or assignments
names(tb2)[grep("ment", names(tb2))]

misc= c( "Assignment.Serial.dilution.lab.group.report"      ,
"Assignment.Pictures.for.microbes.on.campus.by.groups"     ,
"Quiz.Lab.assignment..Scale.of.Microbes"                 ,
"Quiz.Lab.assignment..E.coli.genome.studies"       ,
"Assignment.Report.for.Gram.stain.lab..individual.report."  ,
"Assignment.Homework.for.Dr..Wenzhi.Li.s.lecture..Individual.effort.",
"Assignment.homework.on.circulating.tumor.DNA"   )

tb2[, "Assignment.homework.on.circulating.tumor.DNA" ] =  tb2[,  "Assignment.homework.on.circulating.tumor.DNA" ]/10
tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."] =tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."]/10
tb2[1:5, misc]

report$misc= apply( tb2[, misc],1, sum) assignAndLab =c("scale","misc","ch1","ch2","ch3","ch4","ch5","ch7","ch8", "ch9","ch10","ch16","ch32") report$ToTassignAndLab = apply( report[,assignAndLab], 1, sum)
maxS = apply( report[, assignAndLab], 2, max)
report$ToTassignAndLab = 15*report$ToTassignAndLab / sum(maxS)
## end of assignment and lab reports

#attendence
list.files()
att.tb$ToTAttendence = apply( att.tb[, 6:33], 1, sum) str(att.tb) hist(att.tb$ToTAttendence, br=20)
report$ToTAttendence = att.tb$ToTAttendence[match(report$Last.name, att.tb$Last.name)]
report$ToTAttendence = report$ToTAttendence*5/ max(report$ToTAttendence) # take best 2 regular exam and the final report$badExam = apply(report[,c("Exam1","Exam2", "Exam3")], 1, min)
report$ExamTot = (report$Exam1 + report$Exam2 + report$Exam3 + report$Final - report$badExam) / 3

# bonus points, need to add R bonus points
names(tb2)[grep("onus", names(tb2))]
bonus = c("Assignment.Bonus.points.of.paper.presentations.and.volunteering" ,
"Assignment.Bonus.Problem.1..Flow.cytometer.data.analysis.1"     ,
"Assignment.Bonus.problem.2..Cholera.data.simulation.in.R.1"    )
report$bonus = apply( tb2[,bonus], 1, sum) # oral report$oral = tb2[,"Assignment.Oral.presentation.grades..fall.2014"]

#written report
report$written = tb2$WrittenReport

FinalGrades= c("ExamTot","ToTpractical","ToTassignAndLab", "ToTAttendence", 'bonus', 'oral', "written")

report$FinalGrade= apply(report[,FinalGrades], 1, sum) hist(report$FinalGrade, br=20)

if(x>94){    ret='A'
}else if (x >90) {    ret='A-'
}else if (x >87 ){    ret = 'B+'
}else if (x > 84){    ret = 'B'
}else if (x >80){    ret = 'B-'
}else if (x > 76){  ret = 'C+'
}else if (x > 70){ ret = 'C'
}else if (x > 67){ ret = 'C-'
}else if (x > 64){ ret = 'D+'
}else if (x > 60){ ret = 'D'
}else {   ret = 'F'
}
return (ret)
}
report$letter = lapply(report$FinalGrade,  grade2letter)