Tuesday, December 30, 2014

Draw to look for latex symbol syntax

http://detexify.kirelabs.org/classify.html

least square method, matrix view

Khan academy
https://youtu.be/MC7l96tW8V8

Matrix diagonalization

The approach of this proof might be useful in the matrix approach for network aging study.

Monday, December 29, 2014

orthogonal projection, orthogonality

orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u

U has orthonormal columns if and only if U^T U = I

inner product of vectors, dot products, orthogonality

$u \dot v = u^T v$

$u \dot v = v \dot u $

length (or norm) of vector $v$ is the square root of its inner product. This can be seen from the v [a,b], whose length(norm) is sqrt(a^2 + b^2)

u \dot v = ||u|| ||v|| cos \theta
||u-v|| = ||u||^2 + ||v||^2 - 2||u|| ||v|| cos\theta

Two vector $u$ and $v$ are orthogonal if and only if $u \dot v = 0$.

U has orthonormal columns if and only if U^T U = I

orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u

Orthogonal projection of a point y to W space with {u1, u2, ... up} basis can be found by orthogonal projections on each base vector, u1, u2, ..., u_p.

diagonal matrix of essential genes in network aging model

diagonal matrix of essential genes in network aging model
A^k = P D P^-1
If the diagonal matrix contains only the number of links of essential genes, its decaying might be easily computed numerically.
Diagonalization of A can be found through eigen values and eigen vectors.

PCA notes

From covariance matrix, the eigen vector is the PCA.

http://youtu.be/5zk93CpKYhg

Sunday, December 28, 2014

toread, interaction based discovery of cancer genes

2014 Feb;42(3):e18. doi: 10.1093/nar/gkt1305. Epub 2013 Dec 19.

Interaction-based discovery of functionally important genes in cancers.

http://www.ncbi.nlm.nih.gov/pubmed/24362839

Saturday, December 27, 2014

SQLite3 code on rls.db

file 'test_rls.sql'

.open rls.db
.databases
.tables
.separator ::
.headers on
.mode column
select distinct experiment from result_experiment limit 20;
.indices
.indices set
.width 5
select * from result limit 1;

/* The following select can take rls and its reference rls */
select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean
from result limit 2;

/* The fields of set_name and set_genotype sometimes provide the ORF-name pair, but there are many exceptions. */

mysql tips

MYSQL tips "mysql.txt" file

show tables like "h%";

select * form someTable into outfile "/tmp/tmpfile";

create temporary table tmptab select distinct id1 from sampleTab1 UNION ALL
select distinct id2 from sampleTab2;

grant ALL on homo_sapiens_core_17_33.* to hqin@localhost;

SUBSTRING(str,pos,len)
SUBSTRING(str FROM pos FOR len)

MID(str,pos,len)
#Returns a substring len characters long from string str, starting at position pos.
#The variant form that uses FROM is SQL-92 syntax:

mysql> SELECT SUBSTRING('Quadratically',5,6);
-> 'ratica'

mysqldump test name --no-data --no-create-db > tmp.dump
mysqlimport -u root -h shanghai hong_database *.txt.table

/* try left, inner, outer join to see what's missing */
mysql> create temporary table bader2gu
-> select orf, Name1
-> from curagenOrf2name left join Ks_Ka_Yeast_Ca
-> on curagenOrf2name.orf = Ks_Ka_Yeast_Ca.Name1;
Query OK, 6268 rows affected (54.27 sec)
Records: 6268 Duplicates: 0 Warnings: 0

mysql> select * from bader2gu where Name1 is NULL;
/* return 4313 rows */

mysql> select * from bader2gu where Name1 is not NULL;
/* return 1955 rows. Note, one record is missing probably
due to different annotations bw curagen and the public release from SGD
*/

SQLite 3, osX, byte, rls.db

Reference: http://www.sqlite.org/cli.html

#I want to install SQLite to load 'rls.db'.

$ sudo port install sqlite3

#OK

#how to load 'rls.db' ?

$ sqlite3

SQLite version 3.8.7.4 2014-12-09 01:34:36

Enter ".help" for usage hints.

Connected to a transient in-memory database.

Use ".open FILENAME" to reopen on a persistent database.

sqlite> .open rls.db

sqlite> .databases

seq name file

--- --------------- ----------------------------------------------------------

0 main /Users/hqin/projects/0.network.aging.prj/4.svm/rls.db

sqlite> .tables

build_log genotype_pubmed_id result_experiment set

cross_mating_type meta result_ref yeast_strain

cross_media result result_set

sqlite> .indices

build_log_filename

cross_mating_type_background

cross_mating_type_genotype

cross_mating_type_locus_tag

cross_mating_type_media

cross_mating_type_temperature

cross_media_background

cross_media_genotype

cross_media_locus_tag

cross_media_mating_type

cross_media_temperature

genotype_pubmed_id_genotype

genotype_pubmed_id_pubmed_id

meta_name

result_experiment_experiment

result_experiment_result_id

result_percent_change

result_pooled_by

result_ranksum_p

result_ref_background

result_ref_genotype

result_ref_locus_tag

result_ref_mating_type

result_ref_media

result_ref_name

result_ref_result_id

result_ref_set_id

result_ref_strain

result_ref_temperature

result_set_background

result_set_genotype

result_set_lifespan_mean

result_set_locus_tag

result_set_mating_type

result_set_media

result_set_name

result_set_result_id

result_set_set_id

result_set_strain

result_set_temperature

set_experiment

set_media

set_name

set_strain

set_temperature

yeast_strain_background

yeast_strain_genotype_short

yeast_strain_genotype_unique

yeast_strain_mating_type

yeast_strain_name

yeast_strain_owner

sqlite> select distinct experiment from result_experiment limit 20;

experiment

100

101

102_plate115

103

104

105

106_plate116

107

108_plate117

...

sqlite> .separator :::

sqlite> select * from result limit 2;

id:::experiments:::set_name:::set_strain:::set_background:::set_mating_type:::set_locus_tag:::set_genotype:::set_media:::set_temperature:::set_lifespan_start_count:::set_lifespan_count:::set_lifespan_mean:::set_lifespan_stdev:::set_lifespans:::ref_name:::ref_strain:::ref_background:::ref_mating_type:::ref_locus_tag:::ref_genotype:::ref_media:::ref_temperature:::ref_lifespan_start_count:::ref_lifespan_count:::ref_lifespan_mean:::ref_lifespan_stdev:::ref_lifespans:::percent_change:::ranksum_u:::ranksum_p:::pooled_by

1:::127:::BY4741:::KK19:::BY4741:::MATa::::::BY4741:::YPD:::30.0:::20:::20:::30.3:::7.526095:::23,26,34,31,22,37,26,39,22,36,38,24,36,40,26,38,38,17,34,19:::BY4742:::DH502:::BY4742:::MATalpha::::::BY4742:::YPD:::30.0:::40:::40:::29.625:::8.279377:::36,26,15,28,16,44,40,28,25,32,24,29,39,37,30,31,14,17,29,28,44,27,38,29,26,39,38,32,34,33,32,38,16,28,31,11,20,39,30,32:::2.278481:::409.0:::0.8916505557143:::file

2:::127:::ymr226c:::DC:4G4:::BY4741:::MATa::::::tma29:::YPD:::30.0:::20:::20:::27.1:::11.702:::24,11,37,32,41,38,12,11,31,23,39,36,22,19,28,36,24,49,24,5:::BY4741:::KK19:::BY4741:::MATa::::::BY4741:::YPD:::30.0:::20:::20:::30.3:::7.526095:::23,26,34,31,22,37,26,39,22,36,38,24,36,40,26,38,38,17,34,19:::-10.56106:::169.5:::0.4163969339623:::file
#Notes, field 'experiments' in 'result' maybe used to find the in-experiment wildtype controls.
# Ken once suggested that "pooled by" column?? file, genotype, mixed
# set lifespan
# ref lifespan

select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean

from result limit 2;

Wednesday, December 24, 2014

toread, quality control of inner nuclear membrane proteins by the Asi complex

2014 Nov 7;346(6210):751-5. doi: 10.1126/science.1255638. Epub 2014 Sep 18.

Quality control of inner nuclear membrane proteins by the Asi complex.

Foresti O1, Rodriguez-Vaello V1, Funaya C2, Carvalho P3.

Author information

http://www.ncbi.nlm.nih.gov/pubmed/25236469

comments
http://www.ncbi.nlm.nih.gov/pubmed/25315269
http://www.ncbi.nlm.nih.gov/pubmed/25378608

Tuesday, December 23, 2014

toread, power law network paper

http://www.ncbi.nlm.nih.gov/pubmed/25520244

big data sets, free

http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

125 years of public health records

http://www.tycho.pitt.edu/

toread, Distinguishing cause from effect using observational data: methods and benchmarks Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf

http://arxiv.org/abs/1412.3773

Monday, December 22, 2014

CITI training

I spent 70 minutes (9:30-10:40) on CITI training, with 93% final score (1 wrong due to mis-clicking)

CITI refresher course reading materials, URLs

If you have not read the Belmont Report yet, please review this document and/or copy it for future reference. (Close the new browser window to return here.)

Links to Ethical Codes and Regulations of Human Subjects in Research.

Declaration of Helsinki

Title 45, Part 46 of the Code of Federal Regulations (45 CFR 46)

Title 21, CFR Part 50 and CFR 56 of the Code of Federal Regulations.

CLIA - Clinical Laboratory Improvement Amendments

Title 21 Code of Federal Regulations (21 CFR Part 11) Electronic Records; Electronic Signatures.

OHRP Human Subjects Document Library

CITI training, Belmont report

Belmont report
http://www.hhs.gov/ohrp/humansubjects/guidance/belmont.html

svm project, pca() repalaced by princomp() 20141222

updated file '040610.scmd.Ka.fitness.R'

'pca' package from old code 040610.scmd.Ka.fitness.R does not exist in R 3.x anymore. I switched to princomp() in the base package.

TODO: check the predicted long-lived strains in the Kaeberlein database.

study notes on PCA, principal components, with R testing codes, princomp()

ResearchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.

37:50 in Ng's video
https://www.youtube.com/watch?v=ey2PE5xi9-A
Ng showes to PCA (linear combinations of raw data) to reduce dimension of data.

PCA is basically orthogonal transformation
http://en.wikipedia.org/wiki/Principal_component_analysis
"Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components."
"PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)."

R: princomp( )
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
pc.cr$scores[1:5, ] #scores probably are PCA results, based on wikipedia entry

I can use examples of linear combination to verify my guess.

#######start of the R testing code and results #########
x1 = rnorm(100)
x2 = rnorm(100)
x3 = x1 + x2 + rnorm(100)/20
x4 = 2*x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2,x3,x4))
pc <- princomp(X)
plot(pc)#only two major components, consistent

head(pc)

#######start of the R testing code and results #########

#######start of 2nd R testing code and results #########

set.seed(2014)

x1 = rnorm(100)

x2 = x1 + rnorm(100)/20

X = data.frame(cbind(x1,x2))

pc <- princomp(X, cor = TRUE)

head(pc)

pc$score[,1] - (0.707*x1 + 0.707*x2) #does this approach zero?

summary( pc$score[,1] - (0.707*x1 + 0.707*x2) + mean( 0.707*x1 + 0.707*x2 ) )

#good, it approaches zero

summary(lm( pc$score[,1] ~ x1 ))

#######End of 2nd R testing code and results #########

Useful Unix / Linux shell commands

health disparity gene expression datasets, collection, data resources

Differential endothelial cell gene expression by African Americans versus Caucasian Americans: A possible contribution to health disparity in vascular disease and cancer
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22688
http://www.ncbi.nlm.nih.gov/pubmed/21223544

PLoS One. 2008 Aug 6;3(8):e2847. doi: 10.1371/journal.pone.0002847.
Gene expression and functional studies of the optic nerve head astrocyte transcriptome from normal African Americans and Caucasian Americans donors.
Miao H1, Chen L, Riordan SM, Li W, Juarez S, Crabb AM, Lukas TJ, Du P, Lin SM, Wise A, Agapova OA, Yang P, Gu CC, Hernandez MR.
http://www.ncbi.nlm.nih.gov/pubmed/18716680

Genome Biol. 2008;9(7):R111. doi: 10.1186/gb-2008-9-7-r111. Epub 2008 Jul 9.
Susceptibility to glaucoma: differential comparison of the astrocyte transcriptome from glaucomatous African American and Caucasian American donors.
http://www.ncbi.nlm.nih.gov/pubmed/18613964

Physiol Genomics. 2011 Jul 14;43(13):836-43. doi: 10.1152/physiolgenomics.00243.2010. Epub 2011 Apr 26.
Gene expression variation between African Americans and whites is associated with coronary artery calcification: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/21521779

review, subclinical coronalry atherosclerosis, racial profiling is necessary
http://www.ncbi.nlm.nih.gov/pubmed/17070140

General Cardiovascular Risk Profile identifies advanced coronary artery calcium and is improved by family history: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/20160201

J Transl Med. 2013 Oct 1;11:239. doi: 10.1186/1479-5876-11-239.
Quantitative proteomic analysis in HCV-induced HCC reveals sets of proteins with potential significance for racial disparity.
Dillon ST, Bhasin MK, Feng X, Koh DW, Daoud SS.

http://www.ncbi.nlm.nih.gov/pubmed/24283668

Exerc Sport Sci Rev. 2013 Jan;41(1):44-54. doi: 10.1097/JES.0b013e318279cbbd.
Are there race-dependent endothelial cell responses to exercise?
Brown MD1, Feairheller DL.
http://www.ncbi.nlm.nih.gov/pubmed/23262464

Sunday, December 21, 2014

NIH grants searchable database

http://projectreporter.nih.gov/reporter.cfm

Braunewell Bornholdt, 2007, Superstability of the yeast cell-cycle dynamics

[PB07JTB] 2007 Apr 21;245(4):638-43. Epub 2006 Nov 21. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity

Braunewell S, Bornholdt S.

In their 2009 JTB paper, the author cited a measure of reliability in this 07JTB paper. I searched the entire paper for reliability, but did find one hit in the abstract. In the main text, the author mentioned "stability of the systems under strong noise", termed "stability criterion" (basically robustness or reliability. Based on its explanation below, this is a rather context-specific criterion.

It seems that PB07 and PB09 are based on the Li04PNAS paper, a boolean network model on yeast cell cycle.

Braunewell and Bornholdt, 2009, reliability of network

PB09JTB

investigate the interplay of topological structure and dynamical robustness.

reliability of attractors

boolean network dynamics

The reliability criteriont was used to show the robustness of the yeast cell-

cycle dynamics against timing perturbations (Braunewell and Bornholdt, 2007)

[PB07JTB] Braunewell, S., Bornholdt, S., 2007. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity. J. Theor. Biol. 245 (4), 638–643.

HBCU-PRIDE,

Help us get the word out about PRIDE by forwarding the below to your institution, colleagues, & organizations you may be a member of and posting on Linked In & Facebook. We appreciate your assistance!

The PRIDE Summer Institute Programs to Increase Diversity Among Individuals Engaged in Health-Related Research are now accepting applications. Space is limited for the 2015 mentored summer training programs so Apply early!

Who: Eligible applicants are junior-level faculty or scientists from minority groups that are under-represented in the biomedical or health sciences, and are United States Citizens or Permanent Residents. Research interests should be compatible with those of the National Heart, Lung, and Blood Institute (NHLBI) in the prevention and treatment of heart, lung, blood, and sleep (HLBS) disorders.

What: Seven unique Summer Institute programs with intensive mentored training opportunities to enhance the research skills and to promote the scientific and career development of trainees. Trainees will learn effective strategies for preparing, submitting and obtaining external funding for research purposes, including extensive tips on best practices. Research emphasis varies by program.

Where/When (Dates subject to change. Verify on website):

Advanced Health Disparities Training (AHD) (June 8-19, 2015)

Location:Arizona Health Sciences Center, University of Arizona, Tucson, Arizona
PI: Joe G.N. “Skip” Garcia, MD; Francisco Moreno, MD

Behavioral and Sleep Medicine (BSM) (July 19 – August 1, 2015)

Location: NYU Langone Medical Center, New York, New York
PI: Girardin Jean-Louis, PhD

Cardiovascular Genetic Epidemiology (CGE) (July 12 – 31, 2015)

Location: Washington University in St. Louis, St. Louis, Missouri
PI: D.C. Rao, PhD; Victor Davila-Roman, MD

Cardiovascular Health-Related Research (CVD) (July 19 – August 1, 2015)

Location: SUNY Downstate Medical College, New York, New York
PI: Mohamed Boutjdir, PhD

Functional and Translational Genomics of Blood Disorders (FTG) (July 6 – 21, 2015)

Location: Georgia Regents University, Augusta, Georgia
PI: Betty Pace, MD

HBCU-PRIDE (June 21 – July 1, 2015)

Location:University of Mississippi Medical Center, Jackson, Mississippi
PI: Bettina M. Beech, DrPH, MPH; Keith C. Norris, MD, PhD

Research in Implementation Science for Equity (RISE) (July 27 – August 7, 2015)

Location: The UCSF Center for Vulnerable Populations at San Francisco General Hospital, San Francisco, California

PI: Kirsten Bibbins-Domingo, PhD, MD, MAS; Alicia Fernandez, MD; Margaret Handley, PhD, MPH

Programs typically are all expenses paid including travel, meals, housing, and tuition. Contact the program of interest for details. Mentees can apply to more than one program, but may accept only one.

If you know of colleagues or program alumni at the junior faculty level who would benefit from this innovative research training and mentorship opportunity, we urge you to encourage them to Apply.

We would appreciate your help in getting the word out …

· Forward this message to appropriate faculty advisors and colleagues.

· Print and post the program flyer in a common location.

· Encourage eligible junior faculty to consider this

Thursday, December 18, 2014

NGS method, RNA seq

Method in Lei 2013, Gene, Diminishing returns in next-generation sequencing (NGS)

transcriptome data.

The sequencing files downloaded from NCBI SRA database were initially

converted from SRA format to FASTQ format using SRA toolkit

(http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software). Then, the

raw data were filtered using the following criteria: (1) the number of

unknown bases (N) was no more than two for each read; and (2) the

fraction of low quality sites (Q b 5) was no more than 50% for each

read. The data that passed this quality control were then used to map

back to their respective genome sequences using bowtie2 (Langmead

and Salzberg, 2012). Only uniquely mapped reads with no more than

two mismatches were retained for further analysis. After mapping, the

counts for each gene were summarized using HTSeq (http://wwwhuber.

embl.de/users/anders/HTSeq/doc/overview.html). In the simulation,

a predetermined-sized subset of reads was randomly selected

from the original file. Using the samemapping procedure as mentioned

above, the RPKM for each gene and depth of coverage were calculated

and comparedwith those fromoriginal data. In-house Perl and R scripts

were developed for data analysis and graphing (available upon request).

Wednesday, December 17, 2014

convert read-only pdf to modifiable pdf

I saved pdf/A file to postscript.

I then "ps2pdf" and generated a new pdf, which can be annotated.

List of student summer program, 2015 summer

CDC,
http://www.kennedykrieger.org/professional-training/professional-training-programs/rise-programs/mchc-rise-up

HBCU PRIDE
http://hongqinlab.blogspot.com/2014/12/hbcu-pride.html

FHCRC
http://www.fhcrc.org/en/education-training/undergraduate-students.html

toread, pan-cancer network, somatic mutaitons

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes, nature genetics, 2014

reciprocity and power-law network, TOREAD

To read.

http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf

This is paper is related to my network aging and network configuration.

Tuesday, December 16, 2014

Cancer, RNA binding protein, proliferation state

Proteins drive cancer cells to change states

When RNA-binding proteins are turned on, cancer cells get locked in a proliferative state.

http://newsoffice.mit.edu/2014/proteins-make-cancer-cells-change-states-1215

Spelman College GPA calculation

Monday, December 15, 2014

How children learn math

How children learn math

http://www.dailymail.co.uk/sciencetech/article-2727268/Peek-brain-shows-kids-learn-math-skills.html

http://www.education.com/reference/article/how-children-learn-mathematics/

Liu & Chen, 2012, Protein Cell, Proteome-wide prediction of protein-protein interactions from high-throughput data.

2012 Jul;3(7):508-20. doi: 10.1007/s13238-012-2945-1. Epub 2012 Jun 22.

Proteome-wide prediction of protein-protein interactions from high-throughput data.

Liu ZP¹, Chen L.

http://www.ncbi.nlm.nih.gov/pubmed/22729399

Good Review on protein/gene network study

Sunday, December 14, 2014

BIO233 final grade calculation

#This is file "gradebio233,20141214.R"

require(xlsx)
rm(list=ls())

list.files()
# tb = read.csv("201409-62376-01BIO233 Grades 20141209c.csv")
tb = read.csv("201409-62376-01BIO233 Grades 20141214-a.csv")
#The - signs have to be replaced with zeros in textwrangler

empty.columns= NULL
for (j in 8:length(tb[1,])){
#for( i in 1:length(tb[,1])){
# if( tb[i,j]=='-') {tb[i,j]=NA }
#}
tb[,j] = as.numeric( tb[,j])
tb[is.na(tb[,j]),j] = 0
if( max(tb[,j])==0 ) { empty.columns = c(empty.columns, j)}
}
str(tb)
tb2 = tb[, - empty.columns]
#tb2 = tb2[, -"Course.total"]
#tb2 = tb2[, - grep('Spring', names(tb2))]
#tb2 = tb2[, - grep('spring', names(tb2))]
tb2 = tb2[, -grep("Quiz.Retake..Fall.2014.Exam.1..part.2..online.part", names(tb2))]
names(tb2)

examColumns = names(tb)[grep("xam", names(tb))]
exam1 = c( "Quiz.Exam1.Part1..Fall.2014" ,
"Assignment.Exam1..part2..calculation.questions..Fall.2014" ,
"Quiz.Fall.2014.Exam.1..part.2..online.part"
)
exam2= c("Quiz.Exam.2..closed.book.section..Fall.2014..Thursday",
"Quiz.Exam2..open.book.section..Fall.2014..Tuesday")
exam3=c("Quiz.Exam3..closed.book.section..Nov.20..2014",
"Quiz.Exam.3..open.book.section..Fall.2014" )
final = c( "Quiz.Final.Exam..Open.book.section..Fall2014..Dec.9..11am.13.00",
"Quiz.Closed.book.section.of.final.exam..Dec.9..2014..10.30am.12.30pm")

tb2[,final]
names(tb)[grep("inal", names(tb))]

report= tb2[,1:2]
report$Exam1 = apply( tb2[,exam1], 1, sum)
report$Exam2 = apply( tb2[,exam2], 1, sum)
report$Exam3 = apply( tb2[,exam3], 1, sum)
report$Final = apply( tb2[,final], 1, sum)

practical = names(tb2)[grep("ractical", names(tb2))]
report$ToTpractical = (tb2[,"Assignment.Practical.Exam..microscope.and.morphology..Sep.29..2014"]/10
+ tb2[,"Assignment.Streak.plate..practical.exam"])/4

### do find out assignments and chapter quiz
#scale lap report were posted twice
scale = c("Quiz.Lab.assignment..Scale.of.Microbes",
"Quiz.scale.of.microbes..lab.report")
report$scale= apply( tb2[, scale],1, max)

# chapter homework can be found with "Quiz" or "Chapter". The names should be consisteny!!!
names(tb2)[grep("Chapter", names(tb2))]

report$ch1 = tb2[, grep("Chapter.1",names(tb2))]
report$ch2 = tb2[, grep("Chapter\\.2",names(tb2))] ##.2 can match 32
report$ch3= apply( tb2[, grep("Chapter.3",names(tb2))],1, max)
report$ch4= apply( tb2[, grep("Chapter.4",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.5",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.6",names(tb2))],1, max)
report$ch7= apply( tb2[, grep("Chapter.7",names(tb2))],1, max)
report$ch8= apply( tb2[, grep("Chapter.8",names(tb2))],1, max)
report$ch9= apply( tb2[, grep("Chapter9",names(tb2))],1, max)
report$ch10= apply( tb2[, grep("Chapter10",names(tb2))],1, max)
report$ch16= apply( tb2[, grep("Chapter16",names(tb2))],1, max)
report$ch32= apply( tb2[, grep("Chapter32",names(tb2))],1, max)

#misc assignment and lab reports, which can be quiz or assignments
names(tb2)[grep("ment", names(tb2))]

misc= c( "Assignment.Serial.dilution.lab.group.report" ,
"Quiz.DePaepeTaddei.Reading.Assignment" ,
"Assignment.Pictures.for.microbes.on.campus.by.groups" ,
"Quiz.Lab.assignment..Scale.of.Microbes" ,
"Quiz.Lab.assignment..E.coli.genome.studies" ,
"Assignment.Report.for.Gram.stain.lab..individual.report." ,
"Assignment.Homework.for.Dr..Wenzhi.Li.s.lecture..Individual.effort.",
"Assignment.homework.on.circulating.tumor.DNA" )

tb2[, "Assignment.homework.on.circulating.tumor.DNA" ] = tb2[, "Assignment.homework.on.circulating.tumor.DNA" ]/10
tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."] =tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."]/10
tb2[1:5, misc]

report$misc= apply( tb2[, misc],1, sum)

assignAndLab =c("scale","misc","ch1","ch2","ch3","ch4","ch5","ch7","ch8", "ch9","ch10","ch16","ch32")
report$ToTassignAndLab = apply( report[,assignAndLab], 1, sum)
maxS = apply( report[, assignAndLab], 2, max)
report$ToTassignAndLab = 15*report$ToTassignAndLab / sum(maxS)
## end of assignment and lab reports

#attendence
list.files()
att.tb= read.csv( "201409-62376-01BIO233_Attendances_2014129-1734.csv")
att.tb$ToTAttendence = apply( att.tb[, 6:33], 1, sum)
str(att.tb)
hist(att.tb$ToTAttendence, br=20)
report$ToTAttendence = att.tb$ToTAttendence[match(report$Last.name, att.tb$Last.name)]
report$ToTAttendence = report$ToTAttendence*5/ max(report$ToTAttendence)

# take best 2 regular exam and the final
report$badExam = apply(report[,c("Exam1","Exam2", "Exam3")], 1, min)
report$ExamTot = (report$Exam1 + report$Exam2 + report$Exam3 + report$Final - report$badExam) / 3

head(report)

# bonus points, need to add R bonus points
names(tb2)[grep("onus", names(tb2))]
bonus = c("Assignment.Bonus.points.of.paper.presentations.and.volunteering" ,
"Assignment.Bonus.Problem.1..Flow.cytometer.data.analysis.1" ,
"Assignment.Bonus.problem.2..Cholera.data.simulation.in.R.1" )
report$bonus = apply( tb2[,bonus], 1, sum)

# oral
report$oral = tb2[,"Assignment.Oral.presentation.grades..fall.2014"]

#written report
report$written = tb2$WrittenReport

FinalGrades= c("ExamTot","ToTpractical","ToTassignAndLab", "ToTAttendence", 'bonus', 'oral', "written")
report[,FinalGrades]

report$FinalGrade= apply(report[,FinalGrades], 1, sum)
hist(report$FinalGrade, br=20)

grade2letter = function(x){
if(x>94){ ret='A'
}else if (x >90) { ret='A-'
}else if (x >87 ){ ret = 'B+'
}else if (x > 84){ ret = 'B'
}else if (x >80){ ret = 'B-'
}else if (x > 76){ ret = 'C+'
}else if (x > 70){ ret = 'C'
}else if (x > 67){ ret = 'C-'
}else if (x > 64){ ret = 'D+'
}else if (x > 60){ ret = 'D'
}else { ret = 'F'
}
return (ret)
}
grade2letter(70); grade2letter(88)
report$letter = lapply(report$FinalGrade, grade2letter)

write.xlsx(report, "bio233FinalGradesFall20141214-a.xlsx")

#generate a sorted report
report.sorted = report[order(report$FinalGrade),]
write.xlsx(report.sorted, "bio233FinalGradesFall20141214-a-sorted.xlsx")