http://detexify.kirelabs.org/classify.html
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Tuesday, December 30, 2014
Monday, December 29, 2014
orthogonal projection, orthogonality
orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u
U has orthonormal columns if and only if U^T U = I
y^hat = \frac{y \dot u}{u \dot u } u
U has orthonormal columns if and only if U^T U = I
inner product of vectors, dot products, orthogonality
$u \dot v = u^T v$
$u \dot v = v \dot u $
length (or norm) of vector $v$ is the square root of its inner product. This can be seen from the v [a,b], whose length(norm) is sqrt(a^2 + b^2)
u \dot v = ||u|| ||v|| cos \theta
||u-v|| = ||u||^2 + ||v||^2 - 2||u|| ||v|| cos\theta
Two vector $u$ and $v$ are orthogonal if and only if $u \dot v = 0$.
orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u
Orthogonal projection of a point y to W space with {u1, u2, ... up} basis can be found by orthogonal projections on each base vector, u1, u2, ..., u_p.
$u \dot v = v \dot u $
length (or norm) of vector $v$ is the square root of its inner product. This can be seen from the v [a,b], whose length(norm) is sqrt(a^2 + b^2)
u \dot v = ||u|| ||v|| cos \theta
||u-v|| = ||u||^2 + ||v||^2 - 2||u|| ||v|| cos\theta
Two vector $u$ and $v$ are orthogonal if and only if $u \dot v = 0$.
U has orthonormal columns if and only if U^T U = I
orthogonal projection of vector y to u:
y^hat = \frac{y \dot u}{u \dot u } u
Orthogonal projection of a point y to W space with {u1, u2, ... up} basis can be found by orthogonal projections on each base vector, u1, u2, ..., u_p.
diagonal matrix of essential genes in network aging model
diagonal matrix of essential genes in network aging model
A^k = P D P^-1
If the diagonal matrix contains only the number of links of essential genes, its decaying might be easily computed numerically.
Diagonalization of A can be found through eigen values and eigen vectors.
A^k = P D P^-1
If the diagonal matrix contains only the number of links of essential genes, its decaying might be easily computed numerically.
Diagonalization of A can be found through eigen values and eigen vectors.
Sunday, December 28, 2014
toread, interaction based discovery of cancer genes
Nucleic Acids Res. 2014 Feb;42(3):e18. doi: 10.1093/nar/gkt1305. Epub  2013 Dec 19.
Interaction-based discovery of functionally important genes in cancers.
http://www.ncbi.nlm.nih.gov/pubmed/24362839
Saturday, December 27, 2014
SQLite3 code on rls.db
file 'test_rls.sql'
.open rls.db
.databases
.tables
.separator ::
.headers on
.mode column
select distinct experiment from result_experiment limit 20;
.indices
.indices set
.width 5
select * from result limit 1;
/* The following select can take rls and its reference rls */
select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean
from result limit 2;
/* The fields of set_name and set_genotype sometimes provide the ORF-name pair, but there are many exceptions. */
.open rls.db
.databases
.tables
.separator ::
.headers on
.mode column
select distinct experiment from result_experiment limit 20;
.indices
.indices set
.width 5
select * from result limit 1;
/* The following select can take rls and its reference rls */
select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean
from result limit 2;
/* The fields of set_name and set_genotype sometimes provide the ORF-name pair, but there are many exceptions. */
mysql tips
MYSQL tips "mysql.txt" file
show tables like "h%";
select * form someTable into outfile "/tmp/tmpfile";
create temporary table tmptab select distinct id1 from sampleTab1 UNION ALL
select distinct id2 from sampleTab2;
grant ALL on homo_sapiens_core_17_33.* to hqin@localhost;
SUBSTRING(str,pos,len)
SUBSTRING(str FROM pos FOR len)
MID(str,pos,len)
#Returns a substring len characters long from string str, starting at position pos.
#The variant form that uses FROM is SQL-92 syntax:
mysql> SELECT SUBSTRING('Quadratically',5,6);
-> 'ratica'
mysqldump test name --no-data --no-create-db > tmp.dump
mysqlimport -u root -h shanghai hong_database *.txt.table
/* try left, inner, outer join to see what's missing */
mysql> create temporary table bader2gu
-> select orf, Name1
-> from curagenOrf2name left join Ks_Ka_Yeast_Ca
-> on curagenOrf2name.orf = Ks_Ka_Yeast_Ca.Name1;
Query OK, 6268 rows affected (54.27 sec)
Records: 6268 Duplicates: 0 Warnings: 0
mysql> select * from bader2gu where Name1 is NULL;
/* return 4313 rows */
mysql> select * from bader2gu where Name1 is not NULL;
/* return 1955 rows. Note, one record is missing probably
due to different annotations bw curagen and the public release from SGD
*/
show tables like "h%";
select * form someTable into outfile "/tmp/tmpfile";
create temporary table tmptab select distinct id1 from sampleTab1 UNION ALL
select distinct id2 from sampleTab2;
grant ALL on homo_sapiens_core_17_33.* to hqin@localhost;
SUBSTRING(str,pos,len)
SUBSTRING(str FROM pos FOR len)
MID(str,pos,len)
#Returns a substring len characters long from string str, starting at position pos.
#The variant form that uses FROM is SQL-92 syntax:
mysql> SELECT SUBSTRING('Quadratically',5,6);
-> 'ratica'
mysqldump test name --no-data --no-create-db > tmp.dump
mysqlimport -u root -h shanghai hong_database *.txt.table
/* try left, inner, outer join to see what's missing */
mysql> create temporary table bader2gu
-> select orf, Name1
-> from curagenOrf2name left join Ks_Ka_Yeast_Ca
-> on curagenOrf2name.orf = Ks_Ka_Yeast_Ca.Name1;
Query OK, 6268 rows affected (54.27 sec)
Records: 6268 Duplicates: 0 Warnings: 0
mysql> select * from bader2gu where Name1 is NULL;
/* return 4313 rows */
mysql> select * from bader2gu where Name1 is not NULL;
/* return 1955 rows. Note, one record is missing probably
due to different annotations bw curagen and the public release from SGD
*/
SQLite 3, osX, byte, rls.db
Reference: http://www.sqlite.org/cli.html
#I want to install SQLite to load 'rls.db'. 
$ sudo port install sqlite3
#OK
#how to load 'rls.db' ?
#Notes, field 'experiments' in 'result' maybe used to find the in-experiment wildtype controls.
# Ken once suggested that "pooled by" column?? file, genotype, mixed
# set lifespan
# ref lifespan
select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean
from result limit 2;
#how to load 'rls.db' ?
$ sqlite3 
SQLite version 3.8.7.4 2014-12-09 01:34:36
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open rls.db
sqlite> .databases
seq  name             file                                                      
---  ---------------  ----------------------------------------------------------
0    main             /Users/hqin/projects/0.network.aging.prj/4.svm/rls.db     
sqlite> .tables
build_log           genotype_pubmed_id  result_experiment   set               
cross_mating_type   meta                result_ref          yeast_strain      
cross_media         result              result_set 
sqlite> .indices
build_log_filename
cross_mating_type_background
cross_mating_type_genotype
cross_mating_type_locus_tag
cross_mating_type_media
cross_mating_type_temperature
cross_media_background
cross_media_genotype
cross_media_locus_tag
cross_media_mating_type
cross_media_temperature
genotype_pubmed_id_genotype
genotype_pubmed_id_pubmed_id
meta_name
result_experiment_experiment
result_experiment_result_id
result_percent_change
result_pooled_by
result_ranksum_p
result_ref_background
result_ref_genotype
result_ref_locus_tag
result_ref_mating_type
result_ref_media
result_ref_name
result_ref_result_id
result_ref_set_id
result_ref_strain
result_ref_temperature
result_set_background
result_set_genotype
result_set_lifespan_mean
result_set_locus_tag
result_set_mating_type
result_set_media
result_set_name
result_set_result_id
result_set_set_id
result_set_strain
result_set_temperature
set_experiment
set_media
set_name
set_strain
set_temperature
yeast_strain_background
yeast_strain_genotype_short
yeast_strain_genotype_unique
yeast_strain_mating_type
yeast_strain_name
yeast_strain_owner
sqlite> select distinct experiment from result_experiment limit 20;
experiment
1
100
101
102_plate115
103
104
105
106_plate116
107
108_plate117
... 
sqlite> .separator :::
sqlite> select * from result limit 2;
id:::experiments:::set_name:::set_strain:::set_background:::set_mating_type:::set_locus_tag:::set_genotype:::set_media:::set_temperature:::set_lifespan_start_count:::set_lifespan_count:::set_lifespan_mean:::set_lifespan_stdev:::set_lifespans:::ref_name:::ref_strain:::ref_background:::ref_mating_type:::ref_locus_tag:::ref_genotype:::ref_media:::ref_temperature:::ref_lifespan_start_count:::ref_lifespan_count:::ref_lifespan_mean:::ref_lifespan_stdev:::ref_lifespans:::percent_change:::ranksum_u:::ranksum_p:::pooled_by
1:::127:::BY4741:::KK19:::BY4741:::MATa::::::BY4741:::YPD:::30.0:::20:::20:::30.3:::7.526095:::23,26,34,31,22,37,26,39,22,36,38,24,36,40,26,38,38,17,34,19:::BY4742:::DH502:::BY4742:::MATalpha::::::BY4742:::YPD:::30.0:::40:::40:::29.625:::8.279377:::36,26,15,28,16,44,40,28,25,32,24,29,39,37,30,31,14,17,29,28,44,27,38,29,26,39,38,32,34,33,32,38,16,28,31,11,20,39,30,32:::2.278481:::409.0:::0.8916505557143:::file
2:::127:::ymr226c:::DC:4G4:::BY4741:::MATa::::::tma29:::YPD:::30.0:::20:::20:::27.1:::11.702:::24,11,37,32,41,38,12,11,31,23,39,36,22,19,28,36,24,49,24,5:::BY4741:::KK19:::BY4741:::MATa::::::BY4741:::YPD:::30.0:::20:::20:::30.3:::7.526095:::23,26,34,31,22,37,26,39,22,36,38,24,36,40,26,38,38,17,34,19:::-10.56106:::169.5:::0.4163969339623:::file#Notes, field 'experiments' in 'result' maybe used to find the in-experiment wildtype controls.
# Ken once suggested that "pooled by" column?? file, genotype, mixed
# set lifespan
# ref lifespan
select experiments,set_name,set_strain,set_background,set_genotype,
set_lifespan_mean,ref_genotype,ref_lifespan_mean
from result limit 2;
Wednesday, December 24, 2014
toread, quality control of inner nuclear membrane proteins by the Asi complex
Science. 2014 Nov 7;346(6210):751-5. doi: 10.1126/science.1255638. Epub 2014 Sep 18.
Quality control of inner nuclear membrane proteins by the Asi complex.
http://www.ncbi.nlm.nih.gov/pubmed/25236469
comments
http://www.ncbi.nlm.nih.gov/pubmed/25315269
http://www.ncbi.nlm.nih.gov/pubmed/25378608
Tuesday, December 23, 2014
toread, power law network paper
http://www.ncbi.nlm.nih.gov/pubmed/25520244
big data sets, free
http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free
toread, Distinguishing cause from effect using observational data: methods and benchmarks Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf
http://arxiv.org/abs/1412.3773
Monday, December 22, 2014
CITI training
I spent 70 minutes (9:30-10:40) on CITI training, with 93% final score (1 wrong due to mis-clicking)
CITI refresher course reading materials, URLs
- If you have not read the Belmont Report yet, please review this document and/or copy it for future reference. (Close the new browser window to return here.)
Links to Ethical Codes and Regulations of Human Subjects in Research.
- CLIA - Clinical Laboratory Improvement Amendments
- Title 21 Code of Federal Regulations (21 CFR Part 11) Electronic Records; Electronic Signatures.
svm project, pca() repalaced by princomp() 20141222
updated file '040610.scmd.Ka.fitness.R'
'pca' package from old code 040610.scmd.Ka.fitness.R does not exist in R 3.x anymore. I switched to princomp() in the base package.
TODO: check the predicted long-lived strains in the Kaeberlein database.
'pca' package from old code 040610.scmd.Ka.fitness.R does not exist in R 3.x anymore. I switched to princomp() in the base package.
TODO: check the predicted long-lived strains in the Kaeberlein database.
study notes on PCA, principal components, with R testing codes, princomp()
ResearchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.
37:50 in Ng's video
https://www.youtube.com/watch?v=ey2PE5xi9-A
Ng showes to PCA (linear combinations of raw data) to reduce dimension of data.
PCA is basically orthogonal transformation
http://en.wikipedia.org/wiki/Principal_component_analysis
"Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components."
"PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)."
R: princomp( )
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
pc.cr$scores[1:5, ] #scores probably are PCA results, based on wikipedia entry
I can use examples of linear combination to verify my guess.
#######start of the R testing code and results #########
x1 = rnorm(100)
x2 = rnorm(100)
x3 = x1 + x2 + rnorm(100)/20
x4 = 2*x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2,x3,x4))
pc <- princomp(X)
plot(pc)#only two major components, consistent
#######start of the R testing code and results #########
#######start of 2nd R testing code and results #########
37:50 in Ng's video
https://www.youtube.com/watch?v=ey2PE5xi9-A
Ng showes to PCA (linear combinations of raw data) to reduce dimension of data.
PCA is basically orthogonal transformation
http://en.wikipedia.org/wiki/Principal_component_analysis
"Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components."
"PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)."
R: princomp( )
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
pc.cr$scores[1:5, ] #scores probably are PCA results, based on wikipedia entry
I can use examples of linear combination to verify my guess.
#######start of the R testing code and results #########
x1 = rnorm(100)
x2 = rnorm(100)
x3 = x1 + x2 + rnorm(100)/20
x4 = 2*x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2,x3,x4))
pc <- princomp(X)
plot(pc)#only two major components, consistent
#######start of the R testing code and results #########
#######start of 2nd R testing code and results #########
set.seed(2014)
x1 = rnorm(100)
x2 = x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2))
pc <- princomp(X, cor = TRUE)
head(pc)
pc$score[,1] - (0.707*x1 + 0.707*x2) #does this approach zero? 
summary( pc$score[,1] - (0.707*x1 + 0.707*x2) + mean( 0.707*x1 + 0.707*x2 ) )
#good, it approaches zero
Useful Unix / Linux shell commands
cat tmp.txt | sed s/CREATE/DROP/
who | cut -c1-8 | sort | uniq | nl
cat /usr/local/apache2/logs/access_log | grep 128\.135 | cut -c1-16 | uniq
ps -ef | grep nohup | cut -c53-57 | sort | uniq | nl
/sbin/shutdown -r now ?
lsof
/etc/rc.local # system startup configuration
grep CREATE ensembl_mart_16_1.sql | sed s/CREATE/DROP/ | sed s/\(/\;/ > $HOME/trim_mart.sql
ls enc.* | sed "s/^/\"/" | sed "s/$/\"\,/"
health disparity gene expression datasets, collection, data resources
Differential endothelial cell gene expression by African Americans versus Caucasian Americans: A possible contribution to health disparity in vascular disease and cancer
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22688
http://www.ncbi.nlm.nih.gov/pubmed/21223544
PLoS One. 2008 Aug 6;3(8):e2847. doi: 10.1371/journal.pone.0002847.
Gene expression and functional studies of the optic nerve head astrocyte transcriptome from normal African Americans and Caucasian Americans donors.
Miao H1, Chen L, Riordan SM, Li W, Juarez S, Crabb AM, Lukas TJ, Du P, Lin SM, Wise A, Agapova OA, Yang P, Gu CC, Hernandez MR.
http://www.ncbi.nlm.nih.gov/pubmed/18716680
Genome Biol. 2008;9(7):R111. doi: 10.1186/gb-2008-9-7-r111. Epub 2008 Jul 9.
Susceptibility to glaucoma: differential comparison of the astrocyte transcriptome from glaucomatous African American and Caucasian American donors.
http://www.ncbi.nlm.nih.gov/pubmed/18613964
Physiol Genomics. 2011 Jul 14;43(13):836-43. doi: 10.1152/physiolgenomics.00243.2010. Epub 2011 Apr 26.
Gene expression variation between African Americans and whites is associated with coronary artery calcification: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/21521779
review, subclinical coronalry atherosclerosis, racial profiling is necessary
http://www.ncbi.nlm.nih.gov/pubmed/17070140
General Cardiovascular Risk Profile identifies advanced coronary artery calcium and is improved by family history: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/20160201
J Transl Med. 2013 Oct 1;11:239. doi: 10.1186/1479-5876-11-239.
Quantitative proteomic analysis in HCV-induced HCC reveals sets of proteins with potential significance for racial disparity.
Dillon ST, Bhasin MK, Feng X, Koh DW, Daoud SS.
Exerc Sport Sci Rev. 2013 Jan;41(1):44-54. doi: 10.1097/JES.0b013e318279cbbd.
Are there race-dependent endothelial cell responses to exercise?
Brown MD1, Feairheller DL.
http://www.ncbi.nlm.nih.gov/pubmed/23262464
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22688
http://www.ncbi.nlm.nih.gov/pubmed/21223544
PLoS One. 2008 Aug 6;3(8):e2847. doi: 10.1371/journal.pone.0002847.
Gene expression and functional studies of the optic nerve head astrocyte transcriptome from normal African Americans and Caucasian Americans donors.
Miao H1, Chen L, Riordan SM, Li W, Juarez S, Crabb AM, Lukas TJ, Du P, Lin SM, Wise A, Agapova OA, Yang P, Gu CC, Hernandez MR.
http://www.ncbi.nlm.nih.gov/pubmed/18716680
Genome Biol. 2008;9(7):R111. doi: 10.1186/gb-2008-9-7-r111. Epub 2008 Jul 9.
Susceptibility to glaucoma: differential comparison of the astrocyte transcriptome from glaucomatous African American and Caucasian American donors.
http://www.ncbi.nlm.nih.gov/pubmed/18613964
Physiol Genomics. 2011 Jul 14;43(13):836-43. doi: 10.1152/physiolgenomics.00243.2010. Epub 2011 Apr 26.
Gene expression variation between African Americans and whites is associated with coronary artery calcification: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/21521779
review, subclinical coronalry atherosclerosis, racial profiling is necessary
http://www.ncbi.nlm.nih.gov/pubmed/17070140
General Cardiovascular Risk Profile identifies advanced coronary artery calcium and is improved by family history: the multiethnic study of atherosclerosis.
http://www.ncbi.nlm.nih.gov/pubmed/20160201
J Transl Med. 2013 Oct 1;11:239. doi: 10.1186/1479-5876-11-239.
Quantitative proteomic analysis in HCV-induced HCC reveals sets of proteins with potential significance for racial disparity.
Dillon ST, Bhasin MK, Feng X, Koh DW, Daoud SS.
http://www.ncbi.nlm.nih.gov/pubmed/24283668
Exerc Sport Sci Rev. 2013 Jan;41(1):44-54. doi: 10.1097/JES.0b013e318279cbbd.
Are there race-dependent endothelial cell responses to exercise?
Brown MD1, Feairheller DL.
http://www.ncbi.nlm.nih.gov/pubmed/23262464
Sunday, December 21, 2014
Braunewell Bornholdt, 2007, Superstability of the yeast cell-cycle dynamics
[PB07JTB] J Theor Biol. 2007 Apr 21;245(4):638-43. Epub  2006 Nov 21. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity
In their 2009 JTB paper, the author cited a measure of reliability in this 07JTB paper. I searched the entire paper for reliability, but did find one hit in the abstract. In the main text, the author mentioned  "stability of the systems under strong noise", termed "stability criterion" (basically robustness or reliability. Based on its explanation below, this is a rather context-specific criterion. 
It seems that PB07 and PB09 are based on the Li04PNAS paper, a boolean network model on yeast cell cycle. 
Braunewell and Bornholdt, 2009, reliability of network
PB09JTB
reliability of attractors
boolean network dynamics
See also
investigate the interplay of topological structure and dynamical robustness.
boolean network dynamics
The reliability criteriont was used to show the robustness of the yeast cell-
cycle dynamics against timing perturbations (Braunewell and Bornholdt, 2007) 
See also
HBCU-PRIDE,
Help us get the word out about PRIDE by forwarding the below to your institution, colleagues, & organizations you may be a member of and posting on Linked In & Facebook.  We appreciate your assistance!
The PRIDE Summer Institute Programs to Increase Diversity Among Individuals Engaged in Health-Related Research are now accepting applications. Space is limited for the 2015 mentored summer training programs so Apply early!
Who: Eligible applicants are junior-level faculty or scientists from minority groups that are under-represented in the biomedical or health sciences, and are United States Citizens or Permanent Residents. Research interests should be compatible with those of the National Heart, Lung, and Blood Institute (NHLBI) in the prevention and treatment of heart, lung, blood, and sleep (HLBS) disorders. 
What: Seven unique Summer Institute programs with intensive mentored training opportunities to enhance the research skills and to promote the scientific and career development of trainees. Trainees will learn effective strategies for preparing, submitting and obtaining external funding for research purposes, including extensive tips on best practices. Research emphasis varies by program.
Where/When (Dates subject to change.  Verify on website): 
- Location:Arizona Health Sciences Center, University of Arizona, Tucson, Arizona
- PI: Joe G.N. “Skip” Garcia, MD; Francisco Moreno, MD
- Location: NYU Langone Medical Center, New York, New York
- PI: Girardin Jean-Louis, PhD
- Location: Washington University in St. Louis, St. Louis, Missouri
- PI: D.C. Rao, PhD; Victor Davila-Roman, MD
- Location: SUNY Downstate Medical College, New York, New York
- PI: Mohamed Boutjdir, PhD
- Location: Georgia Regents University, Augusta, Georgia
- PI: Betty Pace, MD
HBCU-PRIDE (June 21 – July 1, 2015)
- Location:University of Mississippi Medical Center, Jackson, Mississippi
- PI: Bettina M. Beech, DrPH, MPH; Keith C. Norris, MD, PhD
- Location: The UCSF Center for Vulnerable Populations at San Francisco General Hospital, San Francisco, California
- PI: Kirsten Bibbins-Domingo, PhD, MD, MAS; Alicia Fernandez, MD; Margaret Handley, PhD, MPH
Programs typically are all expenses paid including travel, meals, housing, and tuition. Contact the program of interest for details. Mentees can apply to more than one program, but may accept only one.
If you know of colleagues or program alumni at the junior faculty level who would benefit from this innovative research training and mentorship opportunity, we urge you to encourage them to Apply.
We would appreciate your help in getting the word out …
·         Forward this message to appropriate faculty advisors and colleagues.
·         Print and post the program flyer in a common location.
·         Encourage eligible junior faculty to consider this 
Thursday, December 18, 2014
NGS method, RNA seq
Method in Lei 2013, Gene, Diminishing returns in next-generation sequencing (NGS)
transcriptome data. 
The sequencing files downloaded from NCBI SRA database were initially
converted from SRA format to FASTQ format using SRA toolkit
(http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software). Then, the
raw data were filtered using the following criteria: (1) the number of
unknown bases (N) was no more than two for each read; and (2) the
fraction of low quality sites (Q b 5) was no more than 50% for each
read. The data that passed this quality control were then used to map
back to their respective genome sequences using bowtie2 (Langmead
and Salzberg, 2012). Only uniquely mapped reads with no more than
two mismatches were retained for further analysis. After mapping, the
counts for each gene were summarized using HTSeq (http://wwwhuber.
embl.de/users/anders/HTSeq/doc/overview.html). In the simulation,
a predetermined-sized subset of reads was randomly selected
from the original file. Using the samemapping procedure as mentioned
above, the RPKM for each gene and depth of coverage were calculated
and comparedwith those fromoriginal data. In-house Perl and R scripts
were developed for data analysis and graphing (available upon request).
Wednesday, December 17, 2014
convert read-only pdf to modifiable pdf
I saved pdf/A file to postscript.
I then "ps2pdf" and generated a new pdf, which can be annotated.
I then "ps2pdf" and generated a new pdf, which can be annotated.
List of student summer program, 2015 summer
CDC,
http://www.kennedykrieger.org/professional-training/professional-training-programs/rise-programs/mchc-rise-up
HBCU PRIDE
http://hongqinlab.blogspot.com/2014/12/hbcu-pride.html
FHCRC
http://www.fhcrc.org/en/
http://www.kennedykrieger.org/professional-training/professional-training-programs/rise-programs/mchc-rise-up
HBCU PRIDE
http://hongqinlab.blogspot.com/2014/12/hbcu-pride.html
FHCRC
http://www.fhcrc.org/en/
toread, pan-cancer network, somatic mutaitons
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes, nature genetics, 2014
reciprocity and power-law network, TOREAD
To read.
http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf
This is paper is related to my network aging and network configuration.
http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf
This is paper is related to my network aging and network configuration.
Tuesday, December 16, 2014
Cancer, RNA binding protein, proliferation state
Proteins drive cancer cells to change states
When RNA-binding proteins are turned on, cancer cells get locked in a proliferative state.
Monday, December 15, 2014
How children learn math
How children learn math
http://www.dailymail.co.uk/sciencetech/article-2727268/Peek-brain-shows-kids-learn-math-skills.html
http://www.education.com/reference/article/how-children-learn-mathematics/
http://www.dailymail.co.uk/sciencetech/article-2727268/Peek-brain-shows-kids-learn-math-skills.html
http://www.education.com/reference/article/how-children-learn-mathematics/
Liu & Chen, 2012, Protein Cell, Proteome-wide prediction of protein-protein interactions from high-throughput data.
Protein Cell. 2012 Jul;3(7):508-20. doi: 10.1007/s13238-012-2945-1. Epub  2012 Jun 22.
Proteome-wide prediction of protein-protein interactions from high-throughput data.
Good Review on protein/gene network study
Sunday, December 14, 2014
BIO233 final grade calculation
#This is file "gradebio233,20141214.R"
require(xlsx)
rm(list=ls())
list.files()
# tb = read.csv("201409-62376-01BIO233 Grades 20141209c.csv")
tb = read.csv("201409-62376-01BIO233 Grades 20141214-a.csv")
#The - signs have to be replaced with zeros in textwrangler
empty.columns= NULL
for (j in 8:length(tb[1,])){
#for( i in 1:length(tb[,1])){
# if( tb[i,j]=='-') {tb[i,j]=NA }
#}
tb[,j] = as.numeric( tb[,j])
tb[is.na(tb[,j]),j] = 0
if( max(tb[,j])==0 ) { empty.columns = c(empty.columns, j)}
}
str(tb)
tb2 = tb[, - empty.columns]
#tb2 = tb2[, -"Course.total"]
#tb2 = tb2[, - grep('Spring', names(tb2))]
#tb2 = tb2[, - grep('spring', names(tb2))]
tb2 = tb2[, -grep("Quiz.Retake..Fall.2014.Exam.1..part.2..online.part", names(tb2))]
names(tb2)
examColumns = names(tb)[grep("xam", names(tb))]
exam1 = c( "Quiz.Exam1.Part1..Fall.2014" ,
"Assignment.Exam1..part2..calculation.questions..Fall.2014" ,
"Quiz.Fall.2014.Exam.1..part.2..online.part"
)
exam2= c("Quiz.Exam.2..closed.book.section..Fall.2014..Thursday",
"Quiz.Exam2..open.book.section..Fall.2014..Tuesday")
exam3=c("Quiz.Exam3..closed.book.section..Nov.20..2014",
"Quiz.Exam.3..open.book.section..Fall.2014" )
final = c( "Quiz.Final.Exam..Open.book.section..Fall2014..Dec.9..11am.13.00",
"Quiz.Closed.book.section.of.final.exam..Dec.9..2014..10.30am.12.30pm")
tb2[,final]
names(tb)[grep("inal", names(tb))]
report= tb2[,1:2]
report$Exam1 = apply( tb2[,exam1], 1, sum)
report$Exam2 = apply( tb2[,exam2], 1, sum)
report$Exam3 = apply( tb2[,exam3], 1, sum)
report$Final = apply( tb2[,final], 1, sum)
practical = names(tb2)[grep("ractical", names(tb2))]
report$ToTpractical = (tb2[,"Assignment.Practical.Exam..microscope.and.morphology..Sep.29..2014"]/10
+ tb2[,"Assignment.Streak.plate..practical.exam"])/4
### do find out assignments and chapter quiz
#scale lap report were posted twice
scale = c("Quiz.Lab.assignment..Scale.of.Microbes",
"Quiz.scale.of.microbes..lab.report")
report$scale= apply( tb2[, scale],1, max)
# chapter homework can be found with "Quiz" or "Chapter". The names should be consisteny!!!
names(tb2)[grep("Chapter", names(tb2))]
report$ch1 = tb2[, grep("Chapter.1",names(tb2))]
report$ch2 = tb2[, grep("Chapter\\.2",names(tb2))] ##.2 can match 32
report$ch3= apply( tb2[, grep("Chapter.3",names(tb2))],1, max)
report$ch4= apply( tb2[, grep("Chapter.4",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.5",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.6",names(tb2))],1, max)
report$ch7= apply( tb2[, grep("Chapter.7",names(tb2))],1, max)
report$ch8= apply( tb2[, grep("Chapter.8",names(tb2))],1, max)
report$ch9= apply( tb2[, grep("Chapter9",names(tb2))],1, max)
report$ch10= apply( tb2[, grep("Chapter10",names(tb2))],1, max)
report$ch16= apply( tb2[, grep("Chapter16",names(tb2))],1, max)
report$ch32= apply( tb2[, grep("Chapter32",names(tb2))],1, max)
#misc assignment and lab reports, which can be quiz or assignments
names(tb2)[grep("ment", names(tb2))]
misc= c( "Assignment.Serial.dilution.lab.group.report" ,
"Quiz.DePaepeTaddei.Reading.Assignment" ,
"Assignment.Pictures.for.microbes.on.campus.by.groups" ,
"Quiz.Lab.assignment..Scale.of.Microbes" ,
"Quiz.Lab.assignment..E.coli.genome.studies" ,
"Assignment.Report.for.Gram.stain.lab..individual.report." ,
"Assignment.Homework.for.Dr..Wenzhi.Li.s.lecture..Individual.effort.",
"Assignment.homework.on.circulating.tumor.DNA" )
tb2[, "Assignment.homework.on.circulating.tumor.DNA" ] = tb2[, "Assignment.homework.on.circulating.tumor.DNA" ]/10
tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."] =tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."]/10
tb2[1:5, misc]
report$misc= apply( tb2[, misc],1, sum)
assignAndLab =c("scale","misc","ch1","ch2","ch3","ch4","ch5","ch7","ch8", "ch9","ch10","ch16","ch32")
report$ToTassignAndLab = apply( report[,assignAndLab], 1, sum)
maxS = apply( report[, assignAndLab], 2, max)
report$ToTassignAndLab = 15*report$ToTassignAndLab / sum(maxS)
## end of assignment and lab reports
#attendence
list.files()
att.tb= read.csv( "201409-62376-01BIO233_Attendances_2014129-1734.csv")
att.tb$ToTAttendence = apply( att.tb[, 6:33], 1, sum)
str(att.tb)
hist(att.tb$ToTAttendence, br=20)
report$ToTAttendence = att.tb$ToTAttendence[match(report$Last.name, att.tb$Last.name)]
report$ToTAttendence = report$ToTAttendence*5/ max(report$ToTAttendence)
# take best 2 regular exam and the final
report$badExam = apply(report[,c("Exam1","Exam2", "Exam3")], 1, min)
report$ExamTot = (report$Exam1 + report$Exam2 + report$Exam3 + report$Final - report$badExam) / 3
head(report)
# bonus points, need to add R bonus points
names(tb2)[grep("onus", names(tb2))]
bonus = c("Assignment.Bonus.points.of.paper.presentations.and.volunteering" ,
"Assignment.Bonus.Problem.1..Flow.cytometer.data.analysis.1" ,
"Assignment.Bonus.problem.2..Cholera.data.simulation.in.R.1" )
report$bonus = apply( tb2[,bonus], 1, sum)
# oral
report$oral = tb2[,"Assignment.Oral.presentation.grades..fall.2014"]
#written report
report$written = tb2$WrittenReport
FinalGrades= c("ExamTot","ToTpractical","ToTassignAndLab", "ToTAttendence", 'bonus', 'oral', "written")
report[,FinalGrades]
report$FinalGrade= apply(report[,FinalGrades], 1, sum)
hist(report$FinalGrade, br=20)
grade2letter = function(x){
if(x>94){ ret='A'
}else if (x >90) { ret='A-'
}else if (x >87 ){ ret = 'B+'
}else if (x > 84){ ret = 'B'
}else if (x >80){ ret = 'B-'
}else if (x > 76){ ret = 'C+'
}else if (x > 70){ ret = 'C'
}else if (x > 67){ ret = 'C-'
}else if (x > 64){ ret = 'D+'
}else if (x > 60){ ret = 'D'
}else { ret = 'F'
}
return (ret)
}
grade2letter(70); grade2letter(88)
report$letter = lapply(report$FinalGrade, grade2letter)
write.xlsx(report, "bio233FinalGradesFall20141214-a.xlsx")
#generate a sorted report
report.sorted = report[order(report$FinalGrade),]
write.xlsx(report.sorted, "bio233FinalGradesFall20141214-a-sorted.xlsx")
require(xlsx)
rm(list=ls())
list.files()
# tb = read.csv("201409-62376-01BIO233 Grades 20141209c.csv")
tb = read.csv("201409-62376-01BIO233 Grades 20141214-a.csv")
#The - signs have to be replaced with zeros in textwrangler
empty.columns= NULL
for (j in 8:length(tb[1,])){
#for( i in 1:length(tb[,1])){
# if( tb[i,j]=='-') {tb[i,j]=NA }
#}
tb[,j] = as.numeric( tb[,j])
tb[is.na(tb[,j]),j] = 0
if( max(tb[,j])==0 ) { empty.columns = c(empty.columns, j)}
}
str(tb)
tb2 = tb[, - empty.columns]
#tb2 = tb2[, -"Course.total"]
#tb2 = tb2[, - grep('Spring', names(tb2))]
#tb2 = tb2[, - grep('spring', names(tb2))]
tb2 = tb2[, -grep("Quiz.Retake..Fall.2014.Exam.1..part.2..online.part", names(tb2))]
names(tb2)
examColumns = names(tb)[grep("xam", names(tb))]
exam1 = c( "Quiz.Exam1.Part1..Fall.2014" ,
"Assignment.Exam1..part2..calculation.questions..Fall.2014" ,
"Quiz.Fall.2014.Exam.1..part.2..online.part"
)
exam2= c("Quiz.Exam.2..closed.book.section..Fall.2014..Thursday",
"Quiz.Exam2..open.book.section..Fall.2014..Tuesday")
exam3=c("Quiz.Exam3..closed.book.section..Nov.20..2014",
"Quiz.Exam.3..open.book.section..Fall.2014" )
final = c( "Quiz.Final.Exam..Open.book.section..Fall2014..Dec.9..11am.13.00",
"Quiz.Closed.book.section.of.final.exam..Dec.9..2014..10.30am.12.30pm")
tb2[,final]
names(tb)[grep("inal", names(tb))]
report= tb2[,1:2]
report$Exam1 = apply( tb2[,exam1], 1, sum)
report$Exam2 = apply( tb2[,exam2], 1, sum)
report$Exam3 = apply( tb2[,exam3], 1, sum)
report$Final = apply( tb2[,final], 1, sum)
practical = names(tb2)[grep("ractical", names(tb2))]
report$ToTpractical = (tb2[,"Assignment.Practical.Exam..microscope.and.morphology..Sep.29..2014"]/10
+ tb2[,"Assignment.Streak.plate..practical.exam"])/4
### do find out assignments and chapter quiz
#scale lap report were posted twice
scale = c("Quiz.Lab.assignment..Scale.of.Microbes",
"Quiz.scale.of.microbes..lab.report")
report$scale= apply( tb2[, scale],1, max)
# chapter homework can be found with "Quiz" or "Chapter". The names should be consisteny!!!
names(tb2)[grep("Chapter", names(tb2))]
report$ch1 = tb2[, grep("Chapter.1",names(tb2))]
report$ch2 = tb2[, grep("Chapter\\.2",names(tb2))] ##.2 can match 32
report$ch3= apply( tb2[, grep("Chapter.3",names(tb2))],1, max)
report$ch4= apply( tb2[, grep("Chapter.4",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.5",names(tb2))],1, max)
report$ch5= apply( tb2[, grep("Chapter.6",names(tb2))],1, max)
report$ch7= apply( tb2[, grep("Chapter.7",names(tb2))],1, max)
report$ch8= apply( tb2[, grep("Chapter.8",names(tb2))],1, max)
report$ch9= apply( tb2[, grep("Chapter9",names(tb2))],1, max)
report$ch10= apply( tb2[, grep("Chapter10",names(tb2))],1, max)
report$ch16= apply( tb2[, grep("Chapter16",names(tb2))],1, max)
report$ch32= apply( tb2[, grep("Chapter32",names(tb2))],1, max)
#misc assignment and lab reports, which can be quiz or assignments
names(tb2)[grep("ment", names(tb2))]
misc= c( "Assignment.Serial.dilution.lab.group.report" ,
"Quiz.DePaepeTaddei.Reading.Assignment" ,
"Assignment.Pictures.for.microbes.on.campus.by.groups" ,
"Quiz.Lab.assignment..Scale.of.Microbes" ,
"Quiz.Lab.assignment..E.coli.genome.studies" ,
"Assignment.Report.for.Gram.stain.lab..individual.report." ,
"Assignment.Homework.for.Dr..Wenzhi.Li.s.lecture..Individual.effort.",
"Assignment.homework.on.circulating.tumor.DNA" )
tb2[, "Assignment.homework.on.circulating.tumor.DNA" ] = tb2[, "Assignment.homework.on.circulating.tumor.DNA" ]/10
tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."] =tb2[, "Assignment.Report.for.Gram.stain.lab..individual.report."]/10
tb2[1:5, misc]
report$misc= apply( tb2[, misc],1, sum)
assignAndLab =c("scale","misc","ch1","ch2","ch3","ch4","ch5","ch7","ch8", "ch9","ch10","ch16","ch32")
report$ToTassignAndLab = apply( report[,assignAndLab], 1, sum)
maxS = apply( report[, assignAndLab], 2, max)
report$ToTassignAndLab = 15*report$ToTassignAndLab / sum(maxS)
## end of assignment and lab reports
#attendence
list.files()
att.tb= read.csv( "201409-62376-01BIO233_Attendances_2014129-1734.csv")
att.tb$ToTAttendence = apply( att.tb[, 6:33], 1, sum)
str(att.tb)
hist(att.tb$ToTAttendence, br=20)
report$ToTAttendence = att.tb$ToTAttendence[match(report$Last.name, att.tb$Last.name)]
report$ToTAttendence = report$ToTAttendence*5/ max(report$ToTAttendence)
# take best 2 regular exam and the final
report$badExam = apply(report[,c("Exam1","Exam2", "Exam3")], 1, min)
report$ExamTot = (report$Exam1 + report$Exam2 + report$Exam3 + report$Final - report$badExam) / 3
head(report)
# bonus points, need to add R bonus points
names(tb2)[grep("onus", names(tb2))]
bonus = c("Assignment.Bonus.points.of.paper.presentations.and.volunteering" ,
"Assignment.Bonus.Problem.1..Flow.cytometer.data.analysis.1" ,
"Assignment.Bonus.problem.2..Cholera.data.simulation.in.R.1" )
report$bonus = apply( tb2[,bonus], 1, sum)
# oral
report$oral = tb2[,"Assignment.Oral.presentation.grades..fall.2014"]
#written report
report$written = tb2$WrittenReport
FinalGrades= c("ExamTot","ToTpractical","ToTassignAndLab", "ToTAttendence", 'bonus', 'oral', "written")
report[,FinalGrades]
report$FinalGrade= apply(report[,FinalGrades], 1, sum)
hist(report$FinalGrade, br=20)
grade2letter = function(x){
if(x>94){ ret='A'
}else if (x >90) { ret='A-'
}else if (x >87 ){ ret = 'B+'
}else if (x > 84){ ret = 'B'
}else if (x >80){ ret = 'B-'
}else if (x > 76){ ret = 'C+'
}else if (x > 70){ ret = 'C'
}else if (x > 67){ ret = 'C-'
}else if (x > 64){ ret = 'D+'
}else if (x > 60){ ret = 'D'
}else { ret = 'F'
}
return (ret)
}
grade2letter(70); grade2letter(88)
report$letter = lapply(report$FinalGrade, grade2letter)
write.xlsx(report, "bio233FinalGradesFall20141214-a.xlsx")
#generate a sorted report
report.sorted = report[order(report$FinalGrade),]
write.xlsx(report.sorted, "bio233FinalGradesFall20141214-a-sorted.xlsx")
Subscribe to:
Comments (Atom)
 
















