This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Wednesday, November 27, 2013
Error propagation
http://www.fas.harvard.edu/~scphys/nsta/error_propagation.pdf
http://www.rit.edu/~w-uphysi/uncertainties/Uncertaintiespart2.html
Monday, November 25, 2013
Cb/Cv H2O2-LOH choices
There are two ways to average Cv/Cb: First is to calculate the ratio in each experiment and then average by experiments. Second is to average Cv and Cb by experiment and then calculate the ratio. The second method seems to generate large variance and is therefore more prone to experimental errors.
Because some strains are extremely sensitive to H2O2 so their Cv is very small, which leads to large fluctuations. So, Cv/Cb is more stable, and this is my best choice from the practical perspective.
Because some strains are extremely sensitive to H2O2 so their Cv is very small, which leads to large fluctuations. So, Cv/Cb is more stable, and this is my best choice from the practical perspective.
Friday, November 22, 2013
Renew network connections on Windows
cmd
ipconfig /renew
ipconfig /renew
Thursday, November 21, 2013
Friday, November 15, 2013
Multicollinearity, correlated predictors in regression
Multicollinearity does not influence the prediction of response (y), but it affect the interpretation of individual predictors.
http://en.wikipedia.org/wiki/Multicollinearity
https://onlinecourses.science.psu.edu/stat501/node/82
*** How to generated correlated random numbers with specifed R-squared
R^2 = 1 - SSres / SStot
SSres = sum of (y_obs - fitted)^2
SStot = sum of (y_obs - y_mean)^2
For y = rho * x1 + sqrt(1-rho^2)*x2
rho^2 = 1- (y - x1*rho)^2 / (x2^2)
For standardized random numbers
x1 = rnorm(100)
x2 = rnorm(100)
rho = 0.5
y = rho * x1 + sqrt(1-rho^2)*x2
http://www.sitmo.com/article/generating-correlated-random-numbers/
> set.seed(2014)
> N=500
> x = rnorm(N)
> error = rnorm(N)
> rho = sqrt(0.5)
> y= rho*x + sqrt(1-rho^2)*error #rho is the slope
> summary(lm(y ~ x ))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.85255 -0.47818 0.04374 0.46947 2.17727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.00915 0.03101 -0.295 0.768
x 0.69389 0.03165 21.921 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6915 on 498 degrees of freedom
Multiple R-squared: 0.4911, Adjusted R-squared: 0.49
F-statistic: 480.5 on 1 and 498 DF, p-value: < 2.2e-16
Generate non-standardized random numbers
> set.seed(2014)
> x=rnorm(1000)*4+2
> error = rnorm(1000)
> rho=sqrt(0.5)
> y = rho*(x-2)/4 + sqrt(1-rho^2)*error
> summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.1879 -0.4414 -0.0138 0.4289 2.7802
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.37723 0.02428 -15.53 <2e-16 ***
x 0.18475 0.00546 33.83 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6751 on 998 degrees of freedom
Multiple R-squared: 0.5342, Adjusted R-squared: 0.5338
F-statistic: 1145 on 1 and 998 DF, p-value: < 2.2e-16
>
> y2 = y*4+2*rho
> summary(lm(y2~x))
Call:
lm(formula = y2 ~ x)
Residuals:
Min 1Q Median 3Q Max
-8.7515 -1.7656 -0.0552 1.7155 11.1207
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.09471 0.09713 -0.975 0.33
x 0.73899 0.02184 33.834 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.701 on 998 degrees of freedom
Multiple R-squared: 0.5342, Adjusted R-squared: 0.5338
F-statistic: 1145 on 1 and 998 DF, p-value: < 2.2e-16
Labels:
***,
R,
random numbers,
Rsquared,
star,
statistics
AIC ( preferred model is the one with the lowest AIC value)
Akaike information criterion
AIC = 2k - 2ln(L)
So, the preferred model is the one with the lowest AIC value.
See example at
http://www.r-bloggers.com/linear-regression-with-correlated-data/
From wikipedia,
Correlated regression, Langley example, Faraway
Correlated errors
g <- gls( Employed ~ GNP + Population, correlation=corAR1(form=~Year), data=longley)
Thursday, November 14, 2013
Regression with correlated variance, from R blogger
Regression with correlated variance/error
http://www.r-bloggers.com/linear-regression-with-correlated-data/
http://www.quantumforest.com/wp-content/uploads/2011/10/unemployment.txt
# Loading packages
library(lattice) # Fancy graphics
library(nlme) # Generalized linear mixed models
# Reading data
setwd('~/Dropbox/quantumforest') # Sets default working directory
un = read.csv('nzunemployment.csv', header = TRUE)
# Plotting first youth data and then adding adults
# as time series
with(un,
{
plot(youth ~ q, type = 'l', ylim = c(0,30), col='red',
xlab = 'Quarter', ylab = 'Percentage unemployment')
lines(q, adult, lty=2, col='blue')
legend('topleft', c('Youth', 'Adult'), lty=c(1, 2), col=c('red', 'blue'))
abline(v = 90)
})
# Creating minimum wage policy factor
un$minwage = factor(ifelse(un$q < 90, 'Different', 'Equal'))
# And a scatterplot
xyplot(youth ~ adult, group=minwage, data = un,
type=c('p', 'r'), auto.key = TRUE)
# Linear regression accounting for change of policy
mod1 = lm(youth ~ adult*minwage, data = un)
summary(mod1)
# Centering continuous predictor
un$cadult = with(un, adult - mean(adult))
mod2 = lm(youth ~ cadult*minwage, data = un)
summary(mod2)
plot(mod2) # Plots residuals for the model fit
acf(mod2$res) # Plots autocorrelation of the residuals
# Now we move to use nlme
# gls() is an nlme function when there are no random effects
mod3 = gls(youth ~ cadult*minwage, data = un)
summary(mod3)
# Adding autocorrelation
mod4 = gls(youth ~ cadult*minwage, correlation = corAR1(form=~1), data = un)
summary(mod4)
Student Jiang, 2013Nov 14
Last time we discussed
problems in retrieving genes associated with SNPs that have OMIM IDs.
One of the problem is that some OMIM entries in the OmimVarLocusIdSNP
table do not have genes listed. I just tried searching the genome
database at UCSC using the SNP_id, and got the corresponding gene names.
For example, "188890" is associated with 11 SNPs and the query results are:
rs686 chr5 DRD1
rs4532 chr5 DRD1
rs921451 chr7 DDC
rs1051730 chr15 CHRNA3
rs1451371 chr7 DDC
rs2060762 chr7 DDC
rs3733829 chr19 EGLN2
rs3733829 chr19 RAB4B-EGLN2
rs3735273 chr7 DDC
rs3757472 chr7 DDC
rs4105144 chr19 N/A
rs6474412 chr8 N/A
rs686 chr5 DRD1
rs4532 chr5 DRD1
rs921451 chr7 DDC
rs1051730 chr15 CHRNA3
rs1451371 chr7 DDC
rs2060762 chr7 DDC
rs3733829 chr19 EGLN2
rs3733829 chr19 RAB4B-EGLN2
rs3735273 chr7 DDC
rs3757472 chr7 DDC
rs4105144 chr19 N/A
rs6474412 chr8 N/A
We
got DRD1, DDC, CHRNA3, and EGLN2 (I assume EGLN2 and RAB4B-EGLN2 are
the same), and there are no gene returned for SNP rs4105144 and
rs6474412.
However, on the webpage of "188890" (http://omim.org/entry/188890):
1. The table "Phenotype Gene Relationships" listed several genes/locus, which do not match with the above results.
2.
In the "Mapping" section, rs4105144 and rs6474412 are mentioned, and
are associated with CYP2A6, CYP2B6, CHRNB3 and CHRNA6, but I got no
results when search on UCSC genome database.
"These 2 loci include genes involved in nicotine metabolism, such as CYP2A6 (122720) and CYP2B6 (123930) on 19q13, and nicotinic acetylcholine receptor subunits on 8p11 (CHRNB3; 118508) and CHRNA6 (606888)."
"These 2 loci include genes involved in nicotine metabolism, such as CYP2A6 (122720) and CYP2B6 (123930) on 19q13, and nicotinic acetylcholine receptor subunits on 8p11 (CHRNB3; 118508) and CHRNA6 (606888)."
In this section, EGLN2 gene is also mentioned.
3. In the "Molecular Genetics" section, several genes are described,
Glutamate Transporters: SLC1A2, SLC17A6 and SLC17A7
Dopamine Transporter/Receptors: SLC6A3, DRD2 and DRD1
DOPA Decarboxylase: DDC
Cholinergic Receptors: CHRNA4, CHRNB2, CHRNA2, CHRNB4, NTRK2, CHRNA3, CHRNA5
G Protein-Coupled Receptor-51: GABABR2
Serotonin Transporter: SLC6A4
Taste Receptor Polymorphisms: TAS2R38
Glutamate Transporters: SLC1A2, SLC17A6 and SLC17A7
Dopamine Transporter/Receptors: SLC6A3, DRD2 and DRD1
DOPA Decarboxylase: DDC
Cholinergic Receptors: CHRNA4, CHRNB2, CHRNA2, CHRNB4, NTRK2, CHRNA3, CHRNA5
G Protein-Coupled Receptor-51: GABABR2
Serotonin Transporter: SLC6A4
Taste Receptor Polymorphisms: TAS2R38
I think I am lost on which gene we should choose in our analysis.
Thanks,
Saturday, November 9, 2013
Genolevures, notes
Genolevures and the Saccharomycodtina
Gene orders are apparently randomly shuffled in yeast genomes.
Gene orders are apparently randomly shuffled in yeast genomes.
Friday, November 8, 2013
Sagi, Wolf, Koonin 2012, PlosComBiol, Universal pacemakers of genome evolution
SWK12 focus on 6901 orthologous gene families in 41 archaeal and 59 bacterial genomes, provided in supporting text.
So, if the SWK12 molecular pacemakers occur in yeast, they are very conserved genes. Maybe, I could simulate the aging of the evolutionary core network, and then study how the recent network component change the aging dynamics of the core network. This is a way to see how evolution influence network reliability, i.e., network robustness.
So, if the SWK12 molecular pacemakers occur in yeast, they are very conserved genes. Maybe, I could simulate the aging of the evolutionary core network, and then study how the recent network component change the aging dynamics of the core network. This is a way to see how evolution influence network reliability, i.e., network robustness.
Thursday, November 7, 2013
Wednesday, November 6, 2013
Wilke & Drummond, 2010, protein biophysics and coding sequence evolution
WD10 list a few factors of protein biophysics that can influence dN dS (exluding fuction)
-> Protein structure, solvent exposure, disorder regions (alpha and beta secondary structure are not important)
-> translational selection.
Synonymous codons can alter ribosomal speed and influence protein folding.
-> protein-protein interactions
-> Protein structure, solvent exposure, disorder regions (alpha and beta secondary structure are not important)
-> translational selection.
Synonymous codons can alter ribosomal speed and influence protein folding.
-> protein-protein interactions
Austad 1989, aging of bowl and doily spider
LIFE EXTENSION BY DIETARY RESTRICTION IN THE BOWL AND
DOILY SPIDER, FRONTINELLA PYRAMITELA
Steven N. Austad
1989 Exp Gerontology.
Survival curve of the bowl and doily spider is exponential in the wild, indicating that sth is kiling the spiders regardless of ages. This is commented as non-aging characteristic.
SNPedia, openSNP
http://snpedia.com/index.php/Bulk
A simple table from SNPedia:
http://www.snpedia.com/files/gbrowse/SNPedia.gff
https://opensnp.org/
https://opensnp.org/
OmimVarLocusIdSNP table
Some do not have their gene names listed. For example, the first entry "188890" links to RS_686, but there is no gene name listed. I searched on OMIM. (http://omim.org/entry/188890?search=188890&highlight=188890) Looks like this OMIM ID has 5 phenotypes and links to 4 different genes. So, for those that gene names are not listed, what should we do?
Hmm, it seems OMIM record can be different from other database:
http://snpedia.com/index.php/Rs686
RS686 is clearly linked to DRD1 gene on SNPedia.
Student's and my notes, dbSNP project
I think we are trying to retrieve from dbSNP XML files those SNPs that have clinical significance, i.e. disease-associated SNPs. But I found out that, there are no attributes for "clinical significance", or any OMIM info, in the dbSNP XML files.
Here is what I did:1. In the following site: http://www.ncbi.nlm.nih.gov/snp/limits, choose "22" in "Chromosomes" and "OMIM" in "Annotation", and then search. There are 266 results returned. I take RS_4633 as the example.
I could not found anything related to OMIM or clinical significance in the extracted info.
On http://www.ncbi.nlm.nih.gov/books/NBK44379/#Search.Finding_Records_with_OMIM_DataLin, it says:
"What dbSNP report format will provide both a SNP and its specific OMIM ID number?
Columns of OmimVarLocusIdSNP table are:
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_data/OmimVarLocusIdSNP.bcp.gz
http://www.ncbi.nlm.nih.gov/projects/SNP/docs/rs_attributes.html#clinical
On http://www.ncbi.nlm.nih.gov/books/NBK44379/#Search.Finding_Records_with_OMIM_DataLin, it says:
"What dbSNP report format will provide both a SNP and its specific OMIM ID number?
The easiest way you can get all the SNPs with OMIM links is from Entrez SNP:
- 1.Go to the Entrez SNP site.
- 2.Click on the grey “Limits” tab near the top of the page (just beneath the text search boxes).
- 3.Select the organism you are interested in from the organism list located at the top left of the page.
- 4.Scroll down the page almost to the bottom, until you find the list of “Annotation” limits. Select “OMIM”
- 5.Press the “Go” button located at the top of the page next to the empty text search box, and you will receive a list of your organism’s SNPs with OMIM annotation.
You can also get those SNPs with an OMIM ID number by downloading from the dbSNP FTP site: the OmimVarLocusIdSNP table
contains the information you need for your organisim of interest
(human, in this case). This table is located in your organism’s
organism_data directory on the dbSNP FTP site."
Column definitions for this table are as follows:
ColumnDescription
1omim_id.
2The locus id the SNP is on
3 omim variation id.
4locus symbol
5Amino acid using the contig reference allele.
6Amino acid position in the protein.
7Amino acid of the snp variance.
8var class (used for internal dbSNP processing)
9snp_id (rs#)
Columns of OmimVarLocusIdSNP table are:
1omim_id.
2The locus id the SNP is on
3 omim variation id.
4locus symbol
5Amino acid using the contig reference allele.
6Amino acid position in the protein.
7Amino acid of the snp variance.
8var class (used for internal dbSNP processing)
9snp_id (rs#)
The last column should be the SNP rs ID. The 4th column seems to be gene name which may be mapped
directly into the human protein network.
URL:
http://www.ncbi.nlm.nih.gov/books/NBK44379/#Search.Finding_Records_with_OMIM_DataLin
How to automatically forward emails to other accounts in Lotus Notes
Here are my steps to set up automatic forwarding in LotusNotes. The steps can be slightly different in different versions of Lotus Notes. The basic idea is to set up a Mail Rule that can work with most emails.
In Lotus Notes, go to the Mail window and click "More".
Then click "Mail Rules" in the drop-down menu.
Choose "does not contain" for this rule,
Type some weird letters to make sure they are unlikely to be in most emails, and then click "Add"
For the action part, choose "send copy to",
Enter the email address that you want to forward and then click "add action",
You should now see the rule in the window beneath. Click "OK" to complete this procedure.
Now, you should this active rule in the mail rule window.
You can now send yourself an email to test wether this mail rule works or not.
Note: Every now and then, when Lotus Note are updated, mail rules can stop working because the versions between your account and the server are not the same. You should ask your systems administrator to update your LotusNote configurations.
Monday, November 4, 2013
AUC library useful tips
AUC Robert Woodruff Library
Inter library loan site
https://auctr.illiad.oclc.org/illiad/logon.html
Remote Access
https://accounts.auctr.edu/EZproxy/Main/rwwl.asp
Electronic JSTOR link:
https://login.ezproxy.auctr.edu:2050/login?url=http://www.jstor.org/jstor
Inter library loan site
https://auctr.illiad.oclc.org/illiad/logon.html
Remote Access
https://accounts.auctr.edu/EZproxy/Main/rwwl.asp
Electronic JSTOR link:
https://login.ezproxy.auctr.edu:2050/login?url=http://www.jstor.org/jstor
Subscribe to:
Posts (Atom)