Q: 0 means no link. but small value means a very close link.
In igraph, direction is from Column to row. The following example show arrow from 2nd and 3rd to 1st.
In Yuan, network exact control paper, the directions are from row to columns. So, is the transpose of the igraph adjacency matrix.
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Showing posts with label network. Show all posts
Showing posts with label network. Show all posts
Friday, October 11, 2019
Tuesday, November 15, 2016
integrating gene expression and network, a reference collection
Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF.
Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated.
Ideker02 seems to combine K-means and simulated annealing for network clustering.
Ideker02 seems to combine K-means and simulated annealing for network clustering.
Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289.
Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272.
Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.
Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221
Integrating gene expression and protein-protein interaction network to prioritize cancer-associated
genes, Chao Wu, Jun Zhu and Xuegong Zhang
http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en
Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation
http://www.biomedcentral.com/1755-8794/7/S2/S4
http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html
http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html
http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html
From Ma, 2007 Bioinformatics CGI paper:
WGCNA: an R package for weighted correlation network analysis.
http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html
http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html
http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html
From Ma, 2007 Bioinformatics CGI paper:
Gene expression data and protein interaction data have been
integrated for gene function prediction. For example, Ideker
et al. (2002) used protein interaction data and gene expression
data to screen for differentially expressed subnetworks between
different conditions. In Tornow and Mewes (2003) and Segal
et al. (2003), gene expression data and protein interactions are
used to group genes into functional modules. These methods provide
insights into the regulatory modules of the whole networks at
the systems biology level. However, it is not clear how to adapt their
methods to identify genes contributing to the phenotype of interest.
Morrison et al. (2005) adapted the Google search engine to prioritize
genes for a phenotype by integrating gene expression profiles
and protein interaction data. However, the algorithm ignores the
information from proteins linked to the target protein through other
intermediate proteins, referred to in the rest of this paper as indirect
neighbors.
Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM.
X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002
WGCNA: an R package for weighted correlation network analysis.
Tuesday, October 25, 2016
Saturday, September 24, 2016
cancer network analysis, 2014 Leiserson et al, Nature genetics
2014 Nature genetics
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes
Mark D M Leiserson1,2,14, Fabio Vandin1,2,13,14, Hsin-Ta Wu1,2, Jason R Dobson1–3, Jonathan V Eldridge1, Jacob L Thomas1, Alexandra Papoutsaki1, Younhun Kim1, Beifang Niu4, Michael McLellan4, Michael S Lawrence5, Abel Gonzalez-Perez6, David Tamborero6, Yuwei Cheng7, Gregory A Ryslik8, Nuria Lopez-Bigas6,9, Gad Getz5,10, Li Ding4,11,12 & Benjamin J Raphael1,2
"URLs. HI2012 interactome, http://interactome.dfci.harvard.edu/; HotNet2 pan-cancer analysis website, http://compbio.cs.brown.edu/pancancer/hotnet2/; RNA expression data used for the TCGA pan-cancer data set, https://www.synapse.org/#!Synapse:syn1734155; pan-cancer mutations with additional germline variant filtering, https://www.synapse.org/#!Synapse:syn1729383; HotNet2 software release, http://compbio.cs.brown.edu/software."
Saturday, September 17, 2016
synthetical lethal interactions
Byte-5:originals hqin$ grep synthe PPI_221205.tab | wc -l
1062
Byte-5:originals hqin$ pwd
/Users/hqin/data/interaction/mips-yeast/originals
Byte:genetic-interaction hqin$ pwd
/Users/hqin/data/Sce.shanghai/mips/genetic-interaction
Byte:genetic-interaction hqin$ wc -l synthetic.lethals.tab
441 synthetic.lethals.tab
Sunday, July 17, 2016
Wednesday, May 25, 2016
toread, Effects of reciprocity on random walks in weighted networks Zhang, Li, Scientific Report
Effects of reciprocity on random walks in
weighted networks
Zhang, Li, Scientific Reports
Zhang, Li, Scientific Reports
Thursday, July 9, 2015
toread: review on graph theory and network analysis
http://www.biodatamining.org/content/4/1/10
Sunday, December 28, 2014
toread, interaction based discovery of cancer genes
Nucleic Acids Res. 2014 Feb;42(3):e18. doi: 10.1093/nar/gkt1305. Epub 2013 Dec 19.
Interaction-based discovery of functionally important genes in cancers.
http://www.ncbi.nlm.nih.gov/pubmed/24362839
Tuesday, December 23, 2014
toread, power law network paper
http://www.ncbi.nlm.nih.gov/pubmed/25520244
Sunday, December 21, 2014
Braunewell Bornholdt, 2007, Superstability of the yeast cell-cycle dynamics
[PB07JTB] J Theor Biol. 2007 Apr 21;245(4):638-43. Epub 2006 Nov 21. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity
In their 2009 JTB paper, the author cited a measure of reliability in this 07JTB paper. I searched the entire paper for reliability, but did find one hit in the abstract. In the main text, the author mentioned "stability of the systems under strong noise", termed "stability criterion" (basically robustness or reliability. Based on its explanation below, this is a rather context-specific criterion.
It seems that PB07 and PB09 are based on the Li04PNAS paper, a boolean network model on yeast cell cycle.
Braunewell and Bornholdt, 2009, reliability of network
PB09JTB
reliability of attractors
boolean network dynamics
See also
investigate the interplay of topological structure and dynamical robustness.
boolean network dynamics
The reliability criteriont was used to show the robustness of the yeast cell-
cycle dynamics against timing perturbations (Braunewell and Bornholdt, 2007)
See also
Wednesday, December 17, 2014
toread, pan-cancer network, somatic mutaitons
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes, nature genetics, 2014
reciprocity and power-law network, TOREAD
To read.
http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf
This is paper is related to my network aging and network configuration.
http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf
This is paper is related to my network aging and network configuration.
Monday, December 15, 2014
Liu & Chen, 2012, Protein Cell, Proteome-wide prediction of protein-protein interactions from high-throughput data.
Protein Cell. 2012 Jul;3(7):508-20. doi: 10.1007/s13238-012-2945-1. Epub 2012 Jun 22.
Proteome-wide prediction of protein-protein interactions from high-throughput data.
Good Review on protein/gene network study
Thursday, December 11, 2014
csemerely review on networks, TOREAD
Structure and dynamics of molecular networks: A novel paradigm of drug discovery: A comprehensive review
- Peter Csermely, et al, 2013
integrating gene expression data into protein interaction network
Integrating gene expression and protein-protein interaction network to prioritize cancer-associated
genes, Chao Wu, Jun Zhu and Xuegong Zhang
http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en
Winterbach et al. BMC Systems Biology 2013, Topology of molecular interaction networks
Topology of molecular interaction networks
http://www.biomedcentral.com/content/pdf/1752-0509-7-90.pdf
Winterbach et al. BMC Systems Biology 2013, 7:90
http://www.biomedcentral.com/1752-0509/7/90
http://www.biomedcentral.com/content/pdf/1752-0509-7-90.pdf
Winterbach et al. BMC Systems Biology 2013, 7:90
http://www.biomedcentral.com/1752-0509/7/90
Tuesday, August 26, 2014
GWAS meta analysis
Gilman et al, Neuron, 2011. p898-907. NetBag on Autism
NetBag is a greedy approaches. The clustering methods started with one or two genes in CNV as ‘seeds’.
Gilman11 generated a weighted background human gene network for their study.
Gilman11 compared the cluster raw pvalue, called local pvalue to the p-values from random networks. The adjusted p-value is called global p-value.
-----------------------------------------------------------------------------------------------------------------------
AIS13 categorize pathway association methods into canonical and de nov pathway methods.
For de novo pathway discovery, integer linear program (ILP) is used in Leiserson , Blokh, Plos Comput Biol. Simultaneous identification of multiple driver pathway in cancer.
Steiner tree problem where one seeks the lowest cost pathway that connect the associated genes. See Liu et al, BMC Sys Biol 2012, Gene, pathway and nework frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data.
-----------------------------------------------------------------------------------------------------------------------
LERR13 review the method on protein-protein and protein-DNA networks to identify 'causal' genetic variant. (Their 'causal' definition is a narrowly defined one).
LERR13 argues that GWAS mostly find SNP that are LD with the actual 'causal' gene. This problem is of less concern to CNV analysis. One solution is to use network to 'rank' genes in the same haplotype known to be associate to the phenotype of interest or similar phenotypes. (This method is in the spirit of our recent CNV paper).
The green square represents the 'known' 'causal' gene. So, this is largely a traversal-measures based method.
LERR13 argues that networks contribute to 'missing heritability'.
LERR13 seems to suggest that protein-DNA networks are better suited for expression QTL (eQTL).
LERR13 shows that OMIM is the source of 'causal' gene information for most network based GWAS (table 1). Only one paper use GeneCards as an alternative source.
LERR13 cited several pathway enrichment analysis of GWAS. It argues that interaction are treated equally in these enrichment analysis. (This can be cited in our CNV replies). The authors then show several method use weighted networks to identify network modules using iterative 'seed and extend' method. (For comparison, our CNV paper did not use seed explicitly, and avoid some 'prejudice').
LERR13 also discussed subnetwork modules with mutation hotspots in cancer genomes.
----------------------------------------------------------------------------------------------------------
BGTF12: GWAS use meta-analysis of multiple data sets to reduce false positives and increase statistical power.
A major concern of GWAS meta analysis is the heterogeneity in the data sets, such LD difference among data sets, chip differences. However, Lin and Zeng 2010 (Gene epidemil) show heterogeneity is not a significant factor using simulation studies.
Combination across data sets is the frequentist approach, cumulative studies is the Bayesian approach.
In R, GWAS meta-analysis package: Metrafor, rmeta, and CATMAP.
BGTF12 argues that GWAS data should be 'cleaned' and imputed before meta-analysis.
Reference:
[AIS13] Atias, Istrail, Sharan 2013, Current Opinion in Genetics and Development. Pathway-based analysis of genomic variation data.
[LERR13] Leiserson, Eldrige, Ramachandran, Raphael, 2013, Current Opinion in Genetics and Development. Network analysis of GWAS data.
[BGTF12], begum, ghosh, tseng, feigold, 2012 NAS, comprehensive literature review and statistical consideration for GWAS meta analysis
NetBag is a greedy approaches. The clustering methods started with one or two genes in CNV as ‘seeds’.
Gilman11 generated a weighted background human gene network for their study.
Gilman11 compared the cluster raw pvalue, called local pvalue to the p-values from random networks. The adjusted p-value is called global p-value.
-----------------------------------------------------------------------------------------------------------------------
AIS13 categorize pathway association methods into canonical and de nov pathway methods.
For de novo pathway discovery, integer linear program (ILP) is used in Leiserson , Blokh, Plos Comput Biol. Simultaneous identification of multiple driver pathway in cancer.
Steiner tree problem where one seeks the lowest cost pathway that connect the associated genes. See Liu et al, BMC Sys Biol 2012, Gene, pathway and nework frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data.
-----------------------------------------------------------------------------------------------------------------------
LERR13 review the method on protein-protein and protein-DNA networks to identify 'causal' genetic variant. (Their 'causal' definition is a narrowly defined one).
LERR13 argues that GWAS mostly find SNP that are LD with the actual 'causal' gene. This problem is of less concern to CNV analysis. One solution is to use network to 'rank' genes in the same haplotype known to be associate to the phenotype of interest or similar phenotypes. (This method is in the spirit of our recent CNV paper).
The green square represents the 'known' 'causal' gene. So, this is largely a traversal-measures based method.
LERR13 argues that networks contribute to 'missing heritability'.
LERR13 seems to suggest that protein-DNA networks are better suited for expression QTL (eQTL).
LERR13 shows that OMIM is the source of 'causal' gene information for most network based GWAS (table 1). Only one paper use GeneCards as an alternative source.
LERR13 cited several pathway enrichment analysis of GWAS. It argues that interaction are treated equally in these enrichment analysis. (This can be cited in our CNV replies). The authors then show several method use weighted networks to identify network modules using iterative 'seed and extend' method. (For comparison, our CNV paper did not use seed explicitly, and avoid some 'prejudice').
LERR13 also discussed subnetwork modules with mutation hotspots in cancer genomes.
----------------------------------------------------------------------------------------------------------
BGTF12: GWAS use meta-analysis of multiple data sets to reduce false positives and increase statistical power.
A major concern of GWAS meta analysis is the heterogeneity in the data sets, such LD difference among data sets, chip differences. However, Lin and Zeng 2010 (Gene epidemil) show heterogeneity is not a significant factor using simulation studies.
Combination across data sets is the frequentist approach, cumulative studies is the Bayesian approach.
In R, GWAS meta-analysis package: Metrafor, rmeta, and CATMAP.
BGTF12 argues that GWAS data should be 'cleaned' and imputed before meta-analysis.
Reference:
[AIS13] Atias, Istrail, Sharan 2013, Current Opinion in Genetics and Development. Pathway-based analysis of genomic variation data.
[LERR13] Leiserson, Eldrige, Ramachandran, Raphael, 2013, Current Opinion in Genetics and Development. Network analysis of GWAS data.
[BGTF12], begum, ghosh, tseng, feigold, 2012 NAS, comprehensive literature review and statistical consideration for GWAS meta analysis
Wednesday, February 12, 2014
ms02 R, batch run
# ms02-2014Feb12.R
#permuate merged yeast PPI+GIN
#2014 Feb 12, re-name function to ms02_singlerun
#2014 Jan 31, fixed a bug that inserted "NA" into new network. The bug seems to be caused by spliting the
# arrays. I rewrote the spliting portion.
#require(igraph)
rm(list=ls())
debug = 0
setwd("~/projects/0.ginppi.reliability.simulation/ms02GINPPI")
#set.seed(2014)
#permute.pairs.wo.selfpairs = function( inpairs, ncycles=10, debug=1 ) {
ms02_singlerun = function( inpairs, ncycles=10, indebug=0 ) { # Renamed, 2014 Feb 12
if (ncycles >= 1 ) {
if(indebug>0) {
print(paste('ncycles=', ncycles))
}
longids = c(as.character(inpairs[,1]), as.character(inpairs[,2]) )
longids = sample(longids)
len = length(inpairs[,1])
newpairs = data.frame( cbind( longids[1:len], longids[(len+1): (2*len)]) )
names(newpairs) = c('id1', 'id2')
newpairs$id1 = as.character( newpairs$id1)
newpairs$id2 = as.character( newpairs$id2)
newpairs$selfpairs = ifelse( newpairs$id1 == newpairs$id2, 1, 0 )
self.tb = newpairs[ newpairs$selfpairs==1, ]
nonself.tb = newpairs[newpairs$selfpairs==0, ]
if(indebug>0) {
print(paste("===selfpairs===="),NULL)
print(self.tb)
print(paste("================="),NULL)
}
if( length(self.tb[,1])>=1 ) {
if ( ncycles == 0) {
#return (c(NA,NA, NA) );
print(paste("ncycles reached zero, ncycles"),ncycles)
print(paste("Abort!"),NULL)
stop;
} else {
ncycles = ncycles - 1
splitPos = round( length(self.tb[,1]) * sqrt(ncycles) ) + 5 #2014Jan31 change
splitPos = min( splitPos, (length(nonself.tb[,1])-1 ) )
selectedpairs = rbind(self.tb, nonself.tb[1: splitPos, ] )
restpairs = nonself.tb[ (splitPos + 1): length(nonself.tb[,1]), ]
#return( rbind(restpairs, permute.pairs.wo.selfpairs(selectedpairs, ncycles)))
return( rbind(restpairs, ms02_singlerun(selectedpairs, ncycles))) #2014 Feb 12
}
} else {
return (newpairs)
}
} else {
return( c(NA,NA,NA ))
}
}
#net = read.table("repeat.tab")
#write.table(pairs, "merged_PPIGIN_2014Jan20.tab", quote=F, row.names=F, col.names=F, sep='\t')
net = read.table( "merged_PPIGIN_2014Jan20.tab", header=F, sep="\t", colClass = c("character", "character") )
head(net)
if(debug==9) {
#net = read.table('pair.tab',header=F)
net = net[1:90000,]
}
for( i in 1:100) {
net.ms02 = ms02_singlerun( net, indebug=0 )
cmnd = paste( "mkdir dipgin.ms02.output/", i, sep="")
system( cmnd )
outputname = paste( 'dipgin.ms02.output/', i, '/', "ms02_",i,".tab", sep="")
write.csv(net.ms02, outputname)
}
#do they have the same degree?
#t1 = table(c(net[,1],net[,2]))
#t2 = table(c(net.ms02[,1],net.ms02[,2]))
#comp <- t1 == t2
#table(comp)
#tf = comp[comp==F]; tf
#t1[names(tf)[1]]
#t1[names(tf)]
#t2[names(tf)]
#permuate merged yeast PPI+GIN
#2014 Feb 12, re-name function to ms02_singlerun
#2014 Jan 31, fixed a bug that inserted "NA" into new network. The bug seems to be caused by spliting the
# arrays. I rewrote the spliting portion.
#require(igraph)
rm(list=ls())
debug = 0
setwd("~/projects/0.ginppi.reliability.simulation/ms02GINPPI")
#set.seed(2014)
#permute.pairs.wo.selfpairs = function( inpairs, ncycles=10, debug=1 ) {
ms02_singlerun = function( inpairs, ncycles=10, indebug=0 ) { # Renamed, 2014 Feb 12
if (ncycles >= 1 ) {
if(indebug>0) {
print(paste('ncycles=', ncycles))
}
longids = c(as.character(inpairs[,1]), as.character(inpairs[,2]) )
longids = sample(longids)
len = length(inpairs[,1])
newpairs = data.frame( cbind( longids[1:len], longids[(len+1): (2*len)]) )
names(newpairs) = c('id1', 'id2')
newpairs$id1 = as.character( newpairs$id1)
newpairs$id2 = as.character( newpairs$id2)
newpairs$selfpairs = ifelse( newpairs$id1 == newpairs$id2, 1, 0 )
self.tb = newpairs[ newpairs$selfpairs==1, ]
nonself.tb = newpairs[newpairs$selfpairs==0, ]
if(indebug>0) {
print(paste("===selfpairs===="),NULL)
print(self.tb)
print(paste("================="),NULL)
}
if( length(self.tb[,1])>=1 ) {
if ( ncycles == 0) {
#return (c(NA,NA, NA) );
print(paste("ncycles reached zero, ncycles"),ncycles)
print(paste("Abort!"),NULL)
stop;
} else {
ncycles = ncycles - 1
splitPos = round( length(self.tb[,1]) * sqrt(ncycles) ) + 5 #2014Jan31 change
splitPos = min( splitPos, (length(nonself.tb[,1])-1 ) )
selectedpairs = rbind(self.tb, nonself.tb[1: splitPos, ] )
restpairs = nonself.tb[ (splitPos + 1): length(nonself.tb[,1]), ]
#return( rbind(restpairs, permute.pairs.wo.selfpairs(selectedpairs, ncycles)))
return( rbind(restpairs, ms02_singlerun(selectedpairs, ncycles))) #2014 Feb 12
}
} else {
return (newpairs)
}
} else {
return( c(NA,NA,NA ))
}
}
#net = read.table("repeat.tab")
#write.table(pairs, "merged_PPIGIN_2014Jan20.tab", quote=F, row.names=F, col.names=F, sep='\t')
net = read.table( "merged_PPIGIN_2014Jan20.tab", header=F, sep="\t", colClass = c("character", "character") )
head(net)
if(debug==9) {
#net = read.table('pair.tab',header=F)
net = net[1:90000,]
}
for( i in 1:100) {
net.ms02 = ms02_singlerun( net, indebug=0 )
cmnd = paste( "mkdir dipgin.ms02.output/", i, sep="")
system( cmnd )
outputname = paste( 'dipgin.ms02.output/', i, '/', "ms02_",i,".tab", sep="")
write.csv(net.ms02, outputname)
}
#do they have the same degree?
#t1 = table(c(net[,1],net[,2]))
#t2 = table(c(net.ms02[,1],net.ms02[,2]))
#comp <- t1 == t2
#table(comp)
#tf = comp[comp==F]; tf
#t1[names(tf)[1]]
#t1[names(tf)]
#t2[names(tf)]
Subscribe to:
Posts (Atom)