Showing posts with label network. Show all posts

Friday, October 11, 2019

weighted adjacency matrix

Q: 0 means no link. but small value means a very close link.

In igraph, direction is from Column to row. The following example show arrow from 2nd and 3rd to 1st.

In Yuan, network exact control paper, the directions are from row to columns. So, is the transpose of the igraph adjacency matrix.

Tuesday, November 15, 2016

integrating gene expression and network, a reference collection

Ideker,T. et al. (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics, 18, S233–S240.

Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF.

Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated.
Ideker02 seems to combine K-means and simulated annealing for network clustering.

Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289.

Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272.

Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.

Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221

Integrating gene expression and protein-protein interaction network to prioritize cancer-associated

genes, Chao Wu, Jun Zhu and Xuegong Zhang

http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en

Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation

http://www.biomedcentral.com/1755-8794/7/S2/S4

http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html

http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html

http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html

From Ma, 2007 Bioinformatics CGI paper:

Gene expression data and protein interaction data have been
integrated for gene function prediction. For example, Ideker
et al. (2002) used protein interaction data and gene expression
data to screen for differentially expressed subnetworks between
different conditions. In Tornow and Mewes (2003) and Segal
et al. (2003), gene expression data and protein interactions are
used to group genes into functional modules. These methods provide
insights into the regulatory modules of the whole networks at
the systems biology level. However, it is not clear how to adapt their
methods to identify genes contributing to the phenotype of interest.
Morrison et al. (2005) adapted the Google search engine to prioritize
genes for a phenotype by integrating gene expression profiles
and protein interaction data. However, the algorithm ignores the
information from proteins linked to the target protein through other
intermediate proteins, referred to in the rest of this paper as indirect
neighbors.

Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM.

https://scholar.google.com/scholar?q=disease&btnG=&hl=en&as_sdt=5%2C43&sciodt=0%2C43&cites=5934830469117211620&scipsc=1

X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002

WGCNA: an R package for weighted correlation network analysis.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559

Tuesday, October 25, 2016

Yeast genetic interaction database (Cellmap.org)

https://www.quantamagazine.org/20161025-pairwise-gene-removal-reveals-genetic-structure/

http://thecellmap.org/costanzo2016/

Saturday, September 24, 2016

cancer network analysis, 2014 Leiserson et al, Nature genetics

2014 Nature genetics

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

Mark D M Leiserson1,2,14, Fabio Vandin1,2,13,14, Hsin-Ta Wu1,2, Jason R Dobson1–3, Jonathan V Eldridge1, Jacob L Thomas1, Alexandra Papoutsaki1, Younhun Kim1, Beifang Niu4, Michael McLellan4, Michael S Lawrence5, Abel Gonzalez-Perez6, David Tamborero6, Yuwei Cheng7, Gregory A Ryslik8, Nuria Lopez-Bigas6,9, Gad Getz5,10, Li Ding4,11,12 & Benjamin J Raphael1,2

"URLs. HI2012 interactome, http://interactome.dfci.harvard.edu/; HotNet2 pan-cancer analysis website, http://compbio.cs.brown.edu/pancancer/hotnet2/; RNA expression data used for the TCGA pan-cancer data set, https://www.synapse.org/#!Synapse:syn1734155; pan-cancer mutations with additional germline variant filtering, https://www.synapse.org/#!Synapse:syn1729383; HotNet2 software release, http://compbio.cs.brown.edu/software."

Saturday, September 17, 2016

synthetical lethal interactions

Byte-5:originals hqin$ grep synthe PPI_221205.tab | wc -l

1062

Byte-5:originals hqin$ pwd

/Users/hqin/data/interaction/mips-yeast/originals

Byte:genetic-interaction hqin$ pwd

/Users/hqin/data/Sce.shanghai/mips/genetic-interaction

Byte:genetic-interaction hqin$ wc -l synthetic.lethals.tab

441 synthetic.lethals.tab

Sunday, July 17, 2016

Stochastic topology in data analysis

http://www2.stat.duke.edu/~sayan/

http://www2.stat.duke.edu/~sayan/Sta613/2016/Sta613.html

Wednesday, May 25, 2016

toread, Effects of reciprocity on random walks in weighted networks Zhang, Li, Scientific Report

Effects of reciprocity on random walks in weighted networks
Zhang, Li, Scientific Reports

Thursday, July 9, 2015

toread: review on graph theory and network analysis

http://www.biodatamining.org/content/4/1/10

Sunday, December 28, 2014

toread, interaction based discovery of cancer genes

2014 Feb;42(3):e18. doi: 10.1093/nar/gkt1305. Epub 2013 Dec 19.

Interaction-based discovery of functionally important genes in cancers.

http://www.ncbi.nlm.nih.gov/pubmed/24362839

Tuesday, December 23, 2014

toread, power law network paper

http://www.ncbi.nlm.nih.gov/pubmed/25520244

Sunday, December 21, 2014

Braunewell Bornholdt, 2007, Superstability of the yeast cell-cycle dynamics

[PB07JTB] 2007 Apr 21;245(4):638-43. Epub 2006 Nov 21. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity

Braunewell S, Bornholdt S.

In their 2009 JTB paper, the author cited a measure of reliability in this 07JTB paper. I searched the entire paper for reliability, but did find one hit in the abstract. In the main text, the author mentioned "stability of the systems under strong noise", termed "stability criterion" (basically robustness or reliability. Based on its explanation below, this is a rather context-specific criterion.

It seems that PB07 and PB09 are based on the Li04PNAS paper, a boolean network model on yeast cell cycle.

Braunewell and Bornholdt, 2009, reliability of network

PB09JTB

investigate the interplay of topological structure and dynamical robustness.

reliability of attractors

boolean network dynamics

The reliability criteriont was used to show the robustness of the yeast cell-

cycle dynamics against timing perturbations (Braunewell and Bornholdt, 2007)

[PB07JTB] Braunewell, S., Bornholdt, S., 2007. Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity. J. Theor. Biol. 245 (4), 638–643.

Wednesday, December 17, 2014

toread, pan-cancer network, somatic mutaitons

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes, nature genetics, 2014

reciprocity and power-law network, TOREAD

To read.

http://www.nature.com/srep/2014/141212/srep07460/pdf/srep07460.pdf

This is paper is related to my network aging and network configuration.

Monday, December 15, 2014

Liu & Chen, 2012, Protein Cell, Proteome-wide prediction of protein-protein interactions from high-throughput data.

2012 Jul;3(7):508-20. doi: 10.1007/s13238-012-2945-1. Epub 2012 Jun 22.

Proteome-wide prediction of protein-protein interactions from high-throughput data.

Liu ZP¹, Chen L.

http://www.ncbi.nlm.nih.gov/pubmed/22729399

Good Review on protein/gene network study

Thursday, December 11, 2014

csemerely review on networks, TOREAD

Structure and dynamics of molecular networks: A novel paradigm of drug discovery: A comprehensive review

Peter Csermely, et al, 2013

http://www.sciencedirect.com/science/article/pii/S0163725813000284

integrating gene expression data into protein interaction network

Integrating gene expression and protein-protein interaction network to prioritize cancer-associated

genes, Chao Wu, Jun Zhu and Xuegong Zhang

http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en

Winterbach et al. BMC Systems Biology 2013, Topology of molecular interaction networks

Topology of molecular interaction networks

http://www.biomedcentral.com/content/pdf/1752-0509-7-90.pdf

Winterbach et al. BMC Systems Biology 2013, 7:90
http://www.biomedcentral.com/1752-0509/7/90

Tuesday, August 26, 2014

GWAS meta analysis

Gilman et al, Neuron, 2011. p898-907. NetBag on Autism
NetBag is a greedy approaches. The clustering methods started with one or two genes in CNV as ‘seeds’.

Gilman11 generated a weighted background human gene network for their study.

Gilman11 compared the cluster raw pvalue, called local pvalue to the p-values from random networks. The adjusted p-value is called global p-value.

-----------------------------------------------------------------------------------------------------------------------

AIS13 categorize pathway association methods into canonical and de nov pathway methods.

For de novo pathway discovery, integer linear program (ILP) is used in Leiserson , Blokh, Plos Comput Biol. Simultaneous identification of multiple driver pathway in cancer.

Steiner tree problem where one seeks the lowest cost pathway that connect the associated genes. See Liu et al, BMC Sys Biol 2012, Gene, pathway and nework frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data.

-----------------------------------------------------------------------------------------------------------------------
LERR13 review the method on protein-protein and protein-DNA networks to identify 'causal' genetic variant. (Their 'causal' definition is a narrowly defined one).

LERR13 argues that GWAS mostly find SNP that are LD with the actual 'causal' gene. This problem is of less concern to CNV analysis. One solution is to use network to 'rank' genes in the same haplotype known to be associate to the phenotype of interest or similar phenotypes. (This method is in the spirit of our recent CNV paper).

The green square represents the 'known' 'causal' gene. So, this is largely a traversal-measures based method.

LERR13 argues that networks contribute to 'missing heritability'.

LERR13 seems to suggest that protein-DNA networks are better suited for expression QTL (eQTL).

LERR13 shows that OMIM is the source of 'causal' gene information for most network based GWAS (table 1). Only one paper use GeneCards as an alternative source.

LERR13 cited several pathway enrichment analysis of GWAS. It argues that interaction are treated equally in these enrichment analysis. (This can be cited in our CNV replies). The authors then show several method use weighted networks to identify network modules using iterative 'seed and extend' method. (For comparison, our CNV paper did not use seed explicitly, and avoid some 'prejudice').

LERR13 also discussed subnetwork modules with mutation hotspots in cancer genomes.

----------------------------------------------------------------------------------------------------------

BGTF12: GWAS use meta-analysis of multiple data sets to reduce false positives and increase statistical power.

A major concern of GWAS meta analysis is the heterogeneity in the data sets, such LD difference among data sets, chip differences. However, Lin and Zeng 2010 (Gene epidemil) show heterogeneity is not a significant factor using simulation studies.

Combination across data sets is the frequentist approach, cumulative studies is the Bayesian approach.

In R, GWAS meta-analysis package: Metrafor, rmeta, and CATMAP.

BGTF12 argues that GWAS data should be 'cleaned' and imputed before meta-analysis.

Reference:
[AIS13] Atias, Istrail, Sharan 2013, Current Opinion in Genetics and Development. Pathway-based analysis of genomic variation data.

[LERR13] Leiserson, Eldrige, Ramachandran, Raphael, 2013, Current Opinion in Genetics and Development. Network analysis of GWAS data.

[BGTF12], begum, ghosh, tseng, feigold, 2012 NAS, comprehensive literature review and statistical consideration for GWAS meta analysis

Wednesday, February 12, 2014

ms02 R, batch run

# ms02-2014Feb12.R
#permuate merged yeast PPI+GIN

#2014 Feb 12, re-name function to ms02_singlerun
#2014 Jan 31, fixed a bug that inserted "NA" into new network. The bug seems to be caused by spliting the
# arrays. I rewrote the spliting portion.

#require(igraph)
rm(list=ls())
debug = 0
setwd("~/projects/0.ginppi.reliability.simulation/ms02GINPPI")
#set.seed(2014)

#permute.pairs.wo.selfpairs = function( inpairs, ncycles=10, debug=1 ) {
ms02_singlerun = function( inpairs, ncycles=10, indebug=0 ) { # Renamed, 2014 Feb 12
if (ncycles >= 1 ) {
    if(indebug>0) {
      print(paste('ncycles=', ncycles))
    }
    longids = c(as.character(inpairs[,1]), as.character(inpairs[,2]) )
    longids = sample(longids)
    len = length(inpairs[,1])
    newpairs = data.frame( cbind( longids[1:len], longids[(len+1): (2*len)]) )
    names(newpairs) = c('id1', 'id2')
    newpairs$id1 = as.character( newpairs$id1)
    newpairs$id2 = as.character( newpairs$id2)
    newpairs$selfpairs = ifelse( newpairs$id1 == newpairs$id2, 1, 0 )
    self.tb = newpairs[ newpairs$selfpairs==1, ]
    nonself.tb = newpairs[newpairs$selfpairs==0, ]
    if(indebug>0) {
      print(paste("===selfpairs===="),NULL)
      print(self.tb)
      print(paste("================="),NULL)
    }
    if( length(self.tb[,1])>=1 ) {
      if ( ncycles == 0) {
        #return (c(NA,NA, NA) );
        print(paste("ncycles reached zero, ncycles"),ncycles)
        print(paste("Abort!"),NULL)
        stop;
      } else {
        ncycles = ncycles - 1
        splitPos = round( length(self.tb[,1]) * sqrt(ncycles) ) + 5 #2014Jan31 change
        splitPos = min( splitPos, (length(nonself.tb[,1])-1 ) )
        selectedpairs = rbind(self.tb, nonself.tb[1: splitPos, ] )
        restpairs = nonself.tb[ (splitPos + 1): length(nonself.tb[,1]), ]
        #return( rbind(restpairs, permute.pairs.wo.selfpairs(selectedpairs, ncycles)))
        return( rbind(restpairs, ms02_singlerun(selectedpairs, ncycles))) #2014 Feb 12
      }
    } else {
      return (newpairs)
    }
} else {
    return( c(NA,NA,NA ))
}
}

#net = read.table("repeat.tab")
#write.table(pairs, "merged_PPIGIN_2014Jan20.tab", quote=F, row.names=F, col.names=F, sep='\t')
net = read.table( "merged_PPIGIN_2014Jan20.tab", header=F, sep="\t", colClass = c("character", "character") )
head(net)
if(debug==9) {
#net = read.table('pair.tab',header=F)
net = net[1:90000,]
}

for( i in 1:100) {
net.ms02 = ms02_singlerun( net, indebug=0 )
cmnd = paste( "mkdir dipgin.ms02.output/", i, sep="")
system( cmnd )
outputname = paste( 'dipgin.ms02.output/', i, '/', "ms02_",i,".tab", sep="")
write.csv(net.ms02, outputname)
}

#do they have the same degree?
#t1 = table(c(net[,1],net[,2]))
#t2 = table(c(net.ms02[,1],net.ms02[,2]))
#comp <- t1 == t2
#table(comp)
#tf = comp[comp==F]; tf
#t1[names(tf)[1]]
#t1[names(tf)]
#t2[names(tf)]