Showing posts with label ms02. Show all posts
Showing posts with label ms02. Show all posts

Thursday, November 10, 2022

Monday, September 20, 2021

ms02 randomness verification

 

use network with known theoretical random permutation


this direction might be too theoretic and has very little practical importance. 


Monday, January 14, 2019

dang 005 GO run log

ridgeside:
col 3 run in the original folder
col 2 un in _col2 folder

ts117:
col4 run in single-core mode on /scr
col4 doMC version run

ts job run, yeastPIN ms02 Dang 0.001 percentile ~ GO


Modified based Guo's example script for ts.
ts does not have 'foreach' and 'doMC' libries, so I have to remove them in R script.


-bash-4.2$ cat ts_yeastPIN_job1.pbs  
#!/bin/bash -l
#$ -S /bin/bash
#$ -N yPIN.Dang.001.col4
#$ -cwd

. /etc/profile.d/modules.sh
module load shared
module load R/3.4.3

cmd="R -f yeast_Zscore_GO-DangCR.R"

$cmd
-bash-4.2$



-bash-4.2$ qstatus -a
Running jobs:

job-ID  # name                      owner      submit time        
------------------------------------------------------------------
 8276  1 yPIN.Dang.001.col4        hqin       01/14/2019 10:15:40

Wednesday, November 7, 2018

NetBAS running time problem, yeast PIN, GOBP ~ GOBP 4.2 hours


The following codes run for 4.2 hours on applejack laptop using single core.  Need to use parallel method to speed this up. 

```{r}
pairsBuffer = data.frame(matrix(NA, nrow = 1, ncol=3))
names(pairsBuffer) = c("name1", "name2", "tag")
for ( i in 1:length(pairs[,1])){
  print(i)
  #els1 = sort( unlist( strsplit(  pairs$cat1[i], split=",") ))
  #els2 = sort( unlist( strsplit(  pairs$cat2[i], split=",") ))
  sub1 = cats[ cats$id == pairs$name1[i], ]
  sub2 = cats[ cats$id == pairs$name2[i], ]
   
  tagbuffer = allCombinationsOfTwoVectors ( sub1$GO, sub2$GO)  #all combinations
 
  # generate a dataframe buffer with ids
  currentBuffer = data.frame( cbind(rep(pairs$name1[i], length(tagbuffer)),
                        rep(pairs$name2[i], length(tagbuffer)),
                        tagbuffer                        ))
  names(currentBuffer) = c("name1", "name2", "tag")
 
  pairsBuffer = rbind( pairsBuffer, currentBuffer) #combine with dataframe buffer
}

F.obs = data.frame( table(pairsBuffer$tag))
names(F.obs) = c("tag", "freq")
F.obs

Thursday, November 1, 2018

yeast PIN ms02 Zscore map


An interesting curve was seen. Only 2 ms02 null were used in this plot.


Friday, September 7, 2018

permutation graphs

https://en.wikipedia.org/wiki/Permutation_graph

Algorithmic Graph Theory and Perfect Graphs, Volume 57

2nd Edition

Authors: Martin Golumbic
eBook ISBN: 9780080526966
Hardcover ISBN: 9780444515308
Imprint: North Holland
Published Date: 4th February 2004
Page Count: 340

Monday, January 29, 2018

possible ms02 bug and how to fix it.

update on 20180202: it turns out the input pairs of network contain redundant pairs. So, input data need to be checked for consistency by a switch, such as "-input-check 1"

It is possible that recursive call return pairs that already existed in previous iterations, in recursive implementation.

This may explain why large network permutation give slightly different random networks.

One way to fix this is to do book-keeping on the entire networks (instead of a small chunk using recursive definitions). This may be implemented in a separate functional call, breaking self-pairing and reassignment partners.




Wednesday, November 29, 2017

p=0.001 ridgeside, rls pairwise difference in yeast biogrid PPI


rm(list=ls())
#setwd("~/github/0.network.aging.ms02/1.Fraser02")
setwd("/home/hqin/github/network.aging.configuration/1.Fraser02")
source("../network.r")
set.seed(2017)
debug = 1; 
start_time = Sys.time();
list.files(path="../data/")
## [1] "ken-RLS-byORF.csv"                             
## [2] "SummaryRegressionHetHomFactorized2015Oct13.csv"
## [3] "unique_biogrid_ScePPI.csv"
rls = read.csv("../data/ken-RLS-byORF.csv");
biogrid = read.csv("../data/unique_biogrid_ScePPI.csv");
fit = read.csv("../data/SummaryRegressionHetHomFactorized2015Oct13.csv")
ppi = biogrid[, c("Systematic.Name.Interactor.A","Systematic.Name.Interactor.B")];
names(ppi) = c("ORF1", "ORF2" )
#First, define a function to calculate V difference in pairs of proteins
 diff.RLS = function( inpairs ) {
   inpairs$rls1 = rls$avgLS[match( inpairs$ORF1, rls$ORF ) ];
   inpairs$rls2 = rls$avgLS[match( inpairs$ORF2, rls$ORF ) ];
   
   inpairs$essen1 = fit$essenflag[match(inpairs$ORF1, fit$orf)];
   inpairs$essen2 = fit$essenflag[match(inpairs$ORF2, fit$orf)];
   
   inpairs$rls1 = ifelse( inpairs$essen1=='essential', 0, inpairs$rls1);
   inpairs$rls2 = ifelse( inpairs$essen2=='essential', 0, inpairs$rls2);
   
   ret = mean( abs( inpairs$rls1 - inpairs$rls2 ), na.rm=T );
 } 
 # calculate the observed difference in RLS
 diff.RLS.obs = diff.RLS ( ppi );
 paste( "Observed deltaRLS = ", diff.RLS.obs); 
## [1] "Observed deltaRLS =  12.759632586095"
#permutation of pairs, and their difference in Ka
 Nsims = 1000; #number of permutations
 permutated.diff.RLS = numeric( Nsims ); #empty vector to store calculations

library(foreach)
library(doMC)
## Loading required package: iterators
## Loading required package: parallel
registerDoMC(cores=8) #Intel i7 has 6 cores, Xeon E5-2603 @ridgeside has 8 cores

permutated.diff.RLS = foreach(i=1:Nsims) %dopar% {
   new.pairs = ms02_singlerun(ppi ) #generate a new MS02 random network
   new.pairs = new.pairs[,1:2] #reformating into two-columns
   names(new.pairs) = c("ORF1", "ORF2")
   diff.RLS( new.pairs ); 
  }
p-value
permutated.diff.RLS = unlist(permutated.diff.RLS)

summary(permutated.diff.RLS)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.09   14.20   14.22   14.22   14.25   14.39
sub = permutated.diff.RLS[ permutated.diff.RLS < diff.RLS.obs ]
paste("pvalue = ", length(sub)/Nsims)
## [1] "pvalue =  0"
hist(permutated.diff.RLS)
stop_time = Sys.time()
paste( "running time = ", stop_time - start_time) 
## [1] "running time =  18.9267802158992"

Tuesday, November 28, 2017

RLS P-value evaluation using ms02 permutation

rm(list=ls())
setwd("~/github/0.network.aging.ms02/1.Fraser02")
source("../network.r")
set.seed(2017)
debug = 1; 
list.files(path="../data/")
## [1] "ken-RLS-byORF.csv"                             
## [2] "SummaryRegressionHetHomFactorized2015Oct13.csv"
## [3] "unique_biogrid_ScePPI.csv"
rls = read.csv("../data/ken-RLS-byORF.csv");
biogrid = read.csv("../data/unique_biogrid_ScePPI.csv");
fit = read.csv("../data/SummaryRegressionHetHomFactorized2015Oct13.csv")
ppi = biogrid[, c("Systematic.Name.Interactor.A","Systematic.Name.Interactor.B")];
names(ppi) = c("ORF1", "ORF2" )
#First, define a function to calculate V difference in pairs of proteins
 diff.RLS = function( inpairs ) {
   inpairs$rls1 = rls$avgLS[match( inpairs$ORF1, rls$ORF ) ];
   inpairs$rls2 = rls$avgLS[match( inpairs$ORF2, rls$ORF ) ];
   
   inpairs$essen1 = fit$essenflag[match(inpairs$ORF1, fit$orf)];
   inpairs$essen2 = fit$essenflag[match(inpairs$ORF2, fit$orf)];
   
   inpairs$rls1 = ifelse( inpairs$essen1=='essential', 0, inpairs$rls1);
   inpairs$rls2 = ifelse( inpairs$essen2=='essential', 0, inpairs$rls2);
   
   ret = mean( abs( inpairs$rls1 - inpairs$rls2 ), na.rm=T );
 } 
 # calculate the observed difference in RLS
 diff.RLS.obs = diff.RLS ( ppi );
 paste( "Observed deltaRLS = ", diff.RLS.obs); 
## [1] "Observed deltaRLS =  12.759632586095"
#permutation of pairs, and their difference in Ka
 Nsims = 100; #number of permutations
 permutated.diff.RLS = numeric( Nsims ); #empty vector to store calculations

library(foreach)
library(doMC)
## Loading required package: iterators
## Loading required package: parallel
registerDoMC(cores=5) #Intel i7 has 6 cores

permutated.diff.RLS = foreach(i=1:Nsims) %dopar% {
   new.pairs = ms02_singlerun(ppi ) #generate a new MS02 random network
   new.pairs = new.pairs[,1:2] #reformating into two-columns
   names(new.pairs) = c("ORF1", "ORF2")
   diff.RLS( new.pairs ); 
  }
p-value
permutated.diff.RLS = unlist(permutated.diff.RLS)

summary(permutated.diff.RLS)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.14   14.20   14.23   14.23   14.25   14.31
sub = permutated.diff.RLS[ permutated.diff.RLS < diff.RLS.obs ]
paste("pvalue = ", length(sub)/Nsims)
## [1] "pvalue =  0"
hist(permutated.diff.RLS)