Friday, March 15, 2013

Power law, Zipf, Pareto distribution, and gene network

Power law distribution refers to the general form of y = a * x^ k.

In gene interaction networks with infinite number of genes, the power law distribution of connectivity is
       (Eq 1)
where k is the connectivity, and Z(gamma) is the Riemann zeta function.  This definition implies that when gamma <=3, the mean does not exists because its distribution has a infinite variance.


Zipf law is defined in a finite population with N number of elements with discrete values.
The above is basically the probability mass function. In this form, the variance is always finite. So, the theoretic property is different from Eq 1 definition. 

Zipf law can be generalized as the Zipf–Mandelbrot law.
where k is the rank of data, and q and s are distribution parameters. This generalized Zipf-Mandelbrot law also covers the specialized form for a 'pure' gene/protein network in power-law configuration as in Eq 1.


In R, a package zipfR is available.  See discussion http://pidgin.ucsd.edu/pipermail/r-lang/2007-July/000047.html and http://zipfr.r-forge.r-project.org/.

library(zipfR)
Z <- lnre("zm", alpha=.8, B=1e-3)
hist( as.integer(rlnre(Z, 1000)))

As an alternative, the "rmutil" package by Jim Lindsey provides Pareto distributions.

It seems netmodels  also provide power-law distribution, but this package become orphaned. 

library(igraph)
data(test.net,package="netmodels")
dist <- degree(test.net)
alpha <- calc.alpha(dist)
dist.power.law(alpha,5)


Pareto distribution is defined for continuous variable x. It is seemed to designed to describe the tail distribution in economics. It cumulative distribution function given by wikipedia is:












Its PDF is
in which, alpha and xm^alpha are constants.

Zipf distribution can be viewed as a discrete Pareto distribution. So, I can use 'rpareto{extremevalues}'  or 'rpareto{VGAM}' to generate the randome number and then discretize them. (See http://tuvalu.santafe.edu/~aaronc/powerlaws/plfit.r).

Rich Wash wrote some R scripts to provide dpowerlaw(), ppowerlaw(), rpowerlaw() functions.
http://code.google.com/p/evaltools/source/browse/trunk/lib/powerlaw.r?spec=svn11&r=11

The igraph package provide a power-law fit function that seems to uses mle().



References and links:
http://tuvalu.santafe.edu/~aaronc/powerlaws
http://tuvalu.santafe.edu/~aaronc/powerlaws/plfit.r
http://code.google.com/p/evaltools/source/browse/trunk/lib/powerlaw.r?spec=svn11&r=11


No comments:

Post a Comment