Friday, January 4, 2013

CONSEL result interpretation and reformat using perl

From "program.pdf" in CONSEL package,

2.4 catpv
The p-values calculated by consel are stored in the pv ¯le, and its contents are
shown by the command catpv. For example,
 

catpv foo
 

reads foo.pv and shows the contents like below.
 

# reading foo.pv
# rank item obs au np | bp kh sh wkh wsh |
# 1 1 -2.7 0.789 0.575 | 0.579 0.639 0.944 0.639 0.948 |

# 2 3 2.7 0.516 0.318 | 0.312 0.361 0.799 0.361 0.791 |
# 3 2 7.4 0.114 0.037 | 0.036 0.122 0.575 0.122 0.422 |
# 4 5 17.6 0.076 0.014 | 0.013 0.044 0.178 0.044 0.210 |


'au' The p-value of the approximately unbiased test. This is the main result of
consel, while rest of the p-values in the table are supplementary. It is derived
from the theory of the signed distance and the curvature of the boundary
as explained later. Let us denote it AUi for the item i. We may reject the
possibility that item i has the largest expected value of Yi when AUi < 0.05
at the significance level 0.05. 


 In its foo example, the trees are ranked based on Y_i test statistic. Difference of Y_i, \delta Y_i is shown in the report. I noticed that au p-value are often the highest for the top ranked tree. When a tree's au pvalue is less than 0.05, this tree can be considered as rejected.  Y_i statistic and p-value orders can give different orders, and this can be see in the foo example.

The following perl code can convert CONSEL report to a tab-delimited file for R.

use strict; use warnings;
use lib '/Users/hongqin/lib/perl'; use Util;

my $debug= 2 ;

my $inclusterfile = '_test.csv';
my $sourcedir = '~coat/ortholog.analysis/essetial.mcl.rbh.090208/bacillus.phylip.I3.1/wkdir';

open (IN,"<$inclusterfile"); my @lines = <IN>; close (IN); shift @lines;
my @clusterdirs = ();
foreach my $line (@lines) {
  chomp $line;
  my ($num, $bg, @res) = split( /\t/, $line);
  print "[$num][$bg]------";
  push (@clusterdirs, $bg);
}

open ( OUT, ">_conselreport.34essenGene.2012Jan4.tab" );
print OUT "BG\trank\titem\tobs\tau\tnp\tbp\tpp\tkh\tsh\twkh\twsh\n";

my $count = 0;
foreach my $mydir ( @clusterdirs ) {
  chomp $mydir;
  if ( !( -e "wkdir/$mydir/$mydir.consel.report.txt" ) ) {
    print "Error: $mydir has no consel report";
  } else {
    $count ++; print "Cluster:$count I am now working on [$mydir]::\n";
      open (IN, "<wkdir/$mydir/$mydir.consel.report.txt"); @lines = <IN>; close (IN);
      shift @lines; shift @lines; shift @lines;  #remove the first three lines
      pop @lines; #remove the last empty line
      foreach my $line (@lines) {
         $line =~ s/^#\s+//o;
         $line =~ s/\|//g;
         chomp $line;
         my @els = split( /\s+/, $line);
         print OUT $mydir;
         foreach my $el (@els) { print OUT "\t$el"; }
         print OUT "\n";    
      }  }
}

close (OUT);

exit;

No comments:

Post a Comment