Thursday, November 16, 2017

biogrid yeast tab2 file clean

applejack:biogrid hqin$ ll
total 126248
-rw-r--r--@  1 hqin  staff   713B Nov 16 13:36
-rw-r--r--   1 hqin  staff   385B Nov 16 11:44 1analye_biogrid.Rmd
drwx------@ 65 hqin  staff   2.2K Nov 16 11:41 BIOGRID-ORGANISM-3.4.154.tab2
-rw-r--r--@  1 hqin  staff    62M Nov 16 11:24
drwxr-xr-x   5 hqin  staff   170B Nov 16 13:34 test

applejack:biogrid hqin$ perl BIOGRID-ORGANISM-3.4.154.tab2/BIOGRID-ORGANISM-Saccharomyces_cerevisiae_S288c-3.4.154.tab2.txt > BIOGRID-ORGANISM-Saccharomyces_cerevisiae_S288c-3.4.154.tab2.cleaned.txt

Trying to read.table into R, only half of the table were loaded. 

I then tried "text to column" feature in Excel, and it worked. All 688014 rows are converted. I then save this as csv file. This csv file can be loaded into R correctly, 688014 obs of 36 variables. 

