Wednesday, January 16, 2013

Some common bugs in bioinformatics programming and lessons in dealing with them

Many ad hoc scripts are written for bioinformatics projects.  I intend to list here some common bugs that I often made. This is going to be a working list. Hopefully, this can leads to better way of coding and managing.
  • Hard-coded links that were not updated after projects have evolved. This often lead to wrong files, or results are output to unintended directories. 
  • A hard-coded debugging variable that was not turned back during production runs. 
  • Mixing of variables names, file names, etc. 
  • Compatibility problems. This can occur after software upgrades. For example, after upgrade to perl 5.10.0, I have re-install bioperl to get previous codes working. 
  •  File format problems. With myriads of data format, this problem is going to keep bugging us. 
  • Typos 
  • logical mistakes, often occur in ifelse statement.
Some experiences and lessons on dealing with these problems:
  • Correct the blind spot. 
    I spent 3 hours in fixing a directory problem in a perl script for batch run. I noticed the job was not running in the right directory even when I chdir $homedir every step. The problem is that I copy-paste the directory twice in variable $homedir, so perl always choose the current directory by default. I was so sure that $homedir was correct because I copy-pasted it, and did not check it. I found this out when I copy-pasted the long directory again and found the length did not match. I spent 2 hours on this before 1am. I then decided to go sleep and look it fresh again in the morning. The fresh morning working energy helped me spotted this error.

  • Switch between different syntax
In perl, qw and string quotes use different syntax. In qw, no comma is needed.
 case 1: qw(results.H0.txt results.H1C.Gblocks.model1.txt results.H2C1S1.txt);
 case 2: qw(results.H0.txt, results.H1C.Gblocks.model1.txt, results.H2C1S1.txt);
In case 2, the file name will actually be treated as "results.H0.txt,". This extra comma is a obvious mistake.




No comments:

Post a Comment