Wednesday, March 18, 2009


Nowadays, I've tried to build my own perl pacakges selections on routine bioinformatics tasks.
Althrough BioPerl is already out there, it usually doesn't have modules for my own purpose.
For example, a package for learning PAML packages from A to Z, not requiring specific formatting to run the BioPerl. Well, so I started to build my own perl packages.

I named the base package name 'BioTool' and added packages whenever I confronted any situation
that I found some tasks might be used repeatly and routinely.

Recenlty, I made 'BioTool::NCBIfetch' package, which can extract sequences and all the related references db information including Gene symbol, Gene description, Chromosome location, Ensembl, Unigene, Uniprot, KEGG and GO for a given gene query.

Since the package fetches the information from result of NCBI eutils query, which results with up-to-date information, the analysis result for a given gene is up-to-dated. So users don't need to worried about whether the reference data is out-dated when they use a program which works the same taks based on localized data. In short, users could free themselves from updating all related indenpent databases day to day.

I'll show some example usage of this package

use BioTool::NCBIfetch;
my $ncbi=BioTool::NCBIfetch->new;
$fetch->set_query(gene,780,xml); # Input : search DB, ID(geneID, gID), result type, result format type)

my $symbol=$fetch->get_symbol;
my @TreEMBL=$fetch->get_uniprotTreEMBL;

print "$symbol\t@TreEMBL\n";

This package might be very helpful for ones who want to cross mapping between major biology databases based on NCBI gene identifier. I actually made it for cross-mapping genes in GEO platform since GEO platform annotation is frequently incomplete and inaccurate.

If anyone interests in this pacakge and want to use, request!