Friday, March 27, 2009

Symbolic reference 사용시 유의점


$x=10;
$sym='x';
$$sym=20;

print "$x\n";



위의 코드를 실행하면 $x 값으로 20이 출력되는 것을 확인할 수 있다.
Symbolic reference 가 적용되어 $sym = 'x'에 대해
$ { $sym } = $ { x} = $x 로 해석되어
$$sym=20 은 $x=20과 같이 해석되기 때문이다.

그럼 아래 코드는 어떨까?


my $x=10;
my $sym='x';
$$sym=20;

print "$x\n";


첫번째 코드의 결과와 같은 값이 출력될까?
정답은 NO!
이 코드는 그대로 $x에 대해 첫 라인의 정의대로 10이 출력된다.

해답은 'my' 과 symbolic reference의 동작 원리에 담겨있다.
my로 선언된 lexical 변수는 symbol table에 저장되지 않는다.
symbolic reference는 오직 lexical 변수를 제외한 package 변수에만 적용된다.


따라서 두번째 코드에서 변수 선언을 my 대신 local이나 our로 해주면 첫번째 코드와 같은 결과를 얻을 수 있다.


my $x=10;
$sym='x';
$$sym=20;

prnt "$x\n";


이 코드는 어떨까?
$$sym = ${x}= $x 로 해석되어 최종적으로 symbolic reference
의 적용을 받는 변수는 $x다. 그런데 $x가 my 로 선언되어 있어
symbol table에 저장되어 있지 않다.

따라서 $$sym=20 은 lexical 영역의 $x에 영향을 주지 못하고
이 코드는 결과적으로 $x=10 을 출력하게 된다.

그렇다면 $$sym=20 는 완전히 아무런 의미도 없는 코드일까?

그렇지 않다. $$sym 은 원래의 역할 그대로의 역할을 수행했다.
즉, symbol table에 존재하는 package global $x 에 20을
저장한 것이다.

이를 확인하기 위해서는 package 변수 $x를 출력해 보면 된다.


my $x=10;
$sym='x';
$$sym=20;

print "Lexical 변수 : $x \n"; # 10을 출력
print "Package 변수 : $::x \n "; # 20을 출력

Wednesday, March 18, 2009

BioTool::NCBIfetch

Nowadays, I've tried to build my own perl pacakges selections on routine bioinformatics tasks.
Althrough BioPerl is already out there, it usually doesn't have modules for my own purpose.
For example, a package for learning PAML packages from A to Z, not requiring specific formatting to run the BioPerl. Well, so I started to build my own perl packages.

I named the base package name 'BioTool' and added packages whenever I confronted any situation
that I found some tasks might be used repeatly and routinely.

Recenlty, I made 'BioTool::NCBIfetch' package, which can extract sequences and all the related references db information including Gene symbol, Gene description, Chromosome location, Ensembl, Unigene, Uniprot, KEGG and GO for a given gene query.

Since the package fetches the information from result of NCBI eutils query, which results with up-to-date information, the analysis result for a given gene is up-to-dated. So users don't need to worried about whether the reference data is out-dated when they use a program which works the same taks based on localized data. In short, users could free themselves from updating all related indenpent databases day to day.

I'll show some example usage of this package


use BioTool::NCBIfetch;
my $ncbi=BioTool::NCBIfetch->new;
$fetch->set_query(gene,780,xml); # Input : search DB, ID(geneID, gID), result type, result format type)

$fetch->get_result;
my $symbol=$fetch->get_symbol;
my @TreEMBL=$fetch->get_uniprotTreEMBL;

print "$symbol\t@TreEMBL\n";



This package might be very helpful for ones who want to cross mapping between major biology databases based on NCBI gene identifier. I actually made it for cross-mapping genes in GEO platform since GEO platform annotation is frequently incomplete and inaccurate.

If anyone interests in this pacakge and want to use, request!

Tuesday, March 10, 2009

Economic crisis & public data deposition in GEO

Does stunning global economic crisis starting from the late 2007 influence each gorvernment to cut down the budget for scientific research? If you're an independent researcher, you've already experienced how serious it is. If you are not, you don't know the actual influence of it on science research.

Starting with just interest, I tried to answer to above question by analyze the trend of data deposition to the largest public gene expression database, GEO. Since the number of gene expression data deposition has been increasing from the creation of GEO, I hypothesize that the data deposition might be not continuously increasing in year 2008 if the global economic crisis was really a matter for cutting down the budget for the science research.




Look at the table above! The number of data deposition to GEO in 2008 is slightly decreased as I expected. The number of data deposition has continuously increased from 2001 through 2007 but this trend stop in the year 2008.



This trend is repeately found for the top 5 countries of budget for science. By the way, USA is really a huge player in the basic science. Look at the size of data! US itself deposited several fold more data than the sum of all the data the other four countries deposited. Anyhow, back to the main point, I think this trend represent the effect of economic crisis on shirinking science budget in each gorvernment.




Except the USA from the chart, another interesting point shows up. That's UK. Deposition of data from UK was decreased from 2007 not 2008. Their contribution to public data was peak in 2006 but the number is less than a half in 2007. What happened in UK at that time? I don't have any evidence on this matter. Just I expect that UK changed their research plan for biology using microarray since microarray generally lack reproducibility and consistency. Or there might be some big scientific project to producing microarray data in 2006 only.

Among the 5 countries I mentioned here, Germany showed least decreasing nubmer of deposited
data. They deposited only 21 data set less compared to previous year ( 14% ) while the other four countries decreased that number around 50%.

The small contributing countries with less than 50 deposition in each year, some countries seem that they are not affected by economic crisis. South Korea is one of the exam. Their deposition increased from 16 in 2007 to 19 in 2008. As I'm researching in Korea, I think it's not because Korean gorvernment did not cut budget for science but most of researchers who producing microarray data don't know GEO at all or don't consider to deposit their data. Why? most of them are just wet-lab biologist. So they usually don't use public data for their own resarch. Even when they want to contribute, depositing data to GEO needs some computational skills for easy or automatic depositing. Without any programming skill, the process to deposit the data is very tedious and boring. So three more data sets of deposited data does not show the actual trend of Korean science budget related to economic crisis. I think the minor contributing countries might have similar situation.

That's all I want to say today! In one sentence, global economic crisis seems actually to force each gorvenment to cut their science budget.