Wednesday, February 4, 2009

Lookaround in PERL

'Lookaround' , a regular expression in perl, could be used as anchor in regular expression. The anchor which I mean here is a point in the string from which the perl scan for a given regular expression.

Examples of anchor in perl.

'great job boy'=~/^g/; # '^' is used to anchor
'great job boy'=~/y$/; # '$' is used to anchor

Alike the anchors above exam code, lookaround could be used similary as below

'great job boy'=~/(?=job).*boy/; # (?=pattern) is a form of positive lookaround
'great job boy'=~/(?!job).*boy/; # ( ?!pattern) is a form of negative lookaround

So how the above example lookaroud code works?

(?=job) anchors the position of match at 'j' of job.

In other words, perl scan the whole string 'great job boy' first, but after lookaround matches, perl start to search for remaining 'job boy'.

Then regular expression below matches or not?

'great job boy'=~/(?=job)boy/ ;

The answer is NO!!!

As I mentioned above, lookaround is working as anchor like '^'.
So (?=job) anchors at the position of j of 'job'. The first character
that could be matched after anchoring is j of 'job, not b of 'boy' which
is following job. Therefore, it doesn't match!

To make 'boy' match after anchoring (?=job), it need to be as below

'great job boy'=~/(?=job).*boy/;

What about negative lookaround? It's more complicated to understand.

Guess what happens below.

Does it matches? It matches! Why it matches?
'You' comes before 'Are'. To regular expression success,
string before 'Are' should not be 'You'.

Then what on earth (?!You) anchor?
It should anchor where there's no 'You'.
Then it may be position 'o' or 'u' which does not match to 'You'.

Yes it is.
Then what is before 'Are' , which doesn't match to 'You'?
'u' can be! Therfore the above regular expression including negative lookaround success.

We can verify where negative lookaround anchors with some exams below

print "$`\t$'\n" if 'YouAre'=~/(?!You)/;
# Y ouAre

print "$`\t$'\n" if 'YouAre'=~/(?!You)e/;
# YouAr e

# it doesn't match