View Single Post
Old 12-06-2003, 06:20 PM   #12 (permalink)
ratbastid
Darth Papa
 
ratbastid's Avatar
 
Location: Yonder
Greediness and multiples

By default, a regular expression matches "first and most". In other words, regular expressions are greedy. Observe:

Code:
#!/usr/bin/perl

my $string = 'ab abc abcd abcde';
$string =~ /(a.*c)/;
print $1;
We would expect the above code to print out<pre>ab abc abcd abc</pre>
That .* (meaning zero or more of any character) will match greedily--it matches the largest string it can, and still have the expression match.

Can you control this? Obviously, the potential for runaway matching exists, if you lose your head coding your expression.

Two things to do. One is to use te {x} operator instead of the splat (*) to specify a specific number of matches.

Obserruve:

Code:
#!/usr/bin/perl

my $string = 'ab abc abcd abcde';
$string =~ /(a.{4}c)/;
print $1;
We'd expect that code to print:<pre>ab abc</pre>
The {4} in that expression says we want exactly 4 of those "any character" characters. You can follow ANY character with a * or a {x}, by the way. The regex /ab{4}c/ will match the string "abbbbc".

Say you're searching for porn. Easy. Just run the regex /X{3}/.

Here's the next great thing about the {x} operator--it can take two arguments, a minimum and a maximum. It's spelled {x,x} that way.

Code:
#!/usr/bin/perl

my $string = 'ab abc abcd abcde';
$string =~ /(a.{6,8}c)/;
print $1;
That should print:<pre>ab abc abc</pre>
The {6,8} says "find me at between six and eight anythings.

You can leave one of those operatands out to say "no limit", too. "{4,}" means "at least four". "{,2}" means "zero, one or two".
ratbastid is offline  
 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43