Monday, March 28, 2011

Regex - difference between reluctant, greedy and possessive quantifiers

This one , I keep on forgetting so I thought of putting a post so that I shall not forget this again and it benefits others who had a hard time in understanding java's regex tutorial.

There are three types of regex quantifiers in regex possessive (a + character identifies this), greedy ( default so no character to identify this), reluctant ( a - character identifies this).

For example \w means any word character. Let take a word "blackdogishere". Regex "\w*dog" matches "blackdogishere". First lets examine what "\w*dog" represents, "*" means zero or more character. "\w*" means zero or more word character. "\w*dog" means, zero or more word charactesr followed by "dog". Note that dog also represents a word character. By default the quantifier is greedy so when "\w*dog" matches against "blackdogishere", \w* matches the entire string "blackdogishere" but then as the regex character moves on, it backtracks and hence "\w*dog" matches "blackdogishere".

Now if we consider reluctant quantifier which is identified by ? charcter. "\w*?dog" also matches "blackdogishere" but in some different way.  Reluctant quantifier eats the character one by one. So "\w*?" matches "black" and then "\w*?dog" matches "blackdog".

Possessive quantifiers (identified by + character) eats the character in a full stretch and never back tracks. "\w*+" eats all characters in "blackdogishere" which never returns back and hence "\w*+dog" does not match the string "blackdogishere".

Believe I have explained the three types of quantifiers in a simple way.

No comments:

Post a Comment