Eric Holscher points out a Python gem I never knew about. If you pass in the number 128 (or, as I have a preference for flags in hex, 0x80) as the second arg to re.compile, it prints out the parse tree of the regex:
>>> import re >>> pattern = re.compile("a+b*\s\w?", 0x80) max_repeat 1 65535 literal 97 max_repeat 0 65535 literal 98 in category category_space max_repeat 0 1 in category category_word
While re.compile is documented as having the signature
the particular flag 0x80 is not documented as far as I can tell.
I thought I'd dig in further.
Firstly, note that re appears to cache patterns as if you repeat the same re.compile, it returns the same object and doesn't spit out the parse tree. There is a re.purge function for purging this cache but while this is mentioned in help(re) it is not in the main documentation.
Secondly, note that the flag 0x80 is actually defined as DEBUG in the re module, so a more robust form would be:
A source code comment for DEBUG and another undocumented flag TEMPLATE (which supposedly disables backtracking) mentions:
# sre extensions (experimental, don't rely on these)
which explains why they aren't documented.
In the Python source code, there is also a Scanner class defined with the comment "experimental stuff (see python-dev discussions for details)"
A quick search of the python-dev mailing list found nothing. Perhaps a python core development could fill us in.
The original post was in the category: python but I'm still in the process of migrating categories over.
The original post had 3 comments I'm in the process of migrating over.