James Tauber's Blog 2008/11/03


blog > 2008 > 11 >


Python's re.DEBUG Flag

Eric Holscher points out a Python gem I never knew about. If you pass in the number 128 (or, as I have a preference for flags in hex, 0x80) as the second arg to re.compile, it prints out the parse tree of the regex:

>>> import re >>> pattern = re.compile("a+b*\s\w?", 0x80) max_repeat 1 65535 literal 97 max_repeat 0 65535 literal 98 in category category_space max_repeat 0 1 in category category_word

While re.compile is documented as having the signature

compile(pattern[, flags])

the particular flag 0x80 is not documented as far as I can tell.

I thought I'd dig in further.

Firstly, note that re appears to cache patterns as if you repeat the same re.compile, it returns the same object and doesn't spit out the parse tree. There is a re.purge function for purging this cache but while this is mentioned in help(re) it is not in the main documentation.

Secondly, note that the flag 0x80 is actually defined as DEBUG in the re module, so a more robust form would be:

re.compile(pattern, re.DEBUG)

A source code comment for DEBUG and another undocumented flag TEMPLATE (which supposedly disables backtracking) mentions:

# sre extensions (experimental, don't rely on these)

which explains why they aren't documented.

In the Python source code, there is also a Scanner class defined with the comment "experimental stuff (see python-dev discussions for details)"

A quick search of the python-dev mailing list found nothing. Perhaps a python core development could fill us in.

by : Created on Nov. 3, 2008 : Last modified Nov. 3, 2008 : (permalink)


Cleese and a New Toolchain

Back in July 2003, I had an idea to "make the Python intepreter a micro-kernel and boot directly to the Python prompt". Thus started Cleese, which I worked on with Dave Long. We made a good deal of progress and I learned a tremendous amount.

In February 2007, I moved Cleese from SourceForge to Google Code Project Hosting in the hope of restarting work on it. In between 2003 and 2007 I'd become a switcher and so I needed to work out how to do on OS X what I'd been doing with a very strange hybrid of Windows command line and Cygwin before. Alas I never got around to that part.

Then about a week ago, inspired by Brian Rosner's interest in the project, I decided to give it another go. I also decided to use it as an opportunity to finally learn Git.

First goal: build a "hello world" kernel (no Python yet). Fortunately I had one from the initial stages of Cleese 2003, but it wouldn't build. In particular ld was barfing on the -T option used to specify a linking script (which OS X's ld doesn't support).

After asking some questions on the #osdev channel on freenode, I discovered I'd need a completely new gcc and binutils toolchain to support i386-elf. This didn't turn out to be difficult at all, though.

Here were my steps:

export PREFIX=/Users/jtauber/Projects/cleese/toolchain export TARGET=i386-elf

cd ~/Projects/cleese curl -O http://ftp.gnu.org/gnu/binutils/binutils-2.19.tar.gz mkdir toolchain tar xvzf binutils-2.19.tar.gz cd binutils-2.19 ./configure --prefix=$PREFIX --target=$TARGET --disable-nls make make install cd .. curl -O http://ftp.gnu.org/gnu/gcc/gcc-4.2.4/gcc-core-4.2.4.tar.bz2 bunzip2 gcc-core-4.2.4.tar.bz2 tar xvf gcc-core-4.2.4.tar cd gcc-4.2.4 ./configure --prefix=$PREFIX --target=$TARGET --disable-nls --enable-languages=c --without-headers make all-gcc make install-gcc

Now my "hello world" kernel builds. Next goal...working out how to programmatically build disk images for VMware Fusion (or, failing that, Qemu)

by : Created on Nov. 3, 2008 : Last modified Nov. 3, 2008 : (permalink)