James Tauber : James Tauber's Blog 2005/05

blog > 2005 >

Paths as homeomorphisms of the closed interval from 0 to 1

Previously, I defined a path in terms of a continuous function from a closed interval on the reals to a set of points in a topological space.

Because the function is continuous, by definition, the resultant image is homeomorphic to the closed interval on the reals. Because any closed interval on the reals is itself homeomorphic to the specific closed interval [0, 1] then the image of a path can be said to be homeomorphic to the real interval [0, 1].

UPDATE (2005-06-01): As Michael Hudson points out in a comment, a path will only be homeomorphic to the closed interval [0, 1] if it doesn't cross over itself. Homeomorphisms require the function to be bijective, continuous and have a continuous inverse. A path that crosses over itself doesn't meet these criteria.

UPDATE: next post

by : Created on May 28, 2005 : Last modified June 1, 2005 : (permalink)

Syntax by any other name

One challenge doing any kind of cross-disciplinary work is the differences in terminology. It's one thing where the same concept gets two different names—it's a lot harder when two different things get the same name.

I recently got into a confusing discussion on the b-greek mailing list where people (including some notable scholars) were saying things like "word order doesn't always alter syntax in Greek". As a linguist that sounds like utter contradiction but people insisted it was true for some constructions in Ancient Greek and "prominent linguists" had recognized this for decades.

I finally took my own advice and stepped back to look at the terminology being used. Then it struck me.

What gets called "syntax" by Greek scholars is largely what I would describe as mapping grammatical relations (e.g. SUBJECT) to semantic roles (e.g. AGENT). That's why when you open up a book on Greek "syntax" it spends a lot of time talking about what different cases means semantically.

This isn't what syntax means to a formal linguist or computer scientist. To them, "syntax" has to do with things like constituent structure and word order.

Now, in English, grammatical relations are predominantly determined by word order rather than morphology whereas in Greek, the word order matters less in determining grammatical relations and morphology takes on that role.

And here is the crux of the terminology confusion. Consider the previous paragraph. You could replace "word order" with "syntax" and it would mean (roughly) the same thing to a formal linguist. I suspect you could replace "grammatical relations" with "syntax" and it would mean the same thing to a Greek scholar.

So here's a way I suggested we could avoid confusion on the b-greek mailing list:

A. Whenever I see someone say "syntax", I'll read it as "grammatical relations".

That way "word order doesn't always alter syntax in Greek" reads to me as "word order doesn't always alter grammatical relations" and I'll agree.

B. Whenever one sees me say "syntax", one should read it as "constituent structure, word order, etc"

That way "word order doesn't always alter syntax in Greek" reads as "word order doesn't always alter constituent structure, word order, etc in Greek" and you'll see why I think it's a contradiction.

by : Created on May 28, 2005 : Last modified May 28, 2005 : (permalink)

Finding Dependencies in Tabular Data, Part 2

Yesterday I wrote about code in Python 2.4 to find out if the range of possible values in one column of tabular data is affected by the value of another column.

I posed the question there: What if you want to check the dependency, not between just two columns but two groups of columns?

Here is the original function for reference:

def find_dependencies(col_i, col_j):
    for i_value in possible_values[col_i]:
        j_values = set()
        for row in rows:
            if row[col_i] == i_value:
                j_values.add(row[col_j])
        if j_values < possible_values[col_j]:
            yield i_value, j_values

and here is a modified version that takes two sequences of column indices (rather than two column indices):

def find_dependencies_2(cols_i, cols_j):
    for i_value in cartesian_product(non_contig_slice(possible_values, cols_i)):
        j_values = set()
        for row in rows:
            if non_contig_slice(row, cols_i) == i_value:
                j_values.add(non_contig_slice(row, cols_j))
        if j_values < set(cartesian_product(non_contig_slice(possible_values, cols_j))):
            yield i_value, j_values

So find_dependencies_2((0,1,2), (3,4)) returns which tuples made up of the 0th, 1st and 2nd columns of a row reduce the possible values that can be taken by the tuple made up of the 3rd and 4th column of the row.

What was interesting in writing it is that I merely needed to change

row[col_n]

non_contig_slice(row, cols_n)

and

possible_values[col_n]

cartesian_product(non_contig_slice(possible_values, cols_n))

Where cartesian_product is defined as:

def cartesian_product(sets, done=()):
    if sets:
        for element in sets[0]:
            for tup in cartesian_product(sets[1:], done + (element,)):
                yield tup
    else:
        yield done

and non_contig_slice is defined as:

def non_contig_slice(seq, indices):
    result = ()
    for i in indices:
        result += (seq[i],)
    return result

Successive applications of find_dependencies_2 with different combinations of column indices can be used to determine what dependencies exist between columns in tabular data.

Hanon Exercises

Hanon's exercises entitled "The Virtuoso Pianist" are going well so far.

The first part (which I'm on) consists of 20 exercises to increase finger agility and strength (especially in the naturally weak fourth and fifth fingers).

The second part consists of 23 exercises that further prepare the fingers for the third part which consists of what are called the "Virtuoso Exercises".

The first 20 exercises are to be played starting at 60bpm and working up to 108bpm. Here's how I'm doing them.

Day One: 1 & 2 @ 60bpm
Day Two: 1 & 2 @ 70bpm; 3 & 4 @ 60bpm
Day Three: 1 & 2 @ 80bpm; 3 & 4 @ 70bpm; 5 & 6 @ 60bpm
and so on...

with the tempo of one pair of exercises being the tempo I played the previous pair the day before.

The sequence of tempos I (plan to) progress through is: 60bpm, 70bpm, 80bpm, 85bpm, 90bpm, 95bpm, 100bpm, 105bpm, 108bpm

I'm currently at 95bpm on 1 & 2 down to 60bpm for 11 & 12.

by : Created on May 26, 2005 : Last modified May 26, 2005 : (permalink)

Finding Dependencies in Tabular Data

I have a file with tabular data and I want to find out if the range of possible values in one column is affected by the value of another column. Here is how I did it in Python 2.4.

This is related to relational database normalization and the code below could be modified to specifically find functional dependencies for normalization. My goal was a little more general, however. I wanted to know any time the range of possible values for one column given another column's value is a proper subset of the possible values for the column when no other column values are fixed.

I have two data structures. One is a tuple of sets of possible values for each column. The second is a set of the rows (each a tuple).

possible_values = (set(), set(), set(), set(), set())
rows = set()

(Note, I've hard-coded the length of the possible_values tuple to 5 to match my current data.)

Next I load in the data — it's in a whitespace-delimited file, one row per line:

for line in file("data.txt"):
    row = tuple(line.strip().split())
    for col_i in range(len(row)):
        possible_values[col_i].add(row[col_i])
    rows.add(row)

Now here's my function for finding whether one column's value restricts the possible values of another.

def find_dependencies(col_i, col_j):
    for i_value in possible_values[col_i]:
        j_values = set()
        for row in rows:
            if row[col_i] == i_value:
                j_values.add(row[col_j])
        if j_values < possible_values[col_j]:
            yield i_value, j_values

It goes through each possible value in the i column and finds out if fixing that reduces the possible values in the j column.

Notice that it makes use of < as a set operator for proper subset. It's a generator too. It will yield any value in the i column that restricts the values in the j column, along with what the restriction is.

What if you want to check the dependency, not between just two columns but two groups of columns?

The solution turned out to involve a couple of cool modifications which I'll save for a followup post.

by : Created on May 26, 2005 : Last modified May 26, 2005 : (permalink)

Date for O.C. Screening of Alibi Phone Network

The West Coast premier of Alibi Phone Network will be at 6pm on 3rd June (next Friday). It will be one of nine shorts shown that session of the O.C. Shorts Festival. It's a shame I can't make it but I've heard that at least one of our actors will be there.

by : Created on May 26, 2005 : Last modified May 26, 2005 : (permalink)

Leonardo 0.6 Release Candidate 1

The first release candidate of Leonardo 0.6 is now available at

http://jtauber.com/2005/leonardo/leonardo-0.6rc1.tgz

Leonardo is the Python-based content management system that runs this site.

Assuming no blockers are found, I'll probably release Leonardo 0.6.0 early next week.

Let me know if you encounter any problems at all.

by : Created on May 25, 2005 : Last modified May 25, 2005 : (permalink)

Almost Ready for Leonardo 0.6 Release

Tonight I finished the remaining items I wanted to get done for the release of Leonardo 0.6.

That puts me ahead as I wasn't planning on a release candidate until the weekend and I should get it out tomorrow.

Will give me more time the rest of the week to work on editing the Atlanta reality show pilot.

by : Created on May 24, 2005 : Last modified May 24, 2005 : (permalink)

Checkpoint at 31.5

Just went past the half-way mark between being 31 and 32.

On my 31st birthday, I posted a list of goals for my 32nd year. Let's see how I'm going:

screening Alibi at festivals — DONE!
releasing the first Nelson James EP — made good progress before my US trip but looking doubtful as I will have spent over 80% of the first half of 2005 on the opposite side of the world from Nelson.
successfully running a 5K race — terrible progress! Haven't been training at all since I've been in US.
sitting music theory exam (and maybe practical too) — haven't done anything theory-wise although piano practice has been going well since I got my keyboard here in US.
getting back to Go — made a conscious decision a few months ago I was going to have to let this one slide.
completing Pimsleur Italian I, II and III — finished I, haven't made good progress on II yet.

Overall, not looking good but having spent months away from home, I have an excuse for some of them.

What am I most pleased with my progress on? Definitely Leonardo!

by : Created on May 24, 2005 : Last modified May 24, 2005 : (permalink)

Testing For Directories Outside the Tree

In Leonardo, I have a case where I am concatenating a fixed directory x and a relative path y.

I want to avoid the result being outside the directory tree rooted by x.

Any ideas?

root = os.path.abspath(x)
path = os.path.abspath(os.path.join(x, y))
assert path.startswith(root)

a reasonable approach?

Actually, I should clarify: y isn't a relative path as such. y can be '/' which should taken to mean x. So perhaps what I want is:

root = os.path.abspath(x)
path = os.path.abspath(os.path.normpath(x + os.sep + y))
assert path.startswith(root)

I ruled out

assert os.path.normpath(x + os.sep + y).startswith(x)

For the case where 'x' is itself relative.

by : Created on May 24, 2005 : Last modified May 24, 2005 : (permalink)

Dogbert the Ungrammatical

Today's Dilbert made me laugh but I found the second panel ungrammatical.

The antecedant of the plural "them" is the singular "every part of your body".

Secondly, when I checked with my sister Jenni she pointed out the use of "would" in that panel doesn't sound right either. It implies the existence of the procedure is hypothetical, which is not how it is presented in the first panel.

The "would" might just be a difference between Australian English and American English but it could also be a subtle slip up because the procedure is, in fact, hypothetical.

The "them" is, however, just plain ungrammatical as far as I can tell.

by : Created on May 23, 2005 : Last modified May 23, 2005 : (permalink)

Which Releases Have This Bug

I've talked before about my thoughts on severity and priority in issue tracking systems.

Things seem to be working well so far in how I've customised Roundup for Leonardo.

One thing that is still missing, however, is the ability for me to do queries like "show me all the bugs in 0.6b1 that have now been fixed". This is helpful for generating release notes. A simple list of "all the bugs that have been fixed in this release" isn't sufficient because it tends to include a lot of bugs that were only introduced during development of that release (e.g. bugs in new features).

So a simple "what build was this bug found in?" field is not what I want (although that's still useful, it doesn't solve the problem at hand). What I want is a field that lists which releases were shipped with the particular bug.

I think for most projects, it doesn't need to be a comprehensive list; really it just needs to be whether the bug existed in the last major release and the last minor release.

by : Created on May 22, 2005 : Last modified May 22, 2005 : (permalink)

Managing Bibliographies with BibDesk

In preparation for my PhD, I recently started investigating Mac OS X tools for managing BibTeX-based bibliographies.

In the end I settled on BibDesk. I chose it because of its functional merits but it's great that it also turns out to be open source.

Because BibDesk allows me to link from an entry to a file on my local filesystem, I can just put all my PDFs in one directory and use BibDesk as the interface to all the papers.

One thing that I don't believe is supported (yet) but which I would like to use as work on my literature review continues is the ability to express relationships between entries, perhaps along the lines I talked about in Google Scholar and Typed Citations.

Of course, then I'd like to express relationships between other entities such as authors and maybe concepts, terminology, etc.

Actually, a lot of the features I'd like to see in BibDesk are features I'd like to see in any MicroContent browser. After all, that's what BibDesk really is.

by : Created on May 22, 2005 : Last modified May 22, 2005 : (permalink)

Hapland

Via Bob Congdon, found an online puzzle of a very different type than the Python Challenge.

Check out Hapland. Very clever and a lot of fun.

by : Created on May 22, 2005 : Last modified May 22, 2005 : (permalink)

Leonardo 0.6 Beta 1 Released

The first beta of Leonardo 0.6 is now available at

http://jtauber.com/2005/leonardo/leonardo-0.6b1.tgz

Leonardo is the Python-based content management system that runs this site.

I'm still putting together a list of what's new since 0.5 but it's big: comments, trackbacks, file upload, categories and heaps of internal improvements.

Try it out and let me know below or via email how you go.

by : Created on May 21, 2005 : Last modified May 21, 2005 : (permalink)

Comments Welcome

Leonardo 0.6 will include the beginnings of support for trackbacks and comments.

I'm turning them on on this post just to see how things go.

Feel free to comment and/or trackback!

UPDATE (2005-05-21): Turning off trackbacks and comments now. Testing is done :-)

by : Created on May 20, 2005 : Last modified May 21, 2005 : (permalink)

Upgrade Successful

Well, I survived upgrading this site to the latest revision of Leonardo (@278 on the trunk)

I'll release a beta this weekend.

by : Created on May 20, 2005 : Last modified May 20, 2005 : (permalink)

About to Upgrade Leonardo

I'm about to upgrade the software running this site and blog to a pre-release of 0.6 beta 1.

Apologies in advance in case anything goes wrong.

by : Created on May 20, 2005 : Last modified May 20, 2005 : (permalink)

Context-Free Design Grammars

Chris Coyne's Context Free Design Grammars appeal to me on so many levels (linguistically, artistically, mathematically, computationally, ...)

Basically they are a set of production rules where the terminals are geometric shapes (actually just a circle and a square) and each symbol on the right-hand-side of a rule is augmented with a geometric transformation.

So a sentence in the generated language is just a collections of squares and circles at different positions, sizes and orientations.

But the results are stunning.

Of course, immediately after discovering this, I had to write a Python implementation. My first implementation immediately hit the recursion limit so I rewrote it to use a pool of states rather than recurse. Coincidently, I used exactly the same technique working on level 24 of the Python Challenge and avoided the recursion depth issues others had encountered.

Once it's cleaned up, I'll make my Python implementation available.

by : Created on May 19, 2005 : Last modified May 19, 2005 : (permalink)

Almost Ready for Next Leonardo Beta

I'm almost ready to release the first beta of Leonardo 0.6.

A lot has been improved since 0.5 and I'm keen to get 0.6 out so everyone can switch to it.

by : Created on May 17, 2005 : Last modified May 17, 2005 : (permalink)

What Planet Am I On?

For reasons unknown to both myself and Ryan Phillips, my blog entries are no longer appearing on Planet Python.

Nothing has changed at this end. Sounds like it could be a problem with 304 Not Modified. The fact my feed gives 304s makes debugging the feed difficult at times.

by : Created on May 17, 2005 : Last modified May 18, 2005 : (permalink)

Film Project Update: Accepted at Another Festival

Just found out that Alibi Phone Network made it in to the official selection of the O.C. Shorts Festival. It seems a cool little festival but unfortunately it's unlikely any of the filmmakers can make it out to California at that time.

Two of our actors are in L.A., though, so hopefully they can make it.

by : Created on May 17, 2005 : Last modified May 17, 2005 : (permalink)

Python Challenge Continues

This week will be so much more productive if I just stay away from http://pythonchallenge.com/ but they've recently put up levels 23-26 and they are calling my name.

must...keep...away....

UPDATE (2005-05-18): Argh! I can't resist it. Fortunately, level 23 was easy. I think it was deliberate to suck me back in.

by : Created on May 17, 2005 : Last modified May 18, 2005 : (permalink)

43 Things and Self-Normalizing Folksonomies

Python Challenge is still sucking up my time but I did take a break and take another look at 43 Things.

43 Things is a site for declaring your goals and matching you up with other people who have the same goals or who have already accomplished them.

They've added some new features since I first checked out the site and one of them really impressed me—how they deal with the issue of distinctions without a difference. i.e. goals that are really the same thing but have been created separately and given different names.

Because goals are identified by the string given in answer to "I want to...", there is a distinction made between say "speak Italian fluently" and "speak fluent Italian" even though they are clearly the same goal.

How does 43 Things solve this?

When someone notices two very similar goals, they can suggest that one is really similar to the other. When they do this, the pages for both goals start showing the other goal under the heading "People have suggested XYZ is really the same as..."

Other people can then, with a single click (hmm, probably a GET), switch their goal from one to the other.

But here is the really clever thing. They say whether the other goal has more or less people. This means you can voluntarily switch your choice of goal naming to the one that emerges as more popular.

So the community's folksonomy becomes self-normalizing.

by : Created on May 12, 2005 : Last modified May 12, 2005 : (permalink)

Python Challenge

With the exception of a break to have an excellent dinner with James Marcus, I've spent the last twelve hours working on the Python Challenge. It's like playing Myst but with Python scripts and the Web.

I'm currently stuck on level 17 and just asked my first question on the forum.

I'm happy to give anyone hints up to that level.

UPDATE (2005-05-09): Up to level 20. No hints on the forum yet :-)

by : Created on May 9, 2005 : Last modified May 9, 2005 : (permalink)

Metadata in Mail 2.0

I mentioned earlier that the ability to add metadata to emails and create smart folders based on that metadata is the feature that would secure my continued use of the new Mail 2.0 in Mac OS X 10.4 'Tiger'.

I find it odd that rules can colour messages but you can't manually label an email with a colour. If you could do that then have smart folders based on colour, that would be enough for how I want to organize my email.

But, of course, I'd love arbitrary metadata. And this is where it gets interesting.

Tiger has a command-line tool mdls which lists the metadata for a particular file. It is this metadata that is available to Spotlight.

All my email messages are downloaded by Mail via IMAP and put into ~/Library/Mail and each email message (and attachment) gets its own file.

I just tried mdls on one of those files and it has metadata for things like ItemTitle (subject), ItemAuthors and ItemRecipients. They are actually displayed in the Finder Get Info under More Information too.

If I add a Spotlight Comment in the Finder Get Info window, mdls will show it and I can easily search for it with Spotlight. From the Finder I can set a colour label and that shows up in mdls (and is hence Spotlightable) as well.

Why can't Mail 2.0 make better use of this?

UPDATE (2005-05-7): Joe Weaks suggested AppleScript for setting the background colour on a mail message. A quick Google search revealed the Label Your Mail hack from the O'Reilly Panther Hacks book. I'm still surprised this didn't make it in as a feature in Mail 2.0

by : Created on May 6, 2005 : Last modified May 7, 2005 : (permalink)

An HTTP Lesson from Google

Hopefully Google's Web Accelerator will teach a whole new generation of Web developers the dangers of using GET when they should be using POST.

by : Created on May 6, 2005 : Last modified May 6, 2005 : (permalink)

More Metadata Adventures in Tiger

I created my first Smart Folder in Finder. I noticed when selecting what metadata fields to search on there were (besides all the photography ones) things like Project. I would love to be able to tag each file with what project it relates to. How do I add this to an arbitrary file, though?

And how do I add this to an email message? Not that it matters because Smart Folders in Finder seem to exclude searching email or vCards.

If I create a Smart Folder with something like Kind = Any and Author = James Tauber as the query and then go to Get Info on the folder, it shows the query as:

(kMDItemAuthors = 'James Tauber'cd) && (kMDItemContentType != com.apple.mail.emlx) && (kMDItemContentType != public.vcard)

So let me get this right: Kind = Any means any but email and vCards.

Interestingly, though, the query found Powerpoint presentations and Word docs even though I don't have either app installed at the moment.

by : Created on May 6, 2005 : Last modified May 7, 2005 : (permalink)

Setting Up Tiger

Not much blogging lately. Tiger arrived on Monday and I got it installed Tuesday evening. I haven't transferred any of my data back yet, although I've downloaded subversion binaries and SubEthaEdit ready to work on Leonardo.

Spotlight and the Dashboard have already been great time savers. I haven't tried Automator yet. The feed-reading capabilities of Safari RSS are actually better than I thought they'd be. I could even see reading a particular class of feeds there rather than in NetNewsWire.

Smart folders in Address Book mean I can fake tags by putting text like @filmmaker in the notes on a person and then creating a smart folder for cards whose notes contain @filmmaker. The birthday field integrated with iCal is very cool.

I'm giving Mail.app another chance. My 2 Gig+ of IMAP mail has pretty much been sync'ed. But without the ability to annotate mail, I can't fake tags with smart folders. Not sure I'll last on Mail.app without something like that.

by : Created on May 4, 2005 : Last modified May 4, 2005 : (permalink)

Watching Feynman

I'm just about finished watching the third of four lectures Richard Feynman gave at the University of Auckland in 1979. Besides being a fascinating overview of quantum electrodynamics for a general audience, it's wonderful to just see Feynman in action.

I'd always heard what a fantastic lecturer Feynman was and so I was keen to see him for myself. He was brilliant but not in the way I expected. He wasn't the most well-spoken person I've listened to; sometimes he would get a little lost in his train of thought, go off on tangents or start to say something only to decide not to proceed down that path; sometimes he'd make mistakes that he'd have to go back and correct.

So despite this, why were his lectures so good? A large part of it was his ability to extract out the key ideas of a theory and present them in a way that was relativity simple but still faithful to the full theory. This is true of his writings too. But what made his lecturing so good?

Four things come to mind:

his authority;
his humility;
his humour; and
his excitement.

His authority and his humility interacted in very interesting ways. Here was a man who was so comfortable with what he did and didn't know that he didn't need to boast. He could say to the audience "I'm not going to explain this because you wouldn't understand it" and not seem arrogant because he would just as often say "I'm not going to explain this because I don't understand it".

His humour was also remarkable; a combination of self-deprecation and genuine wit. When he made a mistake or decided to back-peddle a topic or example he had started, he'd always recover in a way that made the audience laugh.

But the thing that stood out more than anything else was his excitement about what he was teaching. You can tell, watching the video, that he just loved explaining this stuff to people. If I had to pick one thing that set him apart it would be that.

Even if you are not really that interested in physics, watch at least one of the videos just to see what a truly great teacher is like.

by : Created on May 2, 2005 : Last modified May 2, 2005 : (permalink)

Quicktime 7 for Panther

I haven't upgraded to Tiger yet because it won't arrive from Amazon until tomorrow. But today Software Update on my Panther-running PowerBook informed me that Quicktime 7 was available.

Given that the Quicktime 7 upgrade is available for Panther, it isn't really accurate to say that Quicktime 7 is a feature of Tiger. It just happened to come out at the same time and so is bundled with it.

I wonder if any other so-called Tiger features will be available as free Software Updates for Panther users. I'm guessing Safari 2.0 or Mail might be another contender. Maybe even iChat AV.

by : Created on May 1, 2005 : Last modified May 1, 2005 : (permalink)

Happy Birthday

Happy 50th, Dave.

by : Created on May 1, 2005 : Last modified May 1, 2005 : (permalink)