James Tauber

journeyman of some

James Tauber's Blog

Welcome to my blog. It's a haphazard collection of thoughts on various interests of mine as well as updates on projects. If you're interested in any of blogging, personal information management, Python, Django, XML, RDF, software development, Web 2.0, open source, free-market economics, Mac OS X, web architecture, REST, music theory, record producing, filmmaking, linguistics, the Greek New Testament, pure mathematics or general relativity, there's a chance you may find something of interest.

Open Source Project: gyt

To conclude my week of open source projects I spent time in the hotel lobby and airport today implementing the beginnings of an idea I've had for a while.

gyt is (the start of) an implementation of Git-like ideas in Python (see github).

It's not intended to be a port of Git to Python. It's more designed as an exploration of how Git works and how the concepts might be applied to other tasks. In particular, I'm interested in exploring its use for versioning in-memory data structures rather than blobs on disk.

So in a way, gyt is to Git what Rel is to a relational databases. In fact, the two might have some strong tie, in terms of gyt being used to version relations.

gyt will probably look less and less like Git internals over time. I'll likely change the name at some point as the distance between the two increases.

by James Tauber : Created on July 23, 2010 : Last modified July 23, 2010 : Categories python open_source git : (permalink)

Open Source Project: Rel

For my fourth open source project, I thought I'd get around to starting a repo for my various Relational Python explorations, including functional dependency analysis and some ideas I've been having lately that I haven't yet implemented.

Rel (github) is an exploration of the relational model and data analysis in Python.

I'm starting off just bringing together code I had on my blog from various posts in 2005, initial focusing on implementing relations, a few relational operators and exploring functional dependency analysis.

Still to come is broader support of the relational model, use of namedtuples, use of itertools, importers and exporters (including possible support for Django's fixtures format) and more utility functions I have scattered all over the place in various data analysis scripts I've written over the years.

by James Tauber : Created on July 22, 2010 : Last modified July 22, 2010 : Categories python relational_python open_source : (permalink)

Open Source Project: FOP

I have to do an impromptu additional blog post because I just found out that my first big open source project, Apache FOP, just had its 1.0 release today.

I started FOP in 1998 and, in 1999, donated it to the Apache Software Foundation. I haven't been involved in its development for a long time but am delighted to see it reach 1.0, and from the articles written about its release, it sounds like it's actually used by a lot of well-known companies.

FOP was the first project I used Python on. It was the first open source project I did involving other contributors. It was the first large Java project I did (and I remember first grokking things like the Visitor Pattern in trying to solve design problems in FOP)

For a blast down memory lane, be sure to look at my archive of the old FOP website.

by James Tauber : Created on July 21, 2010 : Last modified July 21, 2010 : Categories fop open_source : (permalink)

Open Source Project: Pinax

It seems entirely appropriate that today's featured open source project should be Pinax.

I gave a talk on Pinax at OSCON this morning and at the end of the talk announced the availability of the first 0.9 alpha.

Pinax 0.5 was our first release. 0.7 represented a response to Pinax's first contact with actually building real sites and included fairly cutting-edge (at the time) use of virtualenv and pip.

The development work since then has come from a lot more experience building websites with Pinax. But while a lot of work has taken place, a lot of people didn't know about it because we went too long without a release.

Today that was rectified. Pinax 0.9a1 is now out and is available as easily as typing "pip install Pinax" (preferably inside a virtual environment)

Brian Rosner did a great write up on the mailing list about 0.9a1.

by James Tauber : Created on July 21, 2010 : Last modified July 21, 2010 : Categories python django open_source pinax : (permalink)

Open Source Project: parse-helper.js

Continuing my blogging about an open source project of mine each day during OSCON...

parse-helper (github) is a javascript library for building controls for assisting in the entry of parsing codes during linguistic annotation. I only just started it last week as part of a larger project called OXLOS, a Pinax-based platform for collaborative corpus linguistics.

The idea of parse-helper is that the controls could be attached to any text input expecting a parse code to be entered.

It currently includes support for the CCAT parsing codes for Ancient Greek (as used by the MorphGNT project). Other parsing schemes are planned.

The CCAT support includes filtering available attributes based on part-of-speech selected (and choice of verbal mood can further refine the options).

At the moment there is no support for going the other way and taking an existing parse code as a string and correctly showing the individual attribute values selected. This will be coming soon.

This project is at a very early stage and I'm sure the code could be improved a lot.

You can view a demo.

by James Tauber : Created on July 20, 2010 : Last modified July 20, 2010 : Categories linguistics morphgnt open_source javascript : (permalink)

Open Source Project: Czerny

As I'm at OSCON this week, I thought it would be fun to kick-start my blogging by blogging each day about some open source project I've worked on in the last year.

Today I want to introduce Czerny (github).

Czerny, named after Carl Czerny—the Austrian composer and piano teacher, is a early-stage Python project for assessing the performance of piano pieces.

The idea came when I was doing Charles-Louis Hanon's Virtuoso Pianist exercises. My thought was that it would be nice if a program compared my performance with the score and indicated not only mistakes, but deviations in velocity and timing.

The basic idea is:

  • record a performance of the exercise as MIDI (or similar) events
  • align the performed notes with the "score" notes
  • identify errors as well as fluctuations in timing, velocity, etc

The first two items are at a very early prototype stage. The third has not yet been started on.

Czerny includes a pyrex wrapper around OS X's Core MIDI library and a Python script for outputting events coming in from a MIDI keyboard. At some point it could also just read MIDI files (SMF) but for now, it records MIDI input into its own simple file format.

Alignment is currently done via my implementation of the Needleman-Wunsch alignment algorithm. There's a lot more work I plan to do on the note difference function, but I need more data first.

I haven't yet started on actually interpreting the differences once an alignment has been made. The obvious feedback I can give is in notes added or dropped. But my goal is also to express variations in velocity and timing. If the "score" has fingering, I could also eventually give stats on the performance of each finger, perhaps indicating which need more work.

A long-term goal might also be studying the performance, not of piano exercises, but real pieces to identify and learn patterns in how scores are mapped to performances.

by James Tauber : Created on July 19, 2010 : Last modified July 19, 2010 : Categories python music_theory open_source : (permalink)

Conference Time

I have four conferences coming up in the next eight weeks.

From 12th-14th February, I'll be attending Kiwi Foo Camp in New Zealand—one of those trips where the travel time is longer than the length of the conference :-)

The day after I get back, I'm off to Atlanta for PyCon. I'm involved in the Pinax tutorial at the start and will be staying all the way through the sprints where we hope to get lots of Pinax done!

Then March 10th-12th I'm in Montréal for ConFoo, the first conference I've been to in a while that's all expenses paid for speakers. I'll be giving a talk on, you guessed it, Pinax. Will be fun to introduce Pinax at a general Web conference.

I'll finish off the month in San Jose for BibleTech March 26th and 27th. I'll be giving two talks there, one on my graded reader project and one on using Pinax for collaborative corpus linguistics (partly talking about Pinax in general and partly talking about some early stage work I'm do specifically on corpus annotation tools in Pinax).

Hope to see many of you at at least one of them!

by James Tauber : Created on Feb. 5, 2010 : Last modified Feb. 5, 2010 : Categories python linguistics morphgnt travel conferences speaking language_learning pycon greek new_testament_greek read_john pinax graded_reader : (permalink)

Zeno Processing

Say you have a stream of incoming data. Perhaps it's a database table that's monotonically increasing.

You want to do some processing on it that will take a long time because of the size of the data. Say it's one million records.

You take a snap shot and processes the million records. Say that takes 4 hours. In the meantime, ten thousand new records have come in. So you take a snapshot and process those. Say that takes two minutes. In the meantime, a hundred new records have come in. So you take a snapshot of those...and so on.

The analogy with the paradoxes of Zeno of Elea is obvious and so Nicholas Tollervey and I have decided "Zeno processing" might be a useful term for this approach.

At some point the processing is quick enough that either no new data comes in or you can take the stream down for enough time to finish off the processing.

I'm sure there's an existing name for this technique, but I like "Zeno processing".

by James Tauber : Created on Feb. 1, 2010 : Last modified Feb. 1, 2010 : (permalink)

Good Week for Launches

Last week was a pretty amazing week for the Eldarion team. Amidst a ton of client work, we managed to:

Readers of this blog who remember Potter Predictions will immediately recognize aspects of the second and third sites.

On Friday I managed to squeeze in time to attend a workshop at Harvard on Morphological Complexity where I saw a lot of familiar faces from when I was more active in my PhD.

Over the weekend, I worked on a new website for the Pinax project but didn't get that done so unfortunately missed out on launching a fourth site. I guess I also need to get back to redoing this site at some point too :-)

by James Tauber : Created on Jan. 25, 2010 : Last modified Jan. 25, 2010 : (permalink)

Fake JKM

I thought I'd kick off my blogging in 2010 with this video of my short opening talk at DjangoCon 2009 in Portland.

The talk was inspired, in part, by Rives on 4 AM.

by James Tauber : Created on Jan. 2, 2010 : Last modified Jan. 2, 2010 : Categories django conferences speaking : 366 comments (permalink)

Information Architecture On This SIte

It seems such a shame to have done so much musing about this site and its contents over the last 13 years and then not to blog about it here. So here is today's effort for my site sprint:

http://journeymanofsome.com/information_architecture/

by James Tauber : Created on Nov. 17, 2009 : Last modified Nov. 17, 2009 : Categories this_site blogging site_sprint : 420 comments (permalink)

Site Sprint

I've been thinking about re-doing this site for a while, so I've decided to do it as part of SiteSprint II.

My goals are to:

  • freshen up the design
  • rethink the information architecture of my site
  • adapt to the shifting role blogging now has for me
  • explore some new technologies like HTML5 and typekit
  • rewrite the underlying code and make it more reusable

I'll be prototyping the new site over at http://journeymanofsome.com during the sprint and then will move it over to jtauber.com before the end of the year.

I'll be making daily changes there so you might want to check it out often!

In other news: today marks the 3-year anniversary of my decision to switch to using Django for my sites rather than continue to build my own framework (see Quisition Going Django)

by James Tauber : Created on Nov. 15, 2009 : Last modified Nov. 15, 2009 : Categories blogging web this_site leonardo django web_design site_sprint : 443 comments (permalink)

DjangoCon Talks on Pinax

DjangoCon is on next month in Portland Oregon and the initial talk schedule has just gone up. There are three talks on Pinax I'm directly involved with (and maybe others that will touch on Pinax):

  • an introductory tutorial for Pinax beginners
  • a short talk on how to contribute to Pinax
  • a general State of Pinax talk

If you're new to Pinax, you should consider attending the first talk. If you're already using Pinax you may still learn some stuff but we'll definitely tread some ground that would already be familiar to you.

The second talk is just a brief introduction to our development process works, how you can get involved and contribute and what sorts of things you might want to work on. If you've already started using Pinax and are looking to learn how to get more involved in the project itself, this short talk is for you.

The third talk is an updated State of Pinax talk that will cover what's in 0.7 and what the plans are for 0.8 and 0.9 and beyond. It should have something of interest for everyone, Pinax beginners and experts alike.

We'll also be sprinting on Pinax too following the conference itself. If you're interested in joining the sprint, the How to Contribute to Pinax talk is probably a must.

I look forward to seeing a bunch of you at DjangoCon.

Eldarion is a silver sponsor of DjangoCon.

by James Tauber : Created on Aug. 11, 2009 : Last modified Aug. 11, 2009 : Categories django conferences speaking pinax : (permalink)

Eldarion Launched

With my visa finally getting processed, me being able to be employed by the company I helped found and my return to the US, I am delighted to be able to announce that we've launched the Eldarion website and with it, the company.

http://eldarion.com/

Just a single page at the moment, but I think it's a good start. Thanks to Ryan Berg for turning my initial design into something good looking. And of course thanks to Greg Newman for the logo, which I've previously talked about.

by James Tauber : Created on June 29, 2009 : Last modified June 29, 2009 : Categories django announcements pinax : (permalink)

Affianced

Well, it's been a two-and-a-half month blog drought but it's worth breaking for this news.

Two weeks ago, the love of my life, Lisa, and I got engaged in Paris.

Of course if you follow me on Twitter or Facebook, you knew that at the time (or shortly thereafter) but it felt like it should be declared here too!

I first met (and fell for) Lisa in 1992 although didn't see her again until 1996. People often ask me what took me so long to ask. Truth is, I would have proposed in 1996 (okay, maybe 1997) if I thought she would have said yes. I guess entrepreneurs and lawyers have different risk profiles :-)

My time with her, especially the last four years, have been the best of my life.

I'd always imagined I'd marry my best friend. And now I will be.

by James Tauber : Created on June 25, 2009 : Last modified June 25, 2009 : Categories announcements personal : 5 comments (permalink)

Pinax Video and Move to GitHub

The audio isn't great (they didn't seem to be directly recording the mic I was holding) and it got cut short, but for what it's worth, here's the video of my talk on Pinax at PyCon 2009:

http://blip.tv/file/1952623

Following the conference proper, we had a great sprint that got us within a stone's throw of a 0.7 beta release. One of the first things we did was move from svn at Google Code Project Hosting to git at GitHub. The new home of the Pinax source code is now:

http://github.com/pinax/pinax/

We also moved our wiki and task tracking off Google Code Project Hosting to our own, Pinax-based system:

http://code.pinaxproject.com/

Enjoy!

by James Tauber : Created on April 7, 2009 : Last modified April 7, 2009 : Categories pycon pinax : 4 comments (permalink)

The Long Pay Off: Battlestar Galactica and Lost

NOTE: This may contain SPOILERS for Babylon 5, Battlestar Galactica and Lost.

I'm a big fan of episodic television with long arcs. I love pay-offs that are separated by months or even years from their initial set up, where you can go back and watch earlier episodes and see things you never noticed before.

Back when I was watching the X Files, it was always the so-called "myth arc" episodes I liked and never the "monster of the week". But it eventually became clear that even the myth arc episodes in the X Files were being made up as they went along. I felt duped.

So it was with great anticipation in 2000 that I started watching reruns of Babylon 5, a show famous for having been planned out in advance. Despite numerous flaws, this planning paid off multiple times. There were multiple OMFG moments where you realize how pieces of the puzzle from previous seasons fit perfectly together. In one classic multi-season arc there is a mysterious episode in season 1 involved time travel two years in the future. In season 3 you see the other half. Amazing stuff that would be very hard to pull off without the show being planned in advance.

Of course there are dangers with planning in advance. B5 suffered from story changes due to actors leaving.

This brings me to two of my favourite shows on TV at the moment: Battlestar Galactica and Lost.

Lost has supposedly been planned in advance and BSG largely made up as they went along.

If you'd asked me at the end of last year, I would have said the two shows were counter examples to the need for planning in advance. I didn't feel like Lost was giving me the payoffs and I felt like BSG was doing a great job without the long arcs. This year I've changed my mind. Lost is starting to get some pretty cool payoffs (although it could still screw things up) and BSG is breaking up under the stress of not being worked out in advance (although perhaps the finale will redeem it).

At least BSG had a definite end point planned for a while. There's nothing worse than a show that pretends to be teleological but is being written ad hoc with no real end in sight.

There are also opportunities available to the story teller when things aren't planned too much in advance. With BSG I didn't feel there were any truly amazing payoffs (again, the finale could prove me wrong) but there were certainly moments that were so out of left field and unpredictable it made for awesome story telling. It leads to a completely different kind of OMFG moment. Had BSG been planned in advance, those events might have been easier to see coming.

I guess that's the trade off. Plan in advance and you can get amazing payoffs but you run the risk of leaving too many clues so there are no real surprises. Write ad hoc and you can totally surprise the audience but also expose yourself to too many plot holes. BSG has one more shot to see if it can successfully close some of those holes.

Despite their flaws, I think Battlestar Galactica and Lost are both excellent examples of what works about each approach.

UPDATE: Here's a (spoiler filled) post on 12 Plotholes That Must Be Filled in the Battlestar Finale

by James Tauber : Created on March 20, 2009 : Last modified March 20, 2009 : 2 comments (permalink)

Eldarion Logo

For years, I've had "Eldarion" in the back of my mind as a name I'd like to use for some creative endeavour. It is a reference to Tolkien: Eldarion was the son of Aragorn and Arwen and the Second High King of the Reunited Kingdom.

Much to my surprise, the domain was available when I checked back in 2002, so I registered it immediately.

At first I thought I'd use the name for my film and music endeavours. When I attended SXSW Film and Music in 2005, I actually had business cards printed that said "Eldarion". They were black cards with white writing, a reference to the Gondorian flag.

I planned a logo that would feature the White Tree of Gondor and the seven stars of the House of Elendil but never had the skill to pull it off.

With a need for a logo coming up again recently, I approached my friend Greg Newman. My brief to him was pretty much as follows:

  • use the text "eldarion" in Anivers (the font family I'd chosen a couple of months earlier)
  • feature some reference to the White Tree and/or the Seven Stars
  • make it silver on dark grey rather than white on black to give it more life

He blew me away with the result and the response has been phenomenal.

It may strike some as unusual to choose a fantasy reference for a high-tech startup. But I'm reminded of a wonderful scene in the British comedy Yes Prime Minister where the PM is advised that if his first television broadcast is to say nothing new and exciting then he should wear a modern suit, the background should feature abstract paintings and the opening music should be Stravinsky. On the other hand, if the broadcast is to contain radical new announcements, then he should wear a dark suit, the background should feature oak paneling, leather volumes and 18th century portraits and the opening music should be Bach.

The same idea applies, I think, to choosing ancient symbols from the legendarium of Tolkien for what is intended to be a very modern and forward thinking new company.

Stay tuned for lots more about Eldarion.

by James Tauber : Created on March 2, 2009 : Last modified March 2, 2009 : 6 comments (permalink)

Kindle 2: First Impressions

I was asleep when I heard a knock at the door. I knew it was UPS. I jumped up and threw some clothes on but by the time I got to the door, they had gone. I ran down the hall to the elevator. I had just missed them. Caught the next elevator and caught them in the lobby of my apartment building just as they were leaving. Phew! I had my Kindle 2.

I didn't have the original Kindle and I've never used any kind of electronic reader so this was a new experience for me. I love books and own A LOT of them. But spending time on three continents, at conferences and on airplanes means I'm looking forward to more books at my fingertips and avoiding the agony of "which 0.1% of my books can I take with me on this flight".

Here are my first impressions:

  • they put thought into the unpackaging experience
  • even though I've seen the photos, it still seemed smaller once I held it in my hand than I thought it would be
  • the Amazon leather cover I bought adds a bit of weight and bulk to the device
  • the fact the screen can show stuff while the device is turned off freaked me out at first
  • the device is comfortable to hold and I imagine being able to read for long periods with this
  • as an electronic-ink newbie, I was very impressed by the readability of the screen
  • I didn't like the font when I first looked at it but once I started reading it didn't bother me
  • it worked out of the box, was linked to my Amazon account and had a book I'd bought before the device arrived
  • knowing how far I am through a book as a percentage is a little freaky at times
  • it took me a little while to get the handle of the navigation (beyond basic turn pages, which is easy)
  • text-to-speech is impressive but can't imagine using it
  • I wish trying to go up from the first selection would wrap around to the bottom selection. It's too cumbersome to select a choice towards the end of the screen
  • downloading books is FAST — couple of seconds for each of the two books I bought
  • browsing web pages is like turning CSS off
  • User Agent came through as "Mozilla/4.0 (compatible; Linux 2.6.10) NetFront/3.4 Kindle/1.0 (screen 600x800)"
  • I'm already thinking about web applications for it (especially my graded reader ideas and flashcards)
  • the immediacy and ease at which you can buy books could be dangerous :-)

I'm very happy so far and can imagine buying the majority of my books for the Kindle from now on. The real test will be whether I'll go back and re-buy any of my existing books (especially the ones back in storage in Australia)

UPDATE: just discovered the Web browser has an Advanced Mode that does CSS and Javascript. This site doesn't look too great with it, though :-)

by James Tauber : Created on Feb. 25, 2009 : Last modified Feb. 25, 2009 : Categories books kindle : 1 comment (permalink)

Leaving mValent

People who follow me on twitter or are friends on Facebook already know this, but last week I officially resigned from mValent.

It was over seven years ago that Duane Tharp and Clyde Logue approached me about joining them in a new venture post-Bowstreet. I had already made the decision to move back to Australia from the US and they were still willing to hire me despite being remote. Within a year we were venture funded, had hired a CEO and VP Engineering (who became my boss) and were starting to build the team.

mValent went through distinct phases and so I don't necessarily feel like I've worked in the same place for seven years but rather three or four different companies. I don't think I ever really did the job I thought I would be doing, but I learnt a tremendous amount on the business side of things and made a number of really good friends.

But it's time for me to move on. The technology I helped create and the people I helped hire now have an excellent home at Oracle. I'm ready for something new.

As to what's that is: I hope to have more to say very soon...

by James Tauber : Created on Feb. 23, 2009 : Last modified Feb. 23, 2009 : Categories mvalent announcements personal entrepreneurship : 4 comments (permalink)