James Tauber : James Tauber's Blog

Welcome to my blog. It's a haphazard collection of thoughts on various interests of mine as well as updates on projects. If you're interested in any of blogging, personal information management, Python, Django, XML, RDF, software development, Web 2.0, open source, free-market economics, Mac OS X, web architecture, REST, music theory, record producing, filmmaking, linguistics, the Greek New Testament, pure mathematics or general relativity, there's a chance you may find something of interest.

Julython 2012 First Week

This month I'm participating in Julython 2012, mostly just an excuse for me to dust off some old opensource projects and release some new ones I've been working on privately. The timing is particularly appropriate as at the end of the month I'm speaking at PyOhio on crazy little open source projects in Python.

Here's what I worked on this week:

pyuca: an implementation of the Unicode Collation Algorithm

Allows proper sorting of non-English words. Wrote this code a while ago for a blog post but now it's on github and PyPI.

ultima4: code for exploring Ultima IV data files

Older code, put on github for the first time. Extracts world and town maps as well as town inhabitant and talk data.

skyrim: code for exploring Skyrim data files

This has been on github for a while but didn't handle PNG textures properly. The problem had been identify via a StackExchange question but I only implemented the solution in my code this week.

lotro: code for exploring LOTRO data files

Very old code and not even all I have, but I decided to start releasing it on and work a little more on it again.

pyifs: an iterated function system

Originally written a couple of years ago but never made public. As well as open sourcing, I changed it from output PPM files to PNGs, added a linear transformation, cleaned up the code a little and added some example images.

minilight: a global illumination renderer

This has been open source for a while. This week, however, I changed it from outputting PPM files to PNGs.

cassidy: a CSS library for Python

Started this recently but this week open sourced it and added lexing and parsing of CSS selectors with PLY.

sebastian: symbolic music analysis and composition library

Refactored project layout and did a proper release to PyPI. More planned this month.

czerny: a tool for assessing the performance of piano exercises

Just clarified license. Lots more planned this month.

pinax-project-zero: base project layout for Pinax projects

Mostly just tweaked differences between master and dev. Hoping for a django-user-accounts release soon so I can move on to other Pinax starter projects.

pycon: website for PyCon US

Working specifically on PyCon US 2013 site this week but it's open source and being abstracted out as Pinax Symposion for conferences in general. This work is sponsored by the Python Software Foundation.

gondor-client: official Gondor command line client

Didn't intended to work on this but needed to make a fix. It's in Python and open source so I guess it counts :-)

If you'd like to support my personal open source work, I'm on Gittip. Open source Django work can always be sponsored through Eldarion.

by : Created on July 8, 2012 : Last modified July 8, 2012 : (permalink)

Quisition Redesign

Quisition will be very familiar to long-time readers of this blog. The online learning site, which currently focused on flashcards, was first talked about here in 2005. I built a Python + PostgreSQL (but not Django) version in 2005/2006 and ported it to Django in 2006, relaunching the new version in 2007.

Quisition was the site that originally gave me the idea for Pinax. It's also the site that made me want to get back to building websites for a living and start Eldarion. So, as you can imagine, it's very dear to my heart.

The last few months, we've been working on the fleshing out of the topics feature on the site as well as a completely redesign from scratch. That work was completed and launched last week.

http://quisition.com/

I'm very excited about our plans for the site in 2012.

by : Created on Jan. 1, 2012 : Last modified Jan. 1, 2012 : (permalink)

New Site Design

As part of the move of my website to Gondor, I've completely redesigned my website around Skeleton. There's still a lot missing—media, comments, categories—but I can work on those iteratively now that the new site is up. It's still running the Django port of Leonardo but over time I'll update it to use Pinax's biblion app.

by : Created on Nov. 19, 2011 : Last modified Nov. 21, 2011 : (permalink)

ApplePy Now On GitHub

Back in 2001 I started writing an Apple ][ emulator in Python. In 2006 I started a rewrite but today I went back to my original 2001 version, updated it a little and put it up on GitHub.

There are still issues to sort out but you can get the code at: http://jtauber.com/applepy/

by : Created on Aug. 6, 2011 : Last modified Aug. 6, 2011 : (permalink)

Rebasing MorphGNT off SBLGNT

Reposted from http://morphgnt.org/blog/2011/01/18/rebasing-morphgnt-sblgnt/:

The last three months, I've been working on rebasing the MorphGNT database off the SBLGNT text rather than the UBS3.

While I have had permission to work with the CCAT database for over a decade, the fact the UBS3 text can be extracted from it has always been problematic. The existence of the SBLGNT solves the problem of having a critical text with clear licensing and so, in October 2010, I started the process of moving the MorphGNT analysis to the SBLGNT text.

This task is mostly done and the work-in-progress is available on GitHub at https://github.com/morphgnt/sblgnt.

It was a three step process, done one book at a time.

A Python script was used to do a first-pass alignment. The script allowed for differences in punctuation, accentuation, capitalization and movable-nu.
Any differences were then manually inspected and corrected. In 90% of cases it was a simple re-ordering of words but in the other 10%, a fresh analysis had to be made. These analyses were then checked against various sources such as BDAG, Perseus and the Lexham Reverse Interlinear.
Finally, I wrote another Python script that checked various heuristics

I'm in the process of making a batch of corrections based on the third step and then I'll formally release what will be called MorphGNT 6.0 (although possibly as a beta such as 6.0b1).

The next step (which I've started in parallel) will merge in the Robinson analysis and parse codes on the road to a completely new set of parse codes for MorphGNT 7.0.

by : Created on Jan. 18, 2011 : Last modified Jan. 18, 2011 : (permalink)

Cats or Dogs is Back!

Back in February 2007, I launched Cats or Dogs, my first "little" site in Django (Quisition had already been ported at that stage but was a proper grown up site)

Cats or Dogs asked you to pick between alternatives and built a set of correlations based on community tendencies.

We've now relaunched it at

http://cats-or-dogs.com

with the main addition for now being OAuth support to use Twitter or Facebook to attach your answers to an account (so they persist across sessions better)

We have a few things planned but for now we just need to built up more answers to give better correlations.

Enjoy!

by : Created on Aug. 7, 2010 : Last modified Aug. 7, 2010 : (permalink)

Open Source Project: gyt

To conclude my week of open source projects I spent time in the hotel lobby and airport today implementing the beginnings of an idea I've had for a while.

gyt is (the start of) an implementation of Git-like ideas in Python (see github).

It's not intended to be a port of Git to Python. It's more designed as an exploration of how Git works and how the concepts might be applied to other tasks. In particular, I'm interested in exploring its use for versioning in-memory data structures rather than blobs on disk.

So in a way, gyt is to Git what Rel is to a relational databases. In fact, the two might have some strong tie, in terms of gyt being used to version relations.

gyt will probably look less and less like Git internals over time. I'll likely change the name at some point as the distance between the two increases.

by : Created on July 23, 2010 : Last modified July 23, 2010 : (permalink)

Open Source Project: Rel

For my fourth open source project, I thought I'd get around to starting a repo for my various Relational Python explorations, including functional dependency analysis and some ideas I've been having lately that I haven't yet implemented.

Rel (github) is an exploration of the relational model and data analysis in Python.

I'm starting off just bringing together code I had on my blog from various posts in 2005, initial focusing on implementing relations, a few relational operators and exploring functional dependency analysis.

Still to come is broader support of the relational model, use of namedtuples, use of itertools, importers and exporters (including possible support for Django's fixtures format) and more utility functions I have scattered all over the place in various data analysis scripts I've written over the years.

by : Created on July 22, 2010 : Last modified July 22, 2010 : (permalink)

Open Source Project: FOP

I have to do an impromptu additional blog post because I just found out that my first big open source project, Apache FOP, just had its 1.0 release today.

I started FOP in 1998 and, in 1999, donated it to the Apache Software Foundation. I haven't been involved in its development for a long time but am delighted to see it reach 1.0, and from the articles written about its release, it sounds like it's actually used by a lot of well-known companies.

FOP was the first project I used Python on. It was the first open source project I did involving other contributors. It was the first large Java project I did (and I remember first grokking things like the Visitor Pattern in trying to solve design problems in FOP)

For a blast down memory lane, be sure to look at my archive of the old FOP website.

by : Created on July 21, 2010 : Last modified July 21, 2010 : (permalink)

Open Source Project: Pinax

It seems entirely appropriate that today's featured open source project should be Pinax.

I gave a talk on Pinax at OSCON this morning and at the end of the talk announced the availability of the first 0.9 alpha.

Pinax 0.5 was our first release. 0.7 represented a response to Pinax's first contact with actually building real sites and included fairly cutting-edge (at the time) use of virtualenv and pip.

The development work since then has come from a lot more experience building websites with Pinax. But while a lot of work has taken place, a lot of people didn't know about it because we went too long without a release.

Today that was rectified. Pinax 0.9a1 is now out and is available as easily as typing "pip install Pinax" (preferably inside a virtual environment)

Brian Rosner did a great write up on the mailing list about 0.9a1.

by : Created on July 21, 2010 : Last modified July 21, 2010 : (permalink)

Open Source Project: parse-helper.js

Continuing my blogging about an open source project of mine each day during OSCON...

parse-helper (github) is a javascript library for building controls for assisting in the entry of parsing codes during linguistic annotation. I only just started it last week as part of a larger project called OXLOS, a Pinax-based platform for collaborative corpus linguistics.

The idea of parse-helper is that the controls could be attached to any text input expecting a parse code to be entered.

It currently includes support for the CCAT parsing codes for Ancient Greek (as used by the MorphGNT project). Other parsing schemes are planned.

The CCAT support includes filtering available attributes based on part-of-speech selected (and choice of verbal mood can further refine the options).

At the moment there is no support for going the other way and taking an existing parse code as a string and correctly showing the individual attribute values selected. This will be coming soon.

This project is at a very early stage and I'm sure the code could be improved a lot.

You can view a demo.

by : Created on July 20, 2010 : Last modified July 20, 2010 : (permalink)

Open Source Project: Czerny

As I'm at OSCON this week, I thought it would be fun to kick-start my blogging by blogging each day about some open source project I've worked on in the last year.

Today I want to introduce Czerny (github).

Czerny, named after Carl Czerny—the Austrian composer and piano teacher, is a early-stage Python project for assessing the performance of piano pieces.

The idea came when I was doing Charles-Louis Hanon's Virtuoso Pianist exercises. My thought was that it would be nice if a program compared my performance with the score and indicated not only mistakes, but deviations in velocity and timing.

The basic idea is:

record a performance of the exercise as MIDI (or similar) events
align the performed notes with the "score" notes
identify errors as well as fluctuations in timing, velocity, etc

The first two items are at a very early prototype stage. The third has not yet been started on.

Czerny includes a pyrex wrapper around OS X's Core MIDI library and a Python script for outputting events coming in from a MIDI keyboard. At some point it could also just read MIDI files (SMF) but for now, it records MIDI input into its own simple file format.

Alignment is currently done via my implementation of the Needleman-Wunsch alignment algorithm. There's a lot more work I plan to do on the note difference function, but I need more data first.

I haven't yet started on actually interpreting the differences once an alignment has been made. The obvious feedback I can give is in notes added or dropped. But my goal is also to express variations in velocity and timing. If the "score" has fingering, I could also eventually give stats on the performance of each finger, perhaps indicating which need more work.

A long-term goal might also be studying the performance, not of piano exercises, but real pieces to identify and learn patterns in how scores are mapped to performances.

by : Created on July 19, 2010 : Last modified July 19, 2010 : (permalink)

Conference Time

I have four conferences coming up in the next eight weeks.

From 12th-14th February, I'll be attending Kiwi Foo Camp in New Zealand—one of those trips where the travel time is longer than the length of the conference :-)

The day after I get back, I'm off to Atlanta for PyCon. I'm involved in the Pinax tutorial at the start and will be staying all the way through the sprints where we hope to get lots of Pinax done!

Then March 10th-12th I'm in Montréal for ConFoo, the first conference I've been to in a while that's all expenses paid for speakers. I'll be giving a talk on, you guessed it, Pinax. Will be fun to introduce Pinax at a general Web conference.

I'll finish off the month in San Jose for BibleTech March 26th and 27th. I'll be giving two talks there, one on my graded reader project and one on using Pinax for collaborative corpus linguistics (partly talking about Pinax in general and partly talking about some early stage work I'm do specifically on corpus annotation tools in Pinax).

Hope to see many of you at at least one of them!

by : Created on Feb. 5, 2010 : Last modified Feb. 5, 2010 : (permalink)

Zeno Processing

Say you have a stream of incoming data. Perhaps it's a database table that's monotonically increasing.

You want to do some processing on it that will take a long time because of the size of the data. Say it's one million records.

You take a snap shot and processes the million records. Say that takes 4 hours. In the meantime, ten thousand new records have come in. So you take a snapshot and process those. Say that takes two minutes. In the meantime, a hundred new records have come in. So you take a snapshot of those...and so on.

The analogy with the paradoxes of Zeno of Elea is obvious and so Nicholas Tollervey and I have decided "Zeno processing" might be a useful term for this approach.

At some point the processing is quick enough that either no new data comes in or you can take the stream down for enough time to finish off the processing.

I'm sure there's an existing name for this technique, but I like "Zeno processing".

by : Created on Feb. 1, 2010 : Last modified Feb. 1, 2010 : (permalink)

Good Week for Launches

Last week was a pretty amazing week for the Eldarion team. Amidst a ton of client work, we managed to:

Readers of this blog who remember Potter Predictions will immediately recognize aspects of the second and third sites.

On Friday I managed to squeeze in time to attend a workshop at Harvard on Morphological Complexity where I saw a lot of familiar faces from when I was more active in my PhD.

Over the weekend, I worked on a new website for the Pinax project but didn't get that done so unfortunately missed out on launching a fourth site. I guess I also need to get back to redoing this site at some point too :-)

by : Created on Jan. 25, 2010 : Last modified Jan. 25, 2010 : (permalink)

Fake JKM

I thought I'd kick off my blogging in 2010 with this video of my short opening talk at DjangoCon 2009 in Portland.

The talk was inspired, in part, by Rives on 4 AM.

by : Created on Jan. 2, 2010 : Last modified Jan. 2, 2010 : (permalink)

Information Architecture On This SIte

It seems such a shame to have done so much musing about this site and its contents over the last 13 years and then not to blog about it here. So here is today's effort for my site sprint:

http://journeymanofsome.com/information_architecture/

by : Created on Nov. 17, 2009 : Last modified Nov. 17, 2009 : (permalink)

Site Sprint

I've been thinking about re-doing this site for a while, so I've decided to do it as part of SiteSprint II.

My goals are to:

freshen up the design
rethink the information architecture of my site
adapt to the shifting role blogging now has for me
explore some new technologies like HTML5 and typekit
rewrite the underlying code and make it more reusable

I'll be prototyping the new site over at http://journeymanofsome.com during the sprint and then will move it over to jtauber.com before the end of the year.

I'll be making daily changes there so you might want to check it out often!

In other news: today marks the 3-year anniversary of my decision to switch to using Django for my sites rather than continue to build my own framework (see Quisition Going Django)

by : Created on Nov. 15, 2009 : Last modified Nov. 15, 2009 : (permalink)

DjangoCon Talks on Pinax

DjangoCon is on next month in Portland Oregon and the initial talk schedule has just gone up. There are three talks on Pinax I'm directly involved with (and maybe others that will touch on Pinax):

an introductory tutorial for Pinax beginners
a short talk on how to contribute to Pinax
a general State of Pinax talk

If you're new to Pinax, you should consider attending the first talk. If you're already using Pinax you may still learn some stuff but we'll definitely tread some ground that would already be familiar to you.

The second talk is just a brief introduction to our development process works, how you can get involved and contribute and what sorts of things you might want to work on. If you've already started using Pinax and are looking to learn how to get more involved in the project itself, this short talk is for you.

The third talk is an updated State of Pinax talk that will cover what's in 0.7 and what the plans are for 0.8 and 0.9 and beyond. It should have something of interest for everyone, Pinax beginners and experts alike.

We'll also be sprinting on Pinax too following the conference itself. If you're interested in joining the sprint, the How to Contribute to Pinax talk is probably a must.

I look forward to seeing a bunch of you at DjangoCon.

Eldarion is a silver sponsor of DjangoCon.

by : Created on Aug. 11, 2009 : Last modified Aug. 11, 2009 : (permalink)

Eldarion Launched

With my visa finally getting processed, me being able to be employed by the company I helped found and my return to the US, I am delighted to be able to announce that we've launched the Eldarion website and with it, the company.

http://eldarion.com/

Just a single page at the moment, but I think it's a good start. Thanks to Ryan Berg for turning my initial design into something good looking. And of course thanks to Greg Newman for the logo, which I've previously talked about.

by : Created on June 29, 2009 : Last modified June 29, 2009 : (permalink)