James Tauber

journeyman of some

blog > 2007 >

James Tauber's Blog 2007/11

Rubik's Cube

Like many people (I'd guess most people reading this blog), I had a Rubik's Cube in the 80s.

The only way I could solve it was looking up moves in the book You Can Do The Cube by Patrick Bossert. (Patrick was only 12 when he wrote the book—he's now an entrepreneur and management consultant; see his home page).

In high school I had Rubik's Magic which I took detailed notes on and came up with a solution on my own. Later I came across a book with a much shorter solution that I seem to recall memorizing in the bookstore without buying the book.

Also in high school, I had Rubik's Clock which was also easy enough to solve on my own.

Then last year, when I was in Walt Disney World surprising my sister for her 21st, I bought a Rubik's Cube again with the goal of learning how to solve it.

Wikibooks has a nice page on How to solve the Rubik's Cube that has a straightforward algorithm as well as a discussion of various other approaches favoured by speedcubers.

Wikipedia also has a very interesting article on Optimal solutions for Rubik's Cube. It was only August this year that Kunkle and Cooperman proved Twenty-Six Moves Suffice for Rubik’s Cube (pdf of paper).

UPDATE (2008-03-27): Now see Twenty-Five Moves Suffice

by James Tauber : Created on Nov. 26, 2007 : Last modified March 27, 2008 : 3 comments (permalink)

Leopard After Two Weeks

Feature I use a lot that I didn't think I would: Quick Look

Feature I don't use as much as I thought I would: Spaces

Performance improvement that blew me away: Spotlight (it's amazing!)

UI changes that don't bother me: reflecting dock with glowing orbs and translucent menu bar

UI changes that do bother me: menu selection colour (too blue) and non-focused tabs in Safari (the combination of the embossing on the dark grey is horrible)

by James Tauber : Created on Nov. 19, 2007 : Last modified Nov. 19, 2007 : Categories os_x : 0 comments (permalink)

Automatic Categorization of Blog Entries

I just went through a couple of years of posts, tagging some of those written before I introduced categories to Leonardo (in particular, the ones about Alibi Phone Network). It occurred to me that this categorization could be a job for machine learning.

I've talked before about using techniques like Bayesian Classification for identifying posts in an aggregator that one might be more likely to be interested in. But it didn't occur to me until now that similar techniques could be used to suggest which existing categories to place uncategorized entries into.

At some stage in the near future, I'll try a quick implementation in Leonardo and see how it goes.

by James Tauber : Created on Nov. 18, 2007 : Last modified Nov. 18, 2007 : 2 comments (permalink)

The Death Of Email

So Slate has an article entitled The Death of E-Mail about the younger generation in the US abandoning email. My colleague and good friend Jim Hickey made the same observation to me month or two ago. For his sons, email is how you submit papers to your teachers. It's a formal means of communication, much like snail mail is to me.

Actually, Facebook messaging is replacing email even for me for a certain class of communication. And just in the last 24 hours I've received a number of messages (about my MINI purchase) from old friends I haven't seen in years (more than 20 years in one case) who I wouldn't even be in touch with at all if it wasn't for things like Facebook. Even between me and my colleagues, non-work communication is done more over Facebook than email now.

by James Tauber : Created on Nov. 18, 2007 : Last modified Nov. 18, 2007 : Categories facebook : 1 comment (permalink)

Back To The Feature

Way back in February 2006 I blogged about starting my first feature film. Tom had just completed a first draft and I had some tough criticism that led to a number of big rewrites. Today, Tom delivered a draft (see his blog entry about it) that will form the basis of the initial script breakdown, production planning and budgeting.

21 months seems like a long time to get a script to this stage but I would hazard a guess that it's actually fairly quick by Hollywood standards. For a low-budget indie, we've probably done a lot more script revisions than the norm but I hope that shows in the quality of the final film.

So the next step is a script breakdown. That's where I take Tom's script and basically construct a database of all the characters, locations, props, special equipment, etc required. This will give me a much better idea of how long the film will take to make and how much it will cost to make.

I'm hoping to have something done by the end of next weekend.

by James Tauber : Created on Nov. 18, 2007 : Last modified Nov. 18, 2007 : Categories filmmaking in_the_light_of_day : 0 comments (permalink)

My New MINI

Today I picked up my new car: a dark silver MINI Cooper S with the Premium, Sports and Cold-Weather packs. I've honestly never enjoyed driving so much in my life.

After much deliberation I got a 6-speed manual. I was worried about having to shift with my right hand (while I've driven manual in Australia, I've never driven stick in the US). But I'm so glad I went with it. No problem at all switching hands (I guess the real work is in the feet anyway and they don't switch).

I can't push it too hard for the first 1,200 miles but the acceleration in second gear is awesome.

The interior controls are really nicely done but I'm going to need to sit down with the manual to really get a hang of everything (trip computer, iPod integration, etc)

The purchasing experience at MINI of Peabody was excellent. You are really made to feel like you are joining the "cool club". Maybe MINI is the Apple of cars.

I love it! And I'm not into cars at all :-)

by James Tauber : Created on Nov. 16, 2007 : Last modified Nov. 18, 2007 : 5 comments (permalink)

NaNoBloMo and Memphis

Hmm, between being sick and traveling to Memphis on business, I'm not doing a very good job at NaNoBloMo.

I have a backlog of blog posts to write, though, so I might still make 30 posts this month, although likely not the 50 I'd hoped.

At Memphis airport they promote the city in three ways: "Home of the Blues", "Birthplace of Rock'n'Roll" and "America's Distribution Center". More air cargo goes through Memphis than any other airport in the world. I think I saw more FedEx planes at the airport than passengers.

Being sick, though, I didn't get to see anything of Memphis. No Graceland tours or anything. Although it did occur to me as I took NyQuil LiquiCaps one night that popping pills might have been an authentic Elvis experience.

by James Tauber : Created on Nov. 16, 2007 : Last modified Nov. 16, 2007 : Categories travel blogging : 2 comments (permalink)

I Wish NBC Shows Were On iTunes Again

I wish NBC would sell their shows on iTunes again. Watching Heroes on nbc.com is horrible.

by James Tauber : Created on Nov. 11, 2007 : Last modified Nov. 11, 2007 : 0 comments (permalink)

Distance and Checksum Algorithms on Lists

Related to the programming competition, I've started thinking about algorithms for:

  • measuring the distance between two orderings of the same list (I guess number of swaps required to go from one to the other; even better if the algorithm gave you the swaps required)
  • order-sensitive checksums on lists of integers (as above, the lists always contain the same elements but the goal is to have a checksum that distinguishes different orderings with minimal clashes)

Any suggestions for approaches to these? Working Python code would be even better :-)

by James Tauber : Created on Nov. 10, 2007 : Last modified Nov. 10, 2007 : Categories python programming_competition algorithms : 8 comments (permalink)

Genius

I've mentioned before that Malcolm Gladwell is a wonderful speaker. Since then I've also come across his TED talk. His talks have really taught me the power of story telling (something I confess I never did during all my XML and Web Services talks from 1997-2001 :-).

Via Kottke I found a wonderful video about the notion of genius. It's material from Gladwell's forthcoming book (talked about in Kottke's post).

Watch the video. I'm really looking forward to the book.

by James Tauber : Created on Nov. 9, 2007 : Last modified Nov. 9, 2007 : 1 comment (permalink)

False Initials

People often assume that ISO stands for "International Standards Organization". But in English the full name of the organization is "International Organization for Standardization". Oh, it must be the initials of the French name, some clever people suggest. But the French name is "Organisation internationale de normalisation".

ISO isn't an acronym or initialism at all: it's based on the prefix iso- derived from the Greek ἴσος "equal".

Another example of a non-initialism is UTC. In English, this atomic clock based version of GMT is "coordinated universal time". In French it's "temps universel coordonné". So why UTC? Well, it was just a compromise between CUT and TUC to keep both the English-speaking world and French-speaking world happy.

by James Tauber : Created on Nov. 8, 2007 : Last modified Nov. 8, 2007 : 4 comments (permalink)

Explaining Category Theory With Object-Oriented Programming

I recently came across Category Theory for the Java Programmer. At various points in the Poincaré Project, I've considered writing code to illustrate mathematical structures. Recently reading about category theory (and topoi in particular) I have myself been tempted to think of them in terms of OO programming.

My inclination would, of course, be to use Python, but static typing is probably more useful for explicitly showing the structure of mathematical objects (although at the end of the day, mathematics is about duck typing: if it follows the axioms of a linear space, it is a linear space).

So the post is pretty cool. But what I felt got in the way of using Java to explain category theory is that category theory is essentially about relationships between classes (in the OO sense—or least what OOA/OOD thinks of as what classes are supposed to map to in the real world). The objects in a category like Set are sets, not elements. The objects in a category like Vect are vector spaces, not vectors. Now one might argue that category theory doesn't really care about elements and vectors, only sets and vector spaces. So you could just shift your "metaness" one to the left. One might also argue (as the blog post referenced at the start does) that classes in Java are instances of the class Class so calling a class an object is allowed. But I think it just makes things confusing for the average OO programmer (especially if you think OO == Java).

A better language for illustrating category theory would be one that has a more explicit notion of a metaclass. Then it would be easier to explain category theory as essentially about different characteristics of metaclasses.

by James Tauber : Created on Nov. 8, 2007 : Last modified Nov. 8, 2007 : Categories mathematics : 0 comments (permalink)

Is Vocab Ordering Like Page Rank?

The basis for the vocab ordering project was the realization that even more valuable than learning a common word is learning a common word that shares a lot of sentences with a lot of other common words.

Imagine a network of "knowledge states" where arcs represent what you need to learn in order to move from one level of knowledge to another. It was the idea of finding a path through such a network that eventually led me to make the connection with the traveling salesman problem.

It just occurred to me this evening that there may be a connection with Google's Page Rank algorithm too. After all, it's all about finding the links to the pages judged most valuable because they have links to the most valuable pages and so on.

As the wikipedia article says: "the algorithm may be applied to any collection of entities with reciprocal quotations and references".

Incidentally, I was at the conference where the PageRank paper was first presented. It was WWW7 in 1998. I ran the XML tutorial and sat on a panel on hypertext. I can't remember if I attended the Brin/Page talk or not, though. I do remember hanging out with Ted Nelson (and having dinner with him and Roger Clarke) and meeting Rohit Khare and Adam Rifkin for the first time there.

by James Tauber : Created on Nov. 6, 2007 : Last modified Nov. 6, 2007 : Categories algorithms conferences google speaking : 0 comments (permalink)

Zellyn Hunter Takes Lead

Zellyn Hunter has taken the lead in Category IV of my programming competition using simulated annealing.

Congratulations, Zellyn!

Meanwhile, my Category III local search is still running :-)

by James Tauber : Created on Nov. 6, 2007 : Last modified Nov. 6, 2007 : Categories programming_competition : 0 comments (permalink)

Tabu Search

In a year old comment on a three year old post relating to vocabulary ordering using simulated annealing, Chris Huntley suggests tabu search might be a better approach.

Unfortunately, I only noticed the post now (sorry Chris). I hadn't come across tabu search before but started to do some investigation and wrote the beginnings of a Python implementation. My limited understanding so far is that you essentially do a local search amongst all possible swaps but keep a list of recent swaps that are "taboo" (hence the name) so you aren't always picking the local optimum.

I started off just writing a local search version (without the tabu) and found that on my programming competition, I could get the global optimum in Category I and a pretty good score for Category II (almost as good as I could get using SA). The problem is that the local search is damn slow.

Unlike the travelling salesman problem where a swap doesn't require full recalculation of the score, in my vocabulary ordering problem, I haven't yet worked out a way to avoid having to run the entire scorer for every swap. In local search, that's O(n2) in the number of verses (Category III hasn't finished running yet after 12 hours).

I need to work out a way of improving that before I finish implementing the rest of the tabu search.

I'm also thinking ant colony optimization might be another approach.

Of course, people with more experience in these algorithms than me are more than welcome to submit an entry in the programming competition using them.

by James Tauber : Created on Nov. 5, 2007 : Last modified Nov. 5, 2007 : Categories programming_competition algorithms : 2 comments (permalink)

The Wolfram Prize

I've had on my "to blog about" list for months the Wolfram Prize to prove that a certain 2,3 Turning Machine is universal.

Well, in the mean time, Alex Smith has gone and won the prize. I'll still blog about the theory behind the prize, though, and maybe eventually say some things about the proof (pdf).

by James Tauber : Created on Nov. 4, 2007 : Last modified Nov. 4, 2007 : Categories mathematics wolfram_prize : 2 comments (permalink)

GNT Verse Coverage Statistics

It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words.

I even produced such a table for the GNT back in 1996—see New Testament Vocabulary Count Statistics (via Internet Archive's Wayback Machine).

But these sort of numbers are highly misleading because they don't tell you what proportion of sentences (or as a rough proxy in the GNT case: verses) you could read, only what proportion of words.

Reading theorists have suggested that you need to know 95% of the vocabulary of a sentence to comprehend it. So a more interesting list of statistics would be how many verses can one understand 95% of the vocab of if one know a certain number of words. Of course, there's a lot more to reading comprehension than knowing the vocab. But it was enough for me to decide to write some code yesterday afternoon to run against my MorphGNT database.

To first of all give you a flavour in the specific before moving to the final numbers, consider John 3.16, which is, from a vocabulary point of view, a very easy verse to read.

To be able to read 50% of it, you only need to know the top 28 lexemes in the GNT. To read 75% you only need the top 85 (up to κόσμος). With the top 204 lexemes, you can read 90% of the verse and only a few more: up to 236 (αἰώνιος) gives you the 95%. The only word you would not have come across learning the top 236 words would be μονογενής but even that is in the top 1,200.

This example does highlight some of the shortcomings of this sort of analysis. There's no consideration of necessary knowledge of morphology, syntax, idioms, etc. Nor for the fact that the meaning of something like μονογενής is fairly easy to guess from knowledge of more common words. But I still think it's much more useful than the pure word coverage statistics I linked to earlier.

So let's actually run the numbers on the complete GNT. If you know the top N words, how many verses could you understand 50% of, 75%, 90% or 95% of...

  vocab / coverage    any    50%    75%    90%    95%    100%  
  100    99.9%    91.3%    24.4%    2.1%    0.6%    0.4%  
  200    99.9%    96.9%    51.8%    9.8%    3.4%    2.5%  
  500    99.9%    99.1%    82.3%    36.5%    18.0%    13.9%  
  1,000    100.0%    99.7%    93.6%    62.3%    37.3%    30.1%  
  1,500    100.0%    99.8%    97.2%    76.3%    53.5%    44.8%  
  2,000    100.0%    99.9%    98.4%    85.1%    65.5%    56.5%  
  3,000    100.0%    100.0%    99.4%    93.6%    81.0%    74.1%  
  4,000    100.0%    100.0%    99.7%    97.4%    90.0%    85.5%  
  5,000    100.0%    100.0%    100.0%    99.4%    96.5%    94.5%  
  all    100.0%    100.0%    100.0%    100.0%    100.0%    100.0%  

What this means is purely from a vocabulary point of view if you knew the top 1000 lexemes, then 37.3% of verses in the GNT would be 95% familiar to you.

I should emphasis that learning vocabulary in frequency order isn't necessarily the fastest way to get this proportion of readable verses up. I blogged about this fact three years ago, see Programmed Vocabulary Learning as a Travelling Salesman Problem, for example.

by James Tauber : Created on Nov. 4, 2007 : Last modified Nov. 4, 2007 : Categories morphgnt new_testament_greek : 0 comments (permalink)

Initial Thoughts on Leopard

I've just finished installing Leopard on my laptop and so far, so good.

I started off doing a simple backup (copying folders to an external drive) and then did a full Erase and Install as I always like to do. Installation itself was a breeze.

After installation, the first thing the OS wants to do is build the Spotlight index. I made the mistake of having the external drive plugged in when this started and it was reporting it was going to take 2 hours to build the index. Fortunately, I realised what was going on and told Spotlight not to index the external drive (which it honoured immediately).

After Spotlight indexing, Leopard asked me if I wanted to use the external drive as the Time Machine drive. I said yes and it proceeded to basically make an entire copy of the initial Leopard install. For every snapshot Time Machine takes, a directory tree is created on the external drive. Files that haven't changed are hardlinks shared by previous versions. It's a pretty neat way of doing it—it basically means you can navigate to any snapshot and it looks like the full filesystem. The disadvantage of the hardlink approach is entire files are stored, not deltas, so if you change one byte in a 100MB file, the 100MB gets stored twice.

My Address Book and bookmarks didn't sync via .Mac as I'd hoped so, for the bookmarks, I found the .plist file in the backup and copied it over. The Address Book was a little more involved as it appears they changed the file format. Fortunately, I hadn't upgraded my MacPro yet so was able to copy from the backup to Tiger on my MacPro and export from there in a format that Leopard could read in.

Oddly enough, though, my Mail accounts did sync via .Mac so when I opened Mail.app, it started the lengthy process of downloading 2GB of mail over IMAP. The left pane is ordered differently (my non-Inbox IMAP folders were underneath my smart folders rather than above as before) which threw me at first but I quickly got used to it (and actually prefer it now).

Overall the UI seems crisper. The translucent menu bar isn't really my cup of tea but everything else I've seen so far seems to be an improvement visually. It's certainly nice having consistency between windows of different applications.

Spaces so far is working just like I had hoped. I was worried for a while before the release that different windows from the same app might not be able to live in different spaces (a disaster for something like Safari if you have different spaces for different projects). But fortunately my fears were unfounded.

It's a small thing but configuration of Terminal is much much nicer now.

It's always fun looking at what applications people first download when they install a fresh OS. In my case some of my usual suspects come out of the box. I didn't need to download Python 2.5. Nor Subversion. That basically meant TextMate was the only thing I had to install to be able to get right into my open source projects. TaskPaper is the only other thing I've installed so far.

There's still a lot I haven't tried yet. I can't comment on what I think of Stacks yet. Or Quick Look. And I haven't had a chance to try out iChat yet (neither the fun stuff like effects and backdrops nor the serious stuff like application sharing).

But overall I'm delighted by the upgrade so far.

by James Tauber : Created on Nov. 3, 2007 : Last modified Nov. 3, 2007 : Categories os_x apple mac : 2 comments (permalink)

Blogging Month

I haven't blogged much lately but now that I'm back in Boston I'd like to get back into the swing of it.

So, in the spirit of National Novel Writing Month, where the goal is to write 50,000 words of a novel in the month of November, I'm going to attempt to write 50 blog entries this month (not 1,000 words each, though!)

by James Tauber : Created on Nov. 3, 2007 : Last modified Nov. 3, 2007 : Categories this_site blogging : 2 comments (permalink)