James Tauber

journeyman of some

blog >

James Tauber's Blog 2007

The Future Comes in Green and White

This week I received two significant packages: my "spit kit" to provide a DNA sample for 23andMe to analyze and my One Laptop Per Child XO-1.

I think both of these are landmarks. I'll report on both over the coming months.

Here are some more photos:

by jtauber : Created on Dec. 21, 2007 : Last modified Dec. 21, 2007 : Categories olpc : 0 comments (permalink)

You Me Us We Named Top Song of 2007

Back in May, I wrote about the song You Me Us We which was originally an instrumental piano piece of mine to which Nelson Clemente later added vocals as part of our Nelson James collaboration. For Nelson's solo debut EP, I worked with producer Rob Agostini on an electropop version that was described by one reviewer as "the perfect blend of sparkly pop that will make you feel like the world isn't so bad after all".

The site ElectroQueer has always been a fan of the song but they just released their Top 120 Songs of 2007 and guess which song ended up at #1!

Thank you Nelson and Rob and ElectroQueer for making a simple piano piece I wrote as a 19-year-old into an international number one pop song!

by jtauber : Created on Dec. 21, 2007 : Last modified Dec. 21, 2007 : Categories music_composition nelson_james you_me_us_we : 1 comment (permalink)

Quisition Hits 1000 Users

Yesterday, Quisition had it's 1,000th user registration, which was my target for this year. Admittedly, this was fairly easy to control because most of the new users come via Google Adwords and I could adjust the budget to keep things on track to hit 1,000 before year end.

Comparing the figures from the first 100 users:

  • Only 71% activated their account (down from 89% in the first 100)
  • Only 58% created a deck for testing (down from 72% in the first 100)
  • Only 37% actually learnt any cards (down from 57% in the first 100)

I have a bunch of feature requests from regular users of the site and implementing those will now be my top priority. But I also want to explore why the percentages above are so low. I'm not surprised they are lower than those for the first 100 given most of the traffic now comes from Adwords rather than this blog. But it should be possible to get them higher than they currently are.

by jtauber : Created on Dec. 20, 2007 : Last modified Dec. 20, 2007 : Categories quisition web_marketing : 5 comments (permalink)

Avoiding Recursion

Occasionally I implement an algorithm recursively then hit the recursion limit in Python.

For example, I just wrote this:

def clique(self, node, done=None):
    if done is None:
        done = set()
    done.add(node)
    for other_node in self.edges_by_node[node]:
        if other_node not in done:
            self.clique(other_node, done)
    return done

which hits a limit if the graph is too big.

So I rewrote it like this:

def clique(self, start_node):
    done = set()
    to_do = [start_node]
    while to_do:
        node = to_do.pop()
        done.add(node)
        for other_node in self.edges_by_node[node]:
            if other_node not in done:
                to_do.append(other_node)
    return done

There's a simple relationship between the two, quite independent of the specific task at hand. Which got me thinking about abstracting the specific task out of the recursive versus non-recursive pattern.

I guess I was having an SICP moment :-)

by jtauber : Created on Dec. 15, 2007 : Last modified Dec. 15, 2007 : Categories python algorithms : 1 comment (permalink)

Resizable Text Areas

Is it just me or are Safari 3's resizable text areas really really cool!

by jtauber : Created on Dec. 15, 2007 : Last modified Dec. 15, 2007 : Categories web_design safari : 1 comment (permalink)

Stack-Type Vectors, Part III

Part of the Poincaré Project.

We've already been introduced to linear spaces and three different types of linear space: n-tuples, arrow-type vectors and stack-type vectors.

In the last post, I promised to motivate stack-type vectors from the perspective of physics.

Both arrow-type vectors and stack-type vectors (in contrast to plain n-tuples) exist in relation to a manifold. From the point of view of (at least pre-relativistic) physics, the manifold introduces a notion of position and distance, but not other quantities such as time or mass or electrical charge, which exist in addition to the structure of the manifold.

We've previously stated that displacement, velocity, acceleration, force, etc can all be modeled as arrow-type vectors. The reason is that they are all fundamentally about a change in position. If you do a dimensional analysis—L, LT-1, LT-2, MLT-2—you see that they all have a single length dimension: no length squared; no per unit length.

Arrow-type vectors can't be used for area, energy or power (all of which have L2). Nor can they be used for quantities that are measured per unit area (L-2) or per unit volume (L-3 e.g. density which is ML-3).

Finally, arrow-type vectors cannot be used for quantities that are measured per unit length (L-1) but this is where stack-type vectors come in.

Borrowing the example from Gabriel Weinreich's wonderful book, Geometrical Vectors (the first book to set me straight on this stuff), consider two plates 1cm apart with a potential difference of 30V. We can model the distance between the plates as an arrow-type vector d. The electric field E between the plates is 30V/cm. E is not an arrow-type vector, but a stack-type vector.

One way to see this is the different behaviour of the numerical values of d and E under a change of scale. If we change our measurements from centimetres to metres, the numerical value of d goes from 1 to 0.01 but the numerical value of E goes from 30 to 3000.

This difference can also been seen in the fact that, when stretched, the magnitude of an arrow-type vector increases whereas the magnitude of a stack-type vector decreases.

Arrow-type vectors are about position and rates of change of position. Stack-type vectors, on the other hand, are about rates of change of some other quantity as position changes.

If you are familiar with gradient vectors from vector calculus, it should start to become clear that gradient vectors are not of the arrow-type but rather the stack-type.

In conclusion, while arrow-type vectors and stack-type vectors both follow the axioms for linear spaces, their behaviour in relation to the manifold they get their geometric interpretation from differs. It could almost be said they are opposites. Notice that while the numerical value of d decreased and that of E increased in our scaling example, if you multiple them together, they stay constant.

Exploring this last observation will be topic of the next post in the series.

by jtauber : Created on Dec. 15, 2007 : Last modified Dec. 15, 2007 : Categories poincare_project : 0 comments (permalink)

New Quisition Features

A couple of people requested a while ago the ability to delete decks of cards from my flashcard site, Quisition. This evening I finally got around to implementing that feature. I also got an email yesterday asking for the ability to export cards, so I implemented that while I was at it.

I still have a huge long list of things to implement in Quisition including OpenID, comments and tagging of packs, user rankings, email reminders and lots more.

I'm still on track to hit 1,000 users by the end of the year. Very few of those use the site regularly, though, so one of the things I want to do early next year is survey the users to find out what I need to improve to keep them coming back.

by jtauber : Created on Dec. 14, 2007 : Last modified Dec. 14, 2007 : Categories quisition : 0 comments (permalink)

Numerical Representation of Pitch

A number of years ago, I started working on a Python library for music theory and composition called Sebastian and one of the first implementation decisions I needed to make was how to represent pitch. Assuming one allows for double-flats and double-sharps, there are 35 note names. What I was looking for was a way to represent these 35 note names in a way that was consistent with the relationships between them.

I ended up deciding to use the set of integers with D as 0 and each successor being a 5th higher. So A would be +1, E +2, G -1, Eb -5, F# +4 and so on.

This turns out to work quite nicely. Given one note, you can find the tone above/below by adding/subtracting 2 and the semitone (different letter) above/below by subtracting/adding 5. To augment/diminish (same letter) you can add/subtract 7.

You can determine if two notes are enharmonic by testing

(abs(val1 - val2) % 12) == 0

You generate the textual note name from this integer with:

modifiers = ((val + 3) - ((val + 3) % 7)) / 7
if modifiers == 0:
    m_name = ""
elif modifiers == 1:
    m_name = "#"
elif modifiers > 1:
    m_name = "x" * (modifiers - 1)
else: # m < 0
    m_name = "b" * -modifiers
return "DAEBFCG"[val % 7] + m_name

and the reverse with

letter = name[0]
base = "FCGDAEB".find(letter) - 3
if base == -4:
    raise ValueError
mod = name[1:]
if mod == "":
    m = 0
elif mod == "#":
    m = 1
elif mod == "x" * len(mod):
    m = len(mod) + 1
elif mod == "b" * len(mod):
    m = -len(mod)
else:
    raise ValueError
return base + m * 7

While it's true that moving the origin to F would eliminate some of those -3s and +3s, new ones would need to be introduced, so there's no really clear winner between an origin at D versus an origin at F.

Scales are easy to generate. The pattern for a major scale is [0, 2, 4, -1, 1, 3, 5] and so you can generate the scale for a particular tonic with something like

[tonic + i for i in [0, 2, 4, -1, 1, 3, 5]]

Another advantage of the system is it can be extended to 19-et and other tuning systems. I haven't given much thought to whether it's useful for tunings without a constant frequency ratio for the 5th, though.

I also haven't yet investigated how this system is inferior to Hewlett's base-40 system.

UPDATE: Okay, I've finally grokked the functional difference between Hewlett's system and mine. Mine can incorporate an infinite number of note names (triple flats, quadruple sharps, you name it) but it cannot express intervals of an octave or more. Hewlett's base-40 system can handle intervals of octaves or above but cannot, without change, handle more than the 35 note names of 12-tone chromatic music. My system (and others like it) require two numbers to handle more than an octave. Hewlett traded off infinite note-naming capability for the ability to include octave in a single number.

by jtauber : Created on Dec. 13, 2007 : Last modified Dec. 14, 2007 : Categories python music_theory sebastian : 6 comments (permalink)

Man from Earth

Earlier this week, I read Graham Glass's description of the movie ''Jerome Bixby's Man from Earth'':

The Man from Earth looks like my kind of movie. It's shot entirely in a living room and revolves around a man who reveals to his colleagues that he has lived on Earth for the past 14,000 years. The rest of the movie is about how his colleagues try and disprove his confession via a question and answer session.

Well, it sounded like my kind of movie too so I ordered and, on Wednesday evening, it arrived.

It's a fascinating concept and, for the most part (see below), dealt with very well. It was written by the late Jerome Bixby, of original series Star Trek writing fame.

It was obvious early on that it wasn't shot on film—a lot of low-light noise, even more noticeable than Attack of the Clones, make it clear this was shot on video. IMDb says it was shot on prosumer-grade HDV, possibly an HVX200 judging by the single behind-the-scenes shot of the camera I saw on the film's website. But this was way better done than most films shot on video. Given that it was shot on video and all in one location, it must have been an amazingly cheap film to make.

The acting was generally good. The performances sometimes felt more like a play than a film (and I mean that as a negative—stage acting looks like over-acting on camera) and I wonder how much of that is because it was shot all in one location and probably only over a few days. The story would actually lend itself to a stage performance given that it is all in one location and entirely dialogue-driven.

The pacing was excellent. There was only one time where I started to lose interest and almost immediately, the drama of it picked up and I was hooked again.

I really don't want to say much about the content itself as it's fun just going on the journey. I will, however, raise one complaint.

The film presents a very controversial portrait of Christian origins, which in-and-of-itself doesn't bother me. What disappointed me, however, was that it was done in such a simplistic and naïve way. With just a little more homework, Bixby could have made it a tighter (and no less controversial) argument.

For example, he confuses the uncertainty of certain words of Jesus with the existence of multiple English translations, seemingly ignorant of the fact that the Greek underlying the translations is known and so the uncertainty has nothing to do with the English language translations. He could have expressed the uncertainty in a far more informed way by appealing even just to differences between the Synoptic Gospels let alone issues of textual and higher criticism.

I don't know enough archaeology, paleogeography or biological anthropology to know whether there were glaring errors in those areas, but I suspect not nearly as blatant.

It's a shame because, otherwise it's a nicely done film. I would recommend it to any fans of thought-provoking high concept science fiction. I didn't love it quite as much as Primer (although it's better acted) but it's up there. Certainly if you liked one, you'll like the other.

by jtauber : Created on Dec. 7, 2007 : Last modified Dec. 7, 2007 : Categories filmmaking : 2 comments (permalink)

Dwarves vs Dwarfs

I've long known that Tolkien favoured dwarves to dwarfs as the plural of dwarf but, living post-Tolkien, it's easy to forget that dwarfs was ever the preferred spelling. Then it was pointed out that the Disney film (which came out the same year as The Hobbit) is Snow White and the Seven Dwarfs. I don't think I ever noticed that before.

It was pointed out by Aardy R. DeVarque's Sources of D&D page which is an interesting study of the origins of many terms in the original D&D (hint: they weren't all from Tolkien!)

UPDATE (2007-12-09): Mark Liberman talked about /-fs/ vs /-vz/ plurals back in 2004 in The Theology of Phonology with a followup specifically on Dwarves vs Dwarfs.

by jtauber : Created on Dec. 5, 2007 : Last modified Dec. 9, 2007 : Categories linguistic_observations : 7 comments (permalink)

Rubik's Cube

Like many people (I'd guess most people reading this blog), I had a Rubik's Cube in the 80s.

The only way I could solve it was looking up moves in the book You Can Do The Cube by Patrick Bossert. (Patrick was only 12 when he wrote the book—he's now an entrepreneur and management consultant; see his home page).

In high school I had Rubik's Magic which I took detailed notes on and came up with a solution on my own. Later I came across a book with a much shorter solution that I seem to recall memorizing in the bookstore without buying the book.

Also in high school, I had Rubik's Clock which was also easy enough to solve on my own.

Then last year, when I was in Walt Disney World surprising my sister for her 21st, I bought a Rubik's Cube again with the goal of learning how to solve it.

Wikibooks has a nice page on How to solve the Rubik's Cube that has a straightforward algorithm as well as a discussion of various other approaches favoured by speedcubers.

Wikipedia also has a very interesting article on Optimal solutions for Rubik's Cube. It was only August this year that Kunkle and Cooperman proved Twenty-Six Moves Suffice for Rubik’s Cube (pdf of paper).

UPDATE (2008-03-27): Now see Twenty-Five Moves Suffice

by jtauber : Created on Nov. 26, 2007 : Last modified March 27, 2008 : 3 comments (permalink)

Leopard After Two Weeks

Feature I use a lot that I didn't think I would: Quick Look

Feature I don't use as much as I thought I would: Spaces

Performance improvement that blew me away: Spotlight (it's amazing!)

UI changes that don't bother me: reflecting dock with glowing orbs and translucent menu bar

UI changes that do bother me: menu selection colour (too blue) and non-focused tabs in Safari (the combination of the embossing on the dark grey is horrible)

by jtauber : Created on Nov. 19, 2007 : Last modified Nov. 19, 2007 : Categories os_x : 0 comments (permalink)

Automatic Categorization of Blog Entries

I just went through a couple of years of posts, tagging some of those written before I introduced categories to Leonardo (in particular, the ones about Alibi Phone Network). It occurred to me that this categorization could be a job for machine learning.

I've talked before about using techniques like Bayesian Classification for identifying posts in an aggregator that one might be more likely to be interested in. But it didn't occur to me until now that similar techniques could be used to suggest which existing categories to place uncategorized entries into.

At some stage in the near future, I'll try a quick implementation in Leonardo and see how it goes.

by jtauber : Created on Nov. 18, 2007 : Last modified Nov. 18, 2007 : 2 comments (permalink)

The Death Of Email

So Slate has an article entitled The Death of E-Mail about the younger generation in the US abandoning email. My colleague and good friend Jim Hickey made the same observation to me month or two ago. For his sons, email is how you submit papers to your teachers. It's a formal means of communication, much like snail mail is to me.

Actually, Facebook messaging is replacing email even for me for a certain class of communication. And just in the last 24 hours I've received a number of messages (about my MINI purchase) from old friends I haven't seen in years (more than 20 years in one case) who I wouldn't even be in touch with at all if it wasn't for things like Facebook. Even between me and my colleagues, non-work communication is done more over Facebook than email now.

by jtauber : Created on Nov. 18, 2007 : Last modified Nov. 18, 2007 : Categories facebook : 1 comment (permalink)

Back To The Feature

Way back in February 2006 I blogged about starting my first feature film. Tom had just completed a first draft and I had some tough criticism that led to a number of big rewrites. Today, Tom delivered a draft (see his blog entry about it) that will form the basis of the initial script breakdown, production planning and budgeting.

21 months seems like a long time to get a script to this stage but I would hazard a guess that it's actually fairly quick by Hollywood standards. For a low-budget indie, we've probably done a lot more script revisions than the norm but I hope that shows in the quality of the final film.

So the next step is a script breakdown. That's where I take Tom's script and basically construct a database of all the characters, locations, props, special equipment, etc required. This will give me a much better idea of how long the film will take to make and how much it will cost to make.

I'm hoping to have something done by the end of next weekend.

by jtauber : Created on Nov. 18, 2007 : Last modified Nov. 18, 2007 : Categories filmmaking in_the_light_of_day : 0 comments (permalink)

My New MINI

Today I picked up my new car: a dark silver MINI Cooper S with the Premium, Sports and Cold-Weather packs. I've honestly never enjoyed driving so much in my life.

After much deliberation I got a 6-speed manual. I was worried about having to shift with my right hand (while I've driven manual in Australia, I've never driven stick in the US). But I'm so glad I went with it. No problem at all switching hands (I guess the real work is in the feet anyway and they don't switch).

I can't push it too hard for the first 1,200 miles but the acceleration in second gear is awesome.

The interior controls are really nicely done but I'm going to need to sit down with the manual to really get a hang of everything (trip computer, iPod integration, etc)

The purchasing experience at MINI of Peabody was excellent. You are really made to feel like you are joining the "cool club". Maybe MINI is the Apple of cars.

I love it! And I'm not into cars at all :-)

by jtauber : Created on Nov. 16, 2007 : Last modified Nov. 18, 2007 : 5 comments (permalink)

NaNoBloMo and Memphis

Hmm, between being sick and traveling to Memphis on business, I'm not doing a very good job at NaNoBloMo.

I have a backlog of blog posts to write, though, so I might still make 30 posts this month, although likely not the 50 I'd hoped.

At Memphis airport they promote the city in three ways: "Home of the Blues", "Birthplace of Rock'n'Roll" and "America's Distribution Center". More air cargo goes through Memphis than any other airport in the world. I think I saw more FedEx planes at the airport than passengers.

Being sick, though, I didn't get to see anything of Memphis. No Graceland tours or anything. Although it did occur to me as I took NyQuil LiquiCaps one night that popping pills might have been an authentic Elvis experience.

by jtauber : Created on Nov. 16, 2007 : Last modified Nov. 16, 2007 : Categories travel blogging : 2 comments (permalink)

I Wish NBC Shows Were On iTunes Again

I wish NBC would sell their shows on iTunes again. Watching Heroes on nbc.com is horrible.

by jtauber : Created on Nov. 11, 2007 : Last modified Nov. 11, 2007 : 0 comments (permalink)

Distance and Checksum Algorithms on Lists

Related to the programming competition, I've started thinking about algorithms for:

  • measuring the distance between two orderings of the same list (I guess number of swaps required to go from one to the other; even better if the algorithm gave you the swaps required)
  • order-sensitive checksums on lists of integers (as above, the lists always contain the same elements but the goal is to have a checksum that distinguishes different orderings with minimal clashes)

Any suggestions for approaches to these? Working Python code would be even better :-)

by jtauber : Created on Nov. 10, 2007 : Last modified Nov. 10, 2007 : Categories python programming_competition algorithms : 8 comments (permalink)

Genius

I've mentioned before that Malcolm Gladwell is a wonderful speaker. Since then I've also come across his TED talk. His talks have really taught me the power of story telling (something I confess I never did during all my XML and Web Services talks from 1997-2001 :-).

Via Kottke I found a wonderful video about the notion of genius. It's material from Gladwell's forthcoming book (talked about in Kottke's post).

Watch the video. I'm really looking forward to the book.

by jtauber : Created on Nov. 9, 2007 : Last modified Nov. 9, 2007 : 1 comment (permalink)

False Initials

People often assume that ISO stands for "International Standards Organization". But in English the full name of the organization is "International Organization for Standardization". Oh, it must be the initials of the French name, some clever people suggest. But the French name is "Organisation internationale de normalisation".

ISO isn't an acronym or initialism at all: it's based on the prefix iso- derived from the Greek ἴσος "equal".

Another example of a non-initialism is UTC. In English, this atomic clock based version of GMT is "coordinated universal time". In French it's "temps universel coordonné". So why UTC? Well, it was just a compromise between CUT and TUC to keep both the English-speaking world and French-speaking world happy.

by jtauber : Created on Nov. 8, 2007 : Last modified Nov. 8, 2007 : 4 comments (permalink)

Explaining Category Theory With Object-Oriented Programming

I recently came across Category Theory for the Java Programmer. At various points in the Poincaré Project, I've considered writing code to illustrate mathematical structures. Recently reading about category theory (and topoi in particular) I have myself been tempted to think of them in terms of OO programming.

My inclination would, of course, be to use Python, but static typing is probably more useful for explicitly showing the structure of mathematical objects (although at the end of the day, mathematics is about duck typing: if it follows the axioms of a linear space, it is a linear space).

So the post is pretty cool. But what I felt got in the way of using Java to explain category theory is that category theory is essentially about relationships between classes (in the OO sense—or least what OOA/OOD thinks of as what classes are supposed to map to in the real world). The objects in a category like Set are sets, not elements. The objects in a category like Vect are vector spaces, not vectors. Now one might argue that category theory doesn't really care about elements and vectors, only sets and vector spaces. So you could just shift your "metaness" one to the left. One might also argue (as the blog post referenced at the start does) that classes in Java are instances of the class Class so calling a class an object is allowed. But I think it just makes things confusing for the average OO programmer (especially if you think OO == Java).

A better language for illustrating category theory would be one that has a more explicit notion of a metaclass. Then it would be easier to explain category theory as essentially about different characteristics of metaclasses.

by jtauber : Created on Nov. 8, 2007 : Last modified Nov. 8, 2007 : Categories mathematics : 1 comment (permalink)

Is Vocab Ordering Like Page Rank?

The basis for the vocab ordering project was the realization that even more valuable than learning a common word is learning a common word that shares a lot of sentences with a lot of other common words.

Imagine a network of "knowledge states" where arcs represent what you need to learn in order to move from one level of knowledge to another. It was the idea of finding a path through such a network that eventually led me to make the connection with the traveling salesman problem.

It just occurred to me this evening that there may be a connection with Google's Page Rank algorithm too. After all, it's all about finding the links to the pages judged most valuable because they have links to the most valuable pages and so on.

As the wikipedia article says: "the algorithm may be applied to any collection of entities with reciprocal quotations and references".

Incidentally, I was at the conference where the PageRank paper was first presented. It was WWW7 in 1998. I ran the XML tutorial and sat on a panel on hypertext. I can't remember if I attended the Brin/Page talk or not, though. I do remember hanging out with Ted Nelson (and having dinner with him and Roger Clarke) and meeting Rohit Khare and Adam Rifkin for the first time there.

by jtauber : Created on Nov. 6, 2007 : Last modified Nov. 6, 2007 : Categories algorithms conferences google speaking : 0 comments (permalink)

Zellyn Hunter Takes Lead

Zellyn Hunter has taken the lead in Category IV of my programming competition using simulated annealing.

Congratulations, Zellyn!

Meanwhile, my Category III local search is still running :-)

by jtauber : Created on Nov. 6, 2007 : Last modified Nov. 6, 2007 : Categories programming_competition : 0 comments (permalink)

Tabu Search

In a year old comment on a three year old post relating to vocabulary ordering using simulated annealing, Chris Huntley suggests tabu search might be a better approach.

Unfortunately, I only noticed the post now (sorry Chris). I hadn't come across tabu search before but started to do some investigation and wrote the beginnings of a Python implementation. My limited understanding so far is that you essentially do a local search amongst all possible swaps but keep a list of recent swaps that are "taboo" (hence the name) so you aren't always picking the local optimum.

I started off just writing a local search version (without the tabu) and found that on my programming competition, I could get the global optimum in Category I and a pretty good score for Category II (almost as good as I could get using SA). The problem is that the local search is damn slow.

Unlike the travelling salesman problem where a swap doesn't require full recalculation of the score, in my vocabulary ordering problem, I haven't yet worked out a way to avoid having to run the entire scorer for every swap. In local search, that's O(n2) in the number of verses (Category III hasn't finished running yet after 12 hours).

I need to work out a way of improving that before I finish implementing the rest of the tabu search.

I'm also thinking ant colony optimization might be another approach.

Of course, people with more experience in these algorithms than me are more than welcome to submit an entry in the programming competition using them.

by jtauber : Created on Nov. 5, 2007 : Last modified Nov. 5, 2007 : Categories programming_competition algorithms : 2 comments (permalink)

The Wolfram Prize

I've had on my "to blog about" list for months the Wolfram Prize to prove that a certain 2,3 Turning Machine is universal.

Well, in the mean time, Alex Smith has gone and won the prize. I'll still blog about the theory behind the prize, though, and maybe eventually say some things about the proof (pdf).

by jtauber : Created on Nov. 4, 2007 : Last modified Nov. 4, 2007 : Categories mathematics wolfram_prize : 2 comments (permalink)

GNT Verse Coverage Statistics

It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words.

I even produced such a table for the GNT back in 1996—see New Testament Vocabulary Count Statistics (via Internet Archive's Wayback Machine).

But these sort of numbers are highly misleading because they don't tell you what proportion of sentences (or as a rough proxy in the GNT case: verses) you could read, only what proportion of words.

Reading theorists have suggested that you need to know 95% of the vocabulary of a sentence to comprehend it. So a more interesting list of statistics would be how many verses can one understand 95% of the vocab of if one know a certain number of words. Of course, there's a lot more to reading comprehension than knowing the vocab. But it was enough for me to decide to write some code yesterday afternoon to run against my MorphGNT database.

To first of all give you a flavour in the specific before moving to the final numbers, consider John 3.16, which is, from a vocabulary point of view, a very easy verse to read.

To be able to read 50% of it, you only need to know the top 28 lexemes in the GNT. To read 75% you only need the top 85 (up to κόσμος). With the top 204 lexemes, you can read 90% of the verse and only a few more: up to 236 (αἰώνιος) gives you the 95%. The only word you would not have come across learning the top 236 words would be μονογενής but even that is in the top 1,200.

This example does highlight some of the shortcomings of this sort of analysis. There's no consideration of necessary knowledge of morphology, syntax, idioms, etc. Nor for the fact that the meaning of something like μονογενής is fairly easy to guess from knowledge of more common words. But I still think it's much more useful than the pure word coverage statistics I linked to earlier.

So let's actually run the numbers on the complete GNT. If you know the top N words, how many verses could you understand 50% of, 75%, 90% or 95% of...

  vocab / coverage    any    50%    75%    90%    95%    100%  
  100    99.9%    91.3%    24.4%    2.1%    0.6%    0.4%  
  200    99.9%    96.9%    51.8%    9.8%    3.4%    2.5%  
  500    99.9%    99.1%    82.3%    36.5%    18.0%    13.9%  
  1,000    100.0%    99.7%    93.6%    62.3%    37.3%    30.1%  
  1,500    100.0%    99.8%    97.2%    76.3%    53.5%    44.8%  
  2,000    100.0%    99.9%    98.4%    85.1%    65.5%    56.5%  
  3,000    100.0%    100.0%    99.4%    93.6%    81.0%    74.1%  
  4,000    100.0%    100.0%    99.7%    97.4%    90.0%    85.5%  
  5,000    100.0%    100.0%    100.0%    99.4%    96.5%    94.5%  
  all    100.0%    100.0%    100.0%    100.0%    100.0%    100.0%  

What this means is purely from a vocabulary point of view if you knew the top 1000 lexemes, then 37.3% of verses in the GNT would be 95% familiar to you.

I should emphasis that learning vocabulary in frequency order isn't necessarily the fastest way to get this proportion of readable verses up. I blogged about this fact three years ago, see Programmed Vocabulary Learning as a Travelling Salesman Problem, for example.

by jtauber : Created on Nov. 4, 2007 : Last modified Nov. 4, 2007 : Categories morphgnt new_testament_greek : 0 comments (permalink)

Initial Thoughts on Leopard

I've just finished installing Leopard on my laptop and so far, so good.

I started off doing a simple backup (copying folders to an external drive) and then did a full Erase and Install as I always like to do. Installation itself was a breeze.

After installation, the first thing the OS wants to do is build the Spotlight index. I made the mistake of having the external drive plugged in when this started and it was reporting it was going to take 2 hours to build the index. Fortunately, I realised what was going on and told Spotlight not to index the external drive (which it honoured immediately).

After Spotlight indexing, Leopard asked me if I wanted to use the external drive as the Time Machine drive. I said yes and it proceeded to basically make an entire copy of the initial Leopard install. For every snapshot Time Machine takes, a directory tree is created on the external drive. Files that haven't changed are hardlinks shared by previous versions. It's a pretty neat way of doing it—it basically means you can navigate to any snapshot and it looks like the full filesystem. The disadvantage of the hardlink approach is entire files are stored, not deltas, so if you change one byte in a 100MB file, the 100MB gets stored twice.

My Address Book and bookmarks didn't sync via .Mac as I'd hoped so, for the bookmarks, I found the .plist file in the backup and copied it over. The Address Book was a little more involved as it appears they changed the file format. Fortunately, I hadn't upgraded my MacPro yet so was able to copy from the backup to Tiger on my MacPro and export from there in a format that Leopard could read in.

Oddly enough, though, my Mail accounts did sync via .Mac so when I opened Mail.app, it started the lengthy process of downloading 2GB of mail over IMAP. The left pane is ordered differently (my non-Inbox IMAP folders were underneath my smart folders rather than above as before) which threw me at first but I quickly got used to it (and actually prefer it now).

Overall the UI seems crisper. The translucent menu bar isn't really my cup of tea but everything else I've seen so far seems to be an improvement visually. It's certainly nice having consistency between windows of different applications.

Spaces so far is working just like I had hoped. I was worried for a while before the release that different windows from the same app might not be able to live in different spaces (a disaster for something like Safari if you have different spaces for different projects). But fortunately my fears were unfounded.

It's a small thing but configuration of Terminal is much much nicer now.

It's always fun looking at what applications people first download when they install a fresh OS. In my case some of my usual suspects come out of the box. I didn't need to download Python 2.5. Nor Subversion. That basically meant TextMate was the only thing I had to install to be able to get right into my open source projects. TaskPaper is the only other thing I've installed so far.

There's still a lot I haven't tried yet. I can't comment on what I think of Stacks yet. Or Quick Look. And I haven't had a chance to try out iChat yet (neither the fun stuff like effects and backdrops nor the serious stuff like application sharing).

But overall I'm delighted by the upgrade so far.

by jtauber : Created on Nov. 3, 2007 : Last modified Nov. 3, 2007 : Categories os_x apple mac : 2 comments (permalink)

Blogging Month

I haven't blogged much lately but now that I'm back in Boston I'd like to get back into the swing of it.

So, in the spirit of National Novel Writing Month, where the goal is to write 50,000 words of a novel in the month of November, I'm going to attempt to write 50 blog entries this month (not 1,000 words each, though!)

by jtauber : Created on Nov. 3, 2007 : Last modified Nov. 3, 2007 : Categories this_site blogging : 2 comments (permalink)

TaskPaper

I've been using a beta of TaskPaper from my favourite micro-ISV for a while now and it is amazing how much it hits the 80-20 point of what I want in a todo list manager.

TaskPaper manages to support projects, tagging, filtering and archiving of done tasks all with a human-readable plain text markup format.

It's just gone 1.0 and I thoroughly recommend checking it out if you run Mac OS X.

UPDATE (2007-10-29): After writing this, Jesse Grosjean offered a free license to people who wrote reviews on their blog. I decided not to take up this offer and mention that fact here in the interest of full disclosure.

by jtauber : Created on Oct. 23, 2007 : Last modified Oct. 29, 2007 : 0 comments (permalink)

Update: Visit to Google and My First Facebook App

I haven't been blogging much the last few weeks—partly because I've been busy at work and partly, I think, because my Facebook status has become the primary way I get telling the world what I'm up to out of my system.

The last week, however, I've been up to some fun stuff. On Saturday, I made my first visit to the Googleplex for the Google Summer of Code Mentor Summit. You can see a photo of us all here. The 'plex felt like a cross between a kindergarten (bright colours, play areas, places to nap) and a top-secret military installation (with security at every corner to make sure you didn't wander off into any restricted areas).

Then from Sunday thru Tuesday, I attended the Graphing Social Patterns conference on the business and technology of Facebook. Had a wonderful time catching up with some longtime friends there (who I hadn't seen for six odd years) as well as getting caught up in the incredible buzz around the Facebook platform. Apparently there's a Facebook conference on every week in the valley now but, not being a local, it was new and exciting for me.

While there, I built my first Facebook app—deployed on Django (of course). I'll say more in subsequent posts—I have a lot more planned.

Now I'm about to hop on a plane back to Australia for a week or two. I'll be working from there but should have some time for some hacking and blogging.

by jtauber : Created on Oct. 10, 2007 : Last modified Oct. 10, 2007 : Categories django summer_of_code travel conferences google facebook cats_or_dogs : 1 comment (permalink)

AtomPub Done

AtomPub is now RFC5023. I guess that means I need to get back to finishing django-atompub :-)

by jtauber : Created on Oct. 9, 2007 : Last modified Oct. 9, 2007 : Categories django atompub demokritos : 2 comments (permalink)

Flashcards Beat Harry Potter

Well, not quite. But the number of users on Quisition, my flashcard site, just exceeded the number of users on Potter Predictions. I'm hoping for 1,000 by the end of the year.

by jtauber : Created on Sept. 23, 2007 : Last modified Sept. 23, 2007 : Categories quisition potter_predictions : 0 comments (permalink)

Simone Dinnerstein and the Goldberg Variations

I've mentioned before that I have over five recordings of the Goldberg Variations in iTunes. I just increased that by one after purchasing Simone Dinnerstein's new recording on the strength of Tyler Cowen's comments.

It's definitely a lot more "modern" than Gould's interpretation but it is, quite simply, one of the most amazing recordings I've ever listened to. Definitely a candidate for a desert island album.

Of course, as a composer, the desert island candidacy is as much to do with the piece itself as the interpretation. The Goldberg Variations would have to rate as one of the greatest pieces of music ever written. There is pretty nice article on the piece on Wikipedia.

And for extra Goldberg Variation goodness, check out the video of Christopher Taylor playing parts of it on the only double-manual Steinway ever made.

by jtauber : Created on Sept. 22, 2007 : Last modified Sept. 22, 2007 : Categories music piano : 0 comments (permalink)

Sneaky Peak at Habitualist

I've already mentioned habitualist without explanation of what it is.

The site now reveals a little more of the concept I'm working on. I should be starting a closed alpha soon.

http://habitualist.com/

by jtauber : Created on Sept. 19, 2007 : Last modified Sept. 19, 2007 : 2 comments (permalink)

Clash of Worlds

At the Workshop on Features at the LAGB 2007 conference in London earlier this month, I met Ron Kaplan in person and he told me about the startup Powerset that he's involved in. It was enough of a clash of worlds that I was talking entrepreneurship at an academic conference, but then this week they were announced as one of the TechCrunch40.

It's not that often that these two facets of my life intersect.

Congrats, Ron!

by jtauber : Created on Sept. 19, 2007 : Last modified Sept. 19, 2007 : 0 comments (permalink)

DIV widths

One of the things that's been most frustrating developing websites like Quisition, habitualist and the new Cats or Dogs is the difference between Firefox and Safari (and I guess IE) in determining the width of a block.

Firefox only makes a div as wide as the content needs to be (plus any padding). Safari makes the div as wide as the parent div (minus the parent's padding and the div's own margins).

In different situations, both approaches are desirable.

Short of fixing the width (which does give the same behaviour on both Firefox and Safari) how does one (with CSS) make Firefox behave like Safari or Safari behave like Firefox?

UPDATE: Okay, debugging 101: find the smallest thing that reproduces the problem. It turns out to have to do with multiple embedded floats. The following behaves differently in Firefox and Safari:

<div style="float: left;">
    <div style="float: left;">
        <div style="width: 400px;">foo</div>
        <div style="background: #CCC;">bar</div>
    </div>
</div>

In Firefox, bar's background is only as big as the text. In Safari it extends 400px.

UPDATE (2007-09-19): Courtesy of William Bardon, now see http://jtauber.com/2007/09/div_test.html

A screen shot of that, with the window narrow enough that the third test isn't on the same line as the second is at http://jtauber.com/2007/09/tauber_bardon_test_safari.png

The same thing in Firefox can be seen at http://jtauber.com/2007/09/tauber_bardon_test_firefox.png

by jtauber : Created on Sept. 16, 2007 : Last modified Sept. 16, 2007 : Categories web_design : 11 comments (permalink)

The Long Commute

mValent recently moved from Burlington (the same town in which I live) to larger offices in Waltham. It's not far distance-wise, but anyone who's driven on that stretch of 128 knows it's a nightmare, more than a 5x increase in commute time for me, which is particularly painful given that I haven't had a commute over 5 minutes in eight years (unless you count the 34 hours it takes to get from Perth to Boston).

I may finally have the time to listen to podcasts which means, of course, I'll be relying on Scouta for recommendations.

by jtauber : Created on Sept. 15, 2007 : Last modified Sept. 15, 2007 : 2 comments (permalink)

Demokritos and django-atompub

I just added the following to the django-atompub page on Google Code:

The approach is to start from the specific and only later make it more generic. In other words, I'm building a specific Atom store implementation with all sorts of simplifying assumptions, such as the fact that a collection's path is always /collection/{id}/. Over time, I will make this more generic and django-atompub will be a library rather than an implementation. I'll still keep the specific implementation going though, it will just be called "Demokritos" again and will not be hosted here. Demokritos will be an Atom store that happens to be written on top of Django. django-atompub will be a contributed library for people wanting to add Atom support to their Django sites.

by jtauber : Created on Sept. 15, 2007 : Last modified Sept. 15, 2007 : Categories python django atompub demokritos : 0 comments (permalink)

Django Sprint

Today is a world-wide Django sprint. I was at work during the day so couldn't participate but now I'm home, I'm planning on spending the evening working exclusively on my open source django apps:

by jtauber : Created on Sept. 14, 2007 : Last modified Sept. 14, 2007 : Categories python django software_craftsmanship atompub demokritos open_source : 4 comments (permalink)

Many Eyes on Greek Nominal Suffixes

IBM's Many Eyes is a wonderful site for creating and sharing visualizations of data. They recently added a "word tree" visualization which shows weighted suffix trees.

Given I've been working on the inference of inflectional morphology in New Testament Greek for my PhD, I thought I'd upload words with nominal inflection (nouns, pronouns, adjectives, participles) along with their morphosyntactic features (case, number, gender).

For example, to see how dative, plural, masculine nominals end, you just search for occurrences ending in "DPM"

See http://services.alphaworks.ibm.com/manyeyes/view/SgoRsIsOtha66N-wO-cwI2-

At some stage I'll probably do my own visualization that, while not as pretty, will be more customized to inflectional morphology.

by jtauber : Created on Sept. 11, 2007 : Last modified Sept. 11, 2007 : Categories linguistics morphgnt phd greek new_testament_greek : 0 comments (permalink)

I'm With John Gruber

Yesterday, Steve Jobs announced a $200 drop in the price of the iPhone. Some people are claiming it's because it wasn't selling. Others are claiming it's unfair to the people that already bought one (although I'm still waiting for someone to claim the original price was "price gouging"!)

I'm with John Gruber:

Apple didn’t cut the price because demand is low — they set the debut price ridiculously high because demand was ridiculously high. I suspect that for the first few weeks, they were selling iPhones as fast as they could make them. Apple’s being aggressive, not defensive. (And for those of you who’ve already bought one and are pissed about the price cut, if you didn’t think the iPhone was worth $599, you shouldn’t have bought it. That’s how supply and demand works.)

Nicely put, John.

UPDATE: So, Steve Jobs has now said he'll refund $100 to existing iPhone owners. His open letter makes an excellent point, though:

There is always change and improvement, and there is always someone who bought a product before a particular cutoff date and misses the new price or the new operating system or the new whatever. This is life in the technology lane. If you always wait for the next price cut or to buy the new improved model, you'll never buy any technology product because there is always something better and less expensive on the horizon.

by jtauber : Created on Sept. 6, 2007 : Last modified Sept. 6, 2007 : Categories apple iphone : 3 comments (permalink)

Quisition Updates

My flashcard site, Quisition has now been moved over to WebFaction and I took the opportunity to clean up some code and make some incremental feature enhancements.

You can read more about them on the Quisition site. One new feature is the ability to reset long overdue cards. So if you tried out Quisition a while ago and would like to come back without a huge backlog of old cards, this feature is for you!

I've also implemented an alpha version of the testing parts of the site specifically styled for the iPhone. If you're interested in trying it out, email me and I'll tell you how to access it.

by jtauber : Created on Aug. 27, 2007 : Last modified Aug. 27, 2007 : Categories quisition iphone : 6 comments (permalink)

Speech Accent Archive

An archive of (at the time of writing) 795 people from around the world, reading the same passage, many entries including a narrow phonetic transcription:

http://accent.gmu.edu/

Oddly enough, I found out about it watching an interview of the lovely Zooey Deschanel where she cites it as the latest website she bookmarked.

by jtauber : Created on Aug. 25, 2007 : Last modified Aug. 25, 2007 : Categories linguistics : 1 comment (permalink)

Stack-Type Vectors, Part II

Previously, we introduced the stack-type vector and showed the identity vector, how inverses are formed and how these stacks are scaled. We haven't yet show how they are added.

Consider the two stack-type vectors marked (a) and (b) in the diagram below:

The geometric way stacks are added is to place them on top of each other and draw lines through each point of intersection, as we have done at (c). The resulting lines, show by themselves in (d) are the resultant addition. (Remember the number of lines drawn doesn't matter, it's how dense they are that determines the "magnitude" of the vector)

With a bit of experimentation, you can see that this definition of addition (in the limit as the lines become closer and closer to parallel) fulfills the requirement that v + v = 2v. In fact, you can establish that all the axioms for linear spaces hold true.

So we have so far seen three different types of linear spaces: one type made up of tuples of numbers, one type made up of arrow-type vectors and one type made up of stack-type vectors. The latter two have the special characteristic that they are geometric: they have some meaning in the context of a manifold.

Because physics models phenomena in the physical world (which can be viewed as a manifold) the vectors used in physics are of this geometric type. Displacement, velocity and acceleration are examples of measurements that can be models as arrow-type vectors. But what about stack-type vectors? We'll get to their use in physics in the next Poincaré Project entry.

by jtauber : Created on Aug. 23, 2007 : Last modified Aug. 23, 2007 : Categories poincare_project : 0 comments (permalink)

Trailing Slashes

The biggest mistake I made in Leonardo was making "foo" and "foo/" mean the same thing. I don't mean directing one to the other with a 301, I mean returning the same content at two different URLs. If you add the fact I listened on both jtauber.com and www.jtauber.com, it meant that every resource had 4 distinct URLs.

That's now fixed: www.jtauber.com redirects to jtauber.com and "foo" is redirected to "foo/".

Unfortunately, most of the del.icio.us bookmark counts I include at the bottom of each page are now effectively reset to 0. While 164 people had bookmarked "poincare_project", none had bookmarked "poincare_project/". Even though the former now redirects to the latter, I've effectively split the vote between past and future bookmarking of that page.

As far as I know, del.icio.us never heeds 301s and updates its database accordingly. Google Reader doesn't seem to either, judging by the continual checking of "/atom/full" in addition to "/atom/full/".

by jtauber : Created on Aug. 23, 2007 : Last modified Aug. 23, 2007 : Categories django leonardo this_site : 12 comments (permalink)

A Very Short Introduction

Via Greg Mankiw, I just found out about Oxford's Very Short Introductions series. Greg has high praise for Mathematics: A Very Short Introduction. When I looked it up on Amazon, I found numerous other books in the series recommended to me: Particle Physics, Cosmology, Music, Economics, Theology, The Brain. There were about 20 that immediately sounded interesting.

As I explored more I found more and more books in the series. Oxford's Very Short Introductions site indicates over 150! Damn, almost every single one of them looks interesting.

As well as individually, they sell them in boxed sets like the "Basics Box" (Philosophy, Maths, History, Politics, Psychology), the "Brain Box" (Evolution, Consciousness, Intelligence, Cosmology, Quantum Theory) and the "Thought Box" (Hegel, Marx, Nietzsche, Schopenhauer, Kierkegaard).

As is generally the case with these sorts of series, I'm guessing the quality varies a lot. Still, it's hard for me to resist buying a large proportion of them right now :-)

by jtauber : Created on Aug. 20, 2007 : Last modified Aug. 20, 2007 : Categories books : 2 comments (permalink)

jtauber.com Switches to Django

I've spent the last couple of days doing a crude port of Leonardo over to Django including a converter from the Leonardo File System.

I apologize in advance for any problems with the new system.

Functionally, it should already be better than the old one. In particular, categories are handled in a better way which, amongst other things, means that you can navigate through all the pages in a particular category with a previous and next link (see the categories in this post for an example) and category pages list entries with a last modification date and number of comments. Also there is a path of links at the top of any page that isn't at the top level.

I still need to clean up some of the content in light of changes like this so excuse the construction work for the next few days.

Lots more new features planned though, which is one of the reasons I did the port in the first place.

by jtauber : Created on Aug. 19, 2007 : Last modified Aug. 19, 2007 : Categories django announcements leonardo this_site : 0 comments (permalink)

Thoughts on the Social Graph, DataLibre and Aggregation versus Hosting

The recent activity stemming from Brad Fitzpatrick's Thoughts on the Social Graph reminds me of something I was thinking a lot about back in 2004. See especially Mark Atwood's post Five billion social network sites, each about one person on the Social Network Portability list.

The idea I had three years ago was the notion of people hosting their own data and the social networking sites merely being aggregators.

Actually, the way it occurred to me first was in the context of DOAP and the next Advogato but I then generalized the idea in Aggregation Versus Hosting.

Steve Mallett started a mailing list and a website about this same idea, calling it DataLibre. Unfortunately the site no longer exists but here is an announcement of the original site: http://www.oreillynet.com/ruby/blog/2004/09/datalibre_update.html

Steve really drove the vision but I posted numerous thoughts on the idea. Because my blog didn't have categories at the time and I haven't gone back and categorized my old stuff here's a list of my relevant posts:

Brad's article (which is a lot more concrete than anything I ever said) seems to be rejuvenating discussion about this sort of thing. See, for example, the post More on social network portability on Plaxo's blog.

by jtauber : Created on Aug. 18, 2007 : Last modified Aug. 18, 2007 : 0 comments (permalink)

Blog Stats

I've posted here every day for the last 10 days straight. I wondered if I'd had such a writing streak before, so I write a quick program that generated a bunch of stats:

(not counting this post...)

  • I've blogged 721 entries
  • I've never missed a month
  • Worst months were August 2006 and April 2007 both with only three posts the entire month
  • Best month was March 2005 with 47 posts
  • Most number of posts on a single day was six on 14th March 2005 (pi day!)
  • Of 477 blogging days, 303 had one post, 118 two posts, 45 three posts and 11 four or more.

and...

by jtauber : Created on Aug. 18, 2007 : Last modified Aug. 18, 2007 : Categories this_site : 0 comments (permalink)

List-Unsubscribe

I tend to swing back and forth with my mailing list subscription habits. The two endpoints of the pendulum's path are:

  • I rarely read this list, but a have a rule that puts it in its own folder so it's always there if I ever do want to read it
  • I rarely read this list and it's cluttering my mailer with 10,000 unread messages so I'll unsubscribe

I'm currently on a swing to the second one.

In one case, I wasn't immediately sure how to unsubscribe from the list, so I looked at the raw headers for the List-Unsubscribe field (described in RFC2369). Sure enough, it was there.

It would be really nice if more mail apps supported this. Okay, it would be really nice if Mail.app supported this :-)

by jtauber : Created on Aug. 18, 2007 : Last modified Aug. 18, 2007 : 3 comments (permalink)

A Lot Many People Talk Like This

I work with a number of people from Maharashtra and I've noticed they all say "a lot many" where I would say "a lot of". I wonder if this is based on a similar construction in Marathi or Hindi or if there's some other reason (analogy with "a good many", for example).

A Google search for "a lot many" (with quotes) shows numerous examples (if you ignore the cases where punctuation separates "a lot" and "many" — I wish you could tell Google not to count those cases).

Anyone have any insight?

UPDATE: I sat down with one of my colleagues and I understand now. In some dialects of Marathi (especially around Pune, for example), the "proper" way to express a large quantity is not just khup but khup sare (खुप सारे). In some regions in Maharashtra, it might be fine to just say khup but the use of both words together is preferred by those that consider the Pune dialect (and those similar) the more "pure". A literal translation to English would thus be something like "a lot many".

by jtauber : Created on Aug. 17, 2007 : Last modified Aug. 17, 2007 : Categories linguistic_observations : 3 comments (permalink)

PGR2 Maps

A few years ago, my dad won an XBox as a door prize at some event and, for his next birthday, we got him Project Gotham Racing 2 (PGR2). I played it a fair bit too and absolutely loved it.

When I was in Florence at the end of last year, I confess I actually recognized some of the roads from PGR2. I went looking online to see if someone had mapped out all the courses on Google maps but couldn't find anything.

There are some screenshots on Flickr from Bizarre Creations showing the course models. For example at http://flickr.com/photos/bizarrecreations/304326348/ but nothing to really tie the course to the real place.

This evening I found http://www.moondookie.org/pgr2tracks.html which has 12 of the courses drawn over satellite images. It would be nice to have a more complete set, though, and in machine readable form.

Actually, what would be really nice is having the actual models (like the one depicted in the first link). I guess it's too much to hope for that :-) Google's 3D Warehouse is probably best bet of having something anywhere close.

by jtauber : Created on Aug. 16, 2007 : Last modified Aug. 16, 2007 : 0 comments (permalink)

Potter Predictions Now Has Comments

I've added comments to Potter Predictions.

So if you've read the 7th book, head over to http://potterpredictions.com/ and you can comment on each prediction (including arguing if you think I got the "answer" wrong).

by jtauber : Created on Aug. 15, 2007 : Last modified Aug. 15, 2007 : Categories potter_predictions : 0 comments (permalink)

Logo Design For Cats or Dogs

I'd like a logo for the upcoming relaunch of Cats or Dogs. I pretty much know what I'd like: the words "Cats or Dogs" with a cartoon cat on the left and a cartoon dog on the right, both leaning on their respective word with a smug look on their face as if to say "of course you're going to pick me".

In return, you'll get a credit (with a link to your site) on the footer of every page.

Email me if you're interested.

by jtauber : Created on Aug. 15, 2007 : Last modified Aug. 15, 2007 : Categories cats_or_dogs : 0 comments (permalink)

Two Google engEDU Videos

I really enjoy the engEDU videos (aka Tech Talks) that Google makes available.

Two interesting ones I recently saw:

both in some senses about the same topic of measurement.

Ironically it took me a while to get used to Pat Naughtlin's broad Australian accent. I wonder if that counts as cultural cringe.

by jtauber : Created on Aug. 15, 2007 : Last modified Aug. 15, 2007 : 1 comment (permalink)

Stack-Type Vectors, Part I

Part of the Poincaré Project...

So I've promised a couple of times to introduce stack-type vectors. This is a type of geometric vector that works a little differently than arrow-type vectors. They still meet the requirements of a linear space but their behaviour on a manifold, in particular with regard to a coordinate system on that manifold is a bit different. We'll get into that difference in subsequent posts (as well as talk about where these stack-type vectors show up in physics).

Here is what a stack-type vector might look like in three-dimensions:

Hopefully you can see why I call these stack-type vectors. From now on, though, we'll mostly draw two-dimensional stack-type vectors, but you should still get the same idea.

Note that the number of lines drawn and how long they are are not important (any more than the thickness of the line and shape of the arrow are important in an arrow-type vector). Instead it's the direction the lines are stacked in and how tightly they are packed that characterizes the stack-type vector.

But for a set of these to form a linear space, we need to say how to add them and how to scale them.

Here's how this vector scales (by 2x and by -1x):

Notice that scaling by 2x means the lines are packed twice as closely.

In part II, I'll show how addition of these vectors is defined.

by jtauber : Created on Aug. 14, 2007 : Last modified Aug. 14, 2007 : Categories poincare_project : 0 comments (permalink)

Blogging By Day of the Week

A colleague of mine observed in his reading trends on Google Reader that he reads more blog entries on a Thursday than any other day. His hypothesis was that by Thursday he's bored of office work and other routine stuff and so is more likely to choose to read blogs rather than undertaking other activity.

I checked my own reading trends and, sure enough, I read, on average almost twice as many entries on Thursday than the second most common day. Saturday is the day I read the least number of entries. The thing is, unless I'm travelling or really swamped at work, I generally clear my queue of blogs to read daily. Which means the Thursday thing could be more indicative of when entries are posted than when I take the time to read them.

My colleague points out, his argument could equally apply to posting as well as reading.

So I did a quick three-line Python script to output which day of the week each blog post of mine was made on and a sort | uniq -c dance gave me the following results:

  • Monday 67
  • Tuesday 106
  • Wednesday 108
  • Thursday 93
  • Friday 115
  • Saturday 125
  • Sunday 93

So my lowest reading day is my highest posting day and my highest reading day is my equal second lowest posting day.

I guess I don't blog when other people do :-)

by jtauber : Created on Aug. 14, 2007 : Last modified Aug. 14, 2007 : Categories blogging : 3 comments (permalink)

49

I'd heard stories of people getting huge AT&T bills (size, not cost) for their iPhone because AT&T lists every single data transfer session as a line item. Sure enough, when I got my bill today, the envelope was thick and inside was a bill 49 pages long.

by jtauber : Created on Aug. 13, 2007 : Last modified Aug. 13, 2007 : Categories iphone : 1 comment (permalink)

Math Doesn't Suck

Speaking of Numbers...

A few years ago I was on a flight from Boston to LA when I saw Danica McKellar (Winnie Cooper from The Wonder Years) sitting in business class. I later looked up IMDb to see what she'd been up to and discovered she'd graduated summa cum laude from UCLA with a BS in mathematics.

Furthermore while there, she published a paper that got her a finite Erdős number, making her one of the few people that has both a finite Erdős number (hers is 4) and finite Bacon number (just 2 in her case). (See Wikipedia's Erdos-Bacon number article for a list of other people with both.)

For a while, Danica's had a feature on her website where she answers maths questions posed by fans.

Then, earlier this month, she published a book entitled Math Doesn't Suck : How to Survive Middle-School Math Without Losing Your Mind or Breaking a Nail. I'm clearly not the target audience but if I had a daughter, she could do a lot worse than have Danica McKellar as a role-model.

Way to go Danica!

by jtauber : Created on Aug. 12, 2007 : Last modified Aug. 12, 2007 : Categories mathematics : 1 comment (permalink)

Installing iWork '08

I just got back to Boston from San Francisco (had a great dinner and chat with Graham Glass last night) and my copy of iWork '08 had arrived.

I'm looking forward to trying out Numbers as I've been eagerly anticipating it for close to two years (I was disappointed '06 didn't have it).

From what I've seen online, I suspect Numbers won't be suitable for some people's high-end number-crunching but for the everyday spreadsheet work I do, it looks like a dramatically better approach than the mainstream spreadsheets of the past.

It will be interesting to see what the "more word processor than page layout" additions to Pages are like as well.

by jtauber : Created on Aug. 12, 2007 : Last modified Aug. 12, 2007 : Categories apple : 2 comments (permalink)

A New and Improved Cats or Dogs Coming Soon

Cats or Dogs turned out to be a great success. It was fun writing and I got a lot of great feedback.

I always meant to go back and add more polish and a bunch more features.

I'm finally starting to do that (reusing some of Potter Predictions too) and, rather than continuing to live off quisition.com, decided to get a dedicated domain: cats-or-dogs.com

Only thing there at the moment is the ability to subscribe to get an email when the site launches. I'll add an Atom feed too, for people who would prefer that means of notification. Both will also help me to gauge interest.

by jtauber : Created on Aug. 11, 2007 : Last modified Aug. 11, 2007 : 2 comments (permalink)

We Like Our Own Stickers Better

Great answer to a dumb question...

audio

by jtauber : Created on Aug. 9, 2007 : Last modified Aug. 9, 2007 : Categories apple : 0 comments (permalink)

Predictive Text

I always used to think it funny on my old Sony Ericsson that the T9 predictive text proposed the name of one of my sisters whenever I was trying to type the name of the other, right up until the last character.

When I try to type my own name on my iPhone, it proposes one of my heroes instead...

If that's predictive text, I guess I have a bright future :-)

by jtauber : Created on Aug. 9, 2007 : Last modified Aug. 9, 2007 : Categories iphone : 1 comment (permalink)

In San Francisco

I'm in San Francisco from today until Sunday.

If anyone is interested in catching up Friday evening or Saturday, let me know — we'll see if we can get a Geek Dinner going.

by jtauber : Created on Aug. 9, 2007 : Last modified Aug. 9, 2007 : 0 comments (permalink)

Everything is Miscellaneous

I've mentioned before that I've long held a fascination with taxonomies, classifications systems and the like.

I've also been a long-time advocate of hierarchy being derivative (via ordering of facets) rather than primitive, and of faceted tagging (back in 2005 I called it tag the tags but see Google Code Project Hosting for a great example of how simple tag structure achieves this).

So I'm having a great time (half way through) with David Weinberger's Everything is Miscellaneous which is a wonderful book about these kinds of topics, not just in the computer age but going back to topics like the alphabetization, library catalogs in Alexandria, the Dewey Decimal System, the Periodic Table (and alternatives) and the Linnaean Taxonomy.

The book doesn't fall into the books that changed my mind category but more the I never thought of putting it that way before category.

by jtauber : Created on Aug. 9, 2007 : Last modified Aug. 9, 2007 : Categories books : 0 comments (permalink)

Linear Spaces: Duck Typing and Tuples

Part of the (very slow moving) Poincaré Project.

We've introduced the general concept of a linear space and shown how it applies to the specific notion of arrow-type vectors.

Recall that a (real) linear space is a set of objects that can be added and scaled. What 'add' means and what 'scale' means are arbitrary as long as the following rules are followed:

  • the space must form a commutative group under addition, in other words:

    • for any two elements u and v, u + v is also an element of the space

    • there is an element 0 called the identity element such that for every element v, v + 0 = v

    • for every element v there is a corresponding inverse (written -v) such that v + -v = 0

    • the addition is associative, i.e u + v + w = (u + v) + w = u + (v + w)

    • the addition is commutative, i.e u + v = v + u

  • for any real number a and any element v, av is also an element of the space (we say v has been scaled by a)

  • for any two real numbers a and b and any element v, (a + b)v = av + bv

  • for any real number a and elements u and v, a(u + v) = au + av

  • for any real numbers a and b and element v, a (bv) = (ab)v

  • for any element v, 1v = v

(other properties fall out naturally from the above requirements. For example, it's easy to show that -v must be v scaled by -1, that v + v = 2v and so on. Because these follow from the rules above, they are true of all line