Programmed Vocabulary Learning as a Travelling Salesman Problem


For a while I've been interested in how you could select the order in which vocabulary is learnt in order to maximise one's ability to read a particular corpus of sentences. Or more generally, imagine you have a set of things you want to learn and each item has prerequisites drawn from a large set with items sharing a lot of common prerequisites.

As an abstract example, imagine you want to be able to read the "sentences":

{"a b", "b a", "h a b", "d a b e c", "d a g f"}

where we assume you must first learn each "word". Further assuming that all sentences are equally valuable to learn, how would you order the learning of words to maximise what you know at any given point in time?

One approach would be to learn the prerequisites in order of their frequency. So you might learn in an order like

<a, b, d, c, e, f, g, h>

However, had we put h before d, we could have had an overall learning programme that, although equal in length by the end, enabled the learner, at the half-way mark, to understand three sentences instead of just two.

To investigate this further, I needed a way to score a particular learning programme and decided that one reasonable way to do so would be to sum, across each step, the fraction of the overall set of sentences understandable at that point.

I then needed an algorithm that would find the ordering that would maximise this score.

After the quick realisation that the number of possible learning programmes was factorial in the number of words, it dawn on me that this was essentially a travelling salesman problem.

So my sister, Jenni and I wrote a Python script that implements a simulated annealing approach to the TSP. We then applied it to the above contrived example. Sure enough, it found a solution that was better than a straight prerequisite frequency ordering.

I then decided to try applying it to a small extract of the Greek New Testament (which, of course, I have in electronic form, already stemmed). So I ran it on the first chapter of John's Gospel. 198 words and 51 verses. A straight frequency ordering on this text achieves a score of 48 so that was the score to beat.

My first attempt, it didn't even come close to that. What a disappointment! Jenni and I wondered if it was just the initial parameters to the annealing model. So we increased the number of iterations at a given temperature to 50 and lowered the final temperature to 0.001 (keeping the initial temperature at 1 and the alpha at 0.9).

Success!! It found a solution that scored 82.94. The first verse readable (after 27 words) was John 1.34. John 1.20 was then readable after just 2 more words and John 1.4 after another 7.

I decided to try different parameters. With 100 iterations per temp, a final temp of 0.0001 and a few hours, it achieved a score of 91.59 (and was still increasing at the time). This time the first verse readable was John 1.24, after only 8 words; then John 1.4 after another 9; John 1.10 after 4; and both John 1.1 and John 1.6 after another 4 and John 1.2 just 1 word after that.

Overall a very promising approach. I doubt it's anything new but it was fun discovering the approach ourselves rather than just reading about it in some textbook. The example I tested it on was vocabulary learning, but it could apply to anything that can similarly be modelled as items to learn with prerequisites drawn from a large, shared set.

The next step (besides more optimised code and even more long-running parameters) would be to try to work out how to model layered prerequisites - i.e. where prerequisites themselves have prerequisites - to any number of levels. I haven't thought yet how (or even whether) that boils down (no pun intended) to a simulated annealing solution to the TSP.

UPDATE (2005-08-03): Now see Using Simulated Annealing to Order Goal Prerequisites.