blog >
James Tauber's Blog 2004
Favourite Posts of 2004
I mentioned in Blog Hits by Age that I would, as others have done recently, list my favourite entries from my blog this year.
Here are the ones that come to mind. Some generated some good discussion in the blogosphere at the time; others disappointingly didn't generate any response at all.
In no particular order...
Conference Reporting
Questions and Observations
- Why Are There Three Primary Colours?
- Update on the Primary Colours
- 37 is a Psychologically Random Number
Programming Ideas
- Versioned Literate Aspect-Oriented Programming
- Blogs, Annotations, Comments and Trackbacks
- Naked Objects in Sparta
- Wikipedia as a URI Lookup Service
- ReadySET JotSpot
- Programmed Vocabulary Learning as a Travelling Salesman Problem
- Integrating Subversion and Roundup
Little Python Scripts
Typed Citations Meme
Aggregation versus Hosting Meme
- DOAP and the Next Advogato
- Aggregation versus Hosting
- OPML Sharing and Polling Security
- More on Aggregation versus Hosting
- PIMs and DLAs
- Amazon Recommendations and Self-Hosting
XML versus RDF Meme
- XML Infoset and XML Schemas versus RDF and RDF Schemas
- More on XML and RDF
- Maximizing the Differences
by jtauber : Created on Dec. 30, 2004 : Last modified Feb. 8, 2005 : (permalink)
Poincare Project: Open Coverings and Compactness
If you pick a collection of open sets whose union is the space's entire set, then that collection is called an open covering of the space.
For example, consider the set {a, b} with topology { {}, {a}, {b}, {a, b} }. One open covering would be:
{ {a}, {b} }
Another would be
{ {a}, {a, b} }
Clearly it is possible to cover any finite topological space with a finite number of open sets.
It is also possible to cover any infinite topological space with a finite number of open sets. Because X is an open set in any topology on X, a collection consisting of just X itself is an open covering.
If an open covering has a finite subset which still manages to cover the entire set, the covering is said to have a finite subcovering.
Some topological spaces have the property that every open covering has a finite subcovering. Such a space is said to be compact.
Compactness is a topological property. Recall that this means if a topological space is compact, any topological spaces homeomorphic to it will also be compact (and also that a homeomorphism can't exist between a compact topological space and one that is not compact).
UPDATE: next post
by jtauber : Created on Dec. 30, 2004 : Last modified Aug. 20, 2005 : Categories poincare_project : 0 comments (permalink)
Film Project Update: The Long Journey Home
Sending the DVDs to festivals has, so far, gone smoothly. The same can't be said for the 20 Tom sent back to me (recall the mastering had been done in Australia but the duplication in the US).
Tom went to the UPS shop on Saturday 18th December. Once the package was in the system, they were claiming an arrival estimate of Wednesday 22nd December. This seemed optimistic at the time. I thought 23rd was a possibility. Backworking, that would mean it would have to arrive in Sydney on 22nd and hence leave California by late on 20th.
The package, however, was not even picked up from the UPS shop until 6pm on Monday 20th. This meant it didn't fly out of New Hampshire until 10pm that night. At that point I knew it probably wouldn't make it by Christmas. The arrival estimate, however, was still showing 22nd.
It arrived in Ontario, California at 7.10am on 21st and within a few hours, had been seized by customs (or whatever "PKG DELAY-ADD'L SECURITY CHECK BY GOV'T OR OTHER AGENCY- BEYOND UPS CONTROL" means).
Finally at 3.44am on 23rd December, another hub scan was done. The arrival estimate was still showing as 22nd but at least there was a chance it was going to make it on a flight pretty soon and make it into the country by Christmas at least.
But, alas, no new scans. It didn't make it on a flight on 23rd or the 24th. By 27th there was still no new scan. Then on 28th December, ten days after the package had been sent, there was another hub scan done at Ontario, California. It still hadn't left the US!
Then, a few hours later, another dreaded: "PKG DELAY-ADD'L SECURITY CHECK BY GOV'T OR OTHER AGENCY- BEYOND UPS CONTROL"
I can only guess that the customs officials just really like the film. Hey guys, keep a couple of copies, just send the rest on please!
UPS is still showing the arrival estimate as...you guessed it...22nd December.
by jtauber : Created on Dec. 29, 2004 : Last modified Feb. 8, 2005 : Categories alibi_phone_network : (permalink)
More On LinkRanks Ups and Downs
Recently, I observed large jumps in the PubSub LinkRanks for jtauber.com and attributed it to influential sites coming in and out of the 10-day window PubSub uses.
However, in a comment to Trevor Cook's entry on the jumps, the PubSub CEO responded:
The reason for the sudden shift is that we increased the granularity of how we measure linkranks. Specifically, we added individual blogs from the various hosting services for the first time (e.g. livejournal.com/johndoe) - that has suddenly shifted everyone's ranking. Bob Wyman, our CTO, dropped 30,000 places (much to his chagrin). Check out his blog for more details - http://bobwyman.pubsub.com
While it has obviously affected some blogs in the downward direction, I've been sub-50,000 ever since.
by jtauber : Created on Dec. 29, 2004 : Last modified Feb. 8, 2005 : (permalink)
Upgrade Apologies
I've upgraded this site to Leonardo 0.4.0rc3. Apologies to feed readers for the numerous atom entries whose modification dates got changed as a result.
by jtauber : Created on Dec. 29, 2004 : Last modified Feb. 8, 2005 : (permalink)
Blog Hits By Age
I was going to give a Top Ten Blog Entries By Number of Hits listing but I suspected it would not necessarily be that insightful under the hypothesis that hit numbers are partly a function of the age of the entry.
So I took the number of hits for each entry and graphed it against the age of the entry in days:
There definitely appears to be a linear baseline which the entries "rise above". To make this clearer, I graphed the hits per day against age:
Notice that the two entries from 250-300 days ago lower in significance while the entry from 50 days ago rises considerably. Which entries were these?
The older two are Eclipse is the next Emacs and Eclipse GEF. Both those get a lot of their referrals from Google searches.
The entry from 50 days ago is, funnily enough, another Eclipse GEF-related post, Six Snapshots of a Simple Eclipse GEF Application. Note that that entry is linked to from one of the older ones.
So, what effect does using average hits per day instead of just hits have on a Top Ten Blog Entries?
Here is a list of the top 10 just by hits:
- Enumerating the Rationals in Python (2411)
- My New Powerbook (2346)
- Eclipse GEF (1614)
- Eclipse is the Next Emacs (1465)
- 37 is a Psychologically Random Number (1397)
- My First Eclipse RCP Application (1371)
- Digital Life Colophon (1109)
- Naked Objects in Sparta (1063)
- Blogs, Annotations, Comments and Trackbacks (1028)
- More on XML and RDF (983)
And here is a list of the top 10 by hits per day (ignoring the last couple of days):
- My New Powerbook
- Enumerating the Rationals in Python
- Six Snapshots of a Simple Eclipse GEF Application
- 37 is a Psychologically Random Number
- My First Eclipse RCP Application
- The Inverse Law of Bug Complexity
- Aggregation Versus Hosting
- Eclipse is the Next Emacs
- Eclipse GEF
- Great Hackers, Python, Java, Eclipse and Chandler
Is the second list more representative? I think so. It includes some extra entries (in bold) that were popular (judging by incoming links and del.icio.us citations) but didn't make the first list because they hadn't been around for as long.
How does any of this match up with what I consider my own favourite entries? I'll save that for another entry.
by jtauber : Created on Dec. 23, 2004 : Last modified Feb. 8, 2005 : (permalink)
Poincare Project: The Standard Topology for Ordered Sets
One common way of defining a topology is to take a set, add some structure to that set, define a collection of subsets that meet some criteria in that structure and then use that collection as a basis for the open sets.
Although we didn't have the vocabulary to accurately describe it in those terms, that's what we did previously with the topology of a metric space. A metric space, recall, adds to a set the structure of a distance function. From this, we can define the collection of open balls. This collection can then form the basis for the other open sets in a topology.
Here is another example. Take a set X and add to it the structure of a total ordering. A total ordering is a relationship < such that
- for any a, b, c in X: a < b and b < c implies a < c
- for any a, b in X: a < b or b < a or a = b
In other words, a set with a total ordering is a set whose elements can be sorted.
Now define an open interval (a, b) to be the subset of X such that, for each element x, a < x and x < b.
The open intervals form the basis for a topology. So a total ordering on a set defines a particular topology. While other topologies are possible, the one based on the open intervals is referred to as the standard topology for the ordering or the order topology.
The real numbers, being a totally ordered set, has an order topology. While other topologies can be defined on the real numbers (as long as the rules for open sets are followed), the order topology is the most natural and consistent with one's intuitions about how the real numbers work.
UPDATE: next post
by jtauber : Created on Dec. 23, 2004 : Last modified Feb. 8, 2005 : Categories poincare_project : 0 comments (permalink)
TeX for Leonardo
Looking at Wikitex (via Simon Willison) has convinced me more than ever that I want support for TeX in Leonardo.
Hopefully 0.5 will have the framework (if not the actual implementations) to support a range of underlying document formats including TeX, XHTML and Word.
by jtauber : Created on Dec. 23, 2004 : Last modified Feb. 8, 2005 : Categories leonardo : (permalink)
Happy Birthday Konrad Tauber
Today is my father's 56th birthday.
He opened up both the world of computers and the world of business to me. He gave me endless opportunities while always leaving the path up to me. He also taught me that business is about people.
I love you dad. Happy Birthday!
by jtauber : Created on Dec. 22, 2004 : Last modified Feb. 8, 2005 : (permalink)
Branching in Subversion
I'm just about to release Leonardo 0.4.0 so I thought I'd better learn how to branch in Subversion. Turned out to embarrassingly easy:
svn copy trunk branches/0.4
assuming you've got the entire tree checked out (otherwise it can be done almost as easily with URLs).
But it did get me thinking. Previously I've talked about replacing the structure recommended by the O'Reilly Subversion book
/branches /tags /trunk
with more explicit indications of what I use tags for:
/branches /checkpoints /milestones /releases /trunk
with further structure possible under the first four directories before getting to the actual source code.
Well, if I understand correctly, there is nothing special about the /trunk directory. I'm not even sure Subversion really has a notion of a trunk. So why not only have branches?
In other words, instead of keeping the latest development under /trunk and maintenance branches under /branches, why not have a branch for the current development version alongside the branches for maintenance. Something like:
/branches/0.4 /branches/0.5
where (in Leonardo's current state), next-version development takes place under /branches/0.5 and maintenance on 0.4 is done under /branches/0.4
Unless I'm missing something, this seems a clean way of organising things that is native Subversion. The original suggestion given by the Subversion book really makes sense only if you're coming from CVS.
Again, unless I'm missing something :-)
UPDATE (2004-12-23): Justin Johnson, in email noted:
The reason for using trunk is so that developers can continue working on the latest release without having to setup a new working copy everytime the project releases. For example, I were working on 0.4 and then 0.4 released and we created a 0.5 branch, I'd have to clobber my working copy and create a new one. But if I were looking at the trunk, I would be guaranteed that it always points to the latest release that is still in development. It may seem like a minor point, but when you have a lot of developers and when the size of the project is significant, it makes a huge difference.
This is a good point. I did consider the issue of "knowing which is the development branch" and that actually made me wonder about having aliases in Subversion.
However, in my own experience, for commercial software development at least, the developers (even on big projects) all know exactly what version is the latest development version and it is an important "event" in the engineering organization when a new branch is made.
I can see that, for distributed open source development, particularly if the cycles are short, a clearly designated trunk becomes more important, though.
by jtauber : Created on Dec. 22, 2004 : Last modified Feb. 8, 2005 : Categories subversion software_craftsmanship : (permalink)
Flickr and DataLibre
Darren Barefoot has come around on Flickr after earlier making the very DataLibre comment "I’ve yet to be convinced that the best place for my online photos isn’t on my own site."
He says it's the convenience that's won him over. Any feature in particular, Darren?
I certainly have found it easier to put photos up on Flickr than on jtauber.com, but that's just because of the current state of Leonardo. There's no reason why, in the future, Leonardo couldn't provide things like Windows Publishing Wizard support and iPhoto integration to make it just as easy to get stuff up on my own website.
But even then, I might still consider using Flickr. As I've mentioned before, I'm interesting in separating aggregation and hosting, not eliminating aggregation. I should be able to take advantage of Flickr's aggregation by pointing them to my self-hosted photos.
by jtauber : Created on Dec. 22, 2004 : Last modified Feb. 8, 2005 : (permalink)
LinkRanks Ups and Downs
PubSub LinkRanks seem to be very sensitive to very recent activity which means one's rank can jump around a lot. I'm guessing this is particularly true at the long tail where just one link can leap frog you over hundreds of thousands of fellow bloggers.
Yesterday I was 938,610, today I am 89,060. I've been sub-100,000 before but I also spend time around the 1,000,000 mark if I haven't been linked to in the last week or so.
Oddly, Trevor Cook and others are reporting their rank has dropped recently. Perhaps some highly weighted bloggers just dropped out of the time-weighted window of referrers for their sites.
Or maybe PubSub have changed their algorithm. They say they are still refining it.
Incidentally, I'll use the recommended PSI so PubSub know I'm talking about them.
by jtauber : Created on Dec. 22, 2004 : Last modified Feb. 8, 2005 : (permalink)
XML Elements versus Attributes
Ned Batchelder discusses the old question of elements versus attributes in XML. As I've been answering that question for over seven years in various places, I thought I'd put down my viewpoint here.
Firstly, there are distinctions based on performance or API usability. Those distinctions are so implementation-specific, I don't think they are very interesting; certainly not to someone doing schema design.
Secondly, there are distinctions based on a particular schema language. Different schema languages have different levels of expressiveness so it's important to distinguish the characteristics of elements and attributes inherent to XML from those that are true only because of the particular choice of schema language. One important take away here is that a schema is only part of the description of a markup language. In my experience there are always constraints placed on a language beyond what the schema (in any schema language) can say.
Thirdly, there are distinctions inherent to the XML syntax itself; things like the lack of attribute order or the inability to have further XML structure within an attribute value.
But when all those three are considered, there is still a fundamental "style" question around attributes and elements and here is where a lot of people really find themselves asking the elements versus attributes question.
My take on that is that the distinction is more meaningful the more markup-oriented your XML is and more fuzzy the more data-oriented your XML is.
If you are using XML to serialise objects, then the distinction is blurry and it largely comes down to convention and things like the third type of distinction above. In such cases, an element-only approach might my perfect sense, especially if you are using a schema language that can express characteristics that, in DTDs, attributes had over elements, like default values or insignificant ordering.
But if you are truly doing markup, in other words annotating text (particularly a pre-existing text) then the distinction between attributes and elements becomes much clearer and the reason why attributes exist in XML (and SGML) is far more obvious. The key is that attribute values are considered part of the markup, rather than part of the content. So the clearer the distinction is between markup and content, the clearer it will be between using attributes or child elements.
Imagine that you want to describe Max as a black cat. From a data structure representation point of view, there's no semantic distinction between:
<cat> <name>Max</name> <colour>Black</colour> </cat>
or
<cat name="Max" colour="Black"/>
and so decisions about whether to use elements or attributes tend to boil down to (a) whether order matters; (b) whether values can have internal structure; (c) compactness or whatever.
However, if you are doing document markup, things are a little different. In the document markup case, you have some existing text that you annotate. So you start with a word "Max" in your document and you want to mark that up with a generic identifier and any additional properties you want to give that word (or referrant). You might end up with something like:
<cat colour="Black">Max</cat>
Making colour a child element rather than an attribute wouldn't make sense from a document markup perspective. In document markup there is a much clearer distinction between content and markup. "Max" is content. "Black" is markup. If you made "colour" a child element with "Black" as content then "Black" would change from being markup to content. Makes no difference in data structure representation but it does in document markup.
From a data structure representation point of view, this attribute/element distinction is so blurred that it is entirely possible to do away with attributes in representations (and sometime less confusing to do so). This is even more the case where you have schema languages that allow expression of the fact that element order (in a particular context) is not significant.
But in pure document markup applications, where attributes are just indicating characteristic qualities of an element's content, they have a clearer role.
by jtauber : Created on Dec. 21, 2004 : Last modified Feb. 8, 2005 : (permalink)
Film Project Update: Ten More Festivals
Just submitted Alibi Phone Network to ten more festivals: Phoenix FF, Palm Beach IFF, Newport Beach FF, Atlanta FF, Beverly Hills FF, San Fernando Valley IFF, Independent FF of Boston, Malibu IFF, Seattle IFF and IFP/Los Angeles FF.
by jtauber : Created on Dec. 21, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
Alexa Does DataLibre Right (Almost)
I was fiddling around with Amazon.com's Alexa and discovered they provide a very DataLibre-style way of updating one's site information:
To update your contact info, you may place an info.txt file containing your contact info in the root of your site for Alexa to fetch.Right-click this link: info.txt. And save it to your computer. Copy the info.txt file from your computer to the root of your site. Verify that the info.txt file is there with your browser. (Go to http://www.jtauber.com/info.txt.) Once you have verified that the file is there, tell us to fetch it by clicking this link: Go Fetch
Well done Amazon! Now if Bloglines did it with OPML, LinkedIn with FOAF, Freshmeat with DOAP, etc...
UPDATE (2004-12-22): Gary Fleming thinks info.txt is a bad idea. I agree with him. While I still like the DataLibre aspect of what Alexa does, Gary's entry persuaded me that requiring a fixed path "/info.txt" is the wrong way to do it. I should have been able to give Alexa my own URI. DataLibre means owning your own URI space too. Thanks Gary for making me realise that!
by jtauber : Created on Dec. 21, 2004 : Last modified Feb. 8, 2005 : (permalink)
New Mac for Audio and Video
For years, I've dreamed of having a computer dedicated to video and audio editing. It's always been hard to do because the moment I get a fast new machine with lots of memory and disk space, I want to move over to using it for everything. But I'm resolved this time to "keep it pure".
I got a PowerMac dual 2.0GHz G5 (on principle, I always buy the second-fastest processor available on the thinking that the state-of-the-art is over priced for the the people who will pay anything to get the best) with 2.5GB RAM, 2x250 HDDs and a GeForce 6800 GT card. I had earlier bought a 23" Cinema HD screen which I was running off my 12" Powerbook but now it belongs to the PowerMac.
(Actually, losing the 23" screen is going to be the toughest part of "staying pure" as I'm now back to 12" for things like Leonardo and MorphGNT. I might have to share the screen - that's not cheating is it? Do they make KVMs that work with Cinema HD screens?)
I spent a good part of today doing OS updates and installing Apple's Production Suite (Final Cut Pro HD, Motion and DVD Studio Pro). The machine came with OS X 10.3.4 which didn't have support for the 6800 card so I had to put a different graphics card in, upgrade to 10.3.7 and then put the 6800 back in.
The Production Suite install went smoothly. When it came to ProTools LE 6.1, things didn't go so well.
Until now, I've been running ProTools off my Windows machine. I'd forgotten just how much of a pain it was getting ProTools to work last time. ProTools is very picky about hardware and OS. I think I finally got it to work on Windows by upgrading my HDD drivers.
Anyway, I wasn't expecting any problems with my new Mac. But lo and behold, when I started up ProTools for the first time on the Mac, I got an error message (actually it was error code 1). A quick Google result on the DigiDesign discussion board indicated that error 1 meant that ProTools didn't like the OS version.
The next major version of ProTools is due soon so I wonder if that will work. Hopefully in the meantime there is a minor release that works on OS X 10.3.7. Going to investigate now...
UPDATE (2004-12-18): Looks like upgrading to ProTools LE 6.4 did the trick.
by jtauber : Created on Dec. 18, 2004 : Last modified Feb. 8, 2005 : (permalink)
Priority, Severity and Roundup
I'm a big fan of roundup as a bug tracking system. It does, however, come with an odd list of default priorities:
- critical
- urgent
- bug
- feature
- wish
One thing I don't like about it is that it conflates priority and severity. I think it's useful in a bug tracking system to distinguish priority and severity. While the two are often related, it is possible to have a high-priority low-severity bug (e.g. embarrassing typo in UI the day before an important customer meeting) and a low-priority high-severity bug (e.g. software crashes on an unsupported OS)
Severity, in my view is, about the impact on what the user is trying to do. Severity is fairly easy for the submitter to judge. Priority, on the other hand, is more of a triaging issue that needs to take into account a number of factors the submitter might not be privy to. So priority is best assigned in some separate review session. That is not to say the submitter can't be involved in that review — just that others need to be involved too so priority can't generally be judged at the time of submission.
Here is a list I came up with a few years ago for the severity of bugs:
- security or safety issue
- major problem with no known workaround
- major problem with known workaround
- minor inconvenience
- cosmetic
Any alternative lists people have used and found useful?
Note that features aren't included here. I'm not sure that features should be treated as a level of priority or severity. I like the approach of them being a completely different issue type. I also think there's value in having a "task" type which covers things that aren't features or bugs but nevertheless benefit from being tracked. The only problem I see with different types is that, as a developer you really want to see all your issues at once, whether they be features, bugs or tasks. It isn't clear to me how one would do that in roundup.
UPDATE (2005-01-03) : Now see More on Priority and Severity
by jtauber : Created on Dec. 17, 2004 : Last modified Feb. 8, 2005 : Categories software_craftsmanship : (permalink)
Nominations Open for 2005 Australian Blog Awards
see http://kekoc.com/wp/archives/2004/12/14/2005-australian-blog-awards-nominations/
by jtauber : Created on Dec. 17, 2004 : Last modified Feb. 8, 2005 : (permalink)
Leonardo Release Candidate
The first release candidate for Leonardo 0.4 is available at http://jtauber.com/2004/12/leonardo-0.4.0-rc1.tgz. Let me know if you encounter any problems. If all goes well, Leonardo 0.4 will be out by the end of the year.
by jtauber : Created on Dec. 16, 2004 : Last modified Feb. 8, 2005 : Categories leonardo : (permalink)
Why Couldn't They Have Had Blogs in 1986
I was reminiscing with my parents this evening about my first year of high school, which I did by correspondence because we were living in Brunei at the time. My mum reminded me that the thing I hated most was having to write a journal for English.
My teacher didn't care what I wrote, as long as I wrote something. But I always found it difficult, perhaps because the act of writing something down on paper and posting it off to my teacher in Australia made it all seem so formal.
How much easier it would have been if blogs had existed back in 1986!
by jtauber : Created on Dec. 15, 2004 : Last modified Feb. 8, 2005 : (permalink)
It Took Me A Lot Longer
Scoble mentions that today is the fourth anniversary of his blog and he credits Dave Winer as one of the people that talked him into it.
Thinking back, four years ago was the EDevCon conference in New Orleans that I gave a Web Services keynote at. Scoble was the organizer. I also met Dave Winer there for the first time (and Brent Simmons). Dave has a picture to prove it (that's me with the Slashdot fleece :-)
Whatever Dave said to Scoble to talk him into blogging, he mustn't have said to me, but I got there eventually.
by jtauber : Created on Dec. 15, 2004 : Last modified Feb. 8, 2005 : (permalink)
Architecture of the World Wide Web, Volume One
The Architecture of the World Wide Web, Volume One has become a W3C Recommendation.
Congratulations to the W3C TAG. This is a great piece of work (even if the title does sound like a Mel Brooks movie) and provides an invaluable foundation for the design of Web-based systems.
Where Leonardo has failed to embody the terminology, principles or best practices of this document, I consider that to be a bug in Leonardo.
by jtauber : Created on Dec. 15, 2004 : Last modified Feb. 8, 2005 : (permalink)
Thoughts on GNT-NET Parallel Glossing Project
Zack Hubert mentions that I'm thinking about using the NET Bible for a collaborative parallel glossing project.
Here is how it might work:
The user is presented with the Greek text and the NET text.
Consider Luke 1.1. The Greek reads:
Ἐπειδήπερ πολλοὶ ἐπεχείρησαν ἀνατάξασθαι διήγησιν περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων,
The NET reads
Now many have undertaken to compile an account of the things that have been fulfilled among us,
It should be possible to select any number of words in the Greek and any number of words from the NET and assert that they correspond (or link) to one another. There is no need to link between the entire verse of Greek and the entire verse of the NET because that link has already been made automatically.
Say the user selects Ἐπειδήπερ. They should then be shown the part-of-speech and parse information for the word (in this case C) as well as the lexical form, ἐπειδήπερ. The user should also be shown all previous glosses for ἐπειδήπερ in other contexts.
The user is then instructed to select the word or words that directly translate ἐπειδήπερ. In this case, the user selects Now and submits.
The user need not progress in order. Say the next thing they select is the word πραγμάτων. As before, they are shown the part-of-speech and parse information (N-GPN) and the lexical form, πρᾶγμα. Again the user is show previous glosses. These glosses should include those specifically for πραγμάτων as well as other forms of πρᾶγμα, perhaps displayed differently.
The user then selects things and submits.
It should be possible to select multiple Greek words and link them to just one word from NET. It should also be possible to select one Greek word and link it to multiple words in the NET. Many-to-many links should also be possible. For example, a user could select περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων and of the things that have been fulfilled among us and submit that linkage.
It is also possible that some words won’t link to anything.
Many-to-many linkages should be encouraged where the particular sense of a word is entirely determined by its use in a sequence (such as an idiom).
Users should be discouraged from doing many-to-many linkages where the sequence isn't a grammatical unit such as a phrase. For example, a user shouldn't submit a link between περὶ τῶν and of the. This clearly can't be enforced.
Users should be required to log in before they can submit linkages. Each linkage will be stored with the email address of the person that made the linkage.
While users may be encouraged to work on particular verses, they should be free to go to whatever verses interest them. Duplicate effort is not a problem and provides redundancy. The data can be checked later for inconsistencies.
by jtauber : Created on Dec. 14, 2004 : Last modified Feb. 8, 2005 : Categories greek new_testament_greek : (permalink)
Best Use of MorphGNT So Far
Zack Hubert has taken my MorphGNT and built a GNT Browser that blew me away! It displays the text in the browser; hover on a word and the lemma and parsing is shown in a pop-up; click on the word and you get a graph of word occurrence by book with the ability to list all occurrences.
I've toyed with web interfaces to the MorphGNT for years but nothing even remotely as slick as this.
by jtauber : Created on Dec. 14, 2004 : Last modified Feb. 8, 2005 : Categories morphgnt greek new_testament_greek : (permalink)
Film Project Update: DVDs and More Festivals
We've just submitted Alibi Phone Network to five more festivals: Newport, Sedona, Vail, OC and Sonoma Valley.
It was our first submission using professionally duplicated DVDs rather than making copies ourselves. We got a batch of 100 done, of which I expect around 50 to be submitted to festivals.
by jtauber : Created on Dec. 14, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
MorphGNT v5.05 Available
- Corrected occurrence of ἐμβάλλω for lemma instead of ἐμβλέπω or ἐμβαίνω (thanks to Ted Blakley via Zack Hubert)
- Denormalized variant spellings of Ναζαρά
- Corrected parse codes of κἀκεῖνος, θρόνοι
- Added comparative parse code for σπουδαιοτέρως
- Changed lemmata for ἀκριβέστερον, περισσότερον, τολμηρότερον
- Changed lemmata for οὕτως, εἵνεκεν, ἑλπίς
- Corrected lemma for ζώνην and ζώνη
by jtauber : Created on Dec. 14, 2004 : Last modified Feb. 8, 2005 : Categories morphgnt greek new_testament_greek : (permalink)
Ground Loop
The last few days I've been reorganising my home office / recording studio (unfortunately, they are still the same thing).
When I plugged my Korg Triton LE into my Digidesign Digi002 I noticed the distinctive hum of a ground loop. I've never had to deal with a ground loop before. Basically they occur when one device's path of least resistance to the ground is through the audio cable. The result is a low hum at AC frequency (50Hz in Australia).
So I hopped on to the excellent home-recording mailing list to ask what I should do.
Rodrigue Amyot came to my rescue with some things to try. The first possible problem we identified was that the Korg's power cable is only a two-pin (what were they thinking!)
Another possibility Rod raised was mixing balanced and unbalanced devices. I don't know what the Korg is (my Roland keyboard definitely has balanced outputs) and I don't know what the Digi002 takes although I would guess balanced. My cabling assumes both are balanced.
Unplugging the power to the Korg still left the hum which suggested it wasn't a power ground loop problem after all.
Still working on the problem. Audio electrics is fun.
by jtauber : Created on Dec. 13, 2004 : Last modified Feb. 8, 2005 : Categories recording_producing_and_engineering : (permalink)
Blog Goals or Lack Thereof
Dorothea Salo in Caveat Lector comments on how odd it seemed being asked how her blog was going. I think I would react the same way.
Ask how my music's going, or my filmmaking, or my morphological analysis of the Greek New Testament and I'd be able to tell you. They are projects, or at least interests manifesting as specific projects. Even the Poincare Project is foremost about me taking notes on my way to understanding the (possible) proof of the Poincare Conjecture. The use of the blog for those notes is largely incidental to that goal.
Blogging in and of itself isn't a project for me. I think that's largely because I don't have goals for it. Sure I track referrer logs and webstats, etc. Sure I get a thrill when Mark Liberman likes an idea of mine or Doc Searls doesn't. But they aren't accomplishments tracked against some schedule. I don't have monthly Scoble linkblogging targets.
Not that there's anything wrong with that. But for me, like Dorothea, blogging is scribbling. Occasionally making announcements, but mostly just scribbling.
by jtauber : Created on Dec. 13, 2004 : Last modified Feb. 8, 2005 : (permalink)
More on Typed Citations
I've written before about the idea of typed citations.
Mark Liberman (who I might have studied under if I'd gone ahead with my PhD application to UPenn) comments on the idea of typed citations with some excellent thoughts. One thing that I realised, reading Mark's post: I probably wasn't clear that I was envisaging a controlled vocabulary, much like XFN has.
The notion of typed citations relates to trackbacks, a topic I've also talked about before. Bryan Lawrence (who has recently become my main sounding board in the development of Leonardo) asks about semantics in trackbacks. He is talking about typing the source object rather than relationship but the two are related. In RDF terms, one is a class, the other is a property. I would love to see both able to be expressed in a trackback.
by jtauber : Created on Dec. 12, 2004 : Last modified Feb. 8, 2005 : (permalink)
On the Red Couch
No, I'm not appearing on Scoble's Red Couch (I wouldn't say no, though) but Nelson James will be on this red couch next Sunday.
That's right, the pop duo I'm in has been invited back (always a good sign) to perform on local chat show, The Couch, for their Christmas special.
UPDATE (2004-12-14): Unfortunately, there is a conflict with a play that Nelson is in and so we've had to cancel our television appearance. However, we should be appearing some time in the new year.
by jtauber : Created on Dec. 12, 2004 : Last modified Feb. 8, 2005 : Categories nelson_james : (permalink)
Poincare Project: A Basis for a Topology
Because of the requirement that unions and finite intersections of open sets must also be open sets, you don't need to specify every open set in order to define a topology. You can characterise a topology by describing a certain class of open sets from which the other open sets can be calculated.
Such a class is called a basis for the topology.
Because members of the basis are themselves open sets, once we have a basis we can generate all the other open sets by taking unions.
A random selection of subsets of X isn't always going to give as a basis for a topology on X anymore than it gives us a topology, so what restrictions exists on a basis the ensure it can generate a topology?
Clearly every element in the set X must appear in at least one basis open set. Otherwise that element would miss out on being in any open sets (and we know that, by definition, X itself must be open).
There is one more requirement, however, that must be met. Consider X = {a, b, c}. The open sets {a, b}, {b, c} cannot form a basis because if {a, b} and {b, c} are open then the intersection {b} must be. But {b} cannot be open because it isn't the union of basis open sets.
To avoid this, we have the additional requirement on a basis as follows:
if x is in the intersection of two basis open sets then x must also be in a third basis open set which is a subset of the intersection.
This, along with the requirement that every element must appear in at least one basis open set is sufficient to ensure that one has a basis for a topology.
UPDATE: next post
by jtauber : Created on Dec. 10, 2004 : Last modified Feb. 8, 2005 : Categories poincare_project : 0 comments (permalink)
MorphGNT v5.04 and Beyond
I've released a new version of my MorphGNT.
Details of the changes are on the MorphGNT page but they all stem from a simple query performed via a Python script: in cases where there is no parse-code (i.e. the word is essentially uninflected), is the text form the same as the lexical form (other than accentuation)?
In some cases this rule means that new lexical forms need to be provided to allow for spelling variation, rather than the lexical form normalising spelling. This is an editorial decision I've made that makes more sense in the larger picture of where I'm going with the MorphGNT.
The corrections I'm making to the CCAT database are really just a side-effect of my efforts to build an original database of New Testament Greek morphology. I'll say more about it as it develops but the idea is that surface forms, lexical forms, spelling variations, roots, stems, suppletion, morpho-phonological rules, etc. will all be catalogued with relationships between them expressed as a directed labelled graph.
Eventually, the MorphGNT will reference into this graph rather than merely give the lemma. There'll be a partial ordering of nodes in the graph (expressed by a subset of arc types) and so references will be to the node that is as general as can explain the specific surface form.
by jtauber : Created on Dec. 9, 2004 : Last modified Feb. 8, 2005 : Categories morphgnt greek new_testament_greek : (permalink)
Shift to Aggregator Use
I noticed some interesting numbers in my website logs that suggest a significant shift towards aggregator use when reading this blog.
In October, there were 772 unique IP hits to the full-text atom feed. In November, that number was 941. That's a more than 20% increase.
However, October saw 3228 unique IP hits to blog pages compared with only 2600 in November. A just under 20% decrease.
Now this might not have been caused by a shift from people reading in a browser to people reading in an aggregator but it does seem plausible, even likely.
by jtauber : Created on Dec. 9, 2004 : Last modified Feb. 8, 2005 : (permalink)
Integrating Subversion and Roundup
I'm using Subversion for Leonardo and have recently started using Roundup for issue tracking.
I'd like to have some level of integration between the two. The sort of thing I was initially thinking of was being able to associate an issue with a revision and vice versa.
The Roundup wiki gives an example of making something like Version:37 in a issue message automatically get turned into a link to the version control system (or something like ViewCVS).
Because Roundup is extensible in the object types it manages, one could presumably go a step further and have a class called "change" and extend subversion to, every time a commit is done, create a new change object for it in Roundup including the commit message.
References to issues could then be made in commit messages (and the link automatically made). Furthermore, Roundup would facilitate chatting about revisions. Revisions could be classified by topic, assigned to people for review, etc.
by jtauber : Created on Dec. 8, 2004 : Last modified Feb. 8, 2005 : (permalink)
Film Project Update: Two More Festivals Without a Box
Just completed the submissions for Ann Arbor and Aspen. For these festivals I was able to use the phenomenally useful site WITHOUTABOX.
WITHOUTBOX lets you enter the information about your film once and submit electronically (everything but the film itself but that's coming) to each festival. If you're submitting to more than a couple of festivals, this is an incredible time saver. Not only that but the site provides a calender showing upcoming festival deadlines filtered by whether your film is eligible for the festival or not.
They have support for submission to hundreds of festivals (including some pretty big ones) and seem to be adding more all the time. They also have a larger database of known festivals that aren't part of the WITHOUTABOX submission system (yet) so you can still track their deadlines too.
by jtauber : Created on Dec. 8, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
Poincare Project: Connectedness, Closed Sets and Topological Properties
Some topological spaces have the property that they can be decomposed into two disjoint non-empty open sets. In other words, there exist two non-empty open sets whose intersection is empty but whose union is the entire space. Take our ball of clay and cut it in half.
Such a topological space is said to be disconnected. Topological spaces for which this is not true are said to be connected.
Another way of defining the same notion of connectedness is via the notion of closed sets. (The existence of open sets suggested there would be something called closed sets right?)
A closed set of a topological space is simple one whose complement is open. In other words, if you have an open set, then the set of points not in that open set is a closed set. One interesting property of this definition is it allows a set to be both open and closed at the same time. If a set and its complement are both open, then both sets are also closed.
Because, by definition, the empty set and the set of all points in a topological space are open sets, they are also closed sets. And here is where we come to the definition of connectedness based on closed sets.
A topological space is connected if and only if the only two sets that are both open and closed are the empty set and the set of all points. If any other sets are both open and closed then the topological space must be disconnected.
It is fairly easy to see why this is true. If two disjoint non-empty open sets A and B have a union which is the entire space then A and B are each others complements. Therefore A must be closed (because B is open) and B must be closed (because A is open). Therefore A and B are both open and closed.
Connectedness is said to be a topological property because it is based purely on the open sets and no additional structure. Because topological properties are based only on the open sets, they are preserved by a homeomorphism. All homeomorphisms preserve all topological properties. So if a space is connected, then any space homeomorphic to it will also be connected. An important corollary is that you can never find a homeomorphism between a connected space and a disconnected one, or between any two spaces that have differing topological properties.
In the example of cutting our ball of clay in half, the before and after are not homeomorphic because the before is connected and the after is disconnected. Again, we've ripped apart points that were once in lots of open sets together so that now the only open set they share is the topological space as a whole.
UPDATE: next post
by jtauber : Created on Dec. 7, 2004 : Last modified Feb. 8, 2005 : Categories poincare_project : 1 comment (permalink)
MorphGNT v5.03 available
More corrections now and more coming soon.
Version 5.03 contains a major correction to the lemma PRO; a correction to MYRA; some spelling distinctions ENEKEN/ENEKA, BETHSAIDA(N), GOLGOTHA(N); and case corrections in proper names GERASENOS, STEFANOS, FOROS, TREIS, TABERNE, DIABLOS.
See MorphGNT.
by jtauber : Created on Dec. 7, 2004 : Last modified Feb. 8, 2005 : (permalink)
Next Film After Alibi
After we finished principal photography on Alibi Phone Network, I suggested our next short film should expand in one of the following three dimensions:
- length
- format (i.e. MiniDV to HD)
- cast/crew/prop/location requirements
Tom has been working on a great script that I definitely want to produce—the problem is it expands on Alibi in all three dimensions simultaneously: 40 mins versus 14; really deserves HD rather than MiniDV; massive increase in cast/crew/prop/location requirements. To do well, it would take 5 times as long a shoot and 10-20 times the budget of Alibi and, particularly given my lack of experience on HD, just too much of a risk.
So today I suggested to Tom that we think about an intermediate project. One that is around 20-25 minutes, shot on HD but not requiring much more beyond Alibi in terms of cast/crew size, number of locations, etc.
I have an idea I came up with in 2001 that would probably fit well. Watch this space!
by jtauber : Created on Dec. 7, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking in_the_light_of_day : (permalink)
Film Project Update: First Festival Submission Arrived
The Alibi Phone Network DVD arrived at SXSW. Next up: Ann Arbor and Aspen.
by jtauber : Created on Dec. 6, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
MorphGNT v5.02 Available
Some breathing corrections on rho-initial words.
by jtauber : Created on Dec. 5, 2004 : Last modified Feb. 8, 2005 : Categories morphgnt : (permalink)
Structured Tag Naming in Subversion
I've recently started using Subversion for versioning the Leonardo code base. While I've admired the design of Subversion since before 1.0, I'd never really had an opportunity to use it on a project.
One of the things I've done with the Leonardo repository is followed the suggestion of the O'Reilly Subversion book in having three top-level directories:
- /branches
- /tags
- /trunk
However, it's just occurred to me that, because tags are just copies with their own directory path, I could add some structure to my tags. Because I normally use tags for either checkpoints, milestones or releases, my top-level directories could be:
- /branches
- /checkpoints
- /milestones
- /releases
- /trunk
Even within things like /releases I could have structure such as
- /releases/0.4/beta/2
I'm thinking aloud but it seems like a reasonable practice to follow. Anyone done anything similar?
by jtauber : Created on Dec. 4, 2004 : Last modified Feb. 8, 2005 : (permalink)
Poincare Project: Homeomorphisms
Previously we talked about bijections as a way of pairing up all the elements of two sets. Often this is done to express that one set is equivalent to another.
Once you have structure on the set, it isn't enough to just have a bijection. The elements of the two sets must be paired up in a way that maintains the structure before the two structured sets can be said to be equivalent.
Two topological spaces are equivalent if the bijection maintains the open sets. In other words, if the bijection maps open sets to open sets then our two spaces are topologically equivalent.
Another word for topologically equivalent is homeomorphic (note the 'e') and the topology-preserving mapping is called a homeomorphism.
A topological space is the most general space that has a notion of continuity, so two spaces that differ in terms of other structures (like distance between their points) might still be homeomorphic if continuity is preserved. One way to think about this is moulding a ball of clay...
Imagine taking a ball of clay and squashing it flat. If you think of the clay as a metric space, you've clearly changed the space quite a bit because distances between pairs of points are no longer the same. However, you haven't changed the topology. The open sets are still open sets in the squashed version. Squashing the clay is a homeomorphism. If you'd drawn a continuous line on your ball it would still be continuous after the squashing. Squashing hasn't ripped two points apart from one another.
But, now consider pushing your thumb through the clay to mould it into a doughnut-shape. To make the hole, you had to rip points apart from one another. This has altered the open sets. Two points that might have been very close (and hence in some very small open sets together) might now only share very large open sets in common. Because the topology is not preserved, the mapping from ball to doughnut is not a homeomorphism.
A topologist would say that the ball of clay is not homeomorphic to the doughnut-shaped clay.
We've reached an important milestone because the Poincare Conjecture has to do with whether one particular type of topological space is always homeomorphic to another particular type.
UPDATE: next post
by jtauber : Created on Nov. 29, 2004 : Last modified July 1, 2005 : Categories poincare_project : 0 comments (permalink)
Film Project Update: Mailing the DVD
With a climax worthy of a film, I got the DVD of Alibi Phone Network sent off to Tom for duplication and festival submission.
I had arranged to visit a friend in the afternoon and my original plan was to spend the morning doing the DVD burning, mail it off and then go visit the friend. However, the burning took longer than I planned and so I decided I'd go to the post office after I'd paid the visit.
Somewhere between when I left to go to the friend's house and when I got back home, I misplaced my wallet so I had no money to pay for the shipping. I rang my mum (who lives ten minutes away) and asked if I could come over and borrow some money. (Oh how many times in my thirty-one years my mum has come to my rescue!)
I got the money, rushed to the post office just before closing time and, as I put the parcel on the counter, the lady said "can I see some ID?". I'd forgotten that international shipping requires ID. And my drivers licence was...you guessed it...in my wallet.
So I raced back home, found my passport, raced back to the post office and got the DVDs sent off within minutes of closing.
by jtauber : Created on Nov. 29, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
Leonardo Mailing List Available
As completion of 0.4 nears, I've set up a mailing list for users and potential contributors. You can join it at:
http://mail.pyworks.org/listinfo/leonardo
UPDATE (2004-12-12): I've edited this page to reflect the new address.
by jtauber : Created on Nov. 28, 2004 : Last modified Feb. 8, 2005 : Categories leonardo : (permalink)
Global Warming and Eskimo Words for Robin
If I see another blog entry that spreads the false meme that, due to global warming, Eskimos are now seeing species they don't have words for, I'm going to scream.
It's just bad linguistics.
Geoffrey Pullum does a much better job than I could of debunking this. Pullum was also the guy who debunked the "Eskimos have hundreds of words for snow" meme many years ago.
by jtauber : Created on Nov. 28, 2004 : Last modified Feb. 8, 2005 : (permalink)
Thank You Blog Readers
This blog is nine months old today.
Every couple of days, I find a new person that has added me to their blog roll. I can't tell you what a nice feeling it is knowing that, not only do people read your blog, but they are willing to admit to it publicly :-)
I still worry that my journeyman of some lack of focus...err...breadth of topics means that each post is completely irrelevant to 90% of readers—the filmmakers tracking the progress of Alibi Phone Network likely don't care if a school dance pairing is a bijection or not.
But I think I'll still just continue to blog about things that interest me and things that I'm working on. After all, pretty much every single topic I've written on has put me in contact with some interesting person that I've learnt and am continuing to learn new things from.
So thanks for reading!
by jtauber : Created on Nov. 26, 2004 : Last modified Feb. 8, 2005 : (permalink)
Programmed Vocabulary Learning as a Travelling Salesman Problem
For a while I've been interested in how you could select the order in which vocabulary is learnt in order to maximise one's ability to read a particular corpus of sentences. Or more generally, imagine you have a set of things you want to learn and each item has prerequisites drawn from a large set with items sharing a lot of common prerequisites.
As an abstract example, imagine you want to be able to read the "sentences":
{"a b", "b a", "h a b", "d a b e c", "d a g f"}
where we assume you must first learn each "word". Further assuming that all sentences are equally valuable to learn, how would you order the learning of words to maximise what you know at any given point in time?
One approach would be to learn the prerequisites in order of their frequency. So you might learn in an order like
<a, b, d, c, e, f, g, h>
However, had we put h before d, we could have had an overall learning programme that, although equal in length by the end, enabled the learner, at the half-way mark, to understand three sentences instead of just two.
To investigate this further, I needed a way to score a particular learning programme and decided that one reasonable way to do so would be to sum, across each step, the fraction of the overall set of sentences understandable at that point.
I then needed an algorithm that would find the ordering that would maximise this score.
After the quick realisation that the number of possible learning programmes was factorial in the number of words, it dawn on me that this was essentially a travelling salesman problem.
So my sister, Jenni and I wrote a Python script that implements a simulated annealing approach to the TSP. We then applied it to the above contrived example. Sure enough, it found a solution that was better than a straight prerequisite frequency ordering.
I then decided to try applying it to a small extract of the Greek New Testament (which, of course, I have in electronic form, already stemmed). So I ran it on the first chapter of John's Gospel. 198 words and 51 verses. A straight frequency ordering on this text achieves a score of 48 so that was the score to beat.
My first attempt, it didn't even come close to that. What a disappointment! Jenni and I wondered if it was just the initial parameters to the annealing model. So we increased the number of iterations at a given temperature to 50 and lowered the final temperature to 0.001 (keeping the initial temperature at 1 and the alpha at 0.9).
Success!! It found a solution that scored 82.94. The first verse readable (after 27 words) was John 1.34. John 1.20 was then readable after just 2 more words and John 1.4 after another 7.
I decided to try different parameters. With 100 iterations per temp, a final temp of 0.0001 and a few hours, it achieved a score of 91.59 (and was still increasing at the time). This time the first verse readable was John 1.24, after only 8 words; then John 1.4 after another 9; John 1.10 after 4; and both John 1.1 and John 1.6 after another 4 and John 1.2 just 1 word after that.
Overall a very promising approach. I doubt it's anything new but it was fun discovering the approach ourselves rather than just reading about it in some textbook. The example I tested it on was vocabulary learning, but it could apply to anything that can similarly be modelled as items to learn with prerequisites drawn from a large, shared set.
The next step (besides more optimised code and even more long-running parameters) would be to try to work out how to model layered prerequisites - i.e. where prerequisites themselves have prerequisites - to any number of levels. I haven't thought yet how (or even whether) that boils down (no pun intended) to a simulated annealing solution to the TSP.
UPDATE (2005-08-03): Now see Using Simulated Annealing to Order Goal Prerequisites.
by jtauber : Created on Nov. 26, 2004 : Last modified Aug. 3, 2005 : (permalink)
Film Project Update: Final Cut Done
Okay, I didn't get to it last weekend but today I finally managed to do an edit of Alibi Phone Network that cut around the line we didn't like as well as fix a bunch of other little things.
The latter included some sound level normalization and removing a sigh noise that didn't fit because the audio was from a different take than the video and in the video you couldn't see any sighing.
There are a bunch of places where I used audio from a different take than the visuals. Mostly it's during an over-the-shoulder shot during a dialog. The clearest dialog is usually recorded from the person facing the camera, so when the person with their back to the camera is speaking, it's generally better to try to use the audio from the take when they were facing the camera themselves. Syncing is generally not too difficult because you rarely see their lips so you just have to sync to their general head movement.
Sometimes, though, you mix takes when the person is facing the camera (if the audio is much clearer on a take that is different from the one with the best performance visually) and that's what I did that resulted in the sigh. To fix it, I literally cut out one second and replaced it with a second of "silence" from another part of the take. You have to replace it with something to get the sound of the room.
The whole concept of using audio from one take with visuals from another would never have occurred to me had it not been for a remark Bryan Singer makes in the commentary to The Usual Suspects (the first director commentary I ever owned—and on video, long before I owned a DVD player). The commentary on The Usual Suspects was probably the single best lesson in filmmaking I've ever had.
So, I think the film is pretty much done. Now to send a DVD to Tom who's arranged duplication for festival submission.
by jtauber : Created on Nov. 26, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
Google Scholar and Typed Citations
A couple of days ago I found out about Google Scholar which enables searching of scholarly publications. What would make this even more useful is if they combined it with a more comprehensive citation index.
Thinking about citation indices got me wondering, though: what if citation indices were annotated with the relationship between the newer publication and what it was citing? You could have relationships like "quotes", "summarises", "provides further evidence for", "argues against", "answers question posed by", and so on.
The granularity of many articles might not be right for this to really work given that one might argue for one part of an article and argue against another.
But it's theoretically appealing from the point of view of the richer searches you could do.
Continuing to think aloud: I wonder if it might be more practical in blogs. People could link to this entry with annotations like "agree", "agree with additional ideas", "agree with caveats", "seen something like this already", "really dumb idea with reasons stated".
Kind of an XFN for memes.
by jtauber : Created on Nov. 21, 2004 : Last modified Feb. 8, 2005 : (permalink)
MorphGNT v5.01 Available
Found an accent and breathing problem in both the text and lemma for ABEL, ANNA and ANNAS which is now corrected.
by jtauber : Created on Nov. 21, 2004 : Last modified Nov. 18, 2007 : Categories morphgnt : (permalink)
Film Project Update: Final Cut Looming
A few people have been asking where I'm at with the film. I'm planning on completing the final cut this weekend ready for festival submission starting in December.
There's a line in the film none of us like and I'm working on trying to cut it out. Not sure if it will work yet. If it does, you'll have to wait until the commentary on the DVD to find out what was changed :-) (unless you're one of a handful of friends and family who've already seen the film and will probably pick it right away).
by jtauber : Created on Nov. 20, 2004 : Last modified Feb. 8, 2005 : Categories filmmaking alibi_phone_network : (permalink)
Poincare Project: Further Thoughts on Topologies and Open Sets
A question raised via email by Dave Long (one of my partners-in-crime on Cleese) has prompted these thoughts.
There is an inherent circularity to think of topologies as collections of open sets because it is the topology that defines what an open set is to start with. There's nothing inherent in an open set that makes it "open" apart from the fact it is a member of the topology.
In sets with more structure that enable you to define openness in terms of that additional structure, openness still comes down to the choice of topology that the additional structure is implying.
For example, if you choose a distance function for a metric space, you've implicitly chosen the topology. So while the open sets can be explicitly defined by the distance function in that case, the very choice of the function assumes a particular underlying topology.
UPDATE: next post
by jtauber : Created on Nov. 20, 2004 : Last modified Feb. 8, 2005 : Categories poincare_project : 0 comments (permalink)
Conversation Categories
Don Park writes about an idea he calls "Conversation Categories". The idea is having a discussion on a particular topic with each participant writing in their own blog but categorising their entry as belonging to the particular conversation. An aggregator could then pick up all the pieces of the conversation.
It's discussion datalibre-style and something I'd love to implement in Leonardo.
It actually fits nicely with some of my previous ideas around trackbacks and categories, maybe even using wikipedia for URIs.
del.icio.us has got to fit in somewhere there too!
by jtauber : Created on Nov. 19, 2004 : Last modified Feb. 8, 2005 : (permalink)
Birthday Thoughts
Today is my birthday and I spent some of it thinking about what I've achieved over the last year and what I want to achieve in the next.
I think the two things I'm most pleased about in the last year are how the short film Alibi Phone Network turned out and how this blog is turning out.
Some of the things I'd like to see happen in the next year:
- screening Alibi at festivals
- releasing the first Nelson James EP
- successfully running a 5K race
- sitting music theory exam (and maybe practical too)
- getting back to Go
- completing Pimsleur Italian I, II and III
It will be fun to revisit this list in 365 days time to see how I've done :-)
by jtauber : Created on Nov. 19, 2004 : Last modified Feb. 8, 2005 : (permalink)
The Road to DataLibre
Steve Mallett has paid me a huge compliment calling my site the "closest DataLibre site I've seen" although I'm somewhat embarrassed because I'm still a long way from where I want to be.
I'm still thrilled Steve likes where I'm going, though. DataLibre is one the two main drivers (the other being REST) in how I'm implementing Leonardo. In fact, I'm considering describing Leonardo as "a RESTful DataLibre server written in Python".
I received my November copy of HBR today and there was a Forethought article entitled "I Am My Own Database" by Richard T. Watson which is pretty much talking about DataLibre. He describes what is referred to in the article as "customer-managed interaction" or CMI:
Under CMI, when a consumer buys merchandise online, he receives an electronic file that describes his purchases and that can be automatically imported into a database he's installed on his home PC. If he wants to record purchases made earlier or offline, the consumer can obtain an electronic list of common products, like books, and CDs, from the Library of Congress or commercial sources such as the Internet service Gracenote. He also registers an opinion of each purchase by using rating software incorporated into the database. The database remains in the consumer's control at all times, so if he decides that the Led Zeppelin period of his life has irretrievably passed, he can simply change his ratings of Led Zeppelin CDs he's purchased from all sources.
Finally, while writing this entry, it occurred to me that readers of the datalibre-discuss mailing list might be interested in the Forethought article. In true DataLibre fashion, I'll post this entry (along with the permalink) to the list. One feature I want to implement in Leonardo is that kind of "trackback to an email address" feature.
by jtauber : Created on Nov. 17, 2004 : Last modified Feb. 8, 2005 : (permalink)
Poincare Project: Injections, Surjections and Bijections
Imagine a school dance. There is a set of boys and a set of girls. When the music starts, each boy picks a girl to dance with.
Think of this as a mapping from a boy to a girl, or from an element in the set of boys to an element in the set of girls.
The mapping is said to be injective (or one-to-one) if each boy picks a different girl. If two boys try to dance with the same girl, the mapping isn't injective.
The mapping is said to be surjective (or onto) if no girls are left without a partner. If there is a girl not dancing, the mapping isn't surjective.
If the mapping is both injective and surjective it is said to be bijective.
You can immediately tell if there are the same number of boys and girls if the mapping is bijective—in other words, each boy is dancing with one and only one girl and no girls are left without a boy to dance with.
The existence of a bijection can be used to demonstrate that two sets have same number of elements or, in the case of infinite sets, have the same cardinality.
Bijections are also very important in establishing the equivalence between two structured sets (for example between two topological spaces) as we shall see in the near future.
UPDATE: next post
by jtauber : Created on Nov. 17, 2004 : Last modified Feb. 8, 2005 : Categories poincare_project : 0 comments (permalink)
Belated Thoughts on Blogs and Wikis
When I read Tim Bray's suggestion that blogs and wikis couldn't be more different in their essential nature, I knew I wanted to say something on the matter. Well, I've finally got around to it.
Bottom line is I agree with Tim. This may surprise some readers given I've talked before about this site being a wiki/blog hybrid and I describe Leonardo as a wiki/blog server. But here's why I don't consider it a contradiction...
Firstly, purely from the perspective of implementing the content management, there can be similarities—that's what I meant when I talked about wiki/blog hybrids. But Tim was talking about essential nature, not implementation details.
There are really a number of facets to the wiki nature. Four that immediately come to mind:
- spirit of collaboration
- in-browser editing
- easy-to-learn and non-intrusive markup
- WikiWords that encapsulate an idea
I'd like to suggest that you can have varying mixes of these and, depending on which mix you have, blogs seem further apart from or closer to wikis.
The important characteristic for something like Wikipedia is the first one. While the rest are still true to varying extents, they aren't what's interesting about Wikipedia. Martin Fowler's bliki, on the other hand, clearly doesn't have the first characteristic. However, it is strongly driven by the fourth and I think it is this facet that really makes his blog wiki-like.
I call my site (and any site served by Leonardo) a "personal wiki" in that it shares characteristics two, three and, to a small degree, four. Plenty of blog software supports in-browser editing. For someone that associates in-browser content editing with wikis, that blog software is wiki-like.
Would two and three alone really be enough to be considered a wiki, though? If not, then wikis and blogs start to diverge. Perhaps people that think blogs and wikis are similar are focusing on two and three. The more you consider four an important characteristic of wikis, the less wiki-like blogs seem—unless they are written like Martin Fowler's. The first characteristic is the one that really sets wikis and blogs apart.
There is no doubt that both wikis and blogs are social. But they are a different kind of social. Blogs are conversations (at least collectively). Wikis (when focusing on characteristic one) are collaborations. Conversations and collaborations are not the same thing. Both are useful—but they are not the same thing.
by jtauber : Created on Nov. 15, 2004 : Last modified Feb. 8, 2005 : (permalink)
MorphGNT v5.00 Available
At wildly varying intensities over the last ten years, I've worked on correcting the UPenn CCAT Morphological Parsed Greek New Testament as a side-effect of larger linguistic analyses I've undertaken. The last big burst of activity was in 2002 when I resumed work on my own morphological analysis (starting with the nouns).
The last couple of weekends, I've been working on preparing a new release of the corrected MorphGNT file, the first in probably seven or so years.
Prompted by a post to the b-greek mailing list, I've now made that release. MorphGNT v5.00 is now available at MorphGNT.
by jtauber : Created on Nov. 14, 2004 : Last modified Feb. 8, 2005 : Categories morphgnt greek new_testament_greek : (permalink)
New Mic, New Song
A couple of days ago, Nelson received the Rode NT-1A microphone he had earlier ordered. Today was the first opportunity we had to record with it.
We spent most of the afternoon recording Noise which was the song we performed on television a few weeks ago. I got Nelson to record multiple takes with differing tonal qualities and when mixed together subtly, it proved to be very effective. We managed to get some great vocal harmonies tracked too and using a gentler, breathier tone in the harmonies worked very well against the main vocal line.
In the last hour of our session, we laid down the initial tracks of a song Star in Vegas we wrote many months ago (on opposite sides of the globe) but had never recorded. Besides using the new mic, it was the first time we'd recorded Nelson's electro-acoustic guitar. We did the entire song in two single-take passes. The first was me on keyboard bass and Nelson on electro-acoustic guitar (DIed straight into the Digi002). Second was me improvising a simple piano line while Nelson sang.
A few mistakes (especially in my piano improv) but the overall recording had a magical quality that I'm too scared to try to mess with. So I'm going have to be very careful with re-recording the problem areas and keep corrections to a minimum. Nothing beats the magic you get in a first take.
All I did to the vocals was added a simple 'verb. I hardly think it needs anything else. I doubt I'll EQ it. The Rode is a beautiful sounding mic—perfect for Nelson's voice.
by jtauber : Created on Nov. 14, 2004 : Last modified Feb. 8, 2005 : (permalink)
Dragon Optical Illusion
Doing the rounds in the blogosphere is a cool optical illusion based on the looks-convex-but-is-really-concave trick. I printed it out and made my own, as shown below:
Cheered me up after my Canon disappointment.
by jtauber : Created on Nov. 12, 2004 : Last modified Feb. 8, 2005 : (permalink)
Canon Multi-Unfunctional on OS X
I just bought a Canon MP390 multi-function printer/scanner/copier/fax and stupidly assumed (without checking) that it would work on OS X. Apparently none of Canon's multi-function units support OS X (although oddly Google reveals that Apple had a Hot Deal on them through B&H Photo at one point). All the other Canon products I've used do support OS X and it appears all their standalone printers and scanners do. I'm not sure what it is about their multi-function units.
Not sure yet whether to return it or to just use it on my Windows box hoping that Canon will soon release an OS X driver.
It's the first time since I bought my PowerBook four months ago that something I've wanted to use hasn't worked with OS X.
I'm seriously bummed.
by jtauber : Created on Nov. 12, 2004 : Last modified Feb. 8, 2005 : (permalink)
Poincare Project: Topologies and Topological Spaces
We saw in Open Sets that open subsets of a set X always follow the rules:
- the union of any collection of open sets in X is also an open set in X;
- the intersection of any finite collection of open sets in X is also an open set in X;
- the empty set is open;
- the set X itself is open.
If you pick a collection of subsets of X that follows the four rules above, that collection is said to be a topology on X. Furthermore, a set along with a choice of topology on that set is called a topological space.
The use of the word choice is an important one. A given set will (unless it is a singleton) allow multiple valid topologies. It is the choice of topology that gives a topological space its characteristics rather than the the set itself.
Consider a simply set {a, b}. The smallest possible topology would be:
{ {}, {a, b} }
In other words, the empty set and the the set itself are the only two open sets. This meets the definition of a topology and, in fact, for any set will be the smallest possible topology.
Another valid topology on {a, b} would be:
{ {}, {a}, {b}, {a, b} }
In other words, all subsets are open. This also meets the definition of a topology. For any set the topology which defines all subsets to be open will be the largest possible topology.
There are two other possible topologies that can be defined on the set {a, b}
{ {}, {a}, {a, b} }
and
{ {}, {b}, {a, b} }
Step through the four rules to convince yourself that these are valid topologies for {a, b}.
Note that, although this example has involved a small, finite set, everything here applies to infinite sets too. It is possible to define, for example, different topologies on the set of real numbers. One such topology is one that equates the open intervals with the open sets. This is by far the most intuitive topology on the reals but by no means the only one.
UPDATE: next post
by jtauber : Created on Nov. 11, 2004 : Last modified Aug. 10, 2007 : Categories poincare_project : 2 comments (permalink)
Delicious Library
Not quite so highly anticipated as Halo 2, but there's been a fair amount of hype around the release of the book/CD/DVD cataloging software for OS X, Delicious Library from Delicious Monster. Yesterday, I downloaded a copy.
It certainly looks cool, presenting your library on a graphic of shelves using cover photos downloaded from Amazon.com. Bar codes can be scanned using an iSight, but I already had a bar code scanner so can use that. I had a number of text files with an ISBN-per-line of all my existing books and I was able to import that file and Delicious Library went off and downloaded all the catalog information (including front cover photo) from Amazon. It even makes use of Amazon to list similar items when you select a book.
I had already put together a catalog of my own using Mark Pilgrim's PyAmazon library but Delicious Library just looks nicer than anything I could have built. The only issues I've found so far:
- it's a little sluggish with my 1200+ books
- it would be nice if there was a feature for identifying duplicates for easy removal
- deleting a book causes the UI to go back to the start of the shelves (was very annoying during my manual duplicate removal)
But all-in-all, it's worth checking out if you run OS X, even if just to see how cool it looks showing you the covers of all your books.
by jtauber : Created on Nov. 9, 2004 : Last modified Feb. 8, 2005 : (permalink)
The Art of the Dust Jacket
I recently bought a copy of Guy Kawasaki's The Art of the Start. Great book so far, but one of the first things I noticed was the comment on the back inside flap:
The front jacket was created by Adam Tucker, winner of a design contest sponsored by Guy Kawasaki. Please take off the book jacket to see some of the other entries from his fans on the reverse side.
That's right. The inside of the dust jacket features 70-odd submissions for cover designs. Each of them is completely different. Makes you realise not only how much variation there can be in a book cover but also just how different one's perception of a book can be depending on the cover.
They say you shouldn't judge a book by its cover. But have 70 alternative covers suggested to you and you pretty soon decide which make you want to buy the book and which don't.
For what it's worth, I think Guy picked the right cover in the end.
by jtauber : Created on Nov. 5, 2004 : Last modified Feb. 8, 2005 : Categories books :