James Tauber's Blog 2004/04
Google Auction in Python
After reading the possible auction-based share allocation algorithm in Google's S-1 filing, I thought I'd try to implement it in Python.
The result is available at http://jtauber.com/2004/04/30/google_auction.py.
by jtauber : Created on April 30, 2004 : Last modified Feb. 8, 2005 : (permalink)
Blogs, Annotations, Comments and Trackbacks
Danny Ayers makes the link between blogging and annotation. I've been thinking about this sort of thing from a different (although related) viewpoint.
Lately I've been thinking about implementing comments and/or trackback in Leonardo. I personally think they are essentially the same thing and that there is a obvious relationship with web page annotation.
The way trackback works is you ping a blog with information about a reference that has been made about an entry in that blog. The information can include an excerpt of what was said.
The trackback implementations I've seen tend to give a trackback URI for an entry that is different from the URI for the entry itself. A more RESTful approach (and one I plan to implement in Leonardo) is to have the trackback URI be the URI of the entry. So you POST to the blog entry to trackback.
I had already considered POSTing to the blog entry as the mechanism for comments and that is when it first struck me that comments and trackbacks are really the same thing. The fields that you POST would be slightly different, but the mechanism should be the same.
Which leads me to web page (or, more generally, resource) annotation. There is no reason why the resource you post to should be restricted to being a blog entry. In fact, in Leonardo, there is nothing special about a blog entry—implementing trackback/comments for blog entries would enable the same capability for any page on my site.
Finally, in all these cases the mechanism involves POSTing the comment/trackback/annotation to the source itself but there is no reason why such information couldn't also be detached and expressed in RDF for the purposes of annotation servers, Technorati-style sites, etc.
I'd like to see a spec that supports this approach. I don't think it should be Atom's initial goal but I would like Atom to at least be compatible with the kind of unification I'm describing.
UPDATE (2004/05/06): see Joe Gregorio's response.
by jtauber : Created on April 29, 2004 : Last modified Feb. 8, 2005 : (permalink)
Email Stats
The last week, I've kept all incoming mail including spam to get a rough breakdown of how much email I get and of what category it is.
Here are the preliminary results:
- 70% is spam/virus/worm (837 per day - most of it caught by the spam filter)
- 17.5% is mailing-lists (211 per day - much of which I end up not reading)
- 10% is work-related alerts (CVS, bugzilla, build, etc - probably a quiet week)
- 1% is opt-in newsletters, announcements, alerts, etc
leaving the remaining 1.5% (124 messages) which is the core email I likely need to act on, reply to and which I will file in my "Keep" folder (currently close to 10,000 emails) once it's no longer actionable.
by jtauber : Created on April 28, 2004 : Last modified Feb. 8, 2005 : (permalink)
New Amazon Purchases
At some stage I might add a "currently reading" box on this site. Until then, here's what just arrived from Amazon:
- The Elements of Typographic Style by Robert Bringhurst — recommended by Bill Hill
- Basic Category Theory for Computer Scientists by Benjamin C. Pierce — to complement my existing books on this subject
- Ready for Anything: 52 Productivity Principles for Work & Life by David Allen — Collection of columns from my favourite productivity guru
- The Firm, The Market, and the Law by R. H. Coase — I used to quote Coase's 1937 masterpiece The Nature of the Firm (included in this book) in my keynotes on Web Services
- Naked Objects by Richard Pawson and Robert Matthews — see Naked Objects
- Dungeons and Dreamers: The Rise of Computer Game Culture From Geek to Chic by Brad King and John Borland — heard about this during the Richard Garriott and Warren Spector Panel that the first author moderated. Someone should make a television series based on this book.
by jtauber : Created on April 27, 2004 : Last modified Feb. 8, 2005 : Categories books : (permalink)
Bubblets After Bray
Following after Tim Bray's linking to the Technorati Cosmos for each post, I've done the same. Not that I get nearly the incoming links that Tim does (two orders of magnitude less, in fact).
Simon Phipps makes some excellent comments on the limitations of this approach and has the beginnings of a nice taxonomy of comments.
UPDATE (2004/05/05): Decided to remove the bubblets - I just don't get enough incoming links yet :-). It was a good exercise in making Leonardo a little more extensible, though, and other users of Leonardo can easily add them back in.
by jtauber : Created on April 23, 2004 : Last modified Feb. 8, 2005 : (permalink)
Introducing Leonardo
I've had a few requests for the Python code this site runs on so, over the weekend, I cleaned it up a little ready for a release. I then had to come up with a name. I'm over my "must include 'Py' in the name of every Python project" phase and wanted something that invoked the notion of a technologist's or scientist's notebook. It didn't take me long to come up with "Leonardo". And no sign of a name clash on either Freshmeat or Sourceforge. So, "Leonardo" it is.
I'm not sure it's ready for prime-time (although it's been running jtauber.com for almost a year) but if you have a lot of patience, you are more than willing to give it a try and I'd love your feedback.
Code is available from the Leonardo page. I'll incrementally put documentation there too - likely in response to user questions as they come in.
by jtauber : Created on April 21, 2004 : Last modified Feb. 8, 2005 : Categories leonardo : (permalink)
Elements of Linear Spaces
Since Tim Bray kindly announced my entry into the blogosphere, I've found that questions I pose here get answered by wonderful people I've never met before.
It's worked with technology questions so let's try it with a question of mathematical terminology that has been bothering me recently.
The question is simply: If one wishes to refer to vector spaces by the alternative "linear spaces", what should elements of that structure be referred to if not "vectors"?
I want to avoid using the term "vector" for generic elements of a linear space because, when talking about things like one-forms and bivectors, I'd like to use the term "vector" in its narrower sense.
Most texts I've looked at give "linear space" as an alternative name for "vector space" but none provide an alternative to "vector".
Any ideas?
by jtauber : Created on April 20, 2004 : Last modified Feb. 8, 2005 : (permalink)
More Feeds Wanted
One of the first things I noticed when I started using a news aggregator is how much it changed my web surfing routine. I used to get up in the morning and open my bookmarks for daily reading: news sites like Slashdot and blogs like ongoing. With an aggregator I read a lot more but it's a lot more efficient because I don't need to visit a site that hasn't been updated. I've greatly reduced the number of "information inboxes" I need to routinely check. That's a good thing and David Allen agrees with me. I even read Dilbert via a feed.
There are still a small number of sites that are part of my old web surfing routine. I wish these had RSS/Atom feeds. One such site is my bank. I make a point of logging into online banking regularly to check that nothing funny is going on. Just last week I had the first case of fraudulent use of my credit card. I noticed a bunch of purchases being made in North Hollywood (where I was six weeks ago). It's a real pain to cancel a card and get a new one. But I digress. My point is that I'd like to get a feed of my transactions rather than having to explicitly go to the bank's site. (I remember financial aggregators like OnMoney.com were big a few years ago. I wonder if they'd offer an RSS feed if they were around today).
Another feed I'd like to see: source control check-in logs. Someone must have done a CVS to RSS bridge.
Event Log Monitors with RSS has been done.
UPDATE (2004/04/17): Tim Bray actually gives the credit card transaction example in his eWeek interview
UPDATE (2004/04/23): Mel Riffe pointed me to Fisheye from the same people that make the excellent coverage tool Clover. Think of Fisheye as ViewCVS on steroids. Looks very cool - and it provides an RSS feed!
UPDATE (2004/04/23): Aaron Straup Cope pointed me to cvs2rss.
UPDATE (2004/04/23): Norm Walsh has written cvslog2atom.
by jtauber : Created on April 16, 2004 : Last modified Feb. 8, 2005 : (permalink)
Amazon and Google
Word has just gotten out in the blogosphere about Amazon.com's A9 search engine. This, and my recent rediscovery of Alexa has got me thinking a lot about the similarities and differences between Amazon.com and Google and also synergies both inter-company and intra-company.
Random Thoughts:
- The A9 toolbar has a "diary" feature for making notes about pages. Is this the beginning of Amazon.com's blogging story?
- Will GMail somehow be tied to Orkut at some stage?
- What if Amazon.com expanded Amazon Friends to be a full-blown social network?
- Why doesn't Amazon.com do a music version of IMDb?
Amazon and Google are two of my favourite companies. It will be fascinating to see what happens over the next few years.
UPDATE (2004/04/16): Found (via Scoble) an interesting Amazon what-if: Amazoning the News.
by jtauber : Created on April 15, 2004 : Last modified Feb. 8, 2005 : (permalink)
Digital Lifestyle Aggregation
If it isn't obvious already, I'm deeply interested in the convergence between email, IM, blogs, calendars, contact lists, music playlists, photo collections, etc. It was the original driver for Redfoot and, in fact, drove a lot of my passion for SGML (and then XML) in the mid-nineties.
I've just come across Marc Canter's use of the term "Digital Lifestyle Aggregation". I like it.
I still think RDF or something RDF-like is core: at least the concepts of URIs, out-of-band relationships and relationship types as first class objects.
by jtauber : Created on April 15, 2004 : Last modified Feb. 8, 2005 : (permalink)
IM in the Matrix
According to this Gamespot story, communication in the forthcoming MMORPG Matrix Online will use AIM. This means that people in the game will be able to communicate with people outside the game and vice-versa.
What makes this particularly appealing, I think, is that unlike other MMOGs, Matrix Online isn't 100% escapism. When you are in the Matrix, you are still the real you. You aren't playing a character completely separate from the real you. So it's entirely plausible within the game world that some friend outside could IM you and you could reply "I'm in the Matrix at the moment. Wanna join me or should I come out?"
Very cool concept.
by jtauber : Created on April 13, 2004 : Last modified Feb. 8, 2005 : (permalink)
Tree-based Instant Messaging
My sister Jenni and I have a lot of IM sessions with very rich structure: lots of tangents and a real need to maintain a stack so as not to miss anything. Even when I'm IMing with other people, there are frequently multiple threads going on at a time and it is sometimes difficult to follow which response goes with which thread.
For a while, Jenni and I have been talking about writing a tree-based instant messaging client - a real-time threaded discussion client.
This weekend, we were able to come up with a usable prototype using Python, wxPython and Jabber.
Stay tuned for more information as development continues.
UPDATE (2004/04/16): Michael Lawley has just told me about http://tickertape.org where "there's a whole family of IM clients supporting threaded discussion."
by jtauber : Created on April 11, 2004 : Last modified Feb. 8, 2005 : (permalink)
USENET and Blog Reading Strategies
I was a fairly active USENET reader in the early-to-mid nineties. For a while I used tin as my news reader but once the quantity of groups I was regularly reading reached a critical mass, I found the approach of nn more suitable. In the former, I would navigate to a particular newsgroup and if I saw any articles of interest, I'd navigate into each, one-by-one. In the latter, I'd scan the list of articles across all newsgroups and tag those that looked interesting and only then would start to read them.
Recently I've heard seasoned blog readers talking about their blog reading strategies in very similar terms to the way things were done with nn. I would say my current blog reading is more tin-like but I am starting to reach that point where I may have to switch to an nn-like reading strategy.
by jtauber : Created on April 10, 2004 : Last modified Feb. 8, 2005 : (permalink)
Bayesian Classification for Blog Reading Prioritization
Mouthful of a title, I know.
During my reading-USENET-via-nn days, I envisaged a news reader that would learn from what I selected and what I didn't select to read and would sort the articles according to how likely it thought I would want to read them.
I didn't know about Bayesian Classification at the time. Now that I do, it seems the perfect technique to use.
I wonder if a similar technique would be useful in prioritizing the reading of blog entries. Admittedly, the signal-to-noise ratio on the blogs I read is considerably higher than USENET but the quantity of blogs I now read makes it potentially useful.
by jtauber : Created on April 10, 2004 : Last modified March 28, 2005 : (permalink)
Wiki/Blog Hybrid
As much as possible I've tried to make this site a hybrid of blog and (privately-editable) wiki. In fact, blog entries are just wiki pages whose location in the URL space of my site means they get picked up by both my atom Atom feed generator and the "by day", "by month", "by year" and "all" blog entries listings. The site's current API (as RESTful as the lack of PUT support in browsers allows me to be) doesn't distinguish wiki page from blog entry.
As a blog entry is a wiki page in my homegrown system it raised questions in my mind about the extent to which a wiki page is blog-entry-like. This in turn gave me the idea of making an Atom feed consisting of an entry for each page on my site. This "site map" feed isn't a change log, it summarizes the actual site itself.
by jtauber : Created on April 9, 2004 : Last modified Feb. 8, 2005 : (permalink)
Channel 9 and Bill Hill
About a month ago Scoble blogged some notes from an interview with Bill Hill, the type guru at Microsoft responsible for, amongst other things, ClearType.
Yesterday, I checked out Microsoft's new Channel 9. I've found two great video interviews with Bill Hill so far. They are definitely worth listening to. He's like a geek Billy Connolly without the swearing.
by jtauber : Created on April 7, 2004 : Last modified Feb. 8, 2005 : (permalink)
Eclipse is the next Emacs
November last year, on the FoRK mailing list, I declared that "Eclipse is the new Emacs" and predicted that "by 2005, there will be people that never leave Eclipse to do their work."
Since then I've been toying with the idea of a full-blown PIM based on Eclipse: email, rss aggregation, calendar, todo and maybe even instant messaging. The nature of Eclipse is such that these not need all come from the same developers.
Yesterday, I discovered (via this presentation) that the Haystack (RDF-based PIM) project at MIT is moving to Eclipse.
Today I found a wiki-style note-taker (like VoodooPad, I guess) that runs on Eclipse.
I think my prediction is definitely shaping up to come true and it's quite possible I'll be one of those people.
UPDATE (2004/05/06): More evidence: Buying from Amazon within Eclipse
by jtauber : Created on April 5, 2004 : Last modified Feb. 8, 2005 : (permalink)
Amazon.com - Your Store
If "James's Store" is really my store - why can't I just claim everything in it?
by jtauber : Created on April 3, 2004 : Last modified Feb. 8, 2005 : (permalink)
libferris
libferris looks like a very nice project along the lines of Plan X.
by jtauber : Created on April 3, 2004 : Last modified Feb. 8, 2005 : (permalink)
Ant and Little Languages
James Duncan Davidson has a nice article on his choice to use XML for Ant scripts:
http://x180.net/Articles/Java/AntAndXML.html
His comment that "I never intended for the file format to become a scripting language" and "If I knew then what I knew now, I would have tried using a real scripting language" reinforced the argument I've made with friends and colleagues for years that you almost always end up needing a full-blown language in the end so you are much better off just starting with something like Python rather than inventing a domain-specific language.
I've seen a similar argument made (and myself made it) as a counter-counter argument against Tcl: Anti-Tcl says "Tcl isn't a full-blown language". Pro-Tcl says "Tcl isn't intended to be; it's for the little jobs that don't need a full-blown language". Anti-Tcl counters with "but what starts out as a little job almost always grows to a bigger one". I've used that argument against Perl too.
I'm wondering if the notion "you almost always end up needing a full-blown language in the end so you are much better off just starting with an existing full-blown language rather than using a little language or inventing a domain-specific one" has a name? Has someone claimed it as their Law yet?
by jtauber : Created on April 2, 2004 : Last modified Feb. 8, 2005 : (permalink)