James Tauber's Blog 2004/12/21


blog > 2004 > 12 >


Alexa Does DataLibre Right (Almost)

I was fiddling around with Amazon.com's Alexa and discovered they provide a very DataLibre-style way of updating one's site information:

To update your contact info, you may place an info.txt file containing your contact info in the root of your site for Alexa to fetch.

Right-click this link: info.txt. And save it to your computer. Copy the info.txt file from your computer to the root of your site. Verify that the info.txt file is there with your browser. (Go to http://www.jtauber.com/info.txt.) Once you have verified that the file is there, tell us to fetch it by clicking this link: Go Fetch

Well done Amazon! Now if Bloglines did it with OPML, LinkedIn with FOAF, Freshmeat with DOAP, etc...

UPDATE (2004-12-22): Gary Fleming thinks info.txt is a bad idea. I agree with him. While I still like the DataLibre aspect of what Alexa does, Gary's entry persuaded me that requiring a fixed path "/info.txt" is the wrong way to do it. I should have been able to give Alexa my own URI. DataLibre means owning your own URI space too. Thanks Gary for making me realise that!

by : Created on Dec. 21, 2004 : Last modified Feb. 8, 2005 : (permalink)


Film Project Update: Ten More Festivals

Just submitted Alibi Phone Network to ten more festivals: Phoenix FF, Palm Beach IFF, Newport Beach FF, Atlanta FF, Beverly Hills FF, San Fernando Valley IFF, Independent FF of Boston, Malibu IFF, Seattle IFF and IFP/Los Angeles FF.

by : Created on Dec. 21, 2004 : Last modified Feb. 8, 2005 : (permalink)


XML Elements versus Attributes

Ned Batchelder discusses the old question of elements versus attributes in XML. As I've been answering that question for over seven years in various places, I thought I'd put down my viewpoint here.

Firstly, there are distinctions based on performance or API usability. Those distinctions are so implementation-specific, I don't think they are very interesting; certainly not to someone doing schema design.

Secondly, there are distinctions based on a particular schema language. Different schema languages have different levels of expressiveness so it's important to distinguish the characteristics of elements and attributes inherent to XML from those that are true only because of the particular choice of schema language. One important take away here is that a schema is only part of the description of a markup language. In my experience there are always constraints placed on a language beyond what the schema (in any schema language) can say.

Thirdly, there are distinctions inherent to the XML syntax itself; things like the lack of attribute order or the inability to have further XML structure within an attribute value.

But when all those three are considered, there is still a fundamental "style" question around attributes and elements and here is where a lot of people really find themselves asking the elements versus attributes question.

My take on that is that the distinction is more meaningful the more markup-oriented your XML is and more fuzzy the more data-oriented your XML is.

If you are using XML to serialise objects, then the distinction is blurry and it largely comes down to convention and things like the third type of distinction above. In such cases, an element-only approach might make perfect sense, especially if you are using a schema language that can express characteristics that, in DTDs, attributes had over elements, like default values or insignificant ordering.

But if you are truly doing markup, in other words annotating text (particularly a pre-existing text) then the distinction between attributes and elements becomes much clearer and the reason why attributes exist in XML (and SGML) is far more obvious. The key is that attribute values are considered part of the markup, rather than part of the content. So the clearer the distinction is between markup and content, the clearer it will be between using attributes or child elements.

Imagine that you want to describe Max as a black cat. From a data structure representation point of view, there's no semantic distinction between:

<cat> <name>Max</name> <colour>Black</colour> </cat>

or

<cat name="Max" colour="Black"/>

and so decisions about whether to use elements or attributes tend to boil down to (a) whether order matters; (b) whether values can have internal structure; (c) compactness or whatever.

However, if you are doing document markup, things are a little different. In the document markup case, you have some existing text that you annotate. So you start with a word "Max" in your document and you want to mark that up with a generic identifier and any additional properties you want to give that word (or referrant). You might end up with something like:

<cat colour="Black">Max</cat>

Making colour a child element rather than an attribute wouldn't make sense from a document markup perspective. In document markup there is a much clearer distinction between content and markup. "Max" is content. "Black" is markup. If you made "colour" a child element with "Black" as content then "Black" would change from being markup to content. Makes no difference in data structure representation but it does in document markup.

From a data structure representation point of view, this attribute/element distinction is so blurred that it is entirely possible to do away with attributes in representations (and sometime less confusing to do so). This is even more the case where you have schema languages that allow expression of the fact that element order (in a particular context) is not significant.

But in pure document markup applications, where attributes are just indicating characteristic qualities of an element's content, they have a clearer role.

by : Created on Dec. 21, 2004 : Last modified Feb. 8, 2005 : (permalink)