XML Elements versus Attributes


Ned Batchelder discusses the old question of elements versus attributes in XML. As I've been answering that question for over seven years in various places, I thought I'd put down my viewpoint here.

Firstly, there are distinctions based on performance or API usability. Those distinctions are so implementation-specific, I don't think they are very interesting; certainly not to someone doing schema design.

Secondly, there are distinctions based on a particular schema language. Different schema languages have different levels of expressiveness so it's important to distinguish the characteristics of elements and attributes inherent to XML from those that are true only because of the particular choice of schema language. One important take away here is that a schema is only part of the description of a markup language. In my experience there are always constraints placed on a language beyond what the schema (in any schema language) can say.

Thirdly, there are distinctions inherent to the XML syntax itself; things like the lack of attribute order or the inability to have further XML structure within an attribute value.

But when all those three are considered, there is still a fundamental "style" question around attributes and elements and here is where a lot of people really find themselves asking the elements versus attributes question.

My take on that is that the distinction is more meaningful the more markup-oriented your XML is and more fuzzy the more data-oriented your XML is.

If you are using XML to serialise objects, then the distinction is blurry and it largely comes down to convention and things like the third type of distinction above. In such cases, an element-only approach might make perfect sense, especially if you are using a schema language that can express characteristics that, in DTDs, attributes had over elements, like default values or insignificant ordering.

But if you are truly doing markup, in other words annotating text (particularly a pre-existing text) then the distinction between attributes and elements becomes much clearer and the reason why attributes exist in XML (and SGML) is far more obvious. The key is that attribute values are considered part of the markup, rather than part of the content. So the clearer the distinction is between markup and content, the clearer it will be between using attributes or child elements.

Imagine that you want to describe Max as a black cat. From a data structure representation point of view, there's no semantic distinction between:

<cat> <name>Max</name> <colour>Black</colour> </cat>

or

<cat name="Max" colour="Black"/>

and so decisions about whether to use elements or attributes tend to boil down to (a) whether order matters; (b) whether values can have internal structure; (c) compactness or whatever.

However, if you are doing document markup, things are a little different. In the document markup case, you have some existing text that you annotate. So you start with a word "Max" in your document and you want to mark that up with a generic identifier and any additional properties you want to give that word (or referrant). You might end up with something like:

<cat colour="Black">Max</cat>

Making colour a child element rather than an attribute wouldn't make sense from a document markup perspective. In document markup there is a much clearer distinction between content and markup. "Max" is content. "Black" is markup. If you made "colour" a child element with "Black" as content then "Black" would change from being markup to content. Makes no difference in data structure representation but it does in document markup.

From a data structure representation point of view, this attribute/element distinction is so blurred that it is entirely possible to do away with attributes in representations (and sometime less confusing to do so). This is even more the case where you have schema languages that allow expression of the fact that element order (in a particular context) is not significant.

But in pure document markup applications, where attributes are just indicating characteristic qualities of an element's content, they have a clearer role.