A Proposal for Delegating SGML Open Catalogs

James K. Tauber
Publications Development Manager
The University of Western Australia
jtauber@jtauber.com

1996-3-13 (Revision 4)

Background

The SGML Open Technical Resolution 9401:1995 defines an entity catalog that can be used by SGML systems to map an external entity's public identifier to a system-dependent storage object identifer (SOI). Entity managers like that included in James Clark's SP parser suite demonstrate an effective use of this catalog format and, in the case of SP, the ability to use URLs as SOIs makes it possible to map public identifiers to entities stored anywhere on the World Wide Web. This valuable extension is being formalized in the EXCH Internet-Draft of the IETF MIMESGML Working Group along with related extensions to facilitate the exchange of SGML documents via the Web.

The SGML Open catalog, though, provides only a flat mapping. The situation is somewhat akin to hostname resolution pre-DNS where each host had to contain a (large) table resolving names to IP addresses. With DNS, the name space became hierarchical and the role of name to IP address resolution was delegated to different levels of the hierarchy.

A Proposal

In this note I propose a simple extension to the SGML Open catalog format that allows for "delegating" catalogs. By allowing URLs as SOIs as per the EXCH proposal, this extension should provide a powerful solution to the problem of public identifier resolution over the Internet. The heart of this proposal lies in the hierarchical structure introduced to the owner ID of a public identifier by ISO/IEC 9070.

I propose to add the single keyword DELEGATE (or something similar), taking two arguments---an owner identifier prefix and an SOI refering to another catalog. By "owner identifier prefix" I mean all or some initial part of an owner identifier.

Owner Identifier Prefix

ISO 8879

ISO 8879 defines a formal public identifier by the production:
formal public identifier =
   owner identifier, "//", text identifier

where owner identifer is minimum data prefixed by +// for registered owner identifiers and -// for unregistered owner identifiers.

The character sequence // is what I shall call a separating token. An owner identifier prefix, then, is an initial part of an owner identifier up to (and optionally) including a separating token. The // token separating the owner identifier from the text identifier is considered an allowable part of the owner identifier prefix.

As an example, the owner identifier prefixes for "-//IETF//DTD HTML 2.0//EN" are:

Each prefix x defines a set of possible FPIs that have x as an owner identifier prefix. I shall call this set the matching set of x.

Following on from ISO/IEC 9070, this proposal also allows for another separating token, "::". So the owner identifier prefixes for "-//IETF::HTML-WG//DTD HTML 2.0//EN" are:

Some of these may be deemed equivalent because their matching sets are congruent---"-" and "-//", for example. Others have subtle differences. "-//IETF//DTD RFC//EN" is in the matching set of "-//IETF" but not that of "-//IETF::".

ISO/IEC 9070

The definitions of separating token, owner identifier prefix and matching set hold true for public identifiers in the canonical form defined in ISO/IEC 9070. Hence this proposal also provides a solution to ISO/IEC 9070 public identifier resolution over the Internet, should this become important.

ISO/IEC 9070 defines the canonical form of a public identifier by the production:

public identifer =
   owner name, "//", object name

In the case of ISO/IEC 9070 public identifiers, the owner identifier prefix is considered an initial part of the owner name up to (and optionally) including a separating token.

A Resolution Example

As an example of how an entity manager would use the delegating catalogs proposed here, consider a catalog on a local system that contains the line

DELEGATE "-//IETF" "http://www.ietf.org/catalog.txt"

An entity manager coming across "-//IETF::HTML-WG//DTD HTML 2.0//EN" could retrieve the catalog given in the URL SOI above. This second catalog may have the entry

DELEGATE "-//IETF::HTML-WG" "http://www.ietf.org/html-wg/catalog.txt"

and the catalog mentioned in the line above may have the entry

PUBLIC "-//IETF::HTML-WG//DTD HTML 2.0//EN" "http://www.ietf.org/dtd/html-2.0.dtd" 

This third catalog could be retrieved and the original public identifier resolved to a URL for, in this case, a DTD.

Suggested Infrastructure

Although delegating catalogs could, at least in theory, work without it, effective public identifier resolution requires one or more central authorities to provide a root for the resolution hierarchy. This means that entity managers need only be told about the root server(s) initially. It also means that public text owners need only register in one place to make their text "visible" to everyone. Note that registration of this type still leaves the public text unregistered in ISO terms. This could be overcome by the root server(s) themselves registering with ISO.

Conflict between Entries in Catalogs

Within a single catalog file:

Where more than one catalog file makes up the catalog, an entry in an earlier file should always win against one in a later file, regardless of the rules above.