James K. Tauber
1996-3-13 (Revision 4)
The SGML Open Technical Resolution 9401:1995 defines an entity catalog that can be used by SGML systems to map an external entity's public identifier to a system-dependent storage object identifer (SOI). Entity managers like that included in James Clark's SP parser suite demonstrate an effective use of this catalog format and, in the case of SP, the ability to use URLs as SOIs makes it possible to map public identifiers to entities stored anywhere on the World Wide Web. This valuable extension is being formalized in the EXCH Internet-Draft of the IETF MIMESGML Working Group along with related extensions to facilitate the exchange of SGML documents via the Web.
The SGML Open catalog, though, provides only a flat mapping. The situation is somewhat akin to hostname resolution pre-DNS where each host had to contain a (large) table resolving names to IP addresses. With DNS, the name space became hierarchical and the role of name to IP address resolution was delegated to different levels of the hierarchy.
In this note I propose a simple extension to the SGML Open catalog format that allows for "delegating" catalogs. By allowing URLs as SOIs as per the EXCH proposal, this extension should provide a powerful solution to the problem of public identifier resolution over the Internet. The heart of this proposal lies in the hierarchical structure introduced to the owner ID of a public identifier by ISO/IEC 9070.
I propose to add the single keyword DELEGATE (or something similar), taking two arguments---an owner identifier prefix and an SOI refering to another catalog. By "owner identifier prefix" I mean all or some initial part of an owner identifier.
formal public identifier = owner identifier, "//", text identifier
where owner identifer is minimum data prefixed by +//
for
registered owner identifiers and -//
for unregistered owner
identifiers.
The character sequence //
is what I shall call a separating
token.
An owner identifier prefix, then, is an initial part of an owner
identifier up to (and optionally) including a separating token. The
//
token separating the owner identifier from the text
identifier is considered an allowable part of the owner identifier prefix.
As an example, the owner identifier prefixes for "-//IETF//DTD HTML
2.0//EN"
are:
"-"
"-//"
"-//IETF"
"-//IETF//"
Each prefix x defines a set of possible FPIs that have x as an owner identifier prefix. I shall call this set the matching set of x.
Following on from ISO/IEC 9070, this proposal also allows for another
separating token, "::". So the owner identifier prefixes for
"-//IETF::HTML-WG//DTD HTML 2.0//EN"
are:
"-"
"-//"
"-//IETF"
"-//IETF::"
"-//IETF::HTML-WG"
"-//IETF::HTML-WG//"
"-"
and "-//"
, for
example. Others have subtle differences. "-//IETF//DTD RFC//EN"
is in the matching set of "-//IETF"
but not that of
"-//IETF::"
.
The definitions of separating token, owner identifier prefix and matching set hold true for public identifiers in the canonical form defined in ISO/IEC 9070. Hence this proposal also provides a solution to ISO/IEC 9070 public identifier resolution over the Internet, should this become important.
ISO/IEC 9070 defines the canonical form of a public identifier by the production:
public identifer = owner name, "//", object name
In the case of ISO/IEC 9070 public identifiers, the owner identifier prefix is considered an initial part of the owner name up to (and optionally) including a separating token.
As an example of how an entity manager would use the delegating catalogs proposed here, consider a catalog on a local system that contains the line
DELEGATE "-//IETF" "http://www.ietf.org/catalog.txt"
An entity manager coming across "-//IETF::HTML-WG//DTD HTML
2.0//EN"
could retrieve the catalog given in the URL SOI above.
This second catalog may have the entry
DELEGATE "-//IETF::HTML-WG" "http://www.ietf.org/html-wg/catalog.txt"
and the catalog mentioned in the line above may have the entry
PUBLIC "-//IETF::HTML-WG//DTD HTML 2.0//EN" "http://www.ietf.org/dtd/html-2.0.dtd"
This third catalog could be retrieved and the original public identifier resolved to a URL for, in this case, a DTD.
Although delegating catalogs could, at least in theory, work without it, effective public identifier resolution requires one or more central authorities to provide a root for the resolution hierarchy. This means that entity managers need only be told about the root server(s) initially. It also means that public text owners need only register in one place to make their text "visible" to everyone. Note that registration of this type still leaves the public text unregistered in ISO terms. This could be overcome by the root server(s) themselves registering with ISO.
Within a single catalog file:
DELEGATE
entry should lose against a PUBLIC
entry but win against other entry types.
DELEGATE
entries could be used to
resolve a public identifier, a more specific (ie longer)
DELEGATE
entry should win over a more general (ie shorter)
one.
DELEGATE
entry occuring earlier in a file should win
against an equivalent DELEGATE
entry occuring later
in the same file.
Where more than one catalog file makes up the catalog, an entry in an earlier file should always win against one in a later file, regardless of the rules above.