Resource Cataloging and
Distribution Service (RCDS)

Keith Moore
Shirley Browne
Stan Green
Reed Wade


Netlib Development Group
University of Tennessee


January 16, 1996

Abstract

We describe an architecture for cataloging the characteristics of Internet-accessible resources, for replicating such resources to improve their accessibility, and for cataloging the current locations of the resources so replicated. Message digests and public-key authentication are used to ensure the integrity of the files provided to users. The service is designed to provide increased functionality with only minimal changes to either a client or a server. Resources can be named either by URNs or by existing URLs, and the service is designed to facilitate long-term resolution of resource names.

1. Introduction

Almost any user of the World Wide Web will be familiar with the following problems:

We therefore propose an architecture for a system which attempts to address these problems.

1.1 Design Goals

The goals of our system include: These goals have certain implications for our design:

1.2 Issues

The following issues must be considered: The assumed significance of transition issues on the success of the project influenced our design in the following ways: we allow ordinary URLs as one kind of resource name, we use existing file servers and file access protocols, and we employ DNS as a component of the system rather than building a new distributed database from the ground up. The need for reliable authentication and integrity assurances, coupled with the difficulty of providing secure servers, influenced us to use end-to-end (between information provider and user) authentication, consisting of public-key signatures and cryptographically signed certificates, rather than depending on the security of resource catalog servers or file servers (though reasonable security for these is still required to thwart denial-of-service attacks). Finally, some of the inherent limitations of DNS and the desire to separate administration of ``naming authority'' names from administration of resource names for a particular naming authority, led us to use DNS only as a means to identify one or more resource catalog servers for a particular resource naming authority, rather than to provide actual location or catalog information directly through DNS.

1.3 Non-Goals

The following were deliberately omitted from our design goals:

2. Description of RCDS

The Resource Cataloging and Distribution System (RCDS) consists of the following components:

2.1 Resource names

RCDS uses three kinds of resource names: URLs, URNs, and LIFNs. Web users will already be familiar with the syntax of URLs and how they are used. For those who are also familiar with URNs, RCDS assumes a specific format for URNs which is described below.

2.1.1 URNs and LIFNs

In RCDS, URNs are used to provide stable names for resources whose characteristics may vary over time. By contrast, a LIFN is used to name a specific instance of a resource, all copies of which must be identical. A URN is associated with a description of the resource it names, while a LIFN is associated with with one or more locations of identical copies of that resource.

The description associated with a URN will normally contain one or more LIFNs, which describe particular instances of that resource and the differences between them. For instance, if the resource named by a particular URN exists in several different data formats (e.g. plain text, PostScript, PDF, HTML), the description for that URN will list each of these, along with a LIFN for that specific instance. Similarly, if the resource associated with a URN has changed over time, and multiple versions of the resource are still accessible, the description of that resource might contain a list of the current and previous versions along with the LIFNs for each. Since the LIFN can then be used to find the current locations of a resource, it serves as a ``link'' or ``file handle'' from the description of a resource to the list of its current locations.

The distinction between URNs and LIFNs was crafted for several reasons:

2.1.2 Format of URNs and LIFNs

An RCDS URN consists of three parts, separated by the "/" character.
  1. A fixed prefix string, e.g. URN:/ or LIFN:/.
  2. A naming authority name, which is simply an an Internet domain name, (though the domain name used by a naming authority may be chosen to have certain useful characteristics).
  3. A suffix string, which is an identifier assigned by the naming authority.
So "URN://foo.bar/mumblefrotz" would be a URN that was assigned by the naming authority foo.bar. URNs, at least in the current RCDS prototype, are thus syntactically similar to URLs.

2.2 Publication and Distribition

Figure 1 illustrates how files are published in RCDS.
  1. An author submits a file to RCDS using a publication tool. If this is a new file, a new description (containing catalog information) of that file is created and a new URN is assigned; otherwise, the description of the old URN is updated to reflect the new version of the file. A LIFN is assigned to the new file, and this LIFN is included in the description of that file. The part of the description containing the LIFN and file fingerprint (and perhaps other parts of it) are cryptographically signed by the author using the publication tool.
  2. The publication tool deposits a copy of the file on a file server, and a copy of the description on a ``master'' resource catalog server. It also sends a copy of the description of the new file to interested parties, which might include file servers and search services.
  3. The ``master'' resource catalog server updates its slave servers with the new description.
  4. The ``master'' file server informs a location server that it has a copy of the file with that particular LIFN.
  5. As other file servers find out about the existance of the new file, their collections managers decide whether to acquire it. When a file server acquires the new file and makes it accessible, it informs a location server about it.
  6. The location servers propagate new file location information to one another.

2.3 Access and Retrieval

Figure 2 illustrates how files are accessed or retrieved in RCDS.
  1. A user acquires a URN of a resource that seems to suit his needs from a search service, hypertext link, or other means. This URN is resolved using DNS (see below) to find the network addresses of one or more resource catalog servers. One of those servers is selected by the client, perhaps based on network proximity estimates.
  2. The resource catalog server is queried for a descripton of the resource named by the URN. The description may contain multiple LIFNs, each describing a different version of the resource. The client selects a particular LIFN from those available.
  3. The client resolves the LIFN using DNS to find the network addresses of one or more location servers. One of those location servers is then queried for locations of the file named by that LIFN.
  4. The location server returns one or more URLs at which the file can be obtained.
  5. The client chooses one of those file servers (again, perhaps based on network proximity estimates) and fetches the file from that server.
The interaction with RCDS may be accomplished either directly by a client, or via a proxy server which communicates with the client via HTTP. This arrangement is shown in Figure 3.

3. Protocols

Because an understanding of some of the protocol details is important to understand how well RCDS acheives it goals, this section outlines important aspects of the protocols used by the current prototype.

3.1 URN/LIFN resolution

RC servers are registered for a particular naming authority by adding resource records to the DNS. A new record type of RCS is assumed. It has a format identical to an MX record, but instead of designating a mail exchanger host, it designates a host which operates a resource catalog server for that domain.

So the records:

foo.bar         RCS     10      server-1.foo.bar.
                RCS     20      server-2.foo.bar.
say that the resource catalog servers for the naming domain foo.bar can be found at server-1.foo.bar and server-2.foo.bar, respectively.

Given a URN or a LIFN, an RC server for that URN or LIFN may be found using DNS as follows:

  1. The naming authority name is extracted from the URN or LIFN.
  2. A DNS lookup is performed on the naming authority name with QTYPE=RCS. The query returns a list of the ``official'' servers for that domain. (If no DNS records were returned, no official servers are available.)
  3. The client chooses one of the available servers.
  4. The client then sends query or update requests to that server.
If the first server chosen fails to respond to the query, the client may choose another of the listed servers. Clients may also be configured to consult ``proxy'' RC servers (which perform queries on behalf of clients and cache results) as well as ``fallback'' (e.g., custodial) servers (which can be consulted when there are no ``official'' servers for a domain or when the ``official'' servers do not respond.)
moore@cs.utk.edu