Tangled in the Threads

Jon Udell, August 15, 2001

The power of the URL-line

RPC-style services don't replace the humble URL

There is often more value in getting XML out of a web service than in sending XML into a service. In such cases, the human-friendly URL-line offers distinct advantages.

For years I used the phrase "URL command line" to describe the browser's Location (Netscape) or Address (MSIE) window -- that is, the place where you type or paste URLs. My friend Rael Dornfest has more recently coined the more elegant term URL-line.

Like the traditional command line, the URL-line is available for use both by humans, who can type it or paste it or click on it, and by programs, which can invoke it. Unlike the traditional command line, the URL-line tends to be self-documenting -- a key point which I think is often overlooked. When you read the man page for a Unix command, the description is given in formal and abstract terms. You might find some examples at the end of the man page, but you might not. The URL-line's syntax, on the other hand, is quite often demonstrated as a consequence of normal interactive use of web pages. This documentation effect occurs because web pages can be, at the same time, documents and software. The blending of these two styles meant that first-generation web APIs were:

URL-line vs web services

When I wrote that article, the idea of web services as a formal discipline, based on something more rigorous than URL-lines and HTML pages, was staring to gain wide currency. One of the key tenets of the web services movement is that first-generation web APIs were fragile. My web mindshare script, for example, depended on HTML screen-scraping, and was easily thwarted by even trivial design changes applied to the HTML pages of the sites it scraped.

In another article written around the same time (fall 1999), I called attention to XML-RPC. Like its big brother SOAP, XML-RPC formalizes the notion of web APIs, by defining a rigorous way to structure both the inputs to and outputs from web services. The example used in that article was MailToTheFuture, a web service that Dave Winer continues to offer. You can use MailToTheFuture interactively, by way of HTML and CGI, or programatically using XML-RPC.

When you use a web form to ask MailToTheFuture to schedule delivery of an email message at some future time, here's the HTTP POST request that's used:

headers:

POST /addMessage HTTP/1.0
Host: www.mailtothefuture.com
Cookie: mailtothefuture=udell@monad.net%09xxxxxxxxxxx
Content-type: application/x-www-form-urlencoded
Content-length: 133

data

receiverMailAddress=udell%40monad.net
subject=nashua+meeting
messageBody=water+st
dateTime=8%2F16%2F2001%3B+12%3A00%3A00+AM

Here's that same request expressed in XML-RPC:

<?xml version="1.0"?>
<methodCall>
<methodName>mailToTheFuture.addMessage</methodName>
<params>
<param>
<value>udell@monad.net</value>
</param>
<param>
<value>xxxxxxxxxxx</value> 
</param>
<param>
<value>
<struct>
<member>
<name>dateTime</name>
<value>8/16/2001; 12:00:00 AM</value>
</member>
<member>
<name>messageBody</name>
<value>water+st</value>
</member>
<member>
<name>receiverMailAddress</name>
<value>yourname@yourmailhost.com</value>
</member>
<member>
<name>subject</name>
<value>nashua+meeting</value>
</member>
</struct>
</value>
</param>
</params>
</methodCall>

Clearly the XML-RPC flavor of the request is more complex. I've biased things slightly here, by not pretty-printing it, but you might well wonder what's gained by encoding a handful of name/value pairs in XML.

It's interesting, in retrospect, that I didn't ask myself that question at the time. It's also interesting that I didn't demonstrate the request in my article, but rather the response:

Consider the response that comes back from MailToTheFuture's XML-RPC interface when you send the wrong password:

<?xml version="1.0"?>
<methodResponse>
  <fault>
    <value>
      <struct>
        <member>
          <name>faultCode</name>
          <value>
            <int>4</int>
            </value>
          </member>
        <member>
          <name>faultString</name>
          <value>
            <string>The password is incorrect.</string>
            </value>
          </member>
        </struct>
      </value>
    </fault>
  </methodResponse>

Like the request, this response is well-formed -- and thus automatically parseable -- XML. Every XML-RPC service will use this same pattern. It's true that, for each application, you'll need to decide how to handle faultCode 4. But you won't need to guess that the output is an example of a methodResponse, or that its value is a fault object containing a struct made up of a faultCode and a faultString.

What I subliminally knew then, but have since become more aware of, is that while XML-RPC and SOAP are inherently symmetric, there is often more value in getting XML out of a service than in sending XML into a service. What's more, the URL-line remains, for lots of reasons, a really useful way to request services that may or may not emit XML, and if they do emit XML, may or may not emit SOAP or XML-RPC packets.

Here's just one example. A project I'm working on has a web-based reporting system. All of its features are exposed as URL-lines, and as you use the system you discover that these are constructed in a really powerful way. Here's an example:

/stats/report?REPORT=1&DATE_1=08/01/2001&DATE_2=08/31/2001&ACCOUNT_ID=93

This is a thinly-disguised SQL passthrough. A script that wants to gather data about accounts 1 through 92 for this date range, or about account 93 for another date range, can easily re-parameterize the SQL query that's embedded in this URL-line. And, while an HTML table can be a quite regular and parseable representation of SQL data, it lacks the nicely self-descriptive qualities of an XML representation. So, in this reporting system, every report pages comes with a Download XML button that invokes a parallel URL-line which invokes an XML results package:

/stats/reportXML?REPORT=1&DATE_1=08/01/2001&DATE_2=08/31/2001&ACCOUNT_ID=93

It is certainly possible to XML-ize these queries. But is it useful to do so? The input/output symmetry of XML-RPC and SOAP matters when you are passing around complex nested data structures. In a great many situations, though, people (and programs) prefer to issue simple requests that may yield complex results.

Here's another example that takes us full circle. Rael Dornfest's RSS viewer, Meerkat, offers a rich URL-line API that is also available by way of XML-RPC. Rael's article on Meerkat's XML-RPC API demonstrates a PHP XML-RPC request asking Meerkat for 3 days' worth of XML or Java stories from channel 724:

$f = new xmlrpcmsg("meerkat.getItems", 
  array(
    new xmlrpcval(
      array(
        "channel" => new xmlrpcval(724, "int"), 
        "search" => new xmlrpcval("/XML|[Jj]ava/", "string"),
        "time_period" => new xmlrpcval("3DAY", "string"),
        "ids" => new xmlrpcval(0, "int"),
        "descriptions" => new xmlrpcval(200, "int"),
        "categories" => new xmlrpcval(0, "int"),
        "channels" => new xmlrpcval(0, "int"),
        "dates" => new xmlrpcval(0, "int"),
        "num_items" => new xmlrpcval(5, "int"),
      ), 
      "struct"
    )
  )
);

Here's the same request as an URL-line:

http://www.oreillynet.com/meerkat/?_fl=xml&s=XML|[Jj]ava&c=724&t=3DAY&_de=200

If you subtract the _fl=xml from this URL-line, you'll produce a normal interactive HTML page. If you put _fl=xml back, you'll select Meerkat's XML flavor. Results then depend on your browser. In Netscape 4.x, you'll be prompted to download an XML file. In MSIE, you'll see the XML directly. This is exactly equivalent to my reporting example: asymmetric use of a compact URL-line to request a richly-structured response.

REST vs RPC

I have been aware of this asymmetry for some time but, as I've said, only subliminally. What prompted me to write this column was an entry in Tim O'Reilly's weblog, entitled REST vs RPC. He refers to discussion, on the decentralization and FoRK mailing lists, about how RPC (remote procedure call) technologies like XML-RPC and SOAP relate to REST (Representational State Transfer), which Roy Fielding and Richard Taylor, in their paper Principled Design of the Modern Web Architecture, say is "the unpublished rationale behind the modern Web's architectural design."

Here's some of what Fielding and Naylor say about REST:

A distributed hypermedia architect has only three fundamental options: 1) render the data where it is located and send a fixed-format image to the recipient; 2) encapsulate the data with a rendering engine and send both to the recipient; or, 3) send the raw data to the recipient along with metadata that describes the data type, so that the recipient can choose their own rendering engine.

REST provides a hybrid of all three options by focusing on a shared understanding of data types with metadata, but limiting the scope of what is revealed to a standardized interface. REST components communicate by transferring a representation of the data in a format matching one of an evolving set of standard data types, selected dynamically based on the capabilities or desires of the recipient and the nature of the data.

REST's architectural style, which the paper elaborates in detail, was chosen to satisfy a specific goal: the construction of a distributed hypermedia system -- "a shared information space," as Tim Berners-Lee has said, "through which people and machines could communicate."

What people have begun asking, lately, is whether the RPC-based architectural style at the heart of the web services movement involves too much overhead, and whether it's in fact the wrong approach.

I don't think that's the case. But I do think we should pay attention to the asymmetry I've noted here. HTML screen scraping is a really stupid way to consume web services. There's no question in my mind that most of the hard-wired HTML that represents the current population of web services would be better recast as XML, rendered as HTML either server-side or client-side when needed, or else delivered straight for downstream pipelining. Some of those services should also get the full RPC treatment, either because complex inputs as well as complex outputs are involved, or because communication is more of the machine-to-machine flavor than the machine-to-person or person-to-machine flavor. This isn't, though, an either/or dichotomy. As Meerkat proves, a rich URL-line API is fully compatible with an RPC-style API. The truth is that we need both, and ideally our development tools should make it trivial to support both.

You don't have to read a dissertation on web architecture to know this. You just need to send somebody a map URL that pinpoints the location of a meeting. Such fluid interaction among people and machines, when it works right, demonstrates what Berners-Lee meant the web to be.


Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He is the author of Practical Internet Groupware, from O'Reilly and Associates. Jon now works as an independent Web/Internet consultant. His recent BYTE.com columns are archived at http://www.byte.com/tangled/

Creative Commons License
This work is licensed under a Creative Commons License.