Introduction to E4X

I recently met with John Schneider, chief technologist at AgileDelta and lead editor of the ECMAScript for XML (E4X) specification. I've mentioned E4X before, in the context of Alchemy. John's demo convinced me to give E4X a try. I'll say more about why I think it's important in an upcoming column, but meanwhile here's a glimpse of how it works. Warning: geekery ahead!

The implementation I'm using is bound into the 1.6R1 version of Rhino, which is the Mozilla project's Java-based JavaScript interpreter. (The C-based version built into the browser is SpiderMonkey.) If you've never used Rhino before -- I hadn't -- there are a lot of similarities to Jython.

To get started with Rhino itself, I found this article by Mike Chambers to be very helpful. Once you've got the basic engine running, you'll need to get hold of xbean.jar (not xmlbeans.jar) from the Apache XMLBeans project, and add that to your Java CLASSPATH. Following Mike's example, I've set up an alias so that typing 'js' invokes the Rhino shell.

Here's a taste of what you can do, with input shown plain and output shown bold:

js> people = <people>
<person id="1">
<person id="2">
</people>                     // raw xml input ends here
<people>                      // parsed xml output begins here
<person id="1">
<person id="2">
js> typeof(people)            // it's a native xml datatype
js> people.person.length()    // which is a list
js> people.person[0]          // whose elements are indexable
<person id="1">
js> people.person[0].@id='47' // and mutable!
js> people.person[0]
<person id="47">
js> delete people.person.(@id=='47').@id
js> people.person.(name=='Moe')

Cool! Now for something a bit more realistic: my RSS feeds, as exported by the new Bloglines API. Here's a sample of the format, which extends OPML with some Bloglines-specific information:

<?xml version="1.0" encoding="utf-8"?>
<opml version="1.0">
<title>Bloglines Subscriptions</title>
<dateCreated>Wed, 29 Sep 2004 16:11:52 GMT</dateCreated>
<outline title="Subscriptions">
<outline title=".Text Blog" xmlUrl=""
  htmlUrl="" type="rss"
  BloglinesSubId="2849970"  BloglinesUnread="0" BloglinesIgnore="0" />
<outline title="ACM  Queue" htmlUrl="" type="rss"
  xmlUrl=""  BloglinesSubId="2849972"
  BloglinesUnread="0" BloglinesIgnore="0" />

After assigning that chunk of XML to the JavaScript variable opml, here are some things you can do:

1. Count unread items

js> for each (count in opml..outline.@BloglinesUnread)
      tot += Number(count);
The .. is like // in XPath, so this snippet finds every <outline> element with a BloglinesUnread attribute. To accumulate the values in a counter, you use a natural programming idiom.

2. Search using a regular expression

js> function matches(s, pat)
  {  return pat.test(s)  }
<outline title="Zope Dispatches" htmlUrl=""
   type="rss" xmlUrl=""
   BloglinesSubId="2850222" BloglinesUnread="0" BloglinesIgnore="0"/>
<outline title="" htmlUrl="" type="rss"
   xmlUrl="" BloglinesSubId="2850223"
   BloglinesUnread="0" BloglinesIgnore="0"/>
This snippet shows that you can use arbitrary JavaScript code to write test expressions. In this case, after writing the function matches() to wrap JavaScript's regex engine, I can use it to filter nodes whose title attribute begins with a cap Z.

Ever since I first heard Adam Bosworth talking about an XML programming language I've been intrigued by the idea. Until now, I've approximated such a thing using Python together with the XPath capabilities of libxml. E4X provides a more unified experience.

The tradeoffs? First, there's no XPath support, which was a "non-goal" according to the spec. Implementations can include XPath, and I'd eventually want mine to, but the idea is to provide an 80/20 solution that does what most people need in a way that most people can easily understand. On those terms, E4X succeeds.

Second, there's no cross-language capability, except to the extent that JavaScript is embeddable in other languages -- for example, Rhino within Java. But if this style of native XML programming catches on, as I'll bet it will, there's no reason why it shouldn't migrate into runtimes such as the JVM and CLR, and from there to the various languages they host.

Former URL: