I recently met with John Schneider, chief technologist at AgileDelta and lead editor of the ECMAScript for XML (E4X) specification. I've mentioned E4X before, in the context of Alchemy. John's demo convinced me to give E4X a try. I'll say more about why I think it's important in an upcoming column, but meanwhile here's a glimpse of how it works. Warning: geekery ahead!
To get started with Rhino itself, I found this article by Mike Chambers to be very helpful. Once you've got the basic engine running, you'll need to get hold of xbean.jar (not xmlbeans.jar) from the Apache XMLBeans project, and add that to your Java CLASSPATH. Following Mike's example, I've set up an alias so that typing 'js' invokes the Rhino shell.
Here's a taste of what you can do, with input shown plain and output shown bold:
js> people = <people> <person id="1"> <name>Moe</name> </person> <person id="2"> <name>Larry</name> </person> </people> // raw xml input ends here <people> // parsed xml output begins here <person id="1"> <name>Moe</name> </person> <person id="2"> <name>Larry</name> </person> </people> js> typeof(people) // it's a native xml datatype xml js> people.person.length() // which is a list 2 js> people.person // whose elements are indexable <person id="1"> <name>Moe</name> </person> js> people.person.@id='47' // and mutable! 47 js> people.person <person id="47"> <name>Moe</name> </person> js> delete people.person.(@id=='47').@id true js> people.person.(name=='Moe') <person> <name>Moe</name> </person>
Cool! Now for something a bit more realistic: my RSS feeds, as exported by the new Bloglines API. Here's a sample of the format, which extends OPML with some Bloglines-specific information:
<?xml version="1.0" encoding="utf-8"?> <opml version="1.0"> <head> <title>Bloglines Subscriptions</title> <dateCreated>Wed, 29 Sep 2004 16:11:52 GMT</dateCreated> <ownerName>email@example.com</ownerName> </head> <body> <outline title="Subscriptions"> <outline title=".Text Blog" xmlUrl="http://blog.ziffdavis.com/seltzer/rss.aspx" htmlUrl="http://blog.ziffdavis.com/seltzer/" type="rss" BloglinesSubId="2849970" BloglinesUnread="0" BloglinesIgnore="0" /> <outline title="ACM Queue" htmlUrl="http://www.acmqueue.com/" type="rss" xmlUrl="http://acmqueue.com/rss.rdf" BloglinesSubId="2849972" BloglinesUnread="0" BloglinesIgnore="0" /> </outline> </body> </opml>
1. Count unread items
js> for each (count in opml..outline.@BloglinesUnread) tot += Number(count); 120The .. is like // in XPath, so this snippet finds every <outline> element with a BloglinesUnread attribute. To accumulate the values in a counter, you use a natural programming idiom.
2. Search using a regular expression
Ever since I first heard Adam Bosworth talking about an XML programming language I've been intrigued by the idea. Until now, I've approximated such a thing using Python together with the XPath capabilities of libxml. E4X provides a more unified experience.
The tradeoffs? First, there's no XPath support, which was a "non-goal" according to the spec. Implementations can include XPath, and I'd eventually want mine to, but the idea is to provide an 80/20 solution that does what most people need in a way that most people can easily understand. On those terms, E4X succeeds.
Former URL: http://weblog.infoworld.com/udell/2004/09/29.html#a1085