Don Box and Tennessee Williams

Jon Udell has joined the club of those wanting to expose remote XQuery over the Internet.

I have a feeling that Jon may not have read my security concerns over exposing raw XQuery (and XPath ) over a public access point

The reason I have this feeling is because it looks like Jon's engine has already melted down from too many //* queries (it's 11:36 PST and the site is effectively wedged).

When I was on the site earlier today, I did notice that Jon's engine was putting an upper-bound on the size of the result set. Unfortunately, it looks as if it is not putting an upper bound on the amount of compute resources a given query can consume.

When I tried a //*-style query earlier this afternoon, the HTTP infrastructure between my house and Jon's server wouldn't let a single HTTP request go that long without returning.

If it was my one query that sent Jon's server over the edge, I'm very sorry.

[Don Box's spoutlet: On the Kindness of Strangers]
Not to worry, Don. I'm aware of the concern, and part of this experiment is about exploring its implications. In fact, the queries that are timing out don't seem to be expensive at all. One possibility was my single-threaded use of Python's minimal BaseHTTPServer class.

So I switched from:

class myHTTPServer (BaseHTTPServer.HTTPServer):

To:

class myHTTPServer (SocketServer.ThreadingMixIn,
                      BaseHTTPServer.HTTPServer):

However, I think the problem may have been even more basic than that: failing to set Content-length when reporting that a query has exceeded the max result-set size. We'll see how it goes now.

As an aside, I've added a canned query that finds blog items written using InfoPath, based on its unique HTML coding signature :-)

The general question of how to constrain an engine's use of resources when exposed to arbitrary queries is, of course, extremely interesting.


Former URL: http://weblog.infoworld.com/udell/2004/01/12.html#a884