The present and future value of Python

Jon Udell
Vancouver Python Workshop, Summer 2004

To get ready for this talk, I took an inventory of the various ways in which I use Python on a regular basis. For each of these cases, I asked myself what Python is doing that's special, and why I think it's special. Here's the list of use cases, in no particular order.

Exhibit 1. Zope.

I don't know how many of you might have come to Python by way of Zope, but that's how I got here. The notion that you could write a full-fledged application server, with a really slick through-the-web management interface, and do it in a "scripting language" -- this was a novelty back when I started with Zope about five years ago. In fact, it's still kind of a novelty. What grabbed me at the start was the idea of discoverability. You could do useful things right out of the box with Zope, and then you could gradually peel the layers. First you'd learn about writing extensions in Python. Later, you'd drill down into the core, which was itself written in Python.

Although that's all still true today, it's not easy to explain how the special relationship between Python and Zope contributes to the -- pardon the expression -- "value proposition" of Zope. But there clearly something important about that relationship, and I'll come back to that point.

I started writing Zope applications in DTML, which is Zope's funky Document Template Markup Language, then I switched to Python when Evan Simpson's sandbox became available. Once you could put your Python scripts into the ZODB, instead of keeping them on the filesystem, I never looked back. Later some other stuff came along -- Zope Page Templates, the Content Management Framework -- but since I haven't done anything new in Zope for a while, I haven't learned these things. What I do have is an application, used by a small group of people, that's mostly Python scripts I wrote three or four years ago. Every once in a while, I dive in there to tweak them, and it's always easy to get up to speed. I can't say the same for some things I wrote years ago in other languages, and that's my best example of the readability benefit claimed for Python.

I know I'm preaching to the converted here when I talk about code readability, but if you ever find yourself preaching to the unconverted, here's an interesting data point. Somebody just sent me a copy of the new edition of Steve McConnell's book, Code Complete. The two longest chapters in the book -- together they add up to 90 pages -- are about styles of formatting and commenting code. You wouldn't think that how we arrange symbols on the page could matter so much, but clearly it does. And not having more than one way to do it turns out to be a fantastically useful constraint. I've seen the T-shirt that says "Life is better without braces," and I have to agree.

Exhibit 2. SpamBayes.

The advent of Bayesian filtering has been a life-changing experience for a lot of people, me included. I use it, when I'm running Windows, by way of Mark Hammond's excellent Outlook plugin. In addition to just using that plugin, I wound up hacking Mark's Python wrappers for Win32 and MAPI to do other stuff with my Outlook mail store. I've worked with a few different encapsulations of that hideous MAPI beast over the years, and Mark's struck me as being the cleanest. One explanation might be that Python just naturally lends itself to making things clean. Another might be that people who have an instict and a talent for making things clean just naturally tend to gravitate to Python. There's probably some truth in both of those explanations.

When I was fiddling around with the MAPI stuff in Python, I made extensive use of Python's interactive shell. I've always been inclined that way. The first real application programming I ever got paid for was in a variant of Lisp. It took me a while to get up to speed with that way of using Python, but now it's a habit. Perl users point out that you can use the Perl debugger in a similar way, and that's true, but clearly Perl doesn't encourage the interactive approach as strongly as Python does.

Exhibit 3. One-off scripts.

Perl used to be the first tool I'd reach for when I needed to whip together a one-off script. Nowadays, I reach first for Python, even for what I used to think of as Perlish tasks. For example, even though this is the Year of Web Services -- or maybe that was last year, I lose track -- I still wind up having to scrape and parse and reformulate Web pages a lot more often that I'd like. For doing that kind of job, I used to rely mainly on regular expressions. At some point Python's regex engine reached parity with Perl's, and that was a significant milestone.

Over time, though, the Web-page-scraping task has morphed into an interesting hybrid of regex and XML processing. Lately I get a lot of mileage out of converting Web pages to XHTML and then navigating them with XPath queries. I really like using libxml for this job, and since I do reach for Python first nowadays, I'm hooked on the Python bindings to libxml. But to be honest, I'm not sure there's any special Pythonic virtue here that you wouldn't find in Perl, or Ruby, or some other environment. Except, maybe, for Python's interactivity, which I've already mentioned.

Exhibit 4. Simple Web applications and services

Lately, when I deploy Web applications on my home server, I tend to write them as simple extensions to Python's BaseHTTPServer class. This isn't super-scalable, but most of my stuff doesn't need to be. I like being able to express a complete service in a page of code that I can easily publish, and that somebody else can easily use without having to deal with a zillion dependencies. Perhaps naively, I also think this is a reasonably secure approach. If an Apache or IIS or Zope security patch rolls out, I don't have to scramble to deal with it, or feel guilty for not scrambling to deal with it.

I'm really hooked on the ability to do things in a simple, portable, and self-contained way. One recent experiment of mine, for example, is an MP3 clipping service. You send it the URL of an MP3 file that's sitting on an HTTP 1.1-capable server, along with start and stop times, and it sends back the piece of the audio file that you asked for. My implementation Does The Simplest Thing That Could Possibly Work. And amazingly, it does work, somehow, thanks to the resilience of the MP3 format and the tolerance of applications that play MP3 files.

There are a lot of things that could be done to improve on my hack, but I'm not the guy to do them. I'm more of an idea person. It'll occur to me that it's possible to do something -- for example, to put up a service that lets people form URLs that quote segments from remote MP3 files -- and I'll do a simple implementation. If you want to work with that idea, you don't have to worry about putting up Tomcat, or ASP.NET, or Zope, or some other complicated thing that you might or might not have. Because Python includes everything you need in a self-contained and portable kit, it knocks down barriers that would otherwise get in the way of the free flow of ideas and implementations. This isn't a uniquely Pythonic virtue -- most so-called scripting languages share this quality of being self-contained and complete -- but it's been a really important thing to me.

Exhibit 5. Jython.

I don't do a whole lot of work with Java libraries, but when I do, I like to approach them from the Jython perspective. When you're exploring a new environment you need to write a lot of little test cases, and writing them in Java seems to take forever. Python saves a lot of typing and keeps things easier to read. The downside is the mental translation you have to do in order to use Java APIs from Jython, which is an issue I don't fully understand. I've also run into some Java libraries that Jython can't swallow, again for reasons I haven't figured out.

When Jython has worked well for me, though, I've really enjoyed having interactive access to Java objects. There's that point about interactivity again. It must be important.

Language versus environment

These five examples paint a picture of a programming environment that can be used for everything from a one-off script to a good-sized object-oriented system like Zope. I'm reluctant to introduce the word dynamic -- because it's gotten pretty badly overloaded -- but let's just say that the agile and interactive nature of Python makes it conducive to an exploratory style of development. The readability of the code makes it easier to work with, both during the initial exploration as well as later. And the standard libraries are complete enough to cover a wide range of use cases.

So, should we define Python's value as the set of things that Python does especially well (if not always uniquely), plus a community of users who happen to care about that particular set of things, plus the software that community has used Python to create? That's a reasonable first cut. It defines a niche inhabited by enthusiasts, like the people I see here and that I run into at other Python events. I happen to think, though, that the world would be a better place if the Pythonic virtues were not confined to a ghetto. I recently interviewed a CTO whose team does enterprise development using .NET. He's never touched Python, but some of his people do use it. Not to build their "real" stuff, mind you, but to automate the processes around that. He told me that he sees the productivity his people are getting out of Python, but that it's a secondary benefit for them, not a primary one.

The same thing appears to be true in the J2EE world. People write the "real" apps in Java, and then -- maybe -- they use Jython to automate building and testing. I've been looking at this issue for a long time, and I've come to the conclusion that that it's mainly a tribal thing. For example, here's a quote from Steve Vinoski, a middleware guy at IONA, from an article he wrote called "Middleware Dark Matter":

"The mass of the middleware universe is much greater than the systems -- such as message-oriented middleware (MOM), enterprise application integration (EAI), and application servers based on Corba or J2EE -- that we usually think of when we speak of middleware. We tend to forget or ignore the vast numbers of systems based on other approaches. We can't see them, and we don't talk about them, but they're out there solving real-world integration problems -- and profoundly influencing the middleware space. These systems are the dark matter of the middleware universe."

On my blog, I turned that around like this:

"The mass of the middleware universe is much greater than the systems -- based on Perl, Python, CGI, FTP, Unix shell, and Visual Basic -- that we usually think of when we speak of middleware. We tend to forget or ignore the vast numbers of systems based on other approaches such as message-oriented middleware (MOM), enterprise application integration (EAI), and application servers based on Corba or J2EE. We can't see them, and we don't talk about them, but they're out there solving real-world integration problems -- and profoundly influencing the middleware space. These systems are the dark matter of the middleware universe."

The hilarious thing is that both of these statements ring true for some audience. Dark matter is in the eye of the beholder. It's like a Necker cube, I can see it both ways depending on how I squint. What I'd like to do is find some way to fuse these images, not only in my brain but in everyone else's brain too. In order to make that happen, I suspect that some things that have been tightly bundled together in Python -- the language, the virtual machine, and the libraries -- may wind up getting taken apart and put back together in different configurations.

Think about how modules work. The level of effort that goes into creating an XML wrapper, and an LDAP wrapper, and a database wrapper, and every other kind of wrapper -- one of each for Perl, and Python, and Ruby, and every other language -- has always bothered me. If that's the itch that you like to scratch, I shouldn't complain. Lord knows I'm grateful to those of you who have scratched those itches in ways that have helped me. But I wish more of your intellectual labor was being spent on new services, and less cranking out bindings to existing ones.

Meanwhile, the services that have traditionally needed to be wrapped in this way aren't standing still. They're migrating out of C libraries and into class libraries, or frameworks, built with managed code. These managed frameworks come in two primary flavors: Java and .NET. It used to be easier to justify the decision to ignore these class libraries. A lot of important stuff was just plain missing. Java, for example, didn't acquire its regular expression package until, I think, JDK 1.4, which is still the current version of the JDK. But times change. The Java libraries include lots of good stuff now, and they're moving forward. The same is true for the .NET Framework, and for Mono's version of it.

People always say that CPAN accounts for much of the value of Perl. The same can be said of Python's collection of modules. There are two interesting cases to consider here. First, the case where the module wraps an operating-system component, or some other low-level thing that's written in C or C++, for example libxml. These bindings bridge the managed world of the dynamic language to the unmanaged world of the low-level component. This is so valuable that we'll happily accept even an imperfect binding. With libxml's Python binding, for example, I still have to explicitly release XML objects that I've allocated. It's very un-Pythonic, but I overlook that because being able to walk around in XML structures from a Python perspective is so useful.

What happens, though, when the equivalent of libxml is a managed OS service, as is true for the XML APIs in .NET, for example? Today, you can use those APIs from C# or VB.NET or another .NET language. It's not as flexible as using libxml from Python, but it's a lot more flexible than using libxml from C.

Of course I want the best of both worlds: the flexibility of Python, and the seamlessness that comes from binding to components that are themselves managed. So here's an interesting question. In a world of .NET or Mono or Java APIs, how much extra value comes from accessing those APIs by way of Python, rather than by way of Java or C#? For some people, not much. For me, a lot. But either way, Python's value proposition is diluted somewhat, compared to the value it had in a world where you only had unmanaged components.

Then there's the second case, where a Python module is built purely in Python. There are tons of these, and they're incredibly valuable. Would the Java and .NET managed frameworks be more complete if parts of them could be written in Python? Would they evolve more rapidly? Would they hang together as well? I'd like to think so, and you probably would too. Here's where we circle back to the Zope example. On the one hand, Zope is a black-box component that provides a bunch of useful services. It provides them to any consumer by way of its Web APIs, and it provides them to Python scripts by way of its Python APIs. On the other hand, the black box that we call Zope is written, internally, in Python. What's the significance of that? I can't imagine Zope not having been written in Python. I just don't think you'd have gotten the same result if the same small band of developers had been writing C, or Java, or even C#, for the same period of time. If somebody who doesn't know Python and Zope challenged me to prove that assertion, I don't know how I would, it's just my belief.

Of course the world that Python and Zope live in keeps changing. Five years ago I gave the keynote talk at the Zope track of the Eighth International Python Conference in DC. At the time, I was suggesting that ZODB, the Zope Object Database, should become some kind of object/relational/XML hybrid. Since then, the leading relational engines have moved strongly in that direction -- but Zope hasn't. When the Chandler project, which originally intended to use ZODB as its persistence engine, wound up instead using Berkeley DB XML, I wasn't too surprised. I've been using Berkeley DB XML myself, for a series of XML microcontent experiments I've been doing over the last year or so. For these kinds of things, a language-neutral XML database turned out to be more useful than a Python-oriented object database.

The endgame here is a hybrid data engine with object, relational, and XML surfaces. Could you build such a thing in Python? I don't see why not. If you can build a scalable high-performance object database like ZODB in Python, I'll bet you can build the kind of hybrid I'm talking about. Of course, there's not an infinite supply of Jim Fultons. And a lot of companies are chasing the universal database holy grail. Oracle and IBM have gotten pretty far down that road already. At the other end of the commercial spectrum, OpenLink Software's Virtuoso has been delivering the goods for a couple of years now. In the open source world, I'm not sure where things stand. Postgres and ZODB and MySQL and Berkeley DB XML are all pieces of the puzzle, but I don't see any plan for fitting them together.

And then there's WinFS, Microsoft's reincarnation of the Cairo object filesystem from a decade ago. My guess is that whenever Longhorn finally does ship, WinFS will do what Microsoft's stuff always eventually does: deliver the 80/20 solution in a package that lands on a whole lot of machines. I'm not sure what the Java equivalent of WinFS is going to be, but I'll bet there will be one, and I'll bet it'll run on cellphones long before WinFS runs on PocketPCs.

The universal database is just one example of the kind of next-generation platform service that will be used primarily through managed interfaces. As operating systems consolidate around managed interfaces -- to data, to middleware, to graphics -- they're going to tend to prefer the Java and .NET and Mono VMs over the Perl, Python, or PHP VMs. But the agility of the dynamic languages, and the collaborative energy of their open-source communities, will matter more than ever. Injecting these qualities into the mainstream VMs is something I've always thought was crucial.

Now as many of you probably heard, Jim Hugunin made two dramatic announcements on Wednesday at the O'Reilly Open Source Conference. Jim's the guy who created Jython, which is Python for the JVM. His first announcement was that IronPython, which is Python for the .NET Common Language Runtime and for Mono, has been released. The second announcement was that Jim starts his new job at Microsoft on Monday, where he'll work on IronPython and help make the CLR friendlier to dynamic languages. I think this is a huge deal. Managed code isn't a panacea, but it's the dominant way of making programming easier and safer. Last month I wrote a blog item with the title: "It's not the J in Java Virtual Machine that matters, it's the VM." For the same reasons there aren't a dozen CPU architectures that matter, I don't think there will be a dozen mainstream VMs. There will be the JVM, there will be the CLR, and -- let's all pray -- there will be a viable non-Windows alternative to the CLR in the form of Mono. And then, maybe, there will be Parrot, one runtime to bind all the open source dynamic languages.

I don't mean to suggest that integration with the mainstream VMs is a survival issue. Python's doing just fine all by itself. BitTorrent, for example, is touching millions of lives. Users of the SpamBayes Outlook plugin have no idea they're running Python. When I was poking around in the Gmail help system the other day, a Python stack trace came spewing out. If Chandler succeeds, it'll be the first major user-facing GUI application written in Python, or indeed in any open source dynamic language, and that's something I've been wanting to see for a very long time.

What I do want to suggest is that, if we can get really good implementations of Python running on the mainstream VMs, Python will be in a position to touch many more millions of lives -- and, what's equally interesting to me -- to influence the evolution of the managed frameworks running on top of those VMs. There hasn't been anybody inside Microsoft who cares about this, but on Monday that'll change. There hasn't been anybody inside Sun who cares about this either, and I don't know when or how that might change. Still, it isn't ultimately up to Sun or Microsoft to make this happen. What they can do, and should do, is lay the foundation. It's up to somebody in the Python community -- maybe somebody in this room -- to build on top of that. So if you're looking for a project that can really make a difference, you might want to consider Jython or IronPython. Any takers?