Gathering and exchanging email threads

Twice in the past few weeks, once for business and once for a personal matter, I've had to collect and transmit a set of related email threads. In both cases, the Gmail query that produced these collections searched tags (what Gmail calls labels) as well as Subject: or From: fields. For example, one of the queries looked like this in Gmail's user interface:

in:school teacher1 teacher2
where in:school refers to the virtual folder created by assigning the school tag, and teacher1 and teacher2 are the names of teachers.

Gmail is terrifically handy for this kind of thing. I've gone back and reviewed three popular mail clients -- Outlook 11, OS X Mail, and Thunderbird -- and it's not at all clear how to achieve the same result using those programs. It would be interesting to collect and compare recipes for doing this, actually. If you have them, and post them with a link back to here, that'll happen automatically.

Gmail's search result wasn't the final solution though. There's no expand function to review a collection that spans multiple threads, as I needed to do. And that being the case, there's also obviously no way to share a reviewable instance of the multi-thread collection with other people, as I also needed to do.

I started opening up messages, and copying and pasting, and after a short while slapped myself upside the head. It's 2006 and we're still playing this game? Insane. The result would only be the kind of document we all dread: a sequential dump of messages whose contextual relationships must be inferred by the reader.

Well, I could at least automate the extraction. Here's the elegantly simple script I used to do that:

import libgmail, getpass, time
  
query = 'in:school teacher1 teacher2'
  
pwd = getpass.getpass("pwd: ")
 
ga = libgmail.GmailAccount('jonudell', pwd)
ga.login()
  
threads = ga.getMessagesByQuery(query)
mbox = []
 
for thread in threads:
  for msg in thread: 
    print msg.id, msg.number, msg.subject    
    source = msg.source.replace("\r","").lstrip()
    mbox.append("From - Thu Jan 22 22:03:29 1998\n")
    mbox.append(source)
    mbox.append("\n\n")
    time.sleep(1)
 
open('mbox','w').writelines(mbox)

This is possible thanks to libgmail, a third-party Gmail API that's so nicely done I consider it a work of art. In using it, of course, I'm violating Gmail's terms of service and risking a lockdown in sector 4. But infrequent queries that yield small result sets can evidently fly under the radar.

The output of this script is, however, merely the aforementioned sequential dump, in mbox format. How can you recover, and more importantly share, the interactive experience that email clients provide in their threaded views?

One solution is to simply import the mbox file. Except it's not so simple. Although OS X Mail and Thunderbird can do it, I don't think Outlook can, at least not directly, though I'd like to be proven wrong. But in any event, receiving an mbox file, along with instructions for importing it into a variety of email clients, is a solution only geeks will love.

A slightly less user-hostile approach is to produce an HTML archive. Many years ago I first used mhonarc for this purpose. I just checked; it's still an active project; the current version works just as I remember.

The problem with this approach, though, is that mhonarc writes a collection of HTML files -- one per message, plus index pages. That's fine for web archives, but how are normal folks expected to package up these collections and pass them around? Zipping and unzipping are, again, solutions that will appeal mainly to geeks. Normal folks would like to be able to use a single compound document. And actually, so would I.

So here's a LazyWeb request. One of the subtler aspects of the AJAX revolution is the newfound ability to do amazing things with standalone HTML files. I've mentioned the concept of SPADE -- that is, Single-Page Application And Development Environment. And if you haven't seen TiddlyWiki you should take a look. It's a self-contained Wiki. All the code and all the data live in a single transportable HTML file.

We've long suffered from the lack of a standard web-friendly compound document format. Now that AJAX has finally gone mainstream, we can revisit this problem and solve it using techniques that we formerly had to rule out. Single-page applications like TiddlyWiki, or my infoworld explorer, are possible now.

My LazyWeb request, therefore, is for a single-page mbox viewer that works cross-browser, with all the expected features: index views, forward and backward chaining, outliner-style expand/collapse, search, and a print-friendly view. It'd be handy, don't you think?


PS: I think Gmail should hire the libgmail team, make libgmail an officially supported API, and tie it in with all the extraordinary Greasemonkey work that's been done for Gmail. The combination of these things is as potent in the email domain as the mapping stuff is in its realm.

Meanwhile, I'd really like to know how to export my complete mail archive without using libgmail in an illegal manner. Perhaps Gmail's product manager Keith Coleman, whose first and only blog entry is here, can post a second entry when he finds the answer.


Former URL: http://weblog.infoworld.com/udell/2006/02/07.html#a1383