Tangled in the ThreadsJon Udell, September 27, 2000
Whatever became of computer telephony?
CTI (computer/telephone integration was the Next Big Thing in 1994. It still is.We tend to focus on merging voice and data traffic into the same pipe. That'll eventually happen, but meanwhile can't our PCs help us use the existing telephone infrastructure more effectively?
Once upon a time (1994) I wrote a cover story for BYTE on the subject of computer telephony. Last week, I was reminded of that story when I found myself involved in several of those annoying conference call fiascos:
"Hang on, I'll try to patch in so-and-so."
"If I lose you, I'll call back."
The number of minutes of productivity lost every day to this sort of nonsense is startling, and the situation seems to have improved not at all in recent years. I still don't see voice-over-IP as a great near-term solution. Internet audio is, for many people, a disappointment. And in any case, why not leverage the reliable low-latency 64kbps voice circuits that we already have?
I still see CTI (computer-telephone integration) as a killer app waiting to happen. In that 1994 article, the focus wasn't on merging voice and data traffic into the same pipe. Rather, it was on ways that our PCs could help us use the existing telephone infrastructure more effectively.
You don't have to think very hard to come up with a whole series of interesting opportunities:
Universal inbox. Remember what this was supposed to be like? My computer keeps track of all my messaging channels: telephone, multiple email accounts, maybe instant messaging too. A message from so-and-so, regardless of which channel it used to reach me, ends up in my folder of so-and-so's messages.
Call setup. I ought to be schedule a conference call without the usual round of preparatory pairwise email and voice messages. A scheduling service such as TimeDance (sadly now gone from the scene, see my recent report on Internet groupware for a discussion of why TimeDance was so cool) could help reach agreement on the right list of attendees, and the time. Then, screen-based call control ought to automate the mechanics of getting everybody into the call.
Presence management. As I discussed last week, presence management -- based on the ability of instant-messaging applications to identify that users really are present on a given channel, and a given device -- looks like not just a killer app, but a killer platform. In the conference-call-scheduling example, this can be a way to know who's really available right now to join the call, and to communicate with that person in realtime. Such communication ought to automatically try various channels. Can we simply call the person? Great. If the phone's busy, but call waiting allows us to interrupt, that's fine too. Failing that, deliver the instant message "conference call starting now." Failing that, say the same thing in email.
Screen pops. In a high-volume call center, a "screen pop" means that when I call L.L. Bean, my account info is already on the agent's screen when he or she picks up the phone. There's no time lost establishing a context for the call. Well, wouldn't this be useful for all of us? If I receive a call from you, and can use caller ID to identify you, I'd love to see a screen pop with general information I've recorded about you, and a recent history of the messages that were exchanged using all the channels through which we communicate.
Speech synthesis. Kevin Lenzo says he likes to pipe IRC channels into Festival, save the synthesized speech it produces, then listen to it while driving. In the CTI realm, I'd like to be able to do things like dial into my universal inbox, scan headers, play selected messages, and respond. One message might be voice, another text. I'd still like to hear them both. Granted, it's asking too much of current speech-recognition technology to turn a voice reply into a text reply. But just being able to make a voice reply to a text message using the phone would be incredibly handy. The Remark! product which I mentioned way back in that 1994 story still exists. I'm sure there are others. (Vendors of such products are welcome to pop into the CTI thread in my newsgroup.) But my point is that by now, these kinds of products ought not to be specialty items. This is stuff we should all be taking for granted.
Forget the phone, use IM?
In follow-on discussion, Mark Wilcox suggests that maybe the phone just matters less than it used to:
Instead of doing voice conferences, use instant messsaging. (Full disclosure, I'm working on a consulting project for Jabber.com who sponsors the open-source Jabber IM project).
I'll admit that before July, when I started working on this project, I thought IM was kind of cheesy, but I've been using it regularly to do the development work for the project instead of just email.
The cool thing about IM is that you can do things like regular email (send a note to a single person) including attachments. But, you can know if they are online or not, via a presence protocol. On Jabber, you can even send a person who's offline a message and they'll get it once they are online.
Then there is groupchat, which is more like your traditional IRC environment. It does suffer from confusion with 3 or 4 people involved, but I don't think it's worse than 3 or more people on a conference call.
The only real-downside at the moment is that you don't have a guaranteed standard way to 'persist' the conversation, though many clients support a feature like that. But if you really need this, you can always revert back to persistent messaging, or what we've more traditionally called "e-mail."
At one time I too underestimated the significance of IM. I've been corrected by various people who've shown me the business relevance of the medium. But people talk much faster than they type. Equally important, they convey a great deal in tone and inflection. Voice is a rich medium, and it is carried most effectively through the existing and well-established circuit-switched network. Integrating that network with the packet data network need not necessarily wait for voice-over-IP, and should not. As far as ditching voice conferences for IM, I just can't see it. As a general rule, I think the name of the game is not to converge on a single best channel of communication, because they all have strengths and weaknesses. Rather, we want to find the best ways to coordinate the use of multiple channels.
The point about persistent chat is, by the way, interesting. Notes James Power:
Initially I thought it would be great to have a written record of the conversation. In fact, chat sessions (those I've looked at afterwards) are too disjointed and full of abbreviations to do anything with. Just more junk to file and ignore, or worse, spend time exploring only to come up with nothing.
I wouldn't expect to usefully spend much time reading chat transcripts. Searching them, though, that's something else again. Chat is not normally used in a mode where everything is indexed and searchable, but it can be, and that can be useful.
Something else that might be useful, someone else pointed out, would be a mechanism for threading and marking threads.
Like NNTP, except real-time, and with the ability for me to mark a thread as "interesting." Just a quick mouse-click or keypress, nothing more. I often have trouble finding that useful remark that's buried among three hundred lines of crud. Maybe I can't even remember what the idea was -- I just remember that Joe said something that I thought was interesting at the time.
Added James Ramirez:
What about leveraging the kind of solution that occurs in MUDs? Instead of explicitly tagging conversations/comments, individuals adopt a different method of communication. They use 'page' rather than 'say', or use a different channel ('admin' vs 'public') according to need. These methods tend to make your communication visibly distinct.
This raises the general issue of metadata-tagging our messages. I'd like to be able to incorporate the "speech act" model of communication into our messaging. In other words, I'd like (in any given messaging channel: mail, chat, phone) to be able to be explicit about the nature and purpose of the message. Is it:
a Request for information?
a Response to a Request?
an Assertion of fact/opinion?
a Response to an Assertion ( subtypes: Agree/Disagree/ModifyPremise)
a Promise to perform a Task on a Date?
There is a kind of deep structure underlying many communication acts. I have long believed that software should ultimately guide us, as we exchange messages, in making such deep structure explicit as metadata -- and thereby available for processing that can help us streamline and organize our communication.
Things like Subject/Author/Date (in email), Rate This Message 1-5 (in forums), and Message from Joe at 2PM (in voicemail) barely scratch the surface, with respect to what message metadata could and perhaps should be.
Voice recognition's hybrid potential
Let's switch back to the voice medium. What wouldn't I give for a searchable archive of my phone conversations? Here's another killer app just waiting in the wings. Some years ago, I saw a really interesting hybrid app. It scanned and OCR'd a bunch of resumes into a searchable database. Even though there was no correction on the OCR, it turned out that for searching, it worked fine. If I'm looking for key terms like "Java" or "SSL" in a stack of resumes, I can easily find the image of the document containing those recognized terms. The OCR'd text may not be all that readable, but the corresponding document image certainly is. The trick is to find it. The OCR'd text can be a great way to locate a document image.
The fact that fulltext search is an effective locator of documents, even when the text is in a degraded state, was a revelation. Now consider voice. Suppose all my phone conversations spool to disk (only, of course, with the permission of my interlocutors). Suppose those voice conversations are automatically recognized as (imperfect) text. Now I search. The recognized text may be just fine to help me locate, and randomly access, a piece of recorded audio.
There are lots more good ideas along these lines, I'm sure. Clearly 1995 wasn't the "Year of CTI," as I had hoped. I wonder when that year will be?
Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, from O'Reilly and Associates. His recent BYTE.com columns are archived at http://www.byte.com/index/threads
This work is licensed under a Creative Commons License.