What's the video threshold for face-reading?

Why isn't videoconferencing more compelling? When we say we want to look the other person in the eye, what we really want to do is read the microexpressions of the face. As Malcolm Gladwell points out in Blink, people adept at reading faces can literally read minds. And at a sufficient frame rate the visual channel can transmit those microexpressions. [Full story at InfoWorld.com]

This week's column is a follow-up to an earlier one about corporate PR use of Second Life, parodied here. While that column was in the pipeline, Cisco announced its new high-end teleconferencing system, videoblogged by Robert Scoble here. That got me thinking again about what the minimum requirements for emotionally effective telepresence might actually be.

Here's a 20-second snippet from Robert's video to give you a feel for how the Cisco system works:

At the crappy level of video quality shown here, or even at the level shown in Robert's original H.264 QuickTime clip, we can't judge the effect of Cisco's high definition video. We can, however, see how the layout of screens and desks creates a powerful illusion of circular seating. Cisco's astronomical price notwithstanding, though, the need for all participants to visit these specially-designed rooms limits this approach to special circumstances.

What could make ordinary iChat A/V or Skype videoconferencing more emotionally effective? It's natural to assume that high-definition video is a make-or-break requirement. But if what really matters is face-reading, is the ability to count the number of whiskers in a five-o'clock shadow as critical as we imagine?

Here's the critical passage from Malcolm Gladwell's 1992 New Yorker article, The Naked Face, which became the face-reading chapter of Blink:

Perhaps the most famous involuntary expression is what Ekman has dubbed the Duchenne smile, in honor of the nineteenth-century French neurologist Guillaume Duchenne, who first attempted to document the workings of the muscles of the face with the camera. If I ask you to smile, you'll flex your zygomatic major. By contrast, if you smile spontaneously, in the presence of genuine emotion, you'll not only flex your zygomatic but also tighten the orbicularis oculi, pars orbitalis, which is the muscle that encircles the eye. It is almost impossible to tighten the orbicularis oculi, pars lateralis, on demand, and it is equally difficult to stop it from tightening when we smile at something genuinely pleasurable. This kind of smile "does not obey the will," Duchenne wrote. "Its absence unmasks the false friend." When we experience a basic emotion, a corresponding message is automatically sent to the muscles of the face. That message may linger on the face for just a fraction of a second, or be detectable only if you attached electrical sensors to the face, but it's always there. [Malcolm Gladwell: The Naked Face]

The protagonist in this piece, psychologist Paul Ekman, mapped out his Facial Action Coding System back in the 1960s, using videotapes. He clearly didn't watch those tapes in high definition. He did, however, watch them at 30 frames per second.

What's the minimum framerate for face-reading? Is it possible that the 15fps typical of web-style video doesn't capture fleeting microexpressions but that 30fps does? If we traded resolution for framerate might low-end videoconferencing cross a threshold of effectiveness? I'd love to know if this experiment has been done, and if so what its outcome was.

Former URL: http://weblog.infoworld.com/udell/2006/10/26.html#a1552