Perception is Everything

Perception is Everything

By Jeff LeBlanc

Summer is rolling right along, and so are the attempts to simulate some of Tony Stark’s lifestyle here at ICS.  A while back, I wrote about trying to create our own version of Jarvis here at ICS.  We got a fair distance and demonstrated our results during the ICS QuickStarts earlier in the year.  Jarvis was a bit twitchy even so, overall he performed well.

To give credit where credit is due, our work is based on the Perceptual Computing initiative from Intel.  Intel is looking to move beyond touch as an interaction paradigm and add the modalities of voice commands, gesture control and facial recognition in order to provide a truly immersive and engaging user experience.  Intel feels so strongly that this is the next interaction wave of the future that they launched a contest to develop new cutting-edge apps using their Perceptual Computing SDK (PCSDK).  I am pleased to say that the initial version of Jarvis did quite well and is to date a finalist in the contest.

For those interested,

check out the videohere:

We’ve continued our work beyond this point, since of course our intention is to win Intel’s $100,000 first prize.  To that end, we’ve been working on Jarvis’ cousin, Anita (or A.N.I.T.A. – A Nature Interface Training Assistant), changing the name so the Stark lawyers don’t call us.  Our focus with Anita has been trying to eliminate many of the false positives (i.e., the “open mic” effect) we were getting with the first cut at voice commands.  We did this by adding in the facial recognition, so Anita only knows to respond to voice when I’m looking at the camera.  Our early testing has shown some rather encouraging results.  We also added in a virtual laser pointer, so I can point at the screen and have a laser dot highlight my presentation; no more having the batteries in my pointer run out.

Another big change with Anita was using true voice synthesis for the speech.  Jarvis, like his movie counterpart, used pre-recorded speech for his responses, which of course sounded like a normal conversation.  Computer generated speech is a technology that has improved every year, as shown by the text-to-speech technology  from Microsoft Sam to the current version of David or Zira shipping with Windows 8.  It is still not quite the same as pre-recording; however, using a text-to-speech style gave us much more flexibility.

Anita is still a work in progress, but she’s coming along nicely.  We are continually tuning the effects of the facial recognition and the vocal grammar to eliminate the false positives from both voice and gesture interaction.  Right now, it works pretty well and our goal is to get it even more functional before the contest ends next month.  Tune in next month for an update.  Hopefully by then, I’ll even be able to have Anita introduce herself.