Tips for Creating Voice User Interfaces

By Jeff LeBlanc Wednesday, April 24, 2019

We had a very enthusiastic Q&A session at the end of our recent Introduction to Voice Design webinar, which you can download here. The content sparked a lot of thought-provoking questions from attendees. I’m answering five of them here.

Q: What is the biggest change to voice user interfaces (VUI) when used in a car scenario? Can something like Alexa link into the car systems?

Since Amazon has just released their Echo Auto and accompanying Alexa Auto SDK this year, that question is an easy one to answer on the surface. It’s a definite yes. What gets more interesting is how the device could potentially kickstart the acceptance of VUIs and voice assistants in vehicles. A limiting factor in how well the current generation of vehicle assistants work is their connectivity. To do VUIs well, you need cloud computing to pull it off. Since the new Alexa service is built from the ground up to work in the cloud, users can expect the same level of voice accuracy in their cars as they do at home.

Q: How do you handle concepts like sarcasm, where the stated words could be the opposite of the intent?

That may have been my favorite question from the session, as it piqued the Tony Stark that is so close to the surface with me! In the movies, Tony and Jarvis have a great rapport. But in real life, getting a computer to understand sarcasm is no easy task. There’s a lot of intense research going on currently in many areas of emotion recognition in speech and natural language, such as detecting anger from a caller to an automated support system and quickly escalating the call to a human operator. Detecting emotion from spoken language involves deep machine learning and neural networks. While this is beyond the scope of today’s Q&A, I’ll develop an upcoming blog on the topic.

Q: What tools do you use when designing and prototyping VUIs?

The best tools for VUI design are often the same as for GUI design. VUI design starts with the whiteboard. The designer maps out the workflows and conversational flows on the board until the concept is fully thought through, and then moves to a graphical tool to document and refine the design. Any type of tool that can create flowcharts, such as Microsoft Visio, works well here. For functional prototypes, we work a lot with the Amazon toolchain, which has great support for intents and synonyms. A bonus: it's basically free. Add to that some homegrown JavaScript code to catch and echo the responses from the Amazon cloud and you have a good environment to build in.

Q: Do you have specific go-to methods or strategies for testing VUIs?

Testing VUIs can be a challenge because of the varying levels of testing needed by different audiences. For instance, while engineers may do proof of concept testing, a UX designer is looking to test the experience. That may require a certain level of “scaffolding” to accomplish. One way to eliminate that need is to do “Wizard of Oz” (WOZ) testing, where a real person plays the part of the computer and handles the response. That can be done totally free-form or by having a person trigger a list of pre-recorded responses. WOZ testing is frequently used to test the experiences that might otherwise require a lot of technology to simulate. Additionally, some of the big VUI building systems like the Amazon cloud include plenty of hooks to facilitate testing.

Q: How do you see VUIs being integrated into the healthcare and medical fields?

I think the biggest area we’ll see VUI impact in the short term will be home healthcare, but the technology will eventually make its way into clinical settings like hospitals, where medical professionals have used dictation-based software for years to transcribe their notes. Why isn’t the tech starting in the place where it seemingly is needed most? The combination of HIPAA regulations and the overall noise level in hospitals makes extending VUIs into those environments a real challenge, one for which solutions are still evolving.

On the other hand, home assistants — Alexa, how do you make an apple pie? — have made huge leaps in their accuracy and acceptance. That means the technology already exists for targeted interfaces for consumer-level home-health devices. That’s great news since these products, from smart blood glucose monitors to wireless blood pressure monitors, which can help people manage chronic conditions and help an aging population maintain independence.

If you’re interested in reading about developing voice interfaces, check out How Speech Synthesis is Helping Alexa Grow Up. You can find general articles on UX design here. And if you’d like to listen to our on-demand webinar on medical device development, find it here.