"Alexa, Go Talk to Someone Else"

If Oscar Wilde was correct in observing that "life imitates art," the tech world should be celebrating Stanley Kubrick rather than Steve Jobs. Sure, Jobs stole the idea of the computer mouse and graphical user interface from Xerox PARC and brought them mainstream in the Mac (after the failed Lisa initiative), but it was Kubrick's "2001: A Space Odyssey" that popularized the idea of interacting with computers via speech. While chatting with HAL was a great cinematic effect, it doesn't necessarily mean that it's an inevitable, or even "useful" advance in man-machine interfaces for the enterprise.

One of the principles that has guided me in my career in technology is the recognition that our ability to do something has absolutely no relationship to whether it needs to be done in the first place. That principle is something we should keep in sight as we consider how we craft user experiences that are appropriate to enterprise applications.

When reviewing or commenting on user interface capabilities in the consumer mobile space (i.e., the area where most of these ideas originate), "fun" is a word I often find myself interjecting. Since the introduction of point and click through all the marvels of touchscreens, users have delighted in each new way they've been able to interact with their personal technology -- and designers continue to pile on the goodies. Touch has morphed into gradients of touch (i.e., Apple's 3D Touch), and haptic engines give our devices the ability to touch back.

Even in areas like security, we went from no security (not a good start) to passcodes, then fingerprints, and finally "look and unlock."

We may have reached the limits of enterprise appropriate fun when we got to speech recognition. My initial unfavorable reaction to speech recognition came from the fact that it simply didn't work very well, even at basic stuff like number recognition. All I could do was pray that the system offered the option of touchtone inputs.

Researchers tell us that basic "recognition" has now passed 95%, but we've decided to up the ante by creating the expectation of natural language processing (NLP) to support a more natural dialog between man and machine. Apple led the charge with Siri, but quickly fell back into the pack as the capabilities of Amazon Alexa, Google Assistant, and even Microsoft Cortana challenged Mr. Kubrick's vision.

Like most of us, I have a ton of tech in my office that I interact with thousands of times a day. I don't think I'm unique in my expectation regarding my work tools. I want them to work. I've got stuff to do, and nothing is going to send my blood pressure into the danger zone faster than having my tools getting in the way of what I've got to get done. My clients are looking for output, and they don't place any premium on the fact that I can produce this stuff in an environment that evokes a 1960's sci-fi flick.

If I'm home and happen to be playing around with my tech toys, my wife does get a kick out of some of the Alexa tricks I show her (I get them from those emails Amazon sends me), but the entertainment value fades quickly. And forget about showing any of that stuff to your friends if you still want them to be friends. In the end, my wife and I both use speech recognition primarily to avoid typing on a touchscreen.

There are occasions when a speech interface is important in an enterprise, particularly when people need to input data but need to keep their hands free. That capability is also useful in industrial settings where workers need their wits about them to avoid hazards. Those systems often increase reliability (system failures also misdirect the user's attention) by simplifying the recognition task (e.g., numbers only, regular cadence, no NLP, etc.).

That's great for the warehouse environment, but what's the value of pushing it into the office? In an Enterprise Connect 2018 keynote, Amazon Web Services pitched the idea of using Alexa to take the confusion out of logging into audio, video, and Web conferences. Through Alexa Skills, users can define their own vocabulary and usage to improve NLP performance. However, what's the likelihood that throwing an even more complex solution on top of this is going to make things any better?

If the challenge is people, it's important to remember that we can retrain people. No one knew how to point and click until we gave them a mouse, but they all seem to have caught on. The same can be said of swipe, pinch, and stretch, and many of the interactive elements we build into Web experiences. Hey, we even taught everyone to start every sentence with "Alexa."

The core message here is that we have to remember we're operating in a business environment, people have jobs to do, and we provide tools to help them do some parts of those jobs. If a tool is more trouble than it's worth, people won't use it and that means we failed. If you want a meaningful measure of the utility of an elective (not "mandatory") technology solution we deliver, look no farther than user adoption.

Movies are great, entertaining product demos lessen the drudgery of industry conferences, and your in-laws might get a charge out of your asking Alexa silly questions, but when we go to work, we'd better be thinking about work.

Consumer technologies have raised the bar on consumer expectations about creating an engaging user experience, but when it comes down to choosing between function and pizazz, we'd better be focusing on the right target.

The key is "choice." If someone's calling into my contact center, I can force them to deal with an AI-enabled, machine learning NLP-driven bot because it saves me money on contact center agents -- though you might want to consider a touchtone-driven escape route for those of us who don't mark the Turing Test on a curve. The message is, if you're offering this technology to users who must choose to adopt it, you've got to be good.

