The disappearing user interface
From the moment computers found their way into our homes decades ago, people have been trying to find ways to interact with them through voice. Precision input methods have dominated computer interaction design for decades; pointing and clicking, typing with a keyboard or tapping on displays, so voice felt like a far-off pipe dream.
Until recently. Voice as an interface has hit the mainstream, as an onslaught of smart, voice-controlled devices have entered the market and begun sneaking into homes, redefining what it means to interact with a computer and changing the interface design paradigm dramatically in just under two years.
Voice assistants are creeping into every part of our life in 2018. As predicted years earlier by the Spike Jonze film Her, many consumers are wearing smart devices that give them access to voice assistants which can answer questions and perform tasks without ever having to look at a screen or type out a query.
It wasn’t always like this — when personal computers first showed up in people’s homes decades ago, many saw them as overly complicated, intimidating machines that took an expert to master. For years, those machines were controlled by a Command Line Interface, which required users to memorize complicated strings of commands to run applications or even just move files between folders.
Software developers solved this issue by adding visual interfaces, providing affordances to real-world objects to help users understand what their computers were doing. They made the save icon look like a floppy disk, and used a closing door to signify exiting an application. These images were used in part as a way to ease new groups of people into becoming comfortable with these devices.
Those same trends re-appeared with the of the onset of the smartphone era. Skeuomorphic interfaces took over, with applications encased in fake wood and leather textures, lined paper inside note-taking apps, and bookshelves used to signify bookstores.
Over recent years designers have largely abandoned those visual metaphors, as computer and smartphone usage has gone mainstream and consumers have become more sophisticated.
Voice as the new command center
Now, however, we’re faced with a platform shift unlike any we’ve seen before — a shift to an invisible user interface. Ushered in by a new era of ambient computing, the invisible interface represents an operating system that users can interact with but don’t actually see.
This shift is being driven by the adoption of smart speakers, voice assistants, and the Internet of Things, and it’s accelerating rapidly. New products, such as Google Home and Amazon Echo, have sold millions of units in just a few years, with consumers rushing to buy speakers that let them interact in a more natural way with the technology they have in their homes.
Instead of clicking, tapping, or typing, smart speakers are controlled by a user’s voice as their key input. Thanks to massive advances in natural language processing and artificial intelligence, those devices (and the systems that they connect to) are finally able to understand our commands with relatively high accuracy.
At the same time, the adoption of connected devices throughout the home — whether they are smart lights or Internet-connected TVs — means consumers have unprecedented control of the things around them.
With this paradigm shift comes unique challenges for developers, businesses, and the enterprise, as the tech industry looks to adapt to a changing landscape. But chief among them is this: When the user interface disappears, how do you help people discover what they want to do?
For most of us, there’s a learning curve involved, but Travis Bogard, SVP at Samsung NEXT, believes that doesn’t really matter because it’s not targeted at us: “I have two young children and it’s just fascinating to watch how they adapt to these things. I come home, they jump in bed — and immediately call out “Alexa, play music.” They’re walking about with handheld phones made of plastic, and it’s fascinating how there’s a generation of kids who have already figured out the limitations and capabilities of these devices.”
How speech got smart
What makes voice interfaces so unique is that they take advantage of humans’ primary and earliest form of communication. In many ways speech is the most natural way for humans to interact, but until recently computers have struggled to understand intent.
The earliest demonstrations of speech recognition patterns and voice assistants, like Dragon NaturallySpeaking, required the user to enunciate clearly and use predictable sentence structure to be understood.
Over many decades, the technology behind speech recognition and natural language processing (NLP) improved. At the same time, high-performance cloud computing increased the amount of voice-enabled data that could be collected and processed, further refining these models.
With massive leaps in artificial intelligence and machine learning, computers are better able to understand intent even if users don’t phrase their request using a predictable sentence structure.
According to The New Stack, the error rate for speech recognition was more than 43 percent in the 1990s, but by 2017 Google was able to reduce that to just 4.9 percent of queries. It accomplished this by gathering a lot more data:
“Part of the improvement was gaining access to more data and training with it, but it was also about evolving the technology,” she explained. The biggest breakthrough in this period was moving to neural networks while keeping the latency low enough to give results quickly.
There are thousands of ways to ask a smart speaker to play a particular song and millions of song choices for that device to choose from, which can poses a significant challenge for systems that are expected to return results in seconds.
But given time — and billions of queries — machine learning becomes the critical component to understanding what users really want, and learning from mistakes when something goes wrong.
Enabled by machine learning and far-field microphone technology, voice assistants are now able to respond to user requests with high levels of accuracy, to the point that users around the world feel deeply comfortable interacting with them.
Smart speakers and voice assistants are still in their adolescence — they are just mature enough to be dangerous. Like many computing technologies, voice interfaces are being used as toys (“Alexa, tell me a joke”) or for mundane tasks like setting timers or playing music.
Voice won’t replace other interfaces entirely, but it could become the first port of call for a new generation of users. Before children learn to read or write, they use speech to make their needs known. That means a child’s first interaction with computing might not be with a computer, tablet, or smartphone but with a voice assistant.
This makes voice as a platform important, because it will soon become the default: the place where people go first, before using any other product or service, to get help.
The rise of ambient computing
While voice interfaces are increasingly serving as the front end for user interactions with their computing devices and applications, what’s happening behind the scenes is just as important.
Even after installing multiple “smart devices” in the home, users today must explicitly direct those devices what to do, usually via mobile app. Alternatively, they could program hardware how to behave in certain situations, often employing “If This Then That“-like recipes or Alexa skills.
With a signal like, “Alexa, it’s movie time,” users can instruct their home’s connected devices to adjust to their preferences — dimming the lights, locking their doors, and firing up Netflix in the process, for instance.
Bogard believes that “devices don’t have enough information about the user, or their context to do everything well” but “data and machine learning are driving a revolution in insight and context.”
Instead of building routines, Bogard sees that devices will have the ability to anticipate a user’s desires, “rather than just running macros or routines.” Ambient computing takes advantage of implicit signals and data it gathers from various sources to act without a user’s explicit direction. These systems try to predict the correct moment to perform tasks or deliver information based on what they know about a user and her environment.
In the home, that could mean changing the temperature and lighting to fit the owner’s needs when they enter a room, playing music when someone arrives home, or turning on a smart TV when they sit on the couch.
The key, Bogard says, is understanding when to do that, and when not to. “Macros or routines are cool to set up — a theme that comes on when you walk into your house, and suddenly it’s embarrassing because the system didn’t have enough context.”
You may already have some ambient computing devices in your home today. Nest’s thermostat, for example, is a prime example of ambient computing that works so well you forget it’s there. Bogard said that “Nest might seem basic, but it reacts to feedback and learns from it, until you stop changing it at all and it melts away.”
“Behind the scenes, Nest works magic based on that feedback, but you just provide input.”
Another great demonstration of this is found in the wearable market. Smartwatches are now able to monitor your heart rate silently and alert you (or even your doctor) if something appears wrong before it becomes a health problem. In action, this seems trivial, but marks a big shift away from computers requiring constant attention.
Alongside voice, it’s easy to see the division between technology and physical world melting away: the invisible computer is here.
Even if voice assistants don’t try to become our companions like they do in Her, as they get better at understanding emotion and intent, we can imagine a world where an AI performs many mundane tasks on our behalf, like automatically paying our bills, answering emails or unlocking the front door for friends.
In the enterprise, voice assistants combined with advanced AI could provide a new way of interacting with complex back-office systems. Information that is currently locked away in databases, could be easily accessed using natural language queries.
Consider asking, “How much money did we make from sales representatives in 2017?” Today, coming up with that answer would usually require finding the spreadsheet where that information is, parsing the data ourselves, and coming to a conclusion — but a voice assistant could make fishing that information out much faster.
The disappearing discovery mechanism
In the short term, voice interfaces won’t replace our interactions with computers, smartphones, or other screen-centric input devices, but augment them. In a more multimodal computing paradigm, however, future generations will likely use speech first, deferring to digital interfaces only when necessary — such as when reading a long-form article or replying to an email.
In the meantime, discovery remains a deep problem for invisible interfaces and speech-based user interfaces that has yet to be solved. Amazon boasts that Alexa now features thousands of apps for users, but data suggests that people rarely use many outside of the assistant’s core features — because there’s no way to find them.
When we design for screens, we have affordances that can be used to coax the user into trying something new. Notifications can be used to tell users about new features on their mobile apps, but a voice assistant has few ways to surface new experiences.
Voice is now one of the tools in a designer’s toolkit, and as with all new platforms, developers will rush to create dedicated experiences for them. For now, however, voice might best be used as complementary to existing interfaces, rather than as a replacement . This would allow developers to extend their apps beyond screens and introduce new concepts in a way that’s familiar to users.
In the future, however, we can imagine a paradigm in which computers and technology truly disappear into the background. Rather than interactions happening on discrete devices at the direction of a user, an invisible operating system will utilize a wide array of signals about you to make decisions automatically.
This is a radically different way of thinking about the user interface, but is where we’re headed. Today, we’re in a world where we are training our devices. But as smart assistants gobble up more and more data, machine learning increases the breadth and depth of the language they need to understand. Every time Alexa fails to understand you, it adds to its library of knowledge, gets a little stronger and a little smarter.
It’s clear we’re headed toward a world in which we interact with computers in a way that’s natural to humans. The only question that remains is if we’re ready for it yet.