For years now, experts have been working on getting machines to imitate human conversations. We’ve come a long way from the very first conversational artificial intelligence (AI) agent, Eliza which was developed back in the 1960s. But despite being around for over fifty years, the term conversational AI is still foreign to many of us who don’t work in deep technology. In our last blog, we demystified conversational AI and classified solutions in the category.
We noted that voice-based and text-based AI are different. Basically, you can speak to voice-enabled AI and it speaks back. Text-based AI requires correspondence through writing. Seems simple enough but this difference in interaction can impact a user’s experience. In this blog, we tell you how user experience differs between voice and text-based conversational AI.
An AI tool’s accessibility is influenced by two things, the devices on which it is available and the circumstances in which it can be called upon.
Text-based AI solutions are housed on messaging platforms or web applications and so we interact with them using a phone or a computer. Since we usually have these devices on hand, these tools are convenient to retrieve.
In comparison, voice AI interactions can take place over a range of devices. Think about the breadth of accessibility with Siri since it’s built into nearly every device in the Apple ecosystem, from the iWatch to AirPods. Even smart speakers like Google Home act as vehicles for voice-activated AI like Google Assistant.
If we look at accessibility over a single device, Curious Thing’s AI voice interviewing technology is a good example. It doesn’t require an internet connection or additional resources, making it a more broad-reaching solution in comparison to other AI tools.
Voice has an edge over text since it’s suitable for hands-free communication. If you’re driving your car and need to switch navigation settings or have your hands full while following a step-by-step recipe, voice-activated AI is handy. On the other hand, customer interactions that call for documentation or online resources are suited for text-based interactions. Overall, the extent of accessibility a conversational AI solution needs must be determined by its use case.
Think about a recent phone conversation with a friend. In just five minutes, you probably covered a wide number of topics. Now, if you had the same discussion over text, would it have been as fluid? Would you have been as present? Probably not.
Humans tend to be more psychologically engaged when conversing verbally. Elements of speech like intonation, pitch and volume make for a more interactive conversation and let you hold the listener’s attention. On the flip side, we’re more deliberate when writing. We often pay careful attention to our choice of words and how we string them together to build sentences. If you re-read a text or an email and make multiple edits before clicking send, you’re not alone.
To put it simply, written communication isn’t as free flowing, dynamic or instantaneous as speech. This is true for conversational AI interactions as well. The bi-directional flow of information along with the prompt pace of exchange positively influences engagement levels in a voice conversation.
In addition, just as questioning and seeking clarifications are signs of active listening in human conversations, the same features in a voice AI tool increase interactiveness. For example, Curious Thing’s AI emulates face-to-face interview experiences by posing questions, listening in real-time and following-up with within-context clarifications, much like a human interviewer would. The result? A bilateral exchange that offers a positive candidate experience.
We’re quicker at sharing information when speaking compared to writing. Likewise, we listen faster than we read. As a consequence, a voice-enabled medium transmits more information in less time, just like the phone conversation with your friend.
But this is where user experience comes into play. Even with a lot of information to convey, the solution most times is not to frontload the user. Say you’re selecting a hotel room for an upcoming trip. Would you choose to have an AI booking assistant run through the list of amenities verbally for each room? That would be counterproductive at the very least. Interacting with a text-based AI in this case will let you read through the information at your own pace and eliminate the need to remember all of it to make a choice, resulting in a smoother experience.
Interactions via a screen can also accommodate visual elements like videos and images that can help in decision-making. This is especially important for certain use cases such as hotel bookings in the example above or smart retail where the user is heavily reliant on visual imagery. Although voice is a richer means of interaction when compared solely to text, it only transmits sonic information which can sometimes be limiting.
To determine which medium of interaction is superior, voice and text need to be considered in conjunction with the use case in question. For instance, if your AI solution is built for emergency situations and needs to aid the user in providing quick, specific feedback, a text-based tool is fitting. But if you need to provide the user with the opportunity to verbalise and express freely like in the case of a job interview, a voice-activated AI tool satisfies this need better. The aim is to attain a human-AI interaction that not only helps accomplish the task at hand, but offers a user experience that is refined yet feels natural. And so, the medium that allows you to achieve this for your specific use case is the better choice.