Insights, Home

What Is Conversational AI?

[object Object]
CT TeamJanuary 18, 2021
PostFeaturedImage

Artificial intelligence (AI) with its built-in smartness is making our lives easier.

You no longer need to swipe or click to play music. “Alexa, play my favourite song,” will do the trick.

Looking to open a ticket after business hours to return a pair of shoes? The retailer’s Interactive Voice Response (IVR) has you covered.

But AI-powered solutions that you can interact with in human-like ways go beyond just voice assistants and IVR. 

In this blog, we explain what conversational AI is and broadly categorise solutions that fall within this family.

What is conversational AI?

Conversational AI is a set of technologies that enable computers to understand and process natural language inputs so they can ‘talk’ with humans. Put simply, they allow us to interact with machines in the same way that we do with other humans.

Unlike computer language which has defined rules and operates in zeros and ones, human language is highly nuanced. We share common knowledge and experiences and embed that in the form of context when we speak or attribute meaning. We use synonyms, analogies and jargon, pronounce words differently and have diverse accents. We even use sarcasm! 

In comparison, computers are used to seeing things in black and white (figuratively of course!).

The differences in how we communicate add texture and colour to our conversations but make it challenging for computers to understand us. So, the value of conversational AI is in bridging this communication gap between humans and computers to achieve more natural and useful interactions.

But conversational AI is a broad umbrella term and the solutions within it vary greatly. 

Categorising conversational AI solutions

Conversational AI solutions can be classified in a number of ways. However, we chose to categorise them based on four simple dimensions for ease of understanding.

Purpose

The vast majority of conversational AI tools are designed to fulfil inquiries or help us accomplish certain tasks. They are reactive to human commands and are essentially conversational user interfaces to an information retrieval mechanism or a form process.

Examples of tools at this end of the spectrum include FAQ bots and task fulfilment chatbots.

FAQ bots simply return the best matches to user inquiries. They consider the user’s input to be complete information and don’t prompt for clarifications or ask follow up questions. Similarly, task fulfilment chatbots that implement an intent detection and template slot filling model ask questions to elicit responses to fill a form. Every piece of information requested corresponds to a blank that needs to be filled. In essence, their questioning ability is limited by the scope of the form. They’re passive tools that wait to be activated by the user and ask predefined questions because they need to.

In contrast, conversational AI solutions on the opposite end of this spectrum proactively question to discover information and gather knowledge from humans. In other words, they drive the conversation and ask questions because they want to.

For example, when screening candidates, Curious Thing’s AI technology leads the conversation by asking open and closed-ended questions about topics ranging from skills and experiences to job expectations and ambitions. The AI optimises the discovery of a candidate’s behavioural traits and capabilities. It has the unique ability to dynamically generate and ask within-context questions on the fly based on the candidate’s responses, in turn fulfilling the recruiter’s interest in getting to know the candidate better.

Intelligence

The ability to retain context in a conversation and conduct complex interactions reflects the level of intelligence of conversational AI solutions.

On one end of this spectrum are AI solutions that can only hold one-off, single-turn conversations and on the other end are more sophisticated tools that are capable of engaging in complex, multi-turn conversations.

Conversational turn-taking is the back and forth nature of orderly communication, similar to people taking turns when speaking.

A single turn-conversation involves one back and forth interaction. For example, if you ask a weather bot what tomorrow’s forecast for Melbourne is, it doesn’t need any additional information to execute your request. It provides the information in one response and the conversation is marked complete.

But if you followed-up with, “How about Sydney?”, the bot would most likely not understand. This is because an AI solution that can only hold single-turn interactions doesn’t retain context. So in this case, it won’t know that you’re still referring to weather.

Such AI solutions work well for simple tasks like playing a song or sending a text where the user usually provides the information in one go. But some questions can’t be answered in a single turn. The user might ask a query that needs to be refined or filtered before it can be actioned. This is where multi-turn conversations are essential. 

A multi-turn conversation is a dialogue with a series of back and forth interactions. Such conversations are complex since the AI needs to withhold context over the course of the discussion and apply it to understand and fulfil the user’s intent. 

Tools that achieve this are far more conversational like Google’s Duplex AI and can handle comparatively more elaborate tasks. They can even ask follow-up questions to get more information and direct the user back to the topic in case they digress.

Real-timeness

Humans in conversation listen, process and respond in real-time. However, not all AI solutions function this way. Conversational AI tools can be categorised based on their asynchronous and synchronous natures.

Asynchronous communication is when parties engage in conversation at different times. Likewise, asynchronous conversational AI tools participate in unidirectional conversations. They process information once the interaction is complete. Such tools usually require lesser conversational turns since all the information is packaged in a single input and provided to the AI.

If you applied for a job role in recent months, you’ve most likely given a video interview. When you begin, you’re prompted with a series of predefined questions displayed on your screen. You record an answer to each question sequentially and submit, only after which the AI starts processing the recording.

On the other hand, synchronous communication happens in real-time. A synchronous conversational AI interaction is a two-way interaction where the user and AI simultaneously participate. Like humans, the AI listens, understands, responds, and potentially delivers results within minutes.

For example, Curious Thing’s AI voice interviewer interacts in real-time. As you speak, the AI processes your input and attributes meaning. Say you’re interviewing for a developer role and the AI asks if you know Python. If you respond with a no, the AI understands your reply and skips the next question that requires Python-related knowledge. 

Because it’s processing while you speak, your results are ready within seconds of you completing.

Medium

Conversational AI tools can also be classified based on the medium of interaction, i.e. voice or text.

Text-based solutions exist on messaging platforms like SMS or web based applications. We interact with them in the same way that we use text-based tools like WhatsApp - through a screen via chat. An example is Uber’s bot that lets you book a ride on Facebook Messenger.

With voice-based solutions like Siri however, we converse verbally. But the way we communicate varies when we talk and text. For instance, we’re more likely to be verbose when speaking but use shorter sentences while texting. 

Although voice is a richer medium of communication, it’s more varied and difficult to comprehend in comparison to text. As a result, conversational AI solutions that can process speech generally lie on the higher end of the intelligence spectrum.