The consumer-business relationship has greatly changed in recent years. Consumers have more choices, and businesses spend time trying to keep up with their changing needs. Consumers choose where to spend their cash depending on their experiences. It’s therefore important for businesses to foster the right kind of relationships to gain and maintain customer trust. Chatbots can help businesses maintain relationships with customers and improve business processes. They have become a huge market with the global market set to exceed $994 million by the year 2024. According to Servion, AI platforms will drive three-quarters of all customer service interactions by 2025. At some point, consumers may even prefer machines over humans. 

EXISTING AGENTS AND FRICTIONS

Large-scale conversational AI agents such as Alexa, Siri, and Google Assistant are getting more and more prevalent, opening up in new domains and taking up new tasks to help users across the globe. One key consideration in designing such systems is how they can be improved over time at that scale. Users interacting with these agents experience frictions due to various reasons: 1) Automatic Speech Recognition (ASR) errors, such as ”play maj and dragons” (should be ”play imagine dragons”), 2) Natural Language Understanding (NLU) errors, such as ”don’t play this song again skip” (Alexa would understand if it is formulated as ”thumbs down this song”), and even user errors, such as ”play bazzi angel” (it should’ve been ”play beautiful by bazzi”). It goes without saying that fixing these frictions help users to have a more seamless experience, and engage more with the AI agents.  

Today, most of these AI agents are built using manually annotated data to train the different components of the system including ASR, NLU and ER. Typically, the accuracy of the machine learning models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to improve the accuracy of these components becomes prohibitively costly and time consuming. As conversational agents become more popular and grow into new scopes, it is critical for these systems to have self-learning mechanisms to fix the recurring issues continuously with minimal human intervention.

SELF-LEARNING METHODS

FEEDBACK BASED

We first intercept the input being passed onto the system and rewrite it with our reformulation engine. We then subsequently pass the rewrite in lieu of the original input back to system for interpretation, and thus restoring the original data flow. Reformulation engine essentially implements rather lightweight service-oriented architecture that encapsulates the access to a high performance, low-latency database, which is queried with the original input to yield its corresponding rewrite candidate. This along with the fact that the system is fundamentally stateless across users translates to a rather scalable customer-facing system. 

In order to discover new rewrite candidates and maintain the viability of existing rewrites, a Markov-based model ingests the log data on a daily basis to learn from users’ reformulations and subsequently updates database. This ingestion to update process takes place offline in entirety with the rewrites in the database updated via a low-maintenance feature-toggling (i.e. feature-flag) mechanism. Additionally, we also have an offline blacklisting mechanism which evaluates the rewrites from the Markov model by independently comparing their friction rate against that of the original utterance, and subsequently filtering them from being uploaded to the database should they perform worse against their no-rewrite counterpart. This allows us to maintain a high precision system at runtime. It is worth mentioning that friction detection is done using a pre-trained ML model based on user’s input and agent’s response.

KNOWLEDGE SERVICE

Demand analysis identified a need for specialised digital assistants in customer facing processes. We focus on the kind of questions that can be answered based on linear rules which take a finite set of parameter-value pairs as input. We assume that most customer support inquiries fall in the domain of our interest, i.e. questions around coverage and pricing. 

A promising approach to advance in this field are comprehensive knowledge engineering methodologies which back the agent and upgrade its conversational capabilities from small talk to domain expert. Modelling contracts bottom-up as self-executing contracts, also known as “smart contracts”, is appealing due to its large potential of automation and its resilience to tampering.

However, it is extremely hard to apply this concept on real-world contracts based on existing contract law provisions because smart contracts are “indifferent to the fundamental legal principles, such as lawfulness, fairness, and protection of the weaker party”. Matching inquiries around a contract using a conventional FAQ bot would be an option, in particular when using a deep learning neural model for answer selection. However, the use of static answers can quickly become complex and hard to maintain due to the arbitrary number of combinations.

DIALOGUE SYSTEMS

The newness of chatbot technology may get people hooked. But, users aren’t likely to stay engaged if a chatbot’s answers are boring or flat. Researchers dived into why so many chatbots need improvement. They found that many of them gave repetitive, nonsensical or contradictory responses. Those problems remind us that no matter how lifelike chatbots seem sometimes, they’re still machines. 

While the content needs to be minimal, the bot can have a personality of its own. The script will keep the answers on track, but the bot’s personality can create a relationship between it and the user. A personality also adds flavour to the exchange, making it more authentic to the user. Companies can choose a personality that fits their brand and interacts well with their specific demographic.

DYNAMIC SPEAKER MODEL 

The model is based on two motivations. First, a speaker’s utterances reflect intents, speaking style, etc. Thus, we may build speaker embeddings by analysing latent modes that characterise utterances in terms of such characteristics, apart from topic-related interests a user might have. Second, the information about a speaker is accumulated as the conversation evolves, which allows us to gradually refine and update the speaker embeddings. The speaker embeddings can be directly used as features or fine-tuned based on the downstream tasks. We design the dynamic speaker model to focus on learning cues from the speaker’s utterances, and leave the modelling of different speaker- addressee interactions for supervised downstream tasks.

USER TOPIC DECISION PREDICTION

We first study a prediction task that estimates whether the user engaged in a social-bot conversation would accept a suggested topic. Due to privacy concerns, the social-bot does not have access to any identity information about users. Also, since each device may be used by multiple users, the device address is not a reliable indicator of the user ID. Therefore, the ability to profile the user through one conversational interaction is desirable for guiding the social-bot’s dialog policy.

DIALOG ACT CLASSIFICATION

Dialog act analysis is widely used for conversations, which identifies the illocutionary force of a speaker’s utterance following the speech act theory. A dialog act classification model is proposed to learn speaker embeddings that are updated as the conversation evolves. The model achieves promising results on two representative tasks in dialogs: user topic decision prediction in human-social-bot conversations and dialog act classification in human-human conversations. 

CHALLENGE 

Neural dialogue models have been widely adopted in various chatbot applications because of their good performance in simulating and generalising human conversations. However, there exists a dark side of these models – due to the vulnerability of neural networks, a neural dialogue model can be manipulated by users to say what they want, which brings in concerns about the security of practical chatbot services. 

Given that the internal mechanisms of neural networks are not explicitly interpretable, neural dialogue models are vulnerable. For example, they may have unpredictable behaviours with regard to some well-crafted inputs. This vulnerability can cause a series of problems, one of which is, whether we can manipulate a dialogue agent to say what we want. In other words, can we find a well-designed input, to induce the agent to provide a desired output? If this is possible, people with ulterior motives may take advantage of this weakness of the chatbots to guide them say something malicious or sensitive, causing adverse social impacts.

Although deep learning models have been used for many tasks that have shown to be useful across a plethora of domains, more recently, researchers have become aware to the fact that although these systems perform extremely well when in a perfect and stable environment (the type they were designed in), but when placed in the real world, they are quite easily susceptible to being attacked.

SOLUTION

REVERSE DIALOG GENERATOR

We can consider the specific neural dialogue model D of our interest as a black-box environment. The dialogue model D is able to take an input sentence I and output a corresponding response sentence O. Now given a targeted output O∗, our goal is to find a well-designed input I, which leads the dialogue model D to output a response O that is exactly the same as O∗ (O = O∗), or at least similar with O∗ in semantics (O ≈ O∗). To achieve it, we build a Reverse Dialogue Generator agent Gθ, which takes the targeted output O∗ as input to predict its corresponding input I.

BLACK-BOX NEURAL DIALOGUE MODEL

In the practical implementation of dialogue systems, neural dialogue models play an important role. However, recent concerns have risen for neural models across all domains as to whether they can be manipulated (in most cases, by crafted adversarial examples), which inspires us to examine the same problem of neural dialogue models. They are likely to be manipulated to “say what I want” by finding well-designed inputs, adversaries can induce the dialogue agent to provide desired outputs. We can use a reinforcement learning based Reverse Dialogue Generator which learns to craft such inputs automatically in the process of interacting with the black-box neural dialogue model. 

FINAL THOUGHTS

IMPROVED CUSTOMER EXPERIENCE 

Chatbots offer consumers 24/7 end-user support without a wait time. Consumers can readily get answers to questions and complaints, and this improves their experiences. 

PERSONALISATION

AI chatbots can collect data and enable your business to offer personalised experiences. 

KEEPS GETTING BETTER OVER TIME

They also store customer data which makes it easy to handle customers who have communicated with the company before. They can also learn more about the consumer’s likes, dislikes, how to keep them engaged, which products to cross-sell and up-sell, what to recommend, and more. AI chatbots get better the more it keeps communicating with consumers.

EFFECTIVE AND EFFICIENT

Using AI chatbots is cheaper than hiring employees to handle huge volumes of consumer queries. They also improve operational efficiency, which makes it easy and fast to meet consumer needs.

Employees can then spend more time handling crucial tasks. Which means the business can save on training costs, hiring costs, and payroll.

PRONE TO ATTACKS

Using AI chatbots is fundamentally the way to go for most business models today but their neural networks are also prone to attacks by adversaries who can essentially make them say what they want given they are not secured by a backup security feature.