Under the Hood With Chatbots

Summary: This is the second in our chatbot series. Here we explore Natural Language Understanding (NLU), the front end of all chatbots. We’ll discuss the programming necessary to build rules based chatbots and then look at the use of deep learning algorithms that are the basis for AI enabled chatbots.

In our last article which was the first in this series about chatbots we covered the basics including their brief technological history, uses, basic design choices, and where deep learning comes into play.

In this installment we’ll explore in more depth how Natural Language Understanding (NLU) based on deep neural net RNN/LSTMs enables both rules based and AI chatbots. We’ll look at the method, logic, design choices, and programmatic components that are at work in rules based chatbots. Finally we’ll look at the use of RNN/LSTMs to generate long form responses and even carry out seemingly sophisticated conversations that cross over the Turing Test threshold.

Natural Language Understanding – The Front End for All Chatbots

You may be more used to hearing about NLP (Natural Language Processing) but with chatbots we’re much more interested in the subset of NLP call NLU (Natural Language Understanding). This the machine’s ability to understand human text or speech and extract the correct meaning, despite the problems of accents, misspelling, mispronunciation, or just an odd way of phrasing the input.

Fortunately you don’t have to build the NLU from scratch as all the major development platforms have one built in. These may be proprietary models from the likes of Amazon or IBM, or the open source variety found in machine libraries like Stanford’s Core NLP Suite, Natural Language Toolkit (NLTK), Apache Open NLP, or spaCy among others.

While NLU routines do many things, the ones that are particularly important in supporting chatbots are these:

Named Entity Recognition: Identifying categories of words like a person’s name, a product, a date, or an address.
Normalization: Attempts to account for common spelling errors, typos, or different pronunciations.
Parts of Speech Tagging: Identifies the parts of speech like nouns, verbs, and adjectives as the foundation for understanding sentence structure and how it will impact meaning.
Dependency Parsing: Identifying subjects, objects, actions and the like to find dependent phrases.

NLU packages will not have been trained on the proper names, events, places, or even acronyms that are unique to different businesses. In some cases it may still be necessary to add ‘domain-specific dictionaries and ontologies’ to allow the NLU to properly interpret how these unique words and phrases should be understood.

However many NLUs can be trained on-the-fly with a built-in technology called ‘communication in focus’ (CIF). CIF develops what are called ‘context discriminants’ (CD) by reducing complex sentences or those containing unknown words into short word groupings and comparing them to ‘semantic neighbors’ comprised of context and points-of-view on subjects. Comparing the new CD to prior CDs produces higher order derivatives allowing the NLU to interpret entity relationships among previously unknown subjects.

There are other helpful things the NLU can add to the process like Sentiment Analysis. A pre-trained NLU can generally detect enough about the tone of the conversation to know if the user is having a good experience or whether the chatbot should forward the conversation to a human backup operator.

Building a Programed Response Rules-Based Chatbot

The design goal in building a rules-based chatbot is to lay out in detail all of the possible questions, clarifying information, and responses or actions that you intend your chatbot be able to handle. That can be a lot of detail and a good reason to keep your knowledge domain narrow. NLU will take care of the words with similar meanings or different ways to phrase the request but that still leaves a lot of work to do.

Despite this up front work, rules-based continues to be the fastest and easiest way to create a chatbot. For the developer, who clearly needs a team that includes one or more SMEs, the process is neither particularly fast nor easy, but less complex than building an AI-powered bot.

This category of chatbot is growing so rapidly that Gartner recently forecast that by 2020 fully 10% of new IT hires would be writing these scripts.

This type of scripting is commonly called ‘waterfall’ since it is a sequential design process in which higher (earlier) phases of the waterfall fill lower level pools which may also flow into one another. My preferred description of this process is ‘decision tree’ which is more descriptive and more familiar to the data science audience.

You could build a chatbot up from any raw code you prefer but the much easier course of action is to use one of the many, many chatbot platforms that have emerged. These offer step-by-step frames for all the necessary components you will need to define.

You may still need to add lines of code in these platforms, for example to describe the source of external data and how to access that information. Or alternatively the steps needed for an action such as ‘make an appointment’. Many platforms have prebuilt modules for actions like ‘make an appointment’ that you can customize. Although this isn’t exactly drag-and-drop, you won’t need to learn the ins and outs of NLP. These platforms lead you through a step-by-step process gathering the necessary NLU information as you go and then provide an environment for testing prior to deployment.

Agents – Intents – Entities – Dialog Flow

Agents, intents, entities, and dialog flows are the building blocks of your chatbot. This is not intended to be a deep dive but to give you an understanding of these terms and how they relate.

Agent: The Agent is your chatbot. You can have multiple Agents with different objectives but a single Agent reflects only the specific tasks and limited knowledge you intend. You might have one Agent that returns the weather, another that schedules an appointment, or another that responds to a customer service complaint. While it is possible to put all three of these objectives into a single Agent it would be unwieldly at best. So your Agent narrowly defines all the things you want this specific chatbot to be able to do.

The definitions that make up the chatbot are based on Intents, Entities, and the Dialog Flow.

Intent: Mapping ‘User Says’ to Actions

The first thing your platform will prompt you for are Intents. The Intents are a mapping between the user’s natural language request and what actions your chatbot should take.

Think of this as intent detection. The first step is typically mapping “User Says” to an “Action”.

If your chatbot is quite narrow, for example reporting the weather in a particular city on a particular day then the list of possible “User Says” statements is likely to be pretty narrow also. E.g. What’s the weather in (city x) (next Friday).

However if you’re building a customer service chatbot then the natural language requests from the user are likely to be much more varied. You need to populate a set of examples that represent what the user might say. You don’t need to think of all of them since the NLU will use your examples to train itself for similar user statements but the more the better.

Let’s suppose your customer service process typically recognizes three types of requests: complaints, returns, and everything else. (The following examples are drawn from IBM Watson ‘Build-a-chatbot’ blog.)

So for complaints you might enter examples like:

Can I get some help.
I need this fixed.
I’ve got a problem.
I wish to register a complaint.
Please help me out.
Somethings wrong.

For returns you might enter examples like:

Exchange.
I don’t want this anymore.
I’d like to return this.
I need you to take this back.
I want my money back.
Please take this back.
This parrot is dead. I need to return it.

For the ‘everything else’ category you can let your imagination run free since it’s training the NLU to recognize that it’s not a complaint or return. You can have some fun with this category.

A little bit of this, a little bit of that.
Banana.
Dog walker.
Football is nice.
I wish I was a fish.
Where am I.

Users may actually make requests that contain multiple intents in the same message so it’s possible to assign priorities to different intents as well as fall back intents. A fall back intent might be used if your chatbot can’t identify the intent and may need to ask for clarification (“I didn’t understand. Would you please clarify your request”). Or you could design your chatbot to simply refer the customer to a human CSR if the bot fails to understand the intent (“let me transfer you to an agent”).

Actions

Based on the Intent mapping your chatbot will now understand that it is to take one or more actions. If they are simple like the weather example or ‘make an appointment’ you may be able to customize prebuilt modules. However your actions can be as complex as you wish to make them which will require custom coding.

For example, a complaint might be answered by a text or verbal response that has looked up the customer’s order and included some of that detail in the response.
It may also then provide directly or by email a prefilled return form and offer some instructions on the return procedure and policy.
In a much more complex case it might search available inventory to see if a replacement item (e.g. different size or color) is immediately available and offer that as a replacement in order avoid a return.

Contexts

Chatbot platforms also offer a method of recording prior information that might be required to better understand the actual intent of the current request.

For example, if a user is listening to music and finds a song they like they might input “show me more like that”. Context will have stored the song title, genre, category, artist, and other information so it is able to interpret the request as related to the last song heard.

Similarly, if we are dealing with a smart home device, and the first voice command is “turn on the living room lights”, followed by “turn them off” the context will allow the bot to understand that it is the living room lights that are intended.

Entities

Entities are real world objects like products, people, places, dates/times, distance, and category names, among others. NLUs can extract the parameter values from the user’s request by looking for entities, some of which will be system defined but many of which will be defined by you during programming.

Entities can also be conditional to define a filter like ‘lowest price’ or ‘open now’. Some entities like date/time or distance are typically built into the system and don’t need to be separately defined. Your product names and types however would have to input and kept updated.

Dialog Flow

Dialog flow is the area of both logic and artistry. You want your Chabot’s dialog to seem natural and conversational. You also want your Chabot’s dialog to be logical, that is lead to the user’s satisfaction in the fewest possible steps.

Many chatbots can be constructed with simple linear dialogues but many will require non-linear or branching dialogs. Non-linear may be required in many cases, for example:

If your chatbot is a customer satisfaction survey and you ask the user to rate their experience as excellent, good, fair, or poor, chances are the next questions you will pose are different for each category and branching is required.
If your chatbot didn’t understand the request you may have several different loops of conversation that the chatbot will pose in order to clarify the user’s intent and required action.
One branch that is almost always present is referral to a live agent. This happens when the chatbot can’t understand the intent or doesn’t have sufficient information to answer, or if the user says a keyword like ‘agent’. In some cases NLU sentiment analysis may detect that the user is becoming frustrated or angry and you may create an exit to a human agent when this is detected.

Building Deep-Learning Generative AI-Driven Chatbots

In our first article we described how Generative AI Chatbots are just now being introduced based on RNN/LSTM architectures that can intake long-form complex sentences, retain the context, and provide long form or multi-part responses. All of this occurs without a programmer having to define intents, actions, responses, entities, or dialog flow.

This may be the wave of the future but it’s only just now getting underway. We mentioned Andrew Ng’s recent announcement of Woebot, a chatbot capable of psychological counseling for depression.

Another instructive example was discussed in a research paper (“A Neural Conversational Model”) by two Google researchers last year. Their goal was to create a chatbot that could converse with users to resolve difficult IT support questions. Two important points come from this experience.

Like all CNN and RNN models you need a lot of data on which to train. Fortunately there were a lot of computer help desk logs on which to train containing years of history about different problems and their solutions.

Remember we said that all chatbots required a reasonably closed domain. Computer help desk problems may be a very large domain but it is essentially closed and the knowledge base could be defined by the past support logs.

The second and more fun feature is that the researchers needed to teach their Generative chatbot to speak conversational English. So in addition to training on the technical knowledge base, they also trained it on a public dataset of 2,000 screen plays from which it learned modern conversational English.

The result was reported as quite successful with the system able to take in complex multipart spoken language and correctly determine the intent as well as the cause. It could also then respond in complex and multipart conversation so that the user could make interjections or ask for clarification if a particular corrective step wasn’t understood or was too complex.

Yes there are examples of generative chatbots today but this is mostly the domain of chatbots yet to come. Meantime the path to rules-based chatbots is well developed, even as new as it is, and ready for your project.

Leave a Reply Cancel reply