Home » Technical Topics » AI Linguistics

Time for a Chatbot Tune-Up

Summary:  Now that you have a little time for introspection, how about reviewing the performance of your chatbots.

4540093847Chances are this lock down period has given you a little more time for introspection and observation than you had before.  Let me suggest that this would be a good time to take a look at how your chatbots are performing and how they might be improved.

There is no shortage of competing KPIs for measuring performance so let’s focus on those that are most likely to flag a problem and those that may not work as well in all situations.

Starting at the End – Success Rate

Whether you call this Goal Completion Rate or Self-Service Rate it means that in the end your customer got what he came for without being referred to a human.  The details of this will vary a great deal depending on the objective of your Bot.

Service Bots responses can be either brief or multi-step.  It may require only a step or two to return information on an account balance or the available appointment times on a specific date.  However those that deal in more complex how-to’s, particularly in tech can have high escalation rates resulting in referrals to human CSRs.

Service Bots are typically trying to solve a customer problem which can mean some level of customer displeasure or aggravation from the start.  Commerce Bots on the other hand are responding to a more positively motivated user.  They may however require more interactive steps to match the user with the right service or product which can also create complications.

Commerce Bot success rates can be seen in successful orders placed or services utilized.  Service Bots on the other hand can be trickier to measure.  It’s always a good idea to offer your user a chance to rate their experience at the conclusion on the session or to simply ask if their problem has been satisfactorily resolved.  The presence of a four or five star rating is a good thing.  However it may also be disguising high abandonment rates by dissatisfied users who never bother to give you the true feedback of a low rating.

Retention Rate, the frequency and time-between-uses in which a single user returns to use your Bot is considered a good measure for Commerce Bots but not necessarily for Service Bots where frequent repeat uses might indicate on going problems. 

If you’re using retention rate be sure to look at the time horizon being measured.  If for example the Bot is for game play or music selection you might correctly assume that a short horizon like 7 days would be appropriate since you may be promoting daily returns by your users.  On the other hand, if your Bot is focused on showing employment opportunities you may need to measure a much longer period, for example 30 days, since it may take that long for the user to evaluate and apply for the first presented jobs before returning to look for more.

Types of Problem KPIs

Fall Back Response or Confusion Rate KPIs are a good place to start looking for opportunities for improvement.  Both these terms describe a similar circumstance when your Bot fails to understand the user input and must either ask for additional information, or worse, fail to respond at all.

Obviously a failure to respond at all is a clear fail.  However, capturing your customer in an endless loop of requests for more information or ‘sorry I didn’t understand’ responses is equally as bad.

These may be NLP failures but they can also signal a portion of your response strategy that is no longer working, perhaps because customer needs have changed, products or services have changed, or altogether new problem types have surfaced.

Long Conversations are Good, Right?

Not necessarily.  Depending on the purpose of your Bot, longer session length and more steps per conversation is considered a measure of success.  A little statistical insight here however can go a long way.

After clustering your interactions by type it’s useful to examine those that are particularly long or particularly short.  The very short ones may actually be user abandonments.

The longer ones also bear looking into.  Did your customer get into an unsatisfactory loop?  Did your Bot need too many clarifying questions?  Did your user as the same question repeated times?

Even where your user finally succeeded in getting the desired information after an extra-long session, a customer’s bad experience with your Bot may deter them from returning next time.

Voluntary Use versus Prompted Use

In voluntary use your customer initiates the interaction with the Bot as opposed to being prompted by the Bot to communicate.  Voluntary usage rate along with a good retention rate is a good indicator that your Bot is working well and is popular with users.

Usage Distribution by Hour

Your adoption of chatbots has two potential benefits.  The one we most often think of is cost control by reducing our reliance on human CSRs.  But particularly in the case of commerce bots the additional goal is to increase revenue during those hours when human CSRs are either not on duty or available only with significantly reduced capacity.

Tracking your bot’s usage by hour should show how your new 24/7 channel is augmenting your revenue during the portion of the day when your user support services were previously not available.

AI/ML Learning Rates

We hope that your Bots are equipped with AI/ML routines that constantly monitor your wins and losses and use that fresh data to improve your recommender, targeting, and pricing algorithms.

More, we suggest that you pick KPIs that will let you measure directly whether that learning is continuing or even accelerating.  Eventually, in the best designed systems, the rate of improvement will slow and even stop if the algorithms have been fully optimized.  Understanding where you are on that curve is equally valuable in continuing to invest resources in that improvement or moving those resources on to the next opportunity.

Other articles by Bill Vorhies


About the author:  Bill is Contributing Editor for Data Science Central.  Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.  His articles have been read more than 2.1 million times.

[email protected] or [email protected]

Leave a Reply

Your email address will not be published. Required fields are marked *