For people working in Artificial Intelligence, the term “Human-in-the-Loop” is familiar i.e. a human in the process to validate and improve the AI. There are many situations where it applies, as many as there are AI applications. However. there are still some distinct different ways it can be deployed even within the same application.
Contact Center Example
Let’s take for example the automation of a contact center. A customer submits a ticket which is fielded by a Helpdesk Agent who triages and routes the ticket to the appropriate department where a Case Handler resolves the query and passes back to the Customer to close off the ticket. For products or services which don’t require specialist case handlers, this situation reduces to the Helpdesk Agent and Case Handler being the same person and looking after both tasks e.g. in a retail bank or airline where the Agent would do almost all the tasks apart from escalations and complaints.
Automation can help to save costs for the contact center but primarily can be used to increase customer satisfaction by speeding up responses and reducing customer effort.
There are three automation possibilities for the contact center which fall broadly into three categories:
- Speeding up or automating the Helpdesk Agent i.e. the staff who capture and triage queries;
- Speeding up or automating the Case Handler i.e. the staff who resolve queries;
- Increasing self-service automation i.e. chatbots, searchable FAQs and self-help tools
There are limitations of AI which cannot guarantee to automate to the accuracy required and expected by customers. Some of these limitations are undoubtedly temporary e.g. the comprehension capabilities of speech recognition which will continue to improve, but some are fundamentally limited by the nature of how machine learning robots work.
All machine learning relies on learning from real-life training data and then using the patterns therein to predict or classify on current data. The training data needs to be ‘labelled’ i.e. have an outcome or class (aka “tag”) assigned to it as it was judged by a human. The predictions therefore come from models generated by the robot which assigns labels on a probabilistic basis based on the closest pattern matching. For example, if a query comes in that says “My server has crashed and is showing a blank screen.” then it will assign the best label it has in its training set which might be “server crashed”. However, you can see even in this example that there might be a label “faulty screen” which might also be assigned and the customer would be justifiably annoyed to be dealt with a faulty screen. So this is an example of potential ambiguity. Furthermore, new issues will appear from new product launches, changes in quality and changes in the market. Lastly the way people describe or view the same problem will be more variable for certain issues than others.
So it’s clear that the performance of the machine learning models, however accurate to start with, will degrade over time by an unknown amount. No contact center or customer experience executive in their right mind would let bots loose unabated. Getting things wrong is not good for customer satisfaction, and the naïve way that bots can get things wrong has potentially more damaging reputational consequences than a human error which might at least appear justifiable.
The only safe way of deploying bots within a contact center is to have a human-in-the-loop somewhere, i.e. someone to validate what the bots are doing and preferably with minimal impact back to the customer.
So who and where is the human-in-the-loop? It turns out that there are four general possibilities for humans to validate a bit or all of the process:
- The Helpdesk Agent or Case Handler e.g. by validating suggested responses before sending
- The Customer can validate the response, or indeed validate the question they asked was comprehended
- A third-party Solution Provider can check the performance of the bots is satisfactory and curate the process (this might be an internal or external data science team)
- The Knowledge Base Manager can check the performance of the bots is satisfactory
More on Different Humans-in-the-loop
There are pros and cons of different Human-in-the-Loop approaches. Some of these points are quite technical in nature, but with substantial implications.
Agent or Handler
This seems like an obvious place to start. There are some solutions on the market which have AI to recommend the ‘next best response’ for the Agent or Handler (just noted as “Agent” for brevity here) trained on previous examples. If the Agent is presented with the top 3 responses with a confidence interval then they can choose to accept one or to pen a new one. It seems naively that this is enough to maintain the system. However, the process of validating the recommended options presented in a ‘reinforcement’ mode has an inherent training bias compared to assigning a query freely to any category and whilst technically the explanation is outside the scope of this document this process will not in itself maintain performance without other humans in the loop elsewhere e.g. a solution provider.
Secondly, in this setup, the Agents are validating the response, not the categorization. For example, if the two queries: “The strawberries I bought were tasteless” and “The strawberries I bought made me sick” both might lead to the same recommended response: “We’re really sorry, please accept our voucher”. You can see that the categorization models will degrade as they are not being updated unless the Agents also validates this, which adds an extra task and defeats the automation.
This situation is accentuated where the Agent and Handler are two different roles e.g. a technical helpdesk or complaints handling as referred above. There are likely to be a variety of different root causes (i.e. responses) for the same set of symptoms (i.e. queries) and indeed vice versa. So it might not be even possible in these situations to have a workable machine learning model of predicting root causes from queries. Far better it classifies the symptoms correctly and then the resolution is predicted from sets of logical diagnostic rules from a fault tree.
Thirdly, the process is still in-line, meaning that it doesn’t remove agents from the loop although it does speed them up so you might need less. It will reach a new equilibrium though.
Lastly, there is another broader issue with not maintaining the query models, and that is that the insight generated by the models (i.e. how many tasteless strawberries, how much sickness) is as valuable as the automation as it is this which allows the customer service and customer experience executives to monitor for product quality, design, usability and indeed being able to generate the self-service tools which can obviate the contact center traffic in the first place (we refer to this again below).
In conclusion, this solution has a lot of merit for a first step towards automation in certain situations, but it does not self-maintain the system without other human-in-the-loop elsewhere.
If customers are providing the required validation, then on the one hand it is perfectly scalable from the business’ point of view, but on the other, customers may not be delighted to have to validate or correct their original query or the responses. If the query is a new category then some process for dealing with that needs to be in place. There is also the risk of misunderstanding or mischief from a party who ultimately is not under the control of the supplier or necessarily has aligned interests.
Fundamentally the system cannot be relied upon just with these humans-in-the-loop.
This is the status quo for nearly all machine learning deployments in real-world environments: There is a data science team, either internally or a third-party, who set-up, curate and retrain the models on a regular basis to maintain their performance. The pros are that these are the only humans-in-the-loop required given that their remit is holistic. The cons are that they are in short supply and tying up their time with work, much of which is retraining and labelling data, or overseeing it, is time that could be undoubtedly be spent on more worthy causes. If it is a third-party, it also introduces a dependency. Potentially also there is a disconnect: Many machine learning projects fail because the business evolves, and new signals, or process reorganization, might happen quicker than the refresh period of the solution provider.
Knowledge Base Manager
This role has possibly the most hidden potential for benefit and human-in-the-loop, without being in-line within the process. In a non-technical environment, they will provide the business rules on how to handle queries (strictly speaking it might be another department, but included here as the strategic function of the contact center) and within a technical one, they will provide the training, trouble-shooting guides and fault trees required to resolve issues.
In terms of their day-to-day role, they will be aware of forthcoming product launches and modifications, but also they can use the rich insight of the labels coming from the contact center (both triage and resolution), to make improvements to the knowledge base, as well as to the process e.g. updating the FAQs so that customers can self-serve more queries, thus obviating rather than automating queries (i.e. a higher goal). This insight can also be fed back to other functions as noted above such as product quality, product design and customer experience, to help improvements there as well.
Different Approach – Optimized System
There is a new approach to the ones above which only requires a few humans-in-the-loop and it can exist because of a novel type of technology called Optimized Learning. This is a form of machine learning which builds models as before, but invites training from a human in a way which is designed to minimize the human input for maximum performance. It is therefore ideal for spotting early warning of new signals as well as improving the others that need it. It doesn’t need to be in-line and suffers from none of the biases above. It requires a fraction of the labelling otherwise required (like for like just a few percent), even in a changing environment. The implications of this are profound. It means that only a few Agents could be retained after the automation is implemented and they would be doing the training that the Optimized Learning invited them to do in an offline capacity. This would be maintaining the models for the labelling of the queries generating both automation and insight i.e. to both speed up and obviate issues.
The rest of the automation would be generated by the rules originating from the Knowledge Base Manager, as informed by the bots too. It also paves the way for improving chatbots and self-serve, searchable FAQs in order to obviate the contact center.
Automation of contact centers yields promise although not without humans-in-the-loop somewhere in the system to maintain the performance. There are many different flavors for human-in-the-loop, and with some novel technology appearing, an optimized system is possible with the minimum amount of humans and without any data science skills. There is now no reason why the contact centers of the future need to look like those of the present and same for the possibilities of customer experience as well.