I’ve been reading the interesting and soul-searching (from a data scientist perspective) book from Cathy O’Neil titled “Weapons of Math Destruction”, or WMD as used in the book. The book provides several real-world examples of how Big Data and Data Science – when not properly structured – can lead to ethically-wrong unintended consequences.
Chapter 3 “Arms Race: Going to College” describes how the college ranking system developed by “US News & World Report” in 1983 has created its own self-fulfilling, mis-aligned ecosystem. Because of the influence the “US News & World Report” ranking has on the multi-billion-dollar college recruiting business, a few key metrics – SAT scores, student-teacher ratios, acceptance rates, alumni donations, freshman retention – get over-valued in college’s investment strategies.
The unintended consequences is that many colleges focus their investments on overly-opulent facilities and over-paid research faculty programs in the effort to increase their ranking, sometimes at the expense of a more holistic “quality education and enlightening personal experience” for the college students.
But the “US News & World Report” ranking is greatly flawed with the omission of several critical key metrics. For example, the ranking doesn’t considered price. If cost is not an issue for someone deciding to go to college, then that’s okay. But for the other 99% of us, cost is an important factor in determining a “quality” educational experience.
And that’s the challenge with AI model biases, if you don’t carefully consider the different variables and metrics against which you need to measure model progress and success, you may end up with AI models that deliver ethically-wrong unintended consequences.
So, how does one mitigate the negative impacts of models that are supposed to represent the real-world, but actually provide a dangerously biased and skewed perspective on that world? Here are a couple of things that every organization can do to reduce the ethically-wrong unintended consequences caused by AI models that turn into ““Weapons of Math Destruction”:
One way to avoid AI models that deliver unintended consequences is to invest the time upfront to brainstorm a “diverse, sometimes conflicting set of metrics” against which the AI model will seek to optimize. This means embracing a diverse set of stakeholders (a stakeholder map can help to identify the different stakeholders who either impact or are impacted by the AI model) who can provide a diverse set of perspectives on how best to measure the AI model’s progress and success.
To understand why it’s important to capture a diverse and sometimes conflicting set of metrics against which the AI model must seek to optimize, one needs to understand how an AI model (AI Agent) works (see Figure 1):
Bottom-line: the AI Agent determines or learns “right versus wrong” based upon the definition of value as articulated in the AI Utility Function. The AI Utility Function provides the metrics against which the AI model will learn the right actions to take in what situations (see Figure 2)
Figure 2: AI Utility Function
To avoid the unintended consequences of a poorly constructed AI Utility Function, collaboration with a diverse set of stakeholders is required to identify those short-term and long-term metrics and KPI’s against which AI model progress and success will be measured. The careful weighing of the short-term and long-term metrics associated with the financial/economic, operational, customer, society, environmental and spiritual dimensions must be taken into consideration if we are to make AI work to the benefit of all stakeholders (and maybe avoid those pesky Terminators in the process).
To help brainstorm these diverse set of metrics, embrace the “Thinking Like a Data Scientist” methodology which is designed to drive the cross-organizational collaboration necessary to root out and brainstorm these different metrics. The “Thinking Like a Data Scientist” process guides the identification of a “diverse, sometimes conflicting metrics” into the data science modeling work because the real world is full of “diverse, sometimes conflicting metrics” against which the world must try to optimize (see Figure 3).
A key deliverable from the “Thinking Like a Data Scientist” process is the Hypothesis Development Canvas. The Hypothesis Development Canvas helps in the identification of the variables and metrics against which one is going to measure the targeted use case’s progress and success. For example, increase financial value, while reducing operational costs and risks, while improving customer satisfaction and likelihood to recommend, while improving societal value and quality of life, while reducing environmental impact and carbon footprint (see Figure 4).
Figure 4: Hypothesis Development Canvas
The AI modeling requirements captured in the Hypothesis Development Canvas then need to be translated into the AI Utility Function that guides the metrics and variables against which the AI model will seek to optimize. Shortcutting the process to define the measures against which to monitor any complicated business initiative is naïve…and could ultimately be dangerous depending upon the costs associated with False Positives and False Negatives.
Unintended consequences can easily occur with the AI model if a thorough, comprehensive exploration of “what could go wrong” isn’t conducted prior to building the AI models, and then integrated those costs into the AI Utility Function. And that brings us into the realm of Type I and Type II errors, or False Positives and False Negatives.
I think most folks struggle to understand Type I (False Positive) and Type II (False Negative) errors, which is why I think Figure 5 summarizes Type I and Type II errors very nicely (he-he-he).
In Figure 5, a Type I Error (False Positive) occurs when the doctor tells the man that he is pregnant, when obviously he can’t be. The Type II Error (False Negative) occurs when the doctor tells the women that she is NOT pregnant when visual inspection confirms that she is pregnant.
Let’s look at understanding the costs of False Positives and False Negative using a real-world COVID19 example. With respect to COVID19, when one has incomplete data and is trying to buy time in order to get more complete, accurate and trusted data through testing, then the best thing that one can do is to make decisions based upon the costs of the False Positives and False Negatives. In the case of the COVID19, that means:
See the blog “Using Confusion Matrices to Quantify the Cost of Being Wrong” for more homework on understanding the costs associated with False Positives and False Negatives. Maybe some of you can share this blog with some of our elected officials…
Any time you see a very complex, multi-faceted decision that has been boiled down to a single number…WATCH OUT! Creating a single number against which to monitor any complicated business initiative is naïve. Baseball, for example, leverages a bevy of numbers and metrics to determine the value of a particular player, and many of those numbers and metrics – such as Wins above Replacement, Offensive Wins above Replacement, Offensive Runs above Average and W-L Percentage of Offensive Wins above Average – are complex, composite metrics that are comprised of additional data and metrics.
In a world more and more driven by AI models, Data Scientists cannot effectively ascertain on their own the costs associated with the unintended consequences of False Positives and False Negatives. Mitigating unintended consequences requires the collaboration across a diverse set of stakeholders in order to identify the metrics against which the AI Utility Function will seek to optimize. And these metrics need to represent multiple, sometimes conflicting objectives including financial/economic, operational, customer, society, environmental and spiritual objectives. And again, the determination of the metrics that comprise the AI Utility Function is not a Data Scientist job, unless, of course, you don’t mind herds of Terminators roaming the local mall (I hear that they like sunglasses).