Subscribe to DSC Newsletter

Warning! Get Ready for Regulations that Restrict Your Analytics

Summary:  AI and predictive analytics are now so prevalent in our day-to-day lives that it has raised the attention of the government, particularly about how certain groups might be adversely impacted.  As data scientists we naturally want to have free reign to let the data speak as best it can, but as this report from the White House shows, we need to be prepared for some push back.

 

The last 10 years have been glorious days for data science and predictive analytics as new algorithmic techniques and faster/cheaper computational platforms have allowed massive improvements in all elements of our lives and business.  Whether you identify this as Big Data, predictive analytics, data science, or artificial intelligence, these are all part and parcel of the driving force behind huge advancements in convenience, efficiency, productivity, better use of resources, expanded access to knowledge and opportunity, and profitability.

As data scientists we have enjoyed an ever expanding array of tools and techniques that are increasingly sensitive and can deal with previously unobtainable results like blending voice, text, image, and social media into our models.  And the businesses that we support have incorporated those advances into better and better products and services. 

The ‘man on the street’ today would probably credit most of those things to AI, artificial intelligence but inside the profession we know the landscape is more complex than that.

Our bedrock belief however is that, properly interpreted, the numbers don’t lie.  So ‘properly interpreted’ is perhaps a big caveat but all the same, our models are objective reflection of the data, not opinion.  They are ground truth.  And our goal has been to find increasingly accurate signals in the data that power improvement in all areas of business, personal experience, and society at large. 

If there is a better tool, we want to use it.  If the result can be made more accurate, we want to find it. The accuracy and value of our work is judged by adoption, business utility, and observable outcomes.

Unfortunately, those days of unfettered objective freedom to pursue better results with better data, tools, and techniques is coming to an end.

Outrageous you say!  How can this possibly be true?  Who will stop me?  The answer is government regulation that has been brewing for some time and over the next several years will increasingly restrict what businesses can and cannot do with data science.

 

Why is this a Problem Now?

We are about to become victims of our own success.  Predictive analytics whether embedded in company operating systems or in public-facing AI apps has become so widely adopted that we are now big enough to be a target.

The argument is that purely objective responses guided by machine learning are sensitive only to optimizing their immediate goals of efficiency and profitability but are not sensitive to social goals.  Government argues that oversight is necessary to push back against unintentional discrimination and exclusion of certain individuals.

If you are in lending or insurance you already know that certain types of data that identify us as members of protected groups based on factors like sex, age, race, religion, and others can’t be used in building decision models that impact customers.  Although this may have resulted in suboptimal decisions about price and risk which equates to higher costs for all of us, our legislators said this was for our own good.  For the most part no one has pushed back.  As it turns out this was just the camel getting its nose into the tent.

 

Where is this Coming From?

Before you decide that this is some sort of hyperbolic rant, please take a look at this report “Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights”.  This was published directly from the Executive Office of the President in May and details exactly what wrongs they will seek to redress, and broadly how.

The report starts off by crediting many benefits of Big Data and algorithmic systems in “removing inappropriate human bias and to improve lives, make the economy work better, and save taxpayer dollars” (part of the original 2014 study for which I was a contributor).

By the second paragraph, the real message emerges:

“As data-driven services become increasingly ubiquitous, and as we come to depend on them more and more, we must address concerns about intentional or implicit biases that may emerge from both the data and the algorithms used as well as the impact they may have on the user and society. Questions of transparency arise when companies, institutions, and organizations use algorithmic systems and automated processes to inform decisions that affect our lives, such as whether or not we qualify for credit or employment opportunities, or which financial, employment and housing advertisements we see.”

 

You Can’t Attack It Unless You Name It

What exactly is it that causes data scientists to be conduits for social injustice?  The report has given our sin a name:

Data fundamentalism—the belief that numbers cannot lie and always represent objective truth—can present serious and obfuscated bias problems that negatively impact people’s lives.”

Anyone with experience in debate knows that you can’t rally the uninformed behind a cause as broad as ‘the social implications of using mathematical algorithms in decision making’. You have to have something catchy, and “Data Fundamentalism” is our new professional fatal flaw.

 

What Exactly Is Being Proposed

The report offers a wide variety of examples in which specific groups might hypothetically be disadvantaged by data science driven decision making.  I emphasize ‘might hypothetically’ since the report offers nothing in the way of evidence that any of this is actually occurring.  No wronged person or group has yet sought redress against mathematics in a court of law.

Here are a few condensed examples directly from the study of potential discrimination, their causes, and what restrictions might be necessary to address them.

1. GPS Apps Providing Fastest Route to a Particular Destination:  This disadvantages groups who do not own smart phones or who rely on public transportation or bicycles.  Since routes are defined in terms of roads navigated by cars there is a selection bias discriminating against low income users.  The proposed solution is to mandate inclusion of non-automotive routes with constantly updated public transportation schedules.

2. Hiring for Cultural Fit:  It is an acknowledged best practice that aside from specific skills that companies should hire for best cultural fit.  This results in employees who are more likely to get along, be retained longer, and be more productive.  Elements of cultural fit have been embedded in the resume screening programs used by many HR departments to be efficient in which candidates to interview.  But as the study contends:  “In a workplace populated primarily with young white men, for example, an algorithmic system designed primarily to hire for culture fit (without taking into account other hiring goals, such as diversity of experience and perspective) might disproportionally recommend hiring more white men because they score best on fitting in with the culture.”  This perpetuates historical bias in hiring.  This type of screening might be disallowed unless certain diversity goals were achieved and reported.

3. Any Screening or Scoring System: Any system used for selecting or excluding any consumer, potential student, job candidate, defendant, or the public cannot be secret or proprietary, must be transparent (interpretable) to the effected person, and both the data used and the technique for scoring must be open to direct challenge.  Clearly this would extend even to scoring systems that recommend one product or service to a specific customer instead of another.  This directly challenges the data used as well as the technique and directly calls out non-interpretable but more accurate techniques like neural nets, SVMs, and ensembles.

4. Simple Recommenders and Search Engines:  These may be conduits of discrimination if the advertisements or products shown to a specific individual can be found to be limiting or mis-targeted.  The report specifically calls out the potential for discrimination when “algorithms used to recommend such content may inadvertently restrict the flow of information to certain groups, leaving them without the same opportunities for economic access and inclusion as others.”

5. Credit Scoring:  The report contends that despite the demonstrable success of credit scoring systems in appropriate pricing and evaluating the risk of loans, they are a primary source of discrimination.  This arises because “30 percent of consumers in low-income neighborhoods are “credit invisible” and the credit records of another 15 percent are unscorable.”  The proposed solution is to mandate the use of alternative scoring systems not yet developed that might rely on such factors as: “phone bills, public records, previous addresses, educational background, and tax records, … location data derived from use of cellphones, information gleaned from social media platforms, purchasing preferences via online shopping histories, and even the speeds at which applicants scroll through personal finance websites.”

6. College Student Admission:  Like the hiring process, many colleges use screening apps designed to identify applicants with the best cultural fit and the highest likelihood of success.  These would be disallowed as restricting the access of less qualified groups who after personal review might be admitted.

These are only a half-dozen examples from the many more offered over the 29 pages of the report.  In short however you can see that they challenge the most fundamental tenants of data science by:

  • Restricting the types of data that can be used.
  • Requiring the user to be responsible for any perceived inaccuracy in the data or its interpretation.
  • Creating a mechanism by which any person impacted however tangentially by a data-driven algorithm must be able to challenge the result.
  • Allowing only algorithms that are interpretable regardless of the lack of accuracy and business inefficiency that practice introduces.
  • Requires application developers to directly incorporate the needs of any individual who might use the app regardless of whether that person was part of the target audience.
  • Make our recommenders and search engines intentionally inefficient by requiring an intentional detuning of specificity to provide broader access to information and recommendations.
  • Essentially invalidating the use of scoring algorithms in recommending a product of service to a specific customer who might challenge that decision.

 

Is There Any Good News Here?

Unless you happen to be completely aligned with the goals of total social inclusivity it is difficult to find any good news here.  If implemented as stated this would undo years of advancements in model accuracy and the broad use of data.  It would also impose very direct costs on a large number of companies whose current data-aided processes would be made intentionally less efficient.

I imagine many readers are wondering how long this will take to actually be regulated – probably piecemeal over four or five years.  Others may be wondering if they can fly under the radar for an extended period of time – probably depends on how deep your pockets are since it is only a matter of time before the plaintiff’s bar awakens to this new opportunity.

You will notice that many of the arguments made in the study are based on the principle of disparate impact (measurement by equal outcome and not equal opportunity).  The new administration coming to Washington in January is probably less anxious to promote regulation and more philosophically inclined toward tests of equal opportunity.

However, now that the services built on our work have such significant impact on society, these reviews and the potential for regulation will not go away.  In a Tractica research report on 10 AI Trends to Watch in 2017, they predict that there will be an “AI Czar” as part of the federal government by 2020.  And this isn’t only a U.S. issue.  They predict that essentially all of the developed countries will move to centralize the review and control of AI on the same time scale.  If you do business internationally you will probably face the same uncoordinated and contradictory regulatory environment that exists broadly for the internet today.

 

 

About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

[email protected]

Views: 3274

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Pramit Choudhary on July 27, 2017 at 6:31am
Comment by Mike O'Neil on December 22, 2016 at 12:06pm

It seems to me that anyone who claims the title of scientist would accept that his or her work could be scrutinized by other scientists to at least be reproducible. If the product of the work was flawed and led to a disadvantageous outcome for an individual, then that individual would have grounds to feel aggrieved.  The claim that algorithm or data  is "proprietary" and therefore cannot be subject to this basic principle of scientific method deserves derision.

Comment by Rebecca Barber, PhD on December 12, 2016 at 12:31pm

Models build predictions of the future based on data from the past, meaning all of the negative externalities of the past are baked in.  No one questions the accuracy of the math in the model.  What is being question here and in "Weapons of Math Destruction" is the way in which these models and their builders ignore the impact the models have on real people.  Not just white, upper middle class people like the builders.  Propublica did an excellent review of the disproportionate impact of algorithms on some disadvantaged groups https://www.propublica.org/series/machine-bias, starting with the recidivism models used by judges in sentencing hearings that is well worth a read.

No one is arguing that models shouldn't be used, but they are arguing that if a human being would be condemned for doing something (sentencing a poor black man in his first conviction to a longer sentence than a working class white serial offender), then we should condemn algorithms who do that as well.  Our models reflect the history we feed them, and that history reflects human bias that was in operation when the data was collected.  We can never claim, therefore, that the algorithms are bias free.  They learn what we give them, and they are learning it better every day, but if society as a whole feels that what they are learning is wrong, unfair, or inappropriate, it is up to us as data scientists to figure out how to either screen that out or at least make it transparent.

Comment by Javier Alonso on December 10, 2016 at 4:15pm

The very first discrimination when looking for a job: Age.

I know of people 60 years old, 7 years unemployed and even if their resumé can be several pages thick, and they could line their house's walls with academic an other titles and diplomas, it's impossible for them to find a job -In Spain, right now- even below their condition and salary aspirations.

No headhunter here would care for them, in spite of their experience -the most expensive quality to attain- or desire to work, be useful or even pay bills and live with some comfort.

Comment by Timo Elliott on December 8, 2016 at 11:03pm

Here's how I see it: data can be used as a weapon; civilized countries put controls on weapons; government is the right body to adjudicate the tradeoffs and controls that may be needed. Our job is to add to that debate.

Society routinely puts in protections that restrict "freedom", from forcing us to wear crash helmets and seatbelts to how much money a bank has to hold in reserve. 

There are many real and troubling examples of how badly-implement algorithms can do harm. The article by Pro Publica called Machine Bias is a good example, where

"The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. And white defendants were mislabeled as low risk more often than black defendants."

But the company that makes the software disagrees with that conclusion, and the algorithm is secret, so we can't tell. Is that OK or not? I believe it's certainly important to have the debate!

And Carina C. Zona's presentation on the Consequences of and Insightful Algorithm is a great summary of the potential problems we will see more often in the future. 

One big danger of algorithms is "bias-washing" -- they are based on prior data, and if that data is generated by humans, it's going to contain bias of some kind. We already have laws to fight bias -- why wouldn't they apply to the algorithms, too?  For example, imagine a racist landlord decided to use an algorithm to choose tenants, and that algorithm just happened to never choose a black person? Wouldn't society have an interest in being able to take a look at that algorithm? Of course, reality will never be that black and white (apologies for the pun), but more subtle biases could be all the more insidious.

Personally, I'm terrified at the potential for data to be misused by people in positions of power. Today's data scientists are like physicists on the Manhattan project. They should go in with their eyes open, and take dissent very seriously indeed. 

Overall, I find it very good news indeed that these types of concern are being taken seriously! 

Comment by Rubens Zimbres on December 7, 2016 at 9:23am

This is very interesting. Taking a similar example, Law and lawyers does not understand Derivatives, known as "advanced financial instruments". These derivatives caused 2008 crisis, affecting millions of people, causing foreclosures, suicides, bankruptcies, etc. However, governments are not interested in regulating the financial industry. What is that? Governments want monopoly over Deep Learning ?

Comment by Vincent Granville on December 6, 2016 at 7:04pm

If the people designing the algorithms have put more intelligence in their systems than a court of law can grasp, I am not sure what the outcome will be. It might look like a witch hunt, against sophisticated algorithms that no one really understand except a few computer scientists, who make too much money anyway to waste their time testifying in court against their peers.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service