Subscribe to DSC Newsletter

GDPR and the Paradox of Interpretability

Summary:  GDPR carries many new data and privacy requirements including a “right to explanation”.  On the surface this appears to be similar to US rules for regulated industries.  We examine why this is actually a penalty and not a benefit for the individual and offer some insight into the actual wording of the GDPR regulation which also offers some relief.

 

GDPR is now just about 60 days away and there’s plenty to pay attention to especially in getting and maintaining permission to use a subscriber’s data.  It’s a tough problem.

If you’re an existing social media platform like Facebook that’s a huge change to your basic architecture.  If you’re just starting out in the EU there are some new third party offerings that promise to keep track of things for you (Integris, Kogni, and Waterline all emphasized this feature at the Strata Data San Jose conference this month).

The item that we keep coming back to however is the “right to explanation”.

In the US we’re not strangers to this requirement if you’re in one of the regulated industries like lending or insurance where you already bear this burden.  US regulators have been pretty specific that this means restricting machine learning to techniques to the simplest regression and decision trees that have the characteristic of being easy to understand and are therefore judged to be ‘interpretable’.

Recently University of Washington professor and leading AI researcher Pedro Domingos created more than a little controversy when he tweeted “Starting May 25, the European Union will require algorithms to explain their output, making deep learning illegal”.  Can this be true?

 

Trading Accuracy for Interpretability Adds Cost

The most significant issue is that restricting ourselves to basic GLM and decision trees directly trades accuracy for interpretability.  As we all know, very small changes in model accuracy can leverage into much larger increases in the success of different types of campaigns and decision criteria.  Intentionally forgoing the benefit of that incremental accuracy imposes a cost on all of society.

We just barely ducked the bullet of ‘disparate impact’ that was to have been a cornerstone of new regulation proposed by the last administration.  Fortunately those proposed regs were abandoned as profoundly unscientific.

Still the costs of dumbing down analytics keep coming.  Basel II and Dodd-Frank place a great deal of emphasis on financial institutions constantly evaluating and adjusting their capital requirements for risk of all sorts. 

This has become so important that larger institutions have had to establish independent Model Validation Groups (MVGs) separate from their operational predictive analytics operation whose sole role is to constantly challenge whether the models in use are consistent with regulations.  That’s a significant cost of compliance.

 

The Paradox:  Increasing Interpretability Can Reduce Individual Opportunity

Here’s the real paradox.  As we use less accurate techniques to model, that inaccuracy actually excludes some individuals who would have been eligible for credit, insurance, a loan, or other regulated item, and includes some other individuals whose risk should have invalidated them for selection.  This last increases the rate of bad debt or other costs of bad decisions that gets reflected in everyone’s rates.

At the beginning of 2017, Equifax, the credit rating company quantified this opportunity/cost imbalance.  Comparing the mandated simple models to modern deep learning techniques they reexamined the last 72 months of their data and decisions.

Peter Maynard, Senior Vice President of Global Analytics at Equifax says the experiment improved model accuracy 15% and reduced manual data science time by 20%. 

 

The ‘Right to Explanation’ is Really No Benefit to the Individual

Regulators apparently think that rejected consumers should be consoled by this proof that the decision was fair and objective.

However, if you think through to the next step, what is the individual’s recourse?  The factors in any model are objective, not subjective.  It’s your credit rating, your income, your driving history, all facts that you cannot change immediately in order to qualify.

So the individual who has exercised this right gained nothing in terms of immediate access to the product they desired, and quite possibly lost out on qualifying had a more accurate modeling technique been used.

Peter Maynard of Equifax goes on to say that after reviewing the last two years of data in light of the new model they found many declined loans that could have been made safely. 

 

Are We Stuck With the Simplest Models?

Data scientists in regulated industries have been working this issue hard.  There are some specialized regression techniques like Penalized Regression, Generalized Additive Models, and Quantile Regression all of which yield somewhat better and still interpretable results.

This last summer, a relatively new technique called RuleFit Ensemble Models was gaining prominence and also promised improvement.

Those same data scientists have also been clever about using black box deep neural nets first to model the data, achieving the most accurate models, and then using those scores and insights to refine and train simpler techniques.

Finally, that same Equifax study quoted above also resulted in a proprietary technique to make deep neural nets explainable.  Apparently Equifax has persuaded some of their regulators to accept this new technique but are so far keeping it to themselves.  Perhaps they’ll share.

 

The GDPR “Right to Explanation” Loophole

Thanks to an excellent blog by Sandra Wachter who is an EU Lawyer and Research Fellow at Oxford we discover that the “right to explanation” may not be all that it seems.

It seems that a legal interpretation of “right to explanation” is not the same as in the US.  In fact, per Wachter’s blog, “the GDPR is likely to only grant individuals information about the existence of automated decision-making and about “system functionality”, but no explanation about the rationale of a decision.”

Wachter goes on to point out that “right to explanation” was written into a section called Recital 71 which is important because that section is meant as guidance but carries no legal basis to establish stand-alone rights.  Wachter observes that this placement appears intentional indicating that legislators did not want to make this a right on the same level as other elements of the GDPR.

 

Are We Off the Hook for Individual Level Explanations?

At least in the EU, the legal reading of “right to explanation” seems to give us a clear pass.  Will that hold?  That’s completely up to the discretion of the EU legislators, but at least for now, as written, “right to explanation” should not be a major barrier to operations in the post GDPR world.

 

 

About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.  He can be reached at:

[email protected]

 

Views: 1718

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on April 16, 2018 at 6:23pm

Hi Bob, 

Interesting comment. Assuming that some of the profiling can't be done anymore, do you think that marketers will replace these metrics (race, gender, age etc.) by proxy metrics? For instance, recruiters and landlords routinely reject some candidates based on unfair (and illegal) criteria, but they don't say anything about it except that they found a "better fit." Could you see this happening in the data industry? And how do you stop this practice? You could say the same thing about insurance pricing, which is also heavily data-driven, and penalize people just based on their zip-code for instance. 

Another example is in stock trading. You can not use insider information, but those who have access to this insider data use friends of friends to do the investments in question. My opinion is that it is not an easy problem to fix these issues. At best, GDPR is poorly drafted. On the plus side, it forces many marketers to "clean their stuff."

Vincent

Comment by Bob Vanderheyden on April 16, 2018 at 1:38pm

"Regulators apparently think that rejected consumers should be consoled by this proof that the decision was fair and objective."

I'm not sure that I buy this premise. Regulation is in place because attributes like race, gender, religion and culture (or their proxies) were being included in models to exclude people, based on those attributes. This amoral behavior is the reason for the regulations. Our industry (data science) only has itself to blame for these requirements. 

I can't wait to see that the fallout is from the Facebook/Cambridge Analytica scandal.

Comment by Vincent Granville on March 29, 2018 at 6:53am

I read recently a provocative question about whether GDPR would make machine learning illegal (in Europe.) It won't, but it is interesting to see people asking the question. 

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service