Data Science and EU Privacy Regulations: A Storm on the Horizon

Guest blog by David Stephenson

The European Union is a few short months away from finalizing a sweeping regulation that will dramatically change the way in which data can be handled and in which data science can be utilized.  This new regulation will affect all corporations using data from EU citizens, not just those with offices in the EU.  Those collecting data from more than 5k EU citizens per year will be consider accountable, regardless of company location.  The EU parliament is so serious about compliance with these new privacy and data protection laws that it has proposed a fine for violations of up to 5% of global annual turnover (1 million Euros for smaller companies).  Needless to say, this massive fine has attracted serious attention to the regulation.  Companies have already started preparations to comply.

Legal Background

Personal privacy and data protection are currently legislated and enforced in the EU through a patchwork of individual member state laws and independent supervisors.  The current lack of a single privacy framework complicates compliance and data transfer for multi-national corporations, while also preventing EU supervisors from addressing privacy violations in a unified manner.  More to the point, overly aggressive data driven business models, ineffective lobby strategies and underinvestment in data protection have resulted in a market failure argument stimulating what is a stepwise regulatory change.  This change will be provided by the General Data Protection Regulation (GDPR).


The GDPR will become the law of the land across the EU, replacing for the most part the current member state regulations.  Three years in development, final ratification is due this year, during the current Luxembourg presidency of the EU, or as a worst timeline case – during the Dutch Presidency (January-June 2016). Enforcement will occur within a two-year window following ratification, implemented via a One Stop Shop approach to supervision (the member state where the corporation is headquartered will supervise).


The Police and Judicial Cooperation Data Protection Directive (PJCD) will be released simultaneously and will address use of data by law enforcement agencies.

Relevance for Data Scientists

Potential Conflict of Goals:  The upcoming privacy regulations will be especially challenging for data scientists as it will push data use in precisely the opposite direction to where many data scientists are tending to push. 

Ideally, both data scientists and privacy advocates are pursuing the best interests of the individual.  They have, however, different goals in their methodologies.  Data Science has the goal of acquiring new data and finding new uses for existing data.  While privacy advocates strive to minimize data collection, data scientists strive to maximize it.  While privacy advocates strive to decrease unexpected uses of data, data scientists strive to increase them.   Compliance with the GDPR will require very careful alignment and coordination of these goals in a way in which the individual is benefited from both a privacy/data protection as well as from an economic perspective.


Generating Private Data:  We are becoming increasingly aware of the ways in which the analytic techniques of data scientists are able to draw unanticipated insights from what was thought to be innocuous data.  Projects have been carried out which, for example, link sensitive but anonymized data to specific individuals, reveal the gender and/or ethnicity of individuals based on Facebook likes, retrieve personal records of individuals based on a snapshot taken on the street, fingerprint cell phones based on cell tower check-ins, etc.  


In a previous post, I wrote about how Netflix had legal problems when they didn’t realize how data science techniques could de-anonymize legally protected data released during the Netflix Prize.   The state of Massachusetts had a similar problem in 2002 when health care records of public employees were released as anonymous and later partially de-anonymized.


So we see how personal data may be volunteered, observed or inferred.  Although the majority of press in the last few years has focused on concerns over data observation (e.g. cookie legislation, audio/video surveillance, RFID etc.), regulators are shifting their attention to the realms of Big Data, Smart Sensors, and advanced analytics.


Thus, advancements in Data Science have and will continue to expand the definition of Personally Identifiable Information (PII).  These advancements will undoubtedly influence privacy legislation in the future.


Working with Data:

Our increased usage of cutting-edge data storage and analytic technologies put us even more at risk of violating privacy concerns.  Modern data technologies, including an abundance of noSQL technologies, on-demand cloud storage, and in-memory processing, are encouraging data scientists and corporations in general to produce massive stores of raw data (data lakes).    This storage raises the following challenges from a privacy compliance perspective:


  1. Data awareness:  Companies lose oversight of what data is stored, where it is replicated, and what the risks and privacy implications of that data may be.


  2. Governance: Raw data may be flowing into the systems of pilot programs without mature governance models.  In addition, there is concern over the security features of many cloud storage systems.


  3. Control:  As raw data with unknown potential is retrieved, stored, copied and distributed, companies may find themselves in a position where they have lost oversight of where data has flowed and have lost the ability to implement right to be forgotten/right to erasure.


In Part 2, we will discuss

  • How pending EU privacy regulation will have a direct impact on general data collection and use and specifically on data analysis and data science

  • Steps that should be taken across the organization today


Views: 1722


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by David Stephenson on October 9, 2015 at 8:38am

That's a good question.  The problem is that companies looking to expand into Europe (and we're seeing a lot of them these days) will need to have a clean record with the EC, and it's not a matter of starting to comply right before beginning expansion.

Comment by Vincent Granville on October 9, 2015 at 8:07am

How could EC collect money from companies that have no presence in Europe? It is unenforceable for all but a few large corporations with a presence in Europe. For the rest of us, you can just simply ignore these regulations, just like laws enacted by the Chinese government only apply to Chinese residents. Indeed, regarding the "cookie warning message" that is supposed (by EC laws) to be displayed on all websites since October 1st, I haven't seen a single instance of a non-European website implementing it. And for companies that can't monetize web traffic from Europe (a majority of us indeed), the solution might just be to block Europe, preventing Europeans from signing up on your website or newsletter, and even preventing them from viewing your website..Maybe you could have an item in your TOS (terms of services) saying that people considered as European residents are prohibited from using your website in any capacity. Hopefully, you don't have (yet) to write it in 15 different languages (Portuguese, Dutch, French, Italian, Swedish, etc.) and offer an alternative (sound) for people who are blind or can't read.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service