In Part 1, we introduced a pending EU privacy and data protection regulation (the GDPR) which will carry fines for violations of up to 5% of global annual turnover (1 million Euros for smaller companies). We discussed how this regulation will present particular challenges for collection, storage, and use of data within EU and global organizations. Impact will be felt by data scientists in particular but also across the IT organization. In this post, we focus on the impact on data science and analytic applications and suggest steps to take in the immediate to near future to prevent fines and/or crippling data blackouts.
The GDPR emphasizes the individual’s rights to understand and control how their data are used. The impact of the GDPR for data scientists includes:
1. Ability to collect data. There will be an increased legislation of principles of Privacy by Design/Privacy by Default, which minimize the baseline collection level of data thru systems and processes (think, for example, of browser default settings). Individuals will need to give express consent for what data are collected and will need to be informed as to why the data are being collected.
2. Ability to use data. It will become necessary to get express consent for each application of personal data. (Details here are still under debate, and there will likely be certain exceptions). This could severely impact the ability of data scientists to find new applications for existing data, as those applications will not have been listed in original consent forms. What’s important to note is that there will likely be a grandfathering of current consent. Thus, it is extremely important to assure that proper consent is in place now.
3. Ability to transfer data to and from third parties. Stiff regulatory fines will certainly produce an environment where corporations are very reluctant to buy, sell or share data that may be personal. In addition, right to privacy/erasure regulation may have strong implications on data sharing (details are still under discussion in the EU parliament). As a result, expect a drying up of certain data sources.
4. Customer Profiling will be specifically affected by the new regulations. In particular, the customer must be informed when and how data will be used to profile them with material impact (e.g. credit scoring, fraud detection, etc.). In addition, they must have the right to opt-out of automatic profiling algorithms (which will produce additional bias that must be dealt with in the model calibration). Finally, and significantly, companies can be held in violation if their profiling algorithms are not sufficiently robust.
5. Requirements in storing data. There are some significant issues here.
6. Much heavier emphasis on privacy in your company. A few factors will be at play here.
The GDPR is so significant that corporations are already beginning to prepare for its implementation. Compliance involves steps that cannot be taking overnight, and the accountability clause will require a documented awareness of data assets and systems, most likely including some type of data audit and risk assessment.
I recommend beginning now with the following steps:
1. Audit your entire data ecosystem now, and determine how it may expose you to privacy violations. Start with the structured data in your BI systems. Look at the dark data in your operational systems. Look at yo
ur Big Data, including the web log data and any sensor data. Document what is there, to where it is replicated, who has access, and what controls are in place. Document what data are personal and what may be made personal through various data science techniques. You’ll most likely need to do this audit within the next year or two anyway, so it’s best to do this now and already introduce necessary changes to product roadmaps.
2. Ensure that user consent is properly implemented before the GDPR takes effect. The reason this is so key is that the current status of the GDPR allows user consent to be grandfathered in. Your ability to use any data that you have collected may be severely limited under the GDPR if you do not have proper user consent.
3. Ensure that all product roadmaps comply with the principles of Privacy by Design. If you aren’t already familiar with the concepts of privacy by design/privacy by default, then become familiar. Communicate with product owners so that products developed in the future maintain full functionality while still complying with the restrictions on data collection required by the GDPR. Design these products so that business critical data can be collected in a way that honors privacy laws while still enabling the business to be data driven to the fullest possible extent.
4. Initiate dialogue with your corporate privacy officer or external expert. The stakes have become quite high, and the subject matter is complex. There will need to be strong 2-way communication between legal and technical experts, and that communication should start very soon.