Data Science and Law are in a lot of ways vastly different; however they do have a few things in common: both professions rely heavily on historical data and patterns . Certain supervening phenomena can cause a break in these patterns; these distinct deviations from a learned practice (either by a machine learning algorithm or through pure human experience) can be problematic.
Those of us in Legal Modelling know that when building and maintaining models we are constantly looking behind our backs, as if trying to outrun the long arm of the law. Let me give a very basic example: Lets assume we've built a very basic but effective Model; Binary Target Class: Conviction, and Acquittal. This model is used by Criminal Defence Lawyers to inform strategy. Precision, Class Recall and Accuracy are impeccable, the ROC curves, R Squared etc. and other metrics measuring performance are immaculate. Let’s say the Model has been deployed and it has already been in use for six months and the results are mind blowing, client is happy, all is well.
The problem with law is, whenever it changes, the model ought to change along with it. What may have been permissible under certain circumstances may no longer be permissible, what gives rise to a particular set of outcomes may no longer be the case. Therein lies the danger of abruptly supervening legal Judgements that can render the performance of your Models well, laughable. A mere 12 page judgement can render vast amounts of data useless and the circumstances under which a specific outcome will materialize can change in a matter of hours.
From a purely Data Science perspective this means that at the outset, the introduction of this judgement into the data set means that technically it is a rare event and so your computations have to account for this. Essentially what you could have is a single, outcome defining event with a low incidence rate of 1 in 1000. This danger is not limited to classification problems alone, but to key Descriptive Statistics and Cluster Analyses as well, potentially swaying your results in the wrong direction.
Let’s complicate it even further, a judgement is handed down that has adverse effects on the performance of your model, however this judgement may not be binding on another jurisdiction, it may only be persuasive. Essentially what you have is a model that will perform well in one jurisdiction, but be absolutely suicidal in another. Client then begins to wonder why the model performs better in certain cities than it does in others.
That is the prospective calamity that we grapple with, the cause of much angst even in the face of a Model with spine-tingling performance. In an attempt to assuage this anxiety, I suspect many in Legal Analytics tinker and tweak their Models, however no amount of scaling, outlier detection or Boosting Algorithms will protect the veracity of your model from a far-reaching judgement from the highest, or even some of the lower courts in the land.
Data Science practitioners in more generalized fields are also not immune to the potentially devastating effects of a 12 page Judgement. The operation of law is far-reaching, very little can remain unaffected by it. As a Data Scientist it would be prudent to understand how sudden changes in law or regulations can have a huge impact on the probative value of your models. For example, we recently completed a Data Mining Project for an Actuarial Firm that wants to extract key insights from data in Personal Injury matters. Insights from the data we extracted can be used to manage and quantify risk in the Insurance and Banking sectors. However Judgements that are prescriptive in the assessment of the future employability of a claimant after a motor vehicle accident will certainly affect models based on this data.
More generally, Judgements can also affect models that manage repudiation rates in the Insurance sector, models that asses risk in the awarding of home loans or credit and other models in the financial sector. If a Judge woke up on a sunny Wednesday morning and decided that all online customers of a specific product must first register with some overly bureaucratic regulatory body, you can rest assured that this will affect a customer churn model.
There are various contingency measures that you could develop to mitigate against this, personally I prefer short-hand score card predictors that can work as a stand-alone solution or be integrated into existing models. Of course legal judgements wont affect all models, however it would be a mistake not to at least consider any devastating effects they may have.