Data Science: The numbers game Law almost lost.

On the face of it, Analytics and Law are manifestly divergent fields of practice. One need only consider the nature of Algorithms that require numerical attributes for their calculations and the textual rigidity of substantive law to realize this. The very first obstacle one will encounter in applying Analytics to Law is the absence of calculable numerical variables in raw legal data. No judicial precedent, statute or common law principle has ever been reduced to a mathematically sound numerical expression; raw legal data is simply not Analytics-receptive.

There are however some methods of mining raw legal data, like powerful Text Analytics that make it possible to build reasonably accurate classification, sentiment analysis and many other models. There are also methods like Discretization (e.g. Nominal to Numerical) in Neural Networks for example that try and facilitate this kind of machine learning. There are a few more techniques that are available but in my humble opinion not worth mentioning, precisely because the results that they yield in their application to raw legal data are catastrophic.

Law is incredibly nuanced and has with it intricacies peculiar to it alone, you need to be able to factor in those intricacies as mathematically adept numericals to assist accurate machine learning. To simply apply text analytics alone or simplistic variations of machine learning is overly facile.

We have seen the use of purely aesthetic numerals as legal data sets for Algorithmic processes. By aesthetic I mean a completely superficial value, say 78, to denote something like contractual breach in a predictive model. The results were a statistical calamity to say the least; it had solved the numerical anomaly at a surface level, but not at a completely authentic one. With this sort of weighting, something like contractual breach is simply expressed as “x”, however “x” has to have a legitimate value in that it has to be calculable mathematically and you have to be able to solve for it. Therefore other variables have to be factored into a calculation that ultimately has breach or “x” as a result. This sort of superficial weighting does not do that.This kind of representative weighting has an adverse effect on the statistical integrity of the algorithm and\or machine learner. Legal data architecture necessitates a collaborative approach between this sort of representative weighting and actual math-based weighting.

Before even purporting to mine legal data competently the numerical anomaly has to be reconciled, legal analytics is simply inconceivable without this reconciliation. This is extremely hard, which is probably why weighting systems in legal technology are non-existent. We spent a year developing a math-based metrics system for law, the system facilitates the conversion of raw legal data (for example a section in an act) into numerical values for the purposes of algorithm-based machine learning. Amongst other things a system such as this one required calculable and inter-dependent variables, stratified weighting schemes, proportionality and most importantly numerical values that are mathematically apportioned according to their peculiar legal implications. What has resulted is a metric system that can numerically quantify legal permutations not only at a purely epidermal level, but at a systemic one as well. For the first time a common law principle like the duty to disclose material facts in insurance contracts, can be mathematically quantified to a value of say “3.22”for the purposes of an algorithmic process. Once this has been done, the values can be used for data pre-processing, architecture and analysis.

At first we intended to use the metrics system for internal purposes only, however we have since resolved to provide these metric conversion systems to other Legal Technology firms and Analytics firms in general. We feel that the data science field needs alternative means of analyzing raw legal data in the most statistically robust manner. Without a capable metrics system in legal analytics it is impossible to unearth very subtle insights, any legal data mining without a system such as this one is unfortunately very finite. Decision science and diagnostics, advanced predictive modeling, algorithmic trial simulations, pattern recognition and automated policy enforcement are some of the advanced areas of our practice that simply would not be possible without a math-based metric system for legal data. Many Data science ventures into law fail before they even begin because there does not exist a numeric conversion system for their legal data. For lawyers, Data Science is a numbers game they almost lost, fortunately though they are now beginning to win it.

Views: 1402

Tags: predictive modeling


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on June 6, 2015 at 3:33pm

The following are relevant to this article:



Abstract available but not the full paper:

"A fuzzy case based reasoning system for the legal inference"


Comment by Sione Palu on June 6, 2015 at 3:25pm

Both symbolic & numeric computing system are applicable to law. Numeric is where big data comes in, where techniques as neural network or some form of regression & classification as mentioned in this article are inductive methods which can help in law. Traditional Expert systems is still widely adopted in law such as legal ontology but it is deductive (prior knowledge is required) rather than inductive which is the domain of numerical computing, the kind of big data or numerical predictive touched on this article.  There have been some journals that dedicated to researches in the legal domain being published in the last decade or so for Expert System in the legal domain by Elsevier & Springer but I think that most researches in this area have folded back in LNAI (Lecture notes in artificial intelligence) journals in recent years.

Comment by Lois Patterson on June 5, 2015 at 1:57pm

Note the first link is a student's thesis.

Comment by Lois Patterson on June 5, 2015 at 1:56pm

I mentioned on Twitter the work of the now-retired professor J. C. Smith at UBC Law School. I took a class from him.



I would like to see his work updated with modern techniques. 

Comment by Robert Klein on June 5, 2015 at 8:31am

This sounds amazing. I hope you'll post links when you can. It seems like quite a victory. Looks like you're done with the initial development, but we just rolled out an API that correlates themes in unstructured information streams. I don't know if it'll help with what you're working on now, but let us know on github if you have any questions or feedback. Best of luck to you. I'm interested to hear more. 

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service