On the face of it, Analytics and Law are manifestly divergent fields of practice. One need only consider the nature of Algorithms that require numerical attributes for their calculations and the textual rigidity of substantive law to realize this. The very first obstacle one will encounter in applying Analytics to Law is the absence of calculable numerical variables in raw legal data. No judicial precedent, statute or common law principle has ever been reduced to a mathematically sound numerical expression; raw legal data is simply not Analytics-receptive.
There are however some methods of mining raw legal data, like powerful Text Analytics that make it possible to build reasonably accurate classification, sentiment analysis and many other models. There are also methods like Discretization (e.g. Nominal to Numerical) in Neural Networks for example that try and facilitate this kind of machine learning. There are a few more techniques that are available but in my humble opinion not worth mentioning, precisely because the results that they yield in their application to raw legal data are catastrophic.
Law is incredibly nuanced and has with it intricacies peculiar to it alone, you need to be able to factor in those intricacies as mathematically adept numericals to assist accurate machine learning. To simply apply text analytics alone or simplistic variations of machine learning is overly facile.
We have seen the use of purely aesthetic numerals as legal data sets for Algorithmic processes. By aesthetic I mean a completely superficial value, say 78, to denote something like contractual breach in a predictive model. The results were a statistical calamity to say the least; it had solved the numerical anomaly at a surface level, but not at a completely authentic one. With this sort of weighting, something like contractual breach is simply expressed as “x”, however “x” has to have a legitimate value in that it has to be calculable mathematically and you have to be able to solve for it. Therefore other variables have to be factored into a calculation that ultimately has breach or “x” as a result. This sort of superficial weighting does not do that.This kind of representative weighting has an adverse effect on the statistical integrity of the algorithm and\or machine learner. Legal data architecture necessitates a collaborative approach between this sort of representative weighting and actual math-based weighting.
Before even purporting to mine legal data competently the numerical anomaly has to be reconciled, legal analytics is simply inconceivable without this reconciliation. This is extremely hard, which is probably why weighting systems in legal technology are non-existent. We spent a year developing a math-based metrics system for law, the system facilitates the conversion of raw legal data (for example a section in an act) into numerical values for the purposes of algorithm-based machine learning. Amongst other things a system such as this one required calculable and inter-dependent variables, stratified weighting schemes, proportionality and most importantly numerical values that are mathematically apportioned according to their peculiar legal implications. What has resulted is a metric system that can numerically quantify legal permutations not only at a purely epidermal level, but at a systemic one as well. For the first time a common law principle like the duty to disclose material facts in insurance contracts, can be mathematically quantified to a value of say “3.22”for the purposes of an algorithmic process. Once this has been done, the values can be used for data pre-processing, architecture and analysis.
At first we intended to use the metrics system for internal purposes only, however we have since resolved to provide these metric conversion systems to other Legal Technology firms and Analytics firms in general. We feel that the data science field needs alternative means of analyzing raw legal data in the most statistically robust manner. Without a capable metrics system in legal analytics it is impossible to unearth very subtle insights, any legal data mining without a system such as this one is unfortunately very finite. Decision science and diagnostics, advanced predictive modeling, algorithmic trial simulations, pattern recognition and automated policy enforcement are some of the advanced areas of our practice that simply would not be possible without a math-based metric system for legal data. Many Data science ventures into law fail before they even begin because there does not exist a numeric conversion system for their legal data. For lawyers, Data Science is a numbers game they almost lost, fortunately though they are now beginning to win it.