Subscribe to DSC Newsletter

Math: My Data Science Stimulus Package and its Guerrilla Analytics


Sometimes I don’t trust Data Science, probably because my duty of care is more pronounced on account of working mostly in Legal Analytics. You see as an Analytics Practitioner in the Legal field my Data Science methodology cannot afford to yield wild guesses, these are people’s lives I’m dealing with. You have to be very careful with Legal Automation, if you build a Classification model for a State Prosecutor and you miss something, even a very small thing, the results will be cataclysmic. For Analytic practitioners in Law it’s not just about finding subtle or abstract insights that boost efficiency, you are venturing into the inner most sanctum of human life and its consequences.  This is not a whimsical Analytic project that some companies venture into because of all the hype around Data Analytics. You know the type, not really knowing what they want out of Analytics but hoping that they’ll know it when they find it in that elusive golden nugget their vast data holds?


At one of the major Banks the minimum desired accuracy of a Classification Model is 85%. That is the standard for them and many Data Scientists as well. That level of accuracy is terrific in every other field, except Law. In my line of work it would be a mistake to take a gamble on a model that has a 15% probabilistic margin for error; its probative value is simply inadequate.   


The statistical volatility of Data Science sometimes requires other instruments to supplement an Analytic process, for me that supplement is Math. Mathematics is a necessity in our Analytics practice, rather than a peripheral and elective tool, which is unheard of for Lawyers but it’s true. I simply cannot rely on Computational Algorithms alone; if I did it would diminish the veracity of the results of my Analytic projects. This led me to develop a series of fairly elaborate Equations and Formulas that we solve before and after the Analytic process. These math functions have enabled a breed of “Guerrilla Analytics” that have become a staple for us. They are applied to a clients’ data and the results of those calculations are the values that make up a typical Data set for us. So while most Analytic practitioners will clean, architect structured and unstructured data then ultimately model it while keeping the data values mostly as is, we employ a different approach. By the time we are done with our ETL process, the Data will be unrecognisable; this is because the Equations we use transmute the Data into a form that only our math functions will recognize and can rationalize.  For example if we take a Data Set of quarterly revenue, and a particular entry is $40000, after our math calculations it will no longer be $40000, but something like “(P+) 7.33”, and that is not meant to denote its “weight” in Data Science terms either. If it is raw Legal Data, a particular averment in one of our clients Pleadings could be illustrated as “(N-) 0.333”, an answer our formulas arrived at. This is a pain staking but worth while process and the calculations will a lot of the time be done by hand(I’m old school like that). Other times they will take the form of an equation in a Matrix, again by hand and then transformed into a computation thereafter.        


One area that has benefited remarkably from the math equations is our Trial Simulations. Simulating a Legal Trial using Algorithms is an enormously difficult and complex task, one that you simply cannot embark on competently using Traditional Data Science tools alone. Postponements, the introduction of new Evidence, uncovering new facts, cross examinations are all factors that can single-handedly derail any Analytic Model on any Data Science platform you can think of.  Surprises like this are just far beyond any Parameter adjustment or Machine Boosting technique. This is especially the case when practicing real-time Litigation Analytics in an actual Trial. If something happens unexpectedly, you need a short- hand technique to quantify those sorts of permutations right then and there, ergo, summarily deploying Analytics in the quickest way possible. Unfortunately in a situation like this there is no time for an ETL process, Data Cleansing or Architecture; this is Guerrilla Analytics and the math functions we’ve developed make it happen.


Now I know that I face imminent attack by Data Science purists when I say that sometimes I don’t trust Data Science on its own, but Machine Learners have their own peculiar biases and dispositions, some of which can jeopardize Legal Analytics. I would never use a Support Vector Machine alone if I’m building a Predictive Model for the Attorney General of a country which informs his decision to prosecute a citizen or not. In an instance such as this one, it would be absolutely criminal (maybe even literally) to pursue Data Science recklessly or without some sort of supplementary tool.


Data Science forms the very substratum of an Analytics Practitioners’ work, it’s what sets us apart from Statisticians or Mathematicians. However in some instances we cannot rely on it alone, we need to employ other measures to increase its definitiveness.   In any event I am sure many Data Scientists use math and other means to augment the potency of their Analytics, some not even scientific at all. It is undeniably prudent to do so where necessary, especially in fields that demand a higher standard of accuracy and care.   

Views: 1467


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

Follow Us

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service