The use and value of torture or enhanced interrogation techniques on captured enemy combatant is a much debated topic in the US political and media circles. The arguments for and against the use of torture has been discussed along three perspectives: moral, legal, and usefulness. Since I am neither an intelligence official nor a politician, it is inappropriate for me to weigh in on this topic. However, I just became a Data Scientist and in my one and a half year of training, I learned various torture techniques – I mean to say I learned various enhanced interrogation techniques that can be applied on data to derive useful intelligence.
Before I get into the details of Data Science, let me confess that the above quote is somewhat disturbing to me. Most Data Scientists want data to confess the truth not “anything.” You see, I learned the art of Data Science because I wanted to know how to decipher the chaos that is created by the deluge of data across all spectrum of life.
But here is the truth. The data will not tell you the truth if you just ask gently. It is not that the data is intentionally lying to you. The problem with data is that it just doesn’t know what it knows. The other problem is that each dataset much like an enemy combatant has its own personality. You may have a very large dataset (volume), a chatty dataset (velocity), a dataset with multiple personality disorder (variety), a low quality dataset (veracity), or a low value dataset.
The nature of the data described above and other dimensions such as the complexity of the analytic problem being solved affect Data Scientist’s approach to interrogation. But in the end, the motivation of Data Scientists is to develop big data solutions that will derive a wide range of insights and benefits, such as:
- Operational Optimization of Businesses: The big data solutions can help a business derive relevant performance metrics from the data in a timely manner. The business leaders can use these matrices and evaluate their business strategy.
- Actionable Intelligence to improve performance: Both Business Intelligence (BI) and Military Intelligence (MI) are excellent use cases for applying Data Science. The goal of BI is to gain insights into the workings of an enterprise to improve decision making. The same is true for MI where the goal is to understand the enemy.
- New Opportunities: Just think of some of the capabilities that were not on our radar just a few years ago. The targeted marketing (annoying ad pages that follow you) to recommender system (you may know xyz) implemented on many commercial websites are some of the examples of new opportunities.
- Predictions: From helping find potential sales leads from your existing data to predicting real customers from the window (website) shoppers are excellent use cases for Data Science work. The hidden value in your data may be hard for you to spot. Data Scientists thrive on these challenges.
- Fault and Fraud Detection: When a faulty sensor consistently reports slightly skewed observations, it is easy to miss or ignore the data as outliers. But Data Scientists look at all the data, including outliers. Finding fraudulent transactions in an imbalanced dataset such as credit card frauds require careful selection of statistical model as well as anomaly detection algorithms.
- Decision Making: Statistical analysis using A/B Testing, Visual Analysis using Heat Maps, Time Series Analysis, Network Analysis, or Spatial Data Analysis are all the techniques used for decision making. These techniques help make selection decision, help find patterns, and help businesses make investment decisions.
- Scientific Discoveries: Data Scientists across the world are taking advantage of human genome project as well as increased digitization of health records to find cure for many currently incurable diseases. From stopping terrorism on our currently livable planet to finding planets with life are some of the exciting research activities that are currently being supported by Data Scientists.
Data Scientists are interested in using their knowledge of data interrogation techniques to help solve problems by finding the truth. The institutions (Government, Businesses, Education, Research, etc..) hiring these professionals should provide scope and constraints in line with the problems they are trying to solve. They must take necessary caution against providing solution narratives that align to their self-serving interests. It is imperative that the Data Scientists are given freedom to apply their art and implement the right techniques for data to speak the truth and not confess to anything.