Imagine if Donald Trump (or some blowhard like him, ex the misogyny) were ranting to apprentices about the current state of data analytics in “real world” corporate America.
Below are 3 Myths, Realities and Truths he might say (if he knew what he was talking about), as counterpoints to some of the (known/acknowledged) hype:
MYTH #1: Data analytics is transforming business decision-making
POPULAR SUPPORTING RATIONAL: The rise of (distributing) computing power and (telecommunications) bandwidth enables senior executives to make business decisions on a broad range of processes based on complex/sophisticated analyses of empirical evidence.
REALITY: Data in most organizations are still used to justify/support business decisions – not make them. In-depth statistical analyses for decision-making are mostly limited to specialized areas/industries: fraud detection, financial pricing and risk modeling, transportation telematics, customer defection/attrition analytics and cross-selling, computer network security, drug trials and medical outcome research, energy consumption and sustainability informatics. Granted, executive decision-makers would love to know what their customers want and are willing to pay for, what their suppliers are willing to produce and the lowest price they would accept, and what their competitors are up to – but valid hard data are incredibly difficult to come by. Most business decisions are still based on subjective judgment, experiential bias, organizational politics and self-interest. What hard data that are available are often presented to executives in very high-level/abstracted/summarized states. Even basic/simple regression forecasting is becoming problematic, as organizations continually change their mix of business, rendering extrapolations from recent history invalid/irrelevant.
TRUTH: Data analytics is as much art as science, and still in its very early stages of evolution. It is a CREATIVE discipline, in which practitioners experiment and explore innovative ways to design variables and produce valid models that can be used to generate knowledge, insights and VALUE.
MYTH #2: Analytical processes are mature/scalable in most large organizations
POPULAR SUPPORTING RATIONAL: everyday, organizations collect vast amounts (unprecedented volume/velocity/variety) of raw data, which analysts explore using scalable, mature, reproducible or productionized processes in order to unlock actionable insights
REALITY: The processes by which data are captured/wrangled/cleaned/blended for deep analytical purposes are mostly ad-hoc. Moreover, most of the large volume of data that are captured (e-mails, text messages, web logs) are useless for advancing strategic goals (generating insights on how to optimize performance or be more competitive, productive, effective, efficient, etc.). Even today, most data are collected for the benefit of tactical/narrow/specific purposes: recording transactions or revenues, generating financial statements, calculating labor or inventory carrying costs, preparing tax forms, producing regulatory-mandated forms, generating reports to help managers track relative contributions of profits or actual versus budgeted expense trends, complying with record retention or forensic accounting requirements, etc. Data are still captured in different system-specific structures and record/file types, making it very difficult to aggregate and compare on a comparable basis. Most data are in (what used to be called unstructured, now called noSQL or less-structured) form, and thus are very difficult for most systems to process/make use of. Most data marts/stores/warehouses were created pre-supposing what data elements are critical and which are not, which are often stored in abstracted and/or summarized form.
TRUTH: Data scientists must explore/investigate global sources to discover/access useful/valid data, and design ad-hoc queries and scripts to retrieve data from the original/underlying target/systems of record.
MYTH #3: Key barriers to user adoption of data-driven/fact-based decision-making are technical and financial
POPULAR SUPPORTING RATIONAL: the main barriers to broader user adoption of advanced data analytics are the cost and difficulty of designing such tools and implementing the supporting infrastructure.
REALITY: The technical design (especially wrangling the data) and implementation challenges can be formidable, but organizational politics can be the real killer. In many enterprise-level transformational initiatives, the data scientist is often working AGAINST the vested interests of operational slios and processes. Data science enables empirical, objective, and transparent decision-making – and transparency can be dangerous. Examples include: the relationship of mission-critical job responsibilities with compensation levels across different departments/divisions within a corporation; the relative efficiency or performance of a company or business units against peer groups; hiring decisions based on similarity in race, ethnicity, age range and/or sex of hiring manager relative to candidate; comparative executive compensation as a percent of average employee wage relative to competitors and/or firms in other industries.
TRUTH: Data science is a revolutionary/disruptive discipline which requires its practitioners to have strong social/personal, communication/presentation, and diplomatic/political – as well as technical – skills. The real trick is figuring out what makes sense to analyze, based on risk/reward trade-off in quality/availability of data and/or internal validity of the variables, based on a solid understanding of cognitive behavior and/or predictive patterns.