Home » Uncategorized

The Best Kept Secret in Data Science

Big Data is a big pain. Humans are generating data five times faster than the rate of economic growth. Each day we create 2.5 Exabytes of new data, ninety percent of which is unstructured. That’s roughly half a gigabyte of new data each day per person


Big Data is a big pain. Humans are generating data five times faster than the rate of economic growth. Each day we create 2.5 Exabytes of new data, ninety percent of which is unstructured. That’s roughly half a gigabyte of new data each day per person as we surf the web, play music or videos, text friends, open emails, buy stuff, take photos, make phone calls, travel, visit the doctor or go about our lives. Snails leave a residue in their wake. We leave data.

Collecting, assessing, sorting, filtering, segmenting and storing all that data is a pain; not to mention an incredible analytics challenge. And yet data scientists in every sector are madly tying to get their arms around this data gusher, find relevant patterns and discern market-changing insights. They are simultaneously empowered and encumbered by the array of deployed technologies, most of which do one or two things well but, cannot handle the volume or make the underlying connections between data elements.


DPMs store media and third party data. CRM platforms collect and store psycho-demographic, marketing and purchase data. Marketing cloud offerings use data to deploy campaigns. DSPs manage media targeting and buys. Business Intelligence tools extract data from storage systems but don’t work across platforms. Data lakes, data marts and data warehouses store enormous volumes of information one field at a time. And even though many of these tools connect to each other, none offer a single source of robust, validated information tied to each individual.

Turning Big Data into big value faces three distinct challenges. First is getting a grip on whom to address. Assembling data from multiple sources, in many formats, into a single “golden record” representing everything we know about an individual or a business is difficult but necessary. The stickiest part is resolving identities. We have to connect disparate online, email, social and mobile accounts accessed with different handles and be sure they all correspond to the same person.


Second, finding patterns and extruding actionable and competitive intelligence from mountains of data requires extensive cleansing and preparation and a series of discrete analytical skills. In most cases, 80 percent of analytic time is spent preparing the data with just 20 percent left for analysis.

Third, storing, maintaining and updating data in formats that can be transported, imported, ingested, used or reused in multiple existing and future systems or structures requires agile formatting and forward thinking to hedge against evolving technology. Too much data is trapped in legacy systems and outdated programming languages unavailable for analysis. 


The answer to these challenges is an ingenious approach to data analytics which transforms, pre-sorts, pre-codes and pre-fabricates raw data into signals. Signals turn massive data sets into manageable, relevant pieces that expedite analysis. Signals are useful information about events, customers, systems and interactions. They describe behaviors, events and attributes, and they can predict future outcomes.

A typical business can be profiled with 3000 baseline signals, which are essentially the family jewels of a business. Signals are combinations of raw data. Often they mirror patterns and dynamics in a business or an industry. Once signals are created they can be reused or recombined like LEGO building blocks. Signals can be constructed, connected and reconfigured easily and quickly. They eliminate the need to go back to or process raw data every time a new use case comes up.


Signals come in two flavors. Descriptive signals explain who (White collar man 35+ living in Cleveland) what (bought a business class ticket for a window seat to LA credited to his airline loyalty account) when (Thursday March 30 at 1:45pm GMT) and how (using MasterCard #### with a credit limit of $15K). Signals document interest, intent, preferences and actions that can be used to trigger interactions.

Prescriptive signals represent the application of machine learning algorithms to basic signals. They are propensity or model scores applied to individual records, which tell the likelihood of a given event or transaction happening in the future. They express preferences, identify look-alikes and represent experience scores, price sensitivity and metrics predicting sustained loyalty or early defection. They are the building blocks for predictive analytics.


Signals create an intermediary stage in Big Data technology because they sit above the raw data stored in marts, warehouses and lakes and below the cloud-based marketing automation systems, CRM platforms, BI tools and other operational structures that create and deploy campaigns. This new middle space in the technology stack is the Signal Layer.

The Signal Layer is the secret sauce of savvy data scientists. The repository of thousands of Signals and billions of raw data components, it automatically updates all the data elements and stores the metadata to make reuse and recalculation easy. The Signal Layer becomes the central nervous system for marketers by drawing basic data from storage, turning it into intelligence, powering specific tasks and use cases and ultimately feeding or directing the systems used to engage and interact with consumers.


Rather than compile mailing lists, develop basic segmentation and deploy one-size-fits-some campaigns, savvy marketers are taking advantage of the signals consumers continuously create. They aim to gain competitive advantage by practicing data-activated marketing based on individual’s real-time needs experiences, preferences, interests and behaviors using signals to power use cases stored, refreshed and analyzed in the emerging Signal Layer.

Leave a Reply

Your email address will not be published. Required fields are marked *