Often in conversations with clients I’m asked the question “where do we begin?”

The costs of measurement, storage, and computation have collapsed. Now organizations are faced with the often daunting challenge of uncovering value from their data assets. How do we put those data assets to work? In nearly every industry, rapid experimentation is no longer a competitive advantage, but rather a market imperative. Our customers, investors, and partners expect us all to experiment and learn continuously, leveraging insights derived from data throughout that process.

credit:dreamstime

Thankfully leveraging your data in meaningful and creative ways can be simultaneously fun and rewarding in terms of business efficiencies. Our goal is to derive insights as efficiently as possible. Bringing needless additional complexity into the process only slows our ability to test our hypotheses incrementally as we move forward.


credit:fineartamerica.com

Too often in our daily encounters at The Data Guild we see people thinking about this in terms of their data assets: “What could I do with what I have?” That’s a perfectly rational approach to the problem. We hear things like:

  • “Wow, look at everything we can get from Google Analytics!”
  • “We have over X years of historical [patient records/sensor data/financial records/etc.]; let’s see what we can learn!”
  • “Let’s buy the [such-and-such service] that will tell us everything we need to know about [blah]”

While open-ended exploration of data shouldn’t be blindly dismissed, it can also result in a “random walk to nowhere”. As the scale of one’s data increases, it becomes increasingly difficult or impossible to comprehend the entire landscape.

On the other extreme, too narrow a focus without some peripheral vision, can limit the potential of your investigation. Sometimes you can miss the most important signals, even when they’re right in front of you.


credit: Alamo Helicopters

So, how to proceed?

First a caveat: the expansiveness of the landscape is a challenge for all, those in products AND services.

On the product side, those that claim single click solutions to the challenges above are oversimplifying the problem. Even where common problems define a well-worn path towards a solution, the result of products that claim to address these issues in such a matter are extremely likely to leave you asking: “So what?”

In Services the same applies. The opportunities afforded by a connected world with indefinable data depth have yet to be tallied. We’ve created an ocean of data, and no one yet has been to the bottom, and it is no longer feasible nor pragmatic to do so.

credit: @earthbeauties

The realistic alternative is to prioritize your goals and then work the signals that can help point you in the right direction, while maintaining a good understanding of the limitations of your findings.

This is contrary to a “turn-key product” solution, since it does not alleviate the need to to think hard and long at every step of the way. We’ll expand on those steps in a moment.

Strategies

That said, there are strategies to employ that will increase your efficiency and effectiveness at uncovering opportunities. The following are some principles we bring to bear in the process of data product design at The Data Guild:

1. Forget about data (for a bit) What is your strategic vision to address your market? Where are the opportunities given global trends and drivers? Where can you carve out new directions based on data assets? What is your secret sauce? What do you personally do on an everyday basis to support that vision? What are your activities? What decisions do you make as a part of those activities? Finally what data do you use to support these decisions?

These questions, considered in a hierarchical structure, form the basis for any Data Guild engagement and convert the problem from a needle/haystack problem to an experimental setting within which we can generate actionable signal.

If answered honestly and thoughtfully, the outcome from your data adventure might be disruptive change in what you and your organization do, day-in and day-out.

It should be noted that the above approach only attempts to optimize existing processes. While this may generate wins, it might result in incrementalism. In this sense, the activity described here is best undertaken with an open mind. How can what sits in each of these boxes be changed, cancelled or defined anew based on a scientific approach to your data?

2. Define your hypothesis. The term “data science” is oft overused to mean “data magic”.

The real meaning of the term is (or ought to be) the formal approach to separating the wheat from the chaff in the world. Generating a hypothesis that can be worked systematically is half of the challenge in generating meaningful signal from data.

Despite the buzz around data science and big data, the truth is: we can't find planes when they drop out of the sky; we’re terrible at diagnosing and treating chronic disease; we can't seem to pass descent environmental protection even when we have the data.

The tech industry is notoriously bad. Generous profit margins are partly to blame (“we can afford to be wrong/take risks”), but also the human variance and anonymity in human-computer interaction make meaningful conclusions difficult.

That said, ask yourself specifically: what do I want to test? And more importantly: what would I do differently if I knew?

A partial answer is to work within the bounds of what’s known and actionable. Consider for example you’d like to “to improve the usability of my product”. By rephrasing the problem to “I’d like to reconfigure the controls of my user interface to improve task times” you’ve pivoted from an open-ended problem with subjective measures to a task with objective measurement.

3. Expect surprises. There is a commonly held belief that by narrowing your focus to a specific outcome, you limit what you can learn from your data. While working without any focus is unlikely to get you far, many great things have come from a surprise en route to another destination. Without inventors who were opened to surprise, the world would be without microwaves and post-it notes, not to mention play-doh, slinkies and silly putty--all accidental discoveries.

4. Always be skeptical. An unavoidable part of being human is that we tend to draw conclusions prematurely. The approach has been an evolutionary skill, enabling us to quickly make concise decisions in a world of uncertainty. However, this benefit can be a deficit: our brains were programmed to reach closure based on information at hand, which can lead to false security in apparent causation and consensus. In working with data, we attempt to build models that explain the complexity of the systems that lie beneath, but models are just replicas.

Consider the model airplanes you might have built as a kid. Did you expect them to fly exactly like the thing they replicate? Of course not. Rather, they were meant to give you a glimpse into the form of the real thing. At a much smaller scale than the real thing, they had the advantage that you could put your eyes close to them andpretend they were the real. The key word here is “pretend”. The complexity *not* represented by the model is exactly the thing that will ruin you.

credit: xjet/youtube

In the end, being confident about what you don’t know is as important as confidence in your conclusions. The fragility of models is unavoidable. The important thing is know what you can and cannot do with them.

5. Iterate! A a mind-blowing result is a falsehood unless repeatable. Don’t stop your experiment at a single pass. Rinse and repeat with all of the shampoo (time) you can afford.

Modify and improve your experiment as you go (taking note of how you lose comparability) and seek to generalize. This can be tough in a world of limited resources, but being scrappy is not being sloppy.

credit: simplyrecipes.com

Find others who are working in your space and compare notes. Find public data sources to cross-validate or exogenous data sources that can increase the dimensionality of your data. Work the problem until the result is worthy of your good name. If you’ve followed 1-4 above then at this point you can feel confident about what you’ve learned (or more likely, have much better questions than when you started. Don’t back down until you are satisfied with the quality. This is what separates the lions from the lambs. The last 10% of terrain travelled in your investigation is 90% of the work, since you can’t know what you’ve discovered until you’ve put in the time.

In Summary…

Too often well-intentioned people trip up on the points above. The challenges to the top-down approach are many. The landscape is littered with abandoned “key performance indicators”, executive dashboards, balance scorecards and wasted hours in monthly review meetings to review numbers that just don’t pass the “so what” test.

Consider your mission, be conscientious of your decisions, test your assumptions and question your results. If you do, you’re bound to be successful in time.