Subscribe to DSC Newsletter

Want to predict human behavior? Use these 6 lessons based on data from 10 million households

Guest post by . First published June 6, 2013, in Opower.

Over the past couple of years, data-driven soothsayers have made a splash with their prescient predictions, forecasting everything from politics to public safety.

Behold Nate Silver’s flawless state-by-state prophecy of the 2012 presidential election. Or Netflix’s increasingly robust guesses of which movies you’ll enjoy. Or New York City’s lifesaving predictive insightsabout how to stop fires before they start.

But in spite of these success stories, let’s face it: most predictions flop. In 2007, economists projected only a 3% chance that the US economy would slide into recession in 2008 – a year that would feature the biggest downturn since the 1920s. In 2009, scientists didn’t see the swine flu coming, thenincorrectly forecasted its trajectory once they did detect it. Failures of prediction are, unfortunately, the norm.

At Opower we’ve made it our business to challenge that norm. Our utility clients rely on us to nail our predictions. Utility companies have to carefully plan their energy efficiency budgets and programs often at least three years into the future; over that time frame, they need to know how much energy their customers will save in response to our home energy reports and other customer engagement initiatives. And that means we need to be able to predict human behavior.

Impossible, you say?

Anyone who’s ever had a temperamental friend or relative may be skeptical that human behavior is predictable. But upon evaluating a larger dataset, a different picture emerges: by using the right analytical tools and approaches, you can actually predict human behavior with exceptional accuracy.

We can’t give away all our secrets, but we can tell you that the following six principles have been central to our success in statistically predicting behavior across 30 US states and three continents. And although our work mainly focuses on energy savings, these lessons can also aid forecasting in other behavior-focused domains, such as the economy and elections.

1) Understand how humans actually  behave

Before you start forecasting human behavior, you first need to make sure you have a firm grasp on what makes people tick. And it may not always be what you’d expect.

For instance, money is not always a reliable way to motivate people. Case in point: several years ago, the American Association for Retired People asked a group of lawyers if they’d be willing to provide legal services to needy retirees, at a discounted rate of $30/hour. They refused. But when lawyers were asked if they’d help out for free, they overwhelmingly said yes.

Likewise, extensive research has shown that people often respond more strongly to so-called social norms (like altruism and peer pressure) than to market norms (like cash bonuses or discounts). For example, simply telling people how their energy usage compares to that of their neighbors prompts them to conserve energy — much more so than advertising the dollar savings or environmental benefits of lower usage. This is just one of many key pieces of behavioral science insights on which Opower runs.

There is no shortage of counter-intuitive, seemingly irrational psychological patterns that characterize us as human beings. So if you’re interested in predicting how people are going to behave in the future, then you’ve got to start mastering how their brains are wired now.

2) Know your field

Imagine you’re a successful TV weatherman. One day, you receive a phone call from the local airport, asking if you can help them construct a 3-day weather forecast to aid local pilots over the next few days.

Excited to assist but without knowing any better, you issue a forecast focusing on temperature, rain, and humidity. There’s a slight problem though. Because you are unfamiliar with aviators’ needs, you neglect to consider other critical factors like visibility, cloud height, and wind direction. Your lack of industry context translates into a (dangerously) inadequate forecast.

The importance of gathering practical context is similarly important when it comes to forecasting human behavior.  For example, when predicting the amount of energy that utility customers will save, one has to consider contextual factors that shape the energy industry, such as seasonal variations in consumption, regional differences in heating fuels, and the dynamics of the local electricity market.

Knowing your field will also make your forecast more useful to those who rely on it. When developing a quantitative energy-savings forecast for a utility company, it’s important to be well versed in the regulatory landscape of their industry. That’s because many utility companies face strict legal obligations to achieve a particular level of energy savings (e.g. a 2% reduction in regional energy demand each year), and are subject to large penalties if they miss the target. As a forecaster in this context, you’d want to incorporate this need for certainty into your statistical predictions.

3) Build a stellar dataset

In God we trust, all others bring data.

Predictions based on theoretical visions or gut feelings do not have a superb track record. In 1995, when the founder of the computer networking company 3Com presaged that the Internet would “go spectacularly supernova and in 1996 catastrophically collapse,” he was likely relying more on speculation than hard data.

To predict human behavior systematically, a high-quality experimental dataset is a must. It serves as a go-to resource for learning how people behave under different conditions.

Say you’re curious how much electricity people in the Northeast are likely to save this summer, in response to receiving personalized energy efficiency advice each month. A million kilowatt-hours? Two million? No need to guess here. Instead, you can accurately predict the answer by consulting a multi-year dataset that combines information on weather, household-level electricity consumption, and the historical impact of personalized advice on energy usage.

A rich, large dataset takes time to build and special expertise to maintain. But the benefits are commensurate. Opower’s ability to foretell the future of energy savings is deeply tied to the size and scope of our data storehouse –   consisting of 250 years’ worth of experimental results across 170 behavioral energy efficiency programs and more than 10 million utility customers.

Don’t delay. Start building a stellar dataset that will lay the groundwork for your predictions.

4) Use scientific methods

Put on your lab coat. Using data to make robust predictions is a scientific process.

As atoms and molecules are to physics, data is to statistics. And similar to what takes place in a scientific laboratory, the process of turning data into an effective predictive model is based on making hypotheses, rigorously testing them, and continually evaluating the results.

Choosing which scientific methods are best for making your predictions depends on the nature of your data. Since Opower’s analytics are built primarily upon a panel dataset (so called because it represents a virtual “panel” of many households and their respective behavioral patterns over time), our forecasting engine draws upon methods that are well-suited for panel data – like multivariate regression andmachine learning.

Say, however, that you’re an urban planner interested in predicting ridership on a city’s subway system in the year 2020. This kind of prediction is more likely to be based on “time series” data – which tracks the evolution of a single value (e.g. number of subway passengers) over time. In this case, you’re better off using methods that are specifically suited to time series data, such as scenario forecasting or autoregressive moving-average modeling.

If you want the nitty-gritty on scientific forecasting methods, there are some excellent online resourcesto get you started.

5) Distinguish Signal from Noise (Choose your variables wisely)

Predictive models are all about determining statistical relationships between variables, then quantifying the implications for the future.

The fundamental challenge in behavioral forecasting is that there are literally thousands of variables to choose from — all of which could theoretically be correlated with people’s decisions and actions: income, age, local weather conditions, political affiliation, commodity prices, family size, musical tastes, you name it.

But unless you’re a wizard, it’s not necessarily obvious which variables are the most meaningful predictors of a particular human behavior (e.g. saving energy), and which variables are less relevant. Effective forecasters are nimble at using scientific methods to identify and zero in on the most statistically significant variables – that is, to distinguish the “signal” from the “noise.”

Predictive behavioral models based firmly on signal are so powerful because they reflect the actual underlying forces that make humans tick. And that means that you can confidently take your model across borders and climates, populate it with values for a given context, and still expect it to make accurate predictions. That’s how Opower’s energy savings forecasts are consistently able to hit the mark – whether we’re looking at Hawaii or New York, electricity or gas, winter or summer.

Predictive models are ultimately judged by their accuracy, not by their number of inputs. A forecast model that rests on thousands of far-flung variables may sound impressive, but its misleading statistical relationships and systematically unreliable predictions won’t be. As Nate Silver has aptly noted, all the data available these days presents an endless supply of possible variables, but this doesn’t change the fact that there is “a relatively constant amount of objective truth.” A laser-like focus on variables that convey “signal” will help your predictive models get closer and closer to that truth.

6) Never stop improving

One reason that we’re so excited about our company’s rapid growth is that it’s allowing us to gain progressively more insights on how people consume and save energy all over the world. Our dataset’s energy usage dimension is alone growing at a rate of 100 billion meter reads per year. The knowledge that flows from a continually expanding dataset enables increasingly accurate forecasting.

As additional data pours in, you should constantly be evaluating the performance of your predictive model and always be ready to update it to reflect new information and discoveries. When you detect well-founded statistical signals, incorporate them. This kind of iterative calibration is the key to becoming a master forecaster. Just look at the progress that the National Weather Service has made over time in predicting the landfall of hurricanes: in 1970, their 3-day-ahead forecasts were an average of 518 miles off; but this margin of error shrunk to 345 miles by 1990, and was just 71 miles for Hurricane Sandy in 2012.

In the field of behavioral prediction, we’re similarly committed to improving over time. For example, when our Analytics teammates verified that email versions of Opower’s home energy reports (sent in combination with our standard mailed paper reports) have a statistically significant effect in motivating households to use less energy, we promptly updated our energy-savings forecasting model to reflect this fact.

Keep the above principles in mind, and you’ll be well on your way to accurately predicting human behavior in your chosen field. It’s time to get out there and start impressing your friends. No crystal ball required.

Emily Bailey and Nathan Srinivas are members of Opower’s Analytics team and lead its energy savings forecasting. Barry Fischer is Head Writer at Opower. Special thanks to Ashley Sudney for graphics.


Views: 6608


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Richard Ordowich on June 13, 2013 at 5:45am


Great article but a somewhat overreaching closing: "accurately predicting human behavior in your chosen field"


Perhaps the statement “approximating human behavior" is more appropriate to reduce the potential for hubris.



Comment by Tanmoy Thakur on June 12, 2013 at 8:29pm

Nice Blog Emily and Nathan !!

We can keep adding more variables to the model and track the regression-coefficient.

If the value of regression coeff doesn't vary much, our model holds true.

That's what we follow.


  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service