The need to deal with big data has stimulated the development of distributed systems such as Hadoop, Spark, than can cope with massive calculations. This opens the possibility of using agent-based models to run a big number of simulations and combine the outputs with bayesian estimations in order to produce models for estimating key statistical aggregates using easy to measure predictors. Let's assume, for example, that we want to develop a model to estimate the total production of a crop in an area using the number of people, animals, or cars transporting that crop on given selected roads during a given period of time. We first develop an agent-based model of the production and the transportation of the crop taking into account the existing roads network, the production points and the geographic positions of the main markets. Then we run the simulation with various parameters a sufficient number of times in order to have numerical estimations of the probability distribution of the number of observed people or cars on the key roads given the production. We can then use bayesian estimation to get the probability of the production being P if a number n of people or cars transporting the crop have been observed. Using multiple predictors may give good quality estimations that are more cost-effective than surveys or than can be used in combination with very light surveys to produce statistical information at lower cost.

Can such approach be the best way developing countries can leverage big data in order to produce better statistical information?

Note: It may be possible to use macroeconomic models such as computable general equilibrium models instead of agent-based models, provided we can model in a probabilistic way the individual behaviours that drive the transition from one global equilibrium to another global equilibrium (based for example on surveys about how individual behave in given situations: When do they decide that they have surplus to sell? Do they just go to the nearest market or their try different markets? Which transportation means they use, etc.)

Tags:

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Optimization and The NFL’s Toughest Scheduling Problem - June 23

At first glance, the NFL’s scheduling problem seems simple: 5 people have 12 weeks to schedule 256 games over the course of a 17-week season. The scenarios are potentially well into the quadrillions. In this latest Data Science Central webinar, you will learn how the NFL began using Gurobi’s mathematical optimization solver to tackle this complex scheduling problem. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Optimization and The NFL’s Toughest Scheduling Problem - June 23

At first glance, the NFL’s scheduling problem seems simple: 5 people have 12 weeks to schedule 256 games over the course of a 17-week season. The scenarios are potentially well into the quadrillions. In this latest Data Science Central webinar, you will learn how the NFL began using Gurobi’s mathematical optimization solver to tackle this complex scheduling problem. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions