Home » Uncategorized

33 unusual problems that can be solved with data science

Here is a non-exhausting list of curious problems that could greatly benefit from data analysis. If you think you can’t get a job as a data scientist (because you only apply to jobs at Facebook, LinkedIn, Twitter or Apple), here’s a way to find or create new jobs, broaden your horizons, and make Earth a better world not just for human beings, but for all living creatures. Even beyond Earth indeed. Help us grow this list of 33 problems, to 100+.

The actual number is higher than 33, as I’m adding new entries.


Figure 1: related to problem #33

33 unusual problems that can be solved with data science

  1. Automated translation, including translating one programming language into another one (for instance, SQL to Python – the converse is not possible)
  2. Spell checks, especially for people writing in multiple languages – lot’s of progress to be made here, including automatically recognizing the language when you type, and stop trying to correct the same word every single time (some browsers have tried to change Ning to Nong hundreds of times, and I have no idea why after 50 failures they continue to try – I call this machine unlearning) 
  3. Detection of earth-like planets – focus on planetary systems with many planets to increase odds of finding inhabitable planets, rather than stars and planets matching our Sun and Earth
  4. Distinguishing between noise and signal on millions of NASA pictures or videos, to identify patterns
  5. Automated piloting (drones, cars without pilots)
  6. Customized, patient-specific medications and diets
  7. Predicting and legally manipulating elections
  8. Sport bets
  9. Predicting oil demand, oil reserves, oil price, impact of coal usage
  10. Predicting chances that a container in a port contains a nuclear bomb
  11. Assessing the probability that a convict is really the culprit, especially when a chain of events resulted in a crime or accident (think about a civil airplane shot down by a missile)
  12. Computing correct average time-to-crime statistics for an average gun (using censored models to compensate for the bias caused by new guns not having a criminal history attached to them)
  13. Predicting iceberg paths: this occasionally requires icebergs to be towed to avoid collisions
  14. Oil wells drilling optimization: how to digg as few test wells as possible to detect the entire area where oil can be found 
  15. Predicting solar flares: timing, duration, intensity and localization
  16. Predicting Earthquakes
  17. Predicting very local weather (short-term) or global weather (long-term); reconstructing past weather (like 200 million years old)
  18. Predicting weather on Mars to identify best time and spots for a landing
  19. Predict riots based on tweets
  20. Designing metrics to predict student success, or employee attrition
  21. Predicting book sales, determining correct price, price elasticity and whether a specific book should be accepted or rejected by a publisher, based on projected ROI
  22. Predicting volcano risk, to evacuate populations or cancel flights, while minimizing expenses caused by these decisions
  23. Predicting 500-year floods, to build dams
  24. Actuarial science: predict your death, and health expenditures, to compute your premiums (based on which population segment you belong to)
  25. Predicting reproduction rate in animal populations
  26. Predicting food reserves each year (fish, meat, crops including crop failures caused by diseases or other problems). Same with electricity and water consumption, as well as rare metals or elements that are critical to build computers and other modern products.
  27. Predicting longevity of a product, or a customer
  28. Asteroid risks
  29. Predicting duration, extent and severity of draught or fires
  30. Predicting racial and religious mix in a population, detecting change point (e.g. when more people speak Spanish than English, in California) to adapt policies accordingly
  31. Attribution modeling to optimize advertising mix, branding efforts and organic traffic
  32. Predicting new flu viruses to design efficient vaccines each year
  33. Explaing hexagonal patterns in this Death Valley picture (see Figure 1)
  34. Road constructions, HOV lanes, and traffic lights designed to optimize highway traffic. Major bottlenecks are caused by 3-lanes highways suddenly narrowing down to 2-lanes on a short section and for no reasons, usually less than 100 yards long. No need for big data to understand and fix this, though if you don’t know basic physics (fluids theory) and your job is traffic planning / optimization / engineering, then big data – if used smartly – will help you find the cause, and compensate for your lack of good judgement. These bottlenecks should be your top proprity, and not expensive to fix.
  35. Google algorithm to predict duration of a road trip, doing much better than GPS systems not connected to the Internet. Potential improvement: when Google tells me that I will arrive in Portland at 5pm when I’m currently in Seattle at 2pm, it should incorporate forecasted traffic in Portland at 5pm: that is, congestion due to peak telecommuting time, rather than making computations based on Portland traffic at 2pm. 

Other articles