Subscribe to DSC Newsletter

Kaggle Releases Data Sets About Global Warming: Make your own Predictions

Analyze the data, make your own conclusions about global warming and climate change (especially predictions), and post your results in the comment section below: this article will be featured and will reach out to more than one million data science practitioners. 

Is the Kaggle data (see below) good enough to build meaningful confidence intervals? That is, does it have enough data points, well distributed spatially and in time, with low and high temperatures for each weather station, to make reliable - statistically significant - conclusions? Are extremes (low or high) getting worse depending on location? Is global warming different based on latitude? Is the gap between two extreme events shrinking, meaning that extremes are becoming more extreme faster than before? Are there any other metrics in the data set - besides temperatures - that could help create a causal, rather than a correlational model?

Could you make a video showing how Earth temperatures evolved over time in the last 100 years, using a map of Earth with each frame in the video representing one year? (Here is some help on how to produce such a video)

A few questions:

Since the Earth's rotation axis is strongly tilted, there is permanently, at any day of the year, a location on Earth that is in the dark 24 hours a day, and thus, colder than any other locations on Earth: the South or North pole. Could human beings migrate from far North to far South each year, to stay permanently in the dark and avoid scorching heat? We are very happy to not be living on Venus (even if Earth becomes like Venus due to greenhouse effect), because Venus does not have this massive tilt that creates seasons and 6-month darkness at the poles. This is something human beings could leverage for survival.

What about underground or undersea leaving?

If the planet gets much warmer, will warming eventually produce much more clouds, which in turn reflect light and contribute to cooling Earth, the same way glaciers do today? That is, can global warming go only so far, or can it get as bad as on Venus? Can the Kaggle data answer that question?

Kaggle data and material

Originally posted here, where you can download the 86MB data set compressed in zip format, as well as some scripts to process the data. For other free data sets, click here. For space weather, click here: it also has very interesting predictive analytics problems. Below is a description of the Kaggle weather project, from the original source.

Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.


Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

  • Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures
  • LandAverageTemperature: global average land temperature in celsius
  • LandAverageTemperatureUncertainty: the 95% confidence interval around the average
  • LandMaxTemperature: global average maximum land temperature in celsius
  • LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature
  • LandMinTemperature: global average minimum land temperature in celsius
  • LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature
  • LandAndOceanAverageTemperature: global average land and ocean temperature in celsius
  • LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

  • Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)
  • Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)
  • Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)
  • Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 6932

Reply to This

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service