As data scientists, we get excited about using our talents to solve problems like global climate change and worldwide environmental policy.
This week, I have the opportunity to represent Pivotal and team with other experts from EMC, Earthwatch, and Schoodic Institute to spend a week at Acadia National Park. We will be applying data science to the science of phenology—the study of periodic plant and animal life cycle events and how these are influenced by seasonal and inter-annual variations in climate. Ultimately, the work will help scientists and researchers to better collect, store, manage, and monitor data, helping us all understand how and why our climate is changing and what the impact is on plants, animals, and humans.
Beginning the Journey
Sitting on the rock-bound Atlantic coast in the state of Maine, Acadia is a cornerstone to the US national parks system, drawing in about 2.1 million visitors annually. Its 49,000+ acres are considered one of the oldest National Parks, starting in the early 1900s and the first national park east of the Mississippi River. At 10 am on Monday, five of us boarded a Cessna in Boston’s Logan Airport for the 70 minute ride up to Acadia. As the plane approached the park, we each had some eagle-eye-level views of the beauty we were on a mission to protect this week.
Focusing on the Data Science Project Goals
Ultimately, we are all participating in this Earthwatch program to help the world’s top scientists better use data to conduct research and support a sustainable planet. With that in mind, the team will explore the concept of a data lake for climate information and how it will help data scientists explore related phenology data. In a nutshell, we will look at how integrating a multitude of sources, both structured and unstructured, scientists can begin to understand new correlations between data sets, build predictive models, provide visualizations to explain these phenomena to the world, influence government policy, and mitigate the negative impact of climate change.
Our goal for the week and beyond is to participate as citizen scientists, brainstorming on ways in which the EMC Federation of companies can improve how we study the impact of climate change by applying modern data science practices. The agenda will cover four things:
Data collection in the field
Limitation of the current data collection approach and system
Requirements for a climate data lake
Applying Data Science to Phenology
As the team assembled at the Schoodic Institute, one of our leaders, Dr. Abe Miller-Rushing gave several examples to explain phenology within the park, illustrating how climate changes can affect a natural ecosystem. Things like:
Insectivorous (insect eating) birds. For certain species, in order for their offspring to survive, there must be adequate caterpillar biomass in the area. With climate change, caterpillars are hatching earlier than normal. This has had a significant impact on the weight and the number of bird offspring.
Puffin chicks. With oceans warming earlier, certain populations of fish are changing their migrations and are adversely affected the population of puffin chicks in Maine. Due to warmer temperatures, the primary food source for puffin chicks is long gone from the waters off Maine by August. As a result the puffin parents have attempted to feed their chicks with an alternative food source, butterfish, which the chicks struggle to swallow. This has led to a serious decline in the number of puffin chicks that have successfully fledged.
Flowers blooming. The flowering of certain species of flowers in Eastern United States has also changed due to climate. The great American poet and naturalist, Henry David Thoreau, kept records in the mid-19th century and captured when various species of flowers would first open every spring. While Thoreau's records indicate that high bush blueberries flowered in mid-May back in his time, now they are seen blooming as early as the first week in April.
Team Progress Summary
The first day, logically, was spent orienting the team and laying out the challenge ahead. By the end of the first day, we covered an overview of the goals of this expedition, discussed current limitations in the data collection and analysis approach and infrastructure, and a outlined how a climate data lake could help study the impact of climate change—one of the important challenges of our generation.
Tomorrow, I'll have more updates from the field about our data collection activities, a look at the different data sources we'll be working with during our time here, and provide an assessment of the challenges.
You can read more on this story on the Pivotal blog, along with other Pivotal Data Science stories or find out more about what open source software and products we use at the Pivotal Data Science blog.