Subscribe to DSC Newsletter

Field Report Day 2: How a Data Lake Could Fast-Forward Our Understanding of Climate Change

In the first post of this series, we gave the background on our data science expedition to Acadia National Park, and now we are seeing its transformative potential.

As a representative from Pivotal and EMC, our goal is to help a team of phenology scientists improve the way they use big data platforms as well as data science tools and techniques to improve their research and fast-forward our understanding of climate change. In this post, I wanted to share what we experienced in the field for Day 2—actually collecting data on bird migration and aquatic life in tidal pools, as well as thinking about how to automate and improve the quality of these data collection processes. I’m happy to report, in just 2 days, we’ve begun formulating ways to use a network of stationary cameras, image processing technology, data lakes, and mobile apps to help automate the process—ultimately helping scientists spend more time on science and less time on administrative tasks.

Key Data Science Challenges: Capturing and Cleaning Data

As we went through orientation sessions yesterday, we heard from Dr. Abe Miller-Rushing, one of the principal investigators from Acadia National Park. He pointed out that scientists spend a lot of time capturing and cleaning up data. In fact, 50% of their time is spent on data cleansing. That is a lot of time in the field making observations and manually counting results that could be put to better use. It became clear that technology can play a stronger role here in the process and make phenology scientists more productive in a significant way.

The Current Method: Manual Observation of Bird Migration and Tide Pools

Capturing data on bird migration was our first task for the day. After breakfast at the Schoodic Research Institute, we hiked for a mile to the southern tip of Winter Harbor and Schoodic Point, a picturesque location for observing bird migrations along the Atlantic. There, we set up binoculars and telescopes to spot birds and paired in teams of two.

Each team, one observer and one note taker, catalogued sightings of three different migratory bird species—the Common Eider, the Northern Gannet, and the Common Loon. After only a few minutes, the challenges in this approach became evident. Bone-chilling temperatures and whipping winds made it uncomfortable to keep our eyes focused on the horizon. Overcast skies, water reflections, fog, clouds, and distance made compounded the challenge for accurate observation. As a result, all 5 teams reported considerable variability in the number of sightings. Moreover, the reports were only small slices in time. In order to get a complete picture, teams would have to endure days or weeks out in the elements. Combining the 5 teams into a single unit and adding an expert to the team both helped raise the accuracy, but intensified the amount of manual labor for the task. In short, manual collection proved to be error prone, incomplete and uncomfortable.

After lunch, we took a first hand look at studying intertidal ecology. Schoodic Institute field team leader and education project manager, Hannah Weber, led us on a mile-long hike to Acadia’s Diagon Alley—a tidal pool speckled with barnacles, blue mussels, and a variety of seaweeds. Again, we worked in pairs to count the presence or absence of our target species. We used the Point-Intercept method with a quadrat to record our observations. While we found that it was easier to be accurate with this type of data collection, it still took a lot of people to brave the elements and a lot of time to make it happen. 

This experience of walking a day in the life of the research teams helped us to see a lot of ways in which technology, particularly a data lake storing large volumes of automatically captured images, could greatly help scientists and ecologists better record and study these types of data sets.

Changing the Game with Data Science, Connected Cameras, Data Lakes, and Mobile Apps

In both cases, it was clear that technology and automation could change the game by putting a greater volume of more accurate information in the hands of researchers more quickly.

In the bird watching environment, stationary cameras could take high resolution images every few seconds throughout the day and then be ingested into a data lake or big data environment. Through object recognition and image processing, the system could separate bird images and run a content based information retrieval engine (CBIR) to match the detected objects against a database of images of migratory birds observed in the region. Furthermore, they could be queued for researchers to review and override misclassifications, thus concentrating their efforts on improving accuracy in the final stage of data collection. As a result, a much greater volume of data could be captured with far less effort and error. As well, an open data lake approach would open these raw data sets up to be easily used by other researchers, expanding the usefulness of the data.

For the tidepools, a tablet or smartphone app could be used to collect pictures of the quadrats in the tide pool. Again, an image processing program could automatically fill out a matrix of hits and misses for the different species of organisms. Once placed in a data lake, this time-stamped data, along with the GPS coordinates of the tide pool, would be available for researchers worldwide.

Wrapping Up Day 2 and Learning More

After one day in the field, it is clear there are many possibilities for: automating the data collection process; building larger, more complete and more accurate data sets; and improving the productivity of the researchers by allowing them to spend more time on science and less time on administrative work. Tomorrow, we’ll continue to look at the scientific research process here in Acadia by visiting Mount Desert Island for our field trip. Later, we will return to the Schoodic Research Institute in the afternoon and continue our brainstorming on the climate data lake and how it can speed our effort to learn about climate change. 

You can read articles from my data science colleagues or find out more about what open source software and products we use at the Pivotal Data Science blog.

Views: 223

Tags: EMC, Federation, analytics, data, for, good, phenology, predictive, science

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service