In this article, I hope to inspire you to start exploring satellite imagery datasets. Recently, this technology has gained huge momentum, and we are finding that new possibilities arise when we use satellite image analysis. Satellite data changes the game because it allows us to gather new information that is not readily available to businesses.
Satellite images allow you to view Earth from a broader perspective. You can point to any location on Earth and get the latest satellite images of that area. Also, this information is easy to access. There are free sources that allow you to download the mapped image onto your computer, and then, you can play with it locally.
One of the most important aspects of using satellite images is that you can also browse past images of certain locations. This means that you can track how the area changed over time and predict how it will change in the future. All you have to do is define the properties that are relevant to your use case.
To give you an idea of how satellites track our progress on Earth, we have to take a look at what is above us.
There are currently over 45 hundred satellites orbiting the Earth. Some are used for communication or GPS, but over 600 of them are regularly taking pictures of the Earth’s surface. Currently (as of end of 2018), the best available resolution is 25cm per pixel, which means that 1 pixel covers a square of 25cm x 25cm. This translates to a person taking about 3 pixels on an image.
The current technology we have actually allows us to get an even better resolution, but it is not available, as many governments don’t allow us to take more detailed images due to security reasons. Meaning, you won’t be able to access better quality unless you have security clearance.
The first group is free public images. Amongst them are American Landsat and European Sentinel, which are the most popular free images. Landsat will provide you images with a resolution of 30m per pixel every 14 days for any location. Sentinel will provide images with a resolution of 10m per pixel every 7 days.
There are also commercial providers, like DigitalGlobe, that can provide you with images with a resolution up to 25cm per pixel where images are available twice a day. It is important to strike a balance between the different properties that you need, as the best resolution doesn’t always mean that you get the most frequent images.
Also, cost is an important factor. The best images can cost up to a couple hundred dollars so it is wise to start building your solution with lower quality images. Just make sure you use the best ones for your particular use cases. Of course, commercial sources offer subscriptions, which will reduce the images’ cost.
Let’s go through the properties that you have to balance out when choosing an image source. First is spatial resolution. As you can see, technology has been rapidly advancing, and there is more and more money being invested into launching better satellites and making them available.
The second factor is temporal resolution. This is how often you get a picture of a given place. This is an important aspect because of how clouds may block your point of interest. For example, if you only get 1 image every 7 days, and your location is in a cloudy area, then it is likely all your images in a month might be blocked by clouds, which stops you from collecting data in your area. There are some algorithms being created to mitigate this issue, however, it is still a big problem when browsing images. For the most part, it is better to get the highest possible frequency to improve your chances of getting a clean shot of the given area in the selected time frame.
Now, the third factor is interesting. It is spectral resolution. When you think about an image, you usually think of three layers: red, green, and blue; these layers compose a visual image of the area. This is because our human eye has three color-sensitive cones, which react to red, green, and blue.
However, satellites can have many more sensors that allow them to record spectrums that our human eyes cannot see. An image taken by the satellite can have 12 or more layers, and each layer brings more information. By combining the layers, you can create indicators that will give you additional insight about what is happening on the ground.
One fascinating indicator is the normalized difference vegetation index (NDVI), which can be used to estimate the condition of the plants. When you look at a normal picture of a field, you see different shades of green, but it doesn’t tell you how healthy the vegetation is.
We can measure vegetation health by looking at the near-infrared light that gets reflected from flora in different ways depending on the amount of chlorophyll. This allows us to see how healthy the plants in our observation area are, which is not possible to derive from an RGB image.
Another example is soil moisture, meaning how wet the land is. During droughts, like in LA, authorities introduced water restrictions. It turned out that wealthier individuals didn’t follow these restrictions and continued to use large amounts of water. Thanks to satellite images, the government could see which fields had high soil moisture, helping them to better enforce these water restriction laws.
It is, also, worth mentioning that there is radar technology that allows you to see through the clouds, but it won’t fit every use case you may want to apply it to.
The current, state-of-the-art satellites have 25cm resolution or images twice a day. This is an example of a standard image.
Clearly, you can see people, and you can even count the number of tables outside the restaurant.
As I said before, you have to strike a good balance between these properties to serve your problem. Spatial resolution may not be the most important factor in your research. You also need to consider the temporal and spectral resolutions, cost, availability, and ease of processing.
Let’s start with what shouldn’t be done in R. There are two main categories: data pre-processing and resource intensive operations. One image will weigh around 1GB and will cover a large area, like half of the state of Washington.
Downloading 100 images and cutting them on your computer is very resource intensive and shouldn’t be done locally in R. There are platforms available that will do the pre-processing and send you the small cut outs of the shapefile that you want. Amongst them, there is Google Earth Engine and Amazon Web Services (AWS), which allow you to simply query the API. They already have public image sets available, and you can upload your own image sets. All of this is available at your fingertips. You just say, “Google, I want a set of dates for Sentinel images that cover small square containing Loews Hotel,” and you are set. From there, you choose one or more dates and ask the API to send you already cropped images, reducing the image size by hundreds of kilobytes.
This all happens quite quickly, as you’re using huge distributed infrastructure to do the calculations. In addition, you can actually conduct computations there and receive indicators. For example, you can receive the NDVI indicator, which is a simple, mathematical combination of the near-infrared and red channels.
Now, R shines when you build dashboards to present the data. You can analyze and forecast the indicators that you’ve built. Operating on small images allows you to leverage many useful R packages to experiment with this data and gain valuable insight. Of course, you can also build neural networks that will help you indicate objects on these images.
Here is an example of a dashboard that you could build with R.
By combining publicly available geospatial data for parcel shapefiles, you can draw any parcel on a map and request available dates of images for that parcel. Then, you can analyze the image, indicate where crops are destroyed, or where they are unhealthy.
This is an example of visualizing an NDVI indicator.
As seen above, there are sub-areas with healthy crops, while there are others with unhealthy plants. Also, the clouds here are distorting the results, which should be accounted for.
One example of applying deep learning to the pre-processed images that I can share is one where we used Kaggle data to indicate if there was a ship located in an image. If you don’t have such a data set available, you have to combine other data sources.
We can use a system called AIS, it requires ships to report their positions on a regular basis. From there, we got a satellite image of the sea, combined that with the ship positions at that time, and cut out the images to prepare the data set.
In the maritime industry, it is important to know where ships are, as there are some restricted areas. For example, there are areas where it is forbidden for fisherman to catch fish. Some of them would turn their AIS off and go there to catch fish. Scanning the satellite images allows us to identify some of these illegal acts.
Also, R is great to augment your data set and produce even more examples.
Here is the network that we used to identify these ships.
It consists of two convolution layers, one max pooling, two convolution layers, max pooling again, and then a softmax function. We also use a dropout to avoid overfitting. Using Keras simplifies the process of defining a network like this.
In this problem, we achieved 98% accuracy, and if you’re interested in the details, you can check out this article on our blog by Michał Maj.
Now, let’s break down a full architecture of a solution that you could build to analyze satellite images and present the results.
First of all, you need a data source. It can be either a platform or you can get them from providers.
You have to pre-process them, and you need large resources here so it is useful to either build your own solution in cloud or leverage existing platforms. You can batch process many images at once and store them. You can also use an API to get the pre-processed images on demand.
With the prepared images, you can train your network and save the model. Then, you can run a batch process to label your images and store them. Finally, you can build a dashboard that will use them or use the API to request an image, run the model on it, and present results.
Let’s look at some emerging applications in different business areas. These are just some examples, but we are seeing more and more different use cases each day.
First is agriculture. Farmers can have a live overview of their crops that show their crops’ health and damages. Using this technology, they can quickly estimate their losses after drought, flood or hurricane. Recently is also used in fertilizing process.
Second is real estate. For example, construction companies can use this technology to see what their competition is doing and benchmark their performance comparatively. Further, they can see which areas are expanding to know what locations might be good to invest in. Also, rooftops themselves provide information about the state of a given building, which is valuable to builders.
The third is finance and insurance. Traders can forecast the supply of goods based on the number of containers being delivered from particular areas. This gives them a huge advantage. For example, they can predict that the price of the given goods will rise in the future if there is a supply shortage. With better spatial and temporal resolutions, you can learn more details about an area. For example, you can use this technology to identify cars in different neighborhoods and assess if an area is wealthy. There were even use cases where the companies count the number of cars in a parking lot to see how well a given supermarket was doing. Note that for this type of work, you need to use much more frequent data.
The technology is getting better and better. If you start experimenting with these images now, you will be on top of this wave soon. It is important to realize that these techniques are technology agnostic, meaning they don’t only apply to satellite pictures.
In the future, it might be possible to even apply these techniques to live drones or airships. The industry is on the rise, and if you start soon, you can get big returns in the future. Use images, share your results with the community, and, most importantly, have fun. Playing with these images, while valuable, is also very exciting.
Original story by Damian Rodziewicz on Appsilon Data Science Blog