Home Blogs Where data science is making a real revolution

Where data science is making a real revolution

What if I told you that everything you see is not so different to a bunch of numbers? And would you believe if I say that this fact has a direct impact on industries like agribusiness, retail, transportation or military defense?

Pretty much like in the Matrix movie, what we see is raw data processed and interpreted by our brains, which happens so fast that we don’t even realize it. In computational terms any image can be thought as a set of data, where each pixel of that image holds one or more values. Pixels are the building blocks of an image, arranged in a particular order to represent some information of a specific type. As in the naval battle game, each pixel has a specific coordinate within an image, which brings sense to the image itself.

In the image above, pixel values are integers that range from 0 (black) to 255 (white), which means that this image (or any other image) can be analyzed, sampled, classified, predicted, parametrized, or any other thing you would do with a dataset. Image analysis is one of the key disciplines in remote sensing imagery since it brings some of the most valuable insights. To be more specific, remote sensing imagery in agriculture is one of the hot topics in modern agritech leveraged by the expansion of modern satellites, some of which provide high resolution imagery for free. Technology and digitalization in the agricultural industry is giving birth to new possibilities, some of which were impossible to imagine some years ago.

Nowadays satellites and UAVs (unmanned aerial vehicles, like drones) are taking more precise images than ever before. Temporal (image frequency) and spatial resolution (which measures how fine an image is) have increased exponentially during the past years, and the aerospace industry will continue to change in ways we cannot even predict. Our capabilities are so strong now that we literally have the ability to spot a coin from outer space. As pixel density increases (pixels per inch, or ppi), we reach sharper images and heavier digital image files. On the contrary, as pixel density decreases we lose precision but also decrease processing needs. This is one fundamental tradeoff in current digital image analysis, where the availability of images is growing so fast that we cannot seem to keep up with the ability to process them.

By using remote sensing imagery in disciplines like agriculture we are able to perform different tasks in a much more efficient way:

  • Detect plant diseases
  • Classify land cover (e.g. forest) and land use (e.g. agriculture)
  • Classify crop types by recognizing different crops like soybean, corn, wheat, etc.
  • Estimate crop yields by calculating the expected yield of a crop in a specific area
  • Identify weeds that compete with healthy crops for sunlight and soil nutrients
  • Monitor and predict soil moisture by detecting water stress
  • Assess herbicides, insecticides and fungicides effectiveness
  • Identify contaminants in crops and soil
  • Monitor the effectiveness of the seeding process

But how do we get all this information just from an image? Seems a bit too much, right? The thing is remote sensing imagery provides not only one value per pixel (for example the color value) but additional values per pixel, which brings a lot of extra data to improve any analysis or decision making. To understand how this works, we must first talk about spectral imagery.

Seeing the unseen

As humans we only see a very small part of the electromagnetic spectrum (what we call “visible light”), and the truth is that through centuries we have been experiencing the world just through a little lock.

In 1800 William Herschel (the same person who discovered Uranus) separated sunlight with a prism and placed thermometers under each light color. By placing an extra one beyond the red light, he discovered that this extra thermometer had the highest temperature of all. This was the discovery of infrared light, and the world knew for the first time that there are types of light that we cannot see with our eyes. Only a year later, Johann Wilhelm Ritter did the same experiment on the other side of the visible light spectrum (this time beyond the purple light side), and discovered ultraviolet light. The electromagnetic spectrum was born.

data science

Modern remote sensing imagery can measure various wavelengths (far beyond our known “visible light” spectrum), many of which are invisible to our eyes. This is called multispectral imagery, and it is produced by sensors (in satellites or drones for example), that measure reflected energy within several sections (or bands) of the electromagnetic spectrum. Since different objects or conditions reflect energy in different ways (for example vegetation reflects light differently than water), each spectral band captures different attributes.

Multispectral imagery can have around 10 or 20 different band measurements in each pixel of the images they represent. Satellites like Sentinel-2 (which provides worldwide free imagery for land and water monitoring) have 13 spectral bands, each one associated to a particular wavelength of the electromagnetic spectrum.

But if you think that 13 bands are not enough, then you might consider going hyperspectral. Hyperspectral imagery can provide hundreds of band measurements (yes, hundreds by image!) across the entire electromagnetic spectrum which provides higher sensitivity to subtle variations in the observation. Images produced from hyperspectral sensors contain substantially more data than the ones generated by multispectral sensors and have a lot of extra potential to detect differences among features. For example, while multispectral imagery can be used to map forested areas, hyperspectral imagery can be used to map tree species within the forest.

The company Satellogic is the leading supplier of high-resolution hyperspectral imagery in the world. Beyond providing businesses in agriculture, mining, construction and environmental mapping with a competitive edge, Satellogic has taken the bold step of providing free hyperspectral data for open scientific research and humanitarian causes.

satellite

Different Landsat bands exploded out like layers of onion skin along with corresponding spectral band wavelengths and common band names. Source: NASA

Why does your organization need to explore remote sensing technologies?

Why Data Science is a real game changer?

Since images are datasets, you might consider carrying out mathematical operations with them.

By performing algebra with spectral bands it is possible to combine two or more wavelengths to highlight particular features like types of vegetation, burned areas or spot the presence of water. It is also possible to combine bands from different sources (like different satellites) and reach deeper insights: imagine combining different light reflectance bands with soil humidity, plus land elevation characteristics, everything stacked in the same dataset: if each band or image in the dataset has the same numbers of rows and columns, then the pixels from one band fall in the same spatial location as the pixels in another band. Since each pixel is geocoded, we can add different layers of information in any specific location.

The good news is this isn’t as hard as it sounds. Google Earth Engine provides petabytes of satellite imagery and geospatial datasets powered back by their own data centers and APIs, everything in a user friendly platform that lets any user to test and implement its own models and algorithms. You can perform math and geometric operations, classify pixels or recognize objects, while running in the cloud at fast speed. Once again, Google has scaled up and simplified what some years ago was extremely complex and expensive.

Moreover, the availability of remote sensing imagery has increased so much through the years that it is possible to add time dimensions. The number of active satellites has increased as well as their revisit times, so we can potentially incorporate robust temporal information to any model. By performing time series analysis in images we move away from mono-temporal information (just one glimpse in time) and enter a new concept, where properties change over periods, and we can observe different cycles and trends.

imagery

Driving conclusions in this big data context is extremely challenging, since none of this data would make any sense if we didn’t have the ability to process it. How do you manage and interpret billions of pixels and tremendous datasets?

This is where data science is making a real revolution, and fueling a deep disruption in sectors like agriculture. Data science can be considered as quite a recent discipline that focuses on the extraction of meaningful insights from data for use in strategic decision making, product development, trend analysis and forecasting. Data science concepts and methodologies are derived from disciplines like statistics, data engineering, natural language processing, data warehousing and machine learning, which are exceptionally efficient in dealing with the actual explosion of data. You can say that data scientists are the best in making discoveries while swimming in data.

But from all the sub disciplines in data science, I believe machine learning deserves a special remark. Machine learning describes algorithms that learn from data rather than being specifically programmed for that, and represent the best solutions to solve the problem of data abundance. This is the area of data science most likely to provide direct and tangible benefits in the foreseeable future.

Companies like Planet exploit machine learning to transform imagery into actionable analytic feeds, tracking changes in infrastructure and land use, at a much better spatial and temporal resolution than free satellites.

From the huge universe of machine learning algorithms, Convolutional Neural Networks (CNNs) are the state of the art in image processing and are being used in fields like agriculture to reshape the way of doing things. CNNs represent a promising technique with current high performance results in terms of precision and accuracy, outperforming other existing image-processing techniques.

C2-A2, the world’s first agrodroid, uses CNNs to protect equipment and people from all possible collisions. C2-A2 is an artificial brain that aims to become the universal control system for autonomous agricultural machinery, like harvesters, tractors or sprayers.

Human capabilities are being strengthened at an exceptional rate thanks to these disciplines. The irruption of technology in fields like remote sensing imagery is completely changing the way we interact with the world, and who knows where this will end. In this scenario, it is not so crazy to think about systems that process extremely complex inputs and make decisions on their own, freeing up human time and resources to make other endeavors.

Also Read

Why go for 3D geospatial data from satellite sensors?