A data refinery, built to understand our planet

A data refinery, built to understand our planet

SHARE

Today, we’re excited to announce Descartes Labs $30M Series B round of financing led by March Capital, an LA-based venture capital fund. Our seed round was led by Crosslink Capital and our Series A was led by Cultivian Sandbox. Both Crosslink and Cultivian participated in the Series B along with a few other investors, including one of our customers, Cargill.

Less than three years ago, Descartes Labs was born as a spin-out from Los Alamos National Laboratory with the goal of using satellite imagery to model complex, global systems. In the course of tackling this audacious goal, we built what we believe to be the critical and missing component that unlocks the value of satellite imagery: a data refinery.

Let me explain how we got to this point, what a data refinery is, and why it’s particularly important for the satellite industry.

Supercomputers and satellites: not just for the government anymore

For many years, only governments had the power to compute at petabyte-scale, making satellites and supercomputers exclusively their realm. That’s changing rapidly.

In the past 10 years, companies like Google and Amazon have built their infrastructure on generic commodity hardware, rather than buying specialized supercomputers to chunk through data. Those companies have opened up their infrastructure in the cloud, providing access to virtually unlimited compute and storage without any upfront cost.

Of course, it still takes specialized skills to be able to wrangle that much processing to tackle huge scientific problems. Our founding team of scientists came from Los Alamos National Laboratory with the unique experience of programming the most powerful supercomputers on the planet. One of the earliest things we did at Descartes Labs was prove that we could create a supercomputer in the cloud. We used 30,000 cores to process a full petabyte of satellite imagery in just 16 hours. Since then, we’ve made a huge investment in the infrastructure to host data and run calculations at enormous scale.

For satellites, similar trends are affecting the market. For decades, satellites were the stuff of spies, closely-held secrets by the world’s largest and most technically-advanced governments. In the 90s, the U.S. government allowed the private sector to build and launch satellites for use in the private sector. This spawned companies like DigitalGlobe. Though there was some commercial interest in these datasets, the largest customer was still the government.

Just in the past decade there has been a perfect storm of science, engineering, innovation, and investment that has caused a renaissance in Earth observation. Launch costs have been reduced. Satellites have gotten smaller, going from the size of a VW Bus to the size of a loaf of bread. Over a billion dollars of investment is going into next-generation satellite companies to improve the technology, making Earth observation less expensive and bringing the benefits within reach of a much wider group of interests. For context, 1,500 satellites were launched in the past 10 years, six times that number (9,000) will be launched in the next ten.

These satellites will be producing tremendous amounts of data. The NASA Landsat constellation took four decades and seven, billion-dollar satellites to amass a petabyte of data. By comparison, DigitalGlobe has over 100PB in its archive. A modern constellation with over 100 cubesats will generate petabytes of data each year. Plus, many new types of data will come online from radar to high-definition video. With the supply of data going up, the price of data will decline, making it accessible to a broader range of consumers.

This explosion of data was the context and founding thesis for Descartes Labs.

Building a data refinery and why it’s important to the satellite industry

Like many companies in our space, we took the obvious path of trying to turn satellite data into actionable information. Our first product was a forecast of corn production for the entire United States. In creating that model, we built a considerable amount of infrastructure.

It was necessary to gather and clean up data from multiple satellite constellations. NASA’s MODIS was great because of its daily revisit. But higher resolution sensors like Landsat-8, which only gathers data every 16 days, were excellent for better understanding where fields were. And, it wasn’t just about satellite data. Weather is an important component of crop health, so we gathered a bunch of weather data, too. On top of this, we built a number of tools that made it easier for our scientists to test and measure the accuracy of models and run large computations. In making our 2016 model, for example, each candidate model required 4 quadrillion pixels and we ran over 1,000 of them.

Though our corn forecast was a success with customers, it was usually just the beginning of the conversation. Corn is a known quantity in the United States and customers wanted to know if we could use the same system we used to build our corn model to look at other crops and other geographies. Also, most of these customers wanted to unlock the power of the data that they’d been collecting over the years and was simply collecting dust in a server somewhere.

This made us realize it wasn’t our corn model that was special, but rather our infrastructure. Much like Google has created a data refinery for web data and GE has created a data refinery for industrial data with Predix, Descartes Labs is focusing on a data refinery about the physical world, starting with satellite imagery.

A data refinery is a system that pulls in raw data, cleans it up, fuses data from disparate sources, and adds tools on top of it for easier analysis.

One of the hallmarks of a data refinery is data cleanup. The team at Descartes Labs spends an inordinate amount of time on the remote sensing science that enables complex models to be accurate. We’ve been working on coregistration of pixels (making sure every pixel is in the proper place), global surface reflectance (correcting for the effects of the atmosphere) and advanced cloud detection (ensure models only incorporate the best pixels).

But, even more important than data cleanup is data fusion. Every data set has unique qualities: finer spatial resolution, pictures with greater frequency, seeing through clouds, and even listening to radio frequency signals or measuring heat. In working on customer problems, one of the things we’ve uncovered is that it’s rarely a single satellite or dataset alone that will solve the problem. Only by turning different datasets into a fused sensor, a super-sensor, are we able to solve the problem.

Finally, we’re building a series of data tools to make it easier for data scientists to access and manipulate huge numbers of pixels. Already we’ve made it easy to search through our archive and return pixels in under 100 milliseconds. We’ll continue to create software that makes satellite data accessible to all data scientists, not just those with PhDs in remote sensing.

By building a data refinery, Descartes Labs is different from a pure analytics company and different from satellite providers. We believe that there will continue to be huge investment in satellites and innovation on hardware — all of which will produce more datasets with hidden insights to be discovered. We expect to see the emergence of numerous analysis companies, using satellite data in unique ways for consumers, corporations, and governments. In the future, our data refinery will collect all of the imagery generated by satellites, normalize it across all of the different sensors, and present science-ready data, freeing developers to focus on advancing science using satellite imagery instead of wasting time collecting and cleaning data.

But, our vision doesn’t end with building a data refinery for satellite imagery. There are other large datasets that are necessary if you want to build complex models of agriculture, shipping & logistics, forestry, energy infrastructure, and human activity. Weather, drones, IoT, even geo-located social media are all useful types of data to use in creating accurate models of activity here on planet Earth.

The first version of this data refinery, the Descartes Labs Platform, is available now. Developers and university researchers have been exploring Earth under our platform beta, and we’ve been directly assisting commercial customers to better understand their businesses and supply chains. Whether you’re a researcher interested in using satellite data to further your science, or a company interested in unlocking understanding of your supply chain through satellite data, talk with us — we’re looking for interesting problems to solve.

Celebrating a venture raise isn’t what really matters, it’s what you do with the money that’s important. We’ll keep you posted as we evolve our data refinery, solve more wickedly-hard science problems, and unlock the value of satellite data to better understand this beautiful planet we live on.