As part of a recent Radiant.Earth workshop, 30 leading international experts participated in the launch of a new Technical Working Group on Machine Learning for Global Development.
The group includes Earth observations (EO), machine learning (ML), and land cover (LC) classification experts, all working collaboratively towards the goal of developing a community standard on best practices for use of ML with EO, a commons for labeled training data catalogues, and a hierarchical schema for global LC classification.
“Radiant.Earth is developing open source datasets of labeled satellite images, which will be hosted on the MLHub.Earth with a Creative Commons license.”
Radiant.Earth is developing open source datasets of labeled satellite images, which will be hosted on the MLHub.Earth with a Creative Commons license. These datasets will lead to a living open image library for ML and EO. Our goal is to create a sustained, community-wide effort to capture image labels that would enable major innovations and will drive new, more targeted and timely insights supporting progress in areas such as agriculture, food security, conservation, health, land rights, urban planning, water resources, and other areas relevant to global development and humanitarian response.
The first of such datasets that Radiant.Earth will generate consist of global LC labeled imagery from Sentinel-2 satellites at 10 m spatial resolution. This will enable fully-automated and dynamic LC classification algorithms, using open source satellite imagery. Radiant.Earth will label these images using a combination of ML and crowdsourcing to generate a human-verified training dataset.
“Existing training datasets for LC classification have limitations that do not support the development of a global EO-based LC classification algorithm at fine spatial resolutions with high accuracy.”
Existing training datasets for LC classification have limitations that do not support the development of a global EO-based LC classification algorithm at fine spatial resolutions with high accuracy. These datasets are either generated for specific regions of the world (therefore, they lack geo-diversity) or are based on imagery that are not freely available at the global scale (therefore, they are not open source). Moreover, in many cases, very few labeled images are available for a specific class within the dataset, which limits the performance of a ML algorithm to learn the particular features of that class.
Key topics of the Technical Working Group
Radiant.Earth formed the technical working group on Machine Learning for Global Development to best define the specification of such a global dataset to meet the requirements for end-user applications and to standardize best practices to increase the interoperability of different datasets and algorithms. The group members are experts from commercial, government, non-profit and academic organizations with subject matter knowledge related to this topic. Existing and future activities of the group are documented on this GitHub repository.
The first meeting of the working group focused on the topic of “Machine Learning for Global Land Cover Classification,” on June 14–15, 2018 in Washington, D.C. Thirty experts representing 23 institutions gathered and presented their latest advancements in the use of ML for LC classification. Presenters also shared their thoughts on the challenges and remaining barriers to improve the accuracy of global LC maps. To facilitate further discussions and examination of key topics, experts participated in one of three groups, which are summarized below:
Group 1 focused on developing a hierarchical LC schema to include all major LC classes at global scale and enable inter-comparison and cross-validation of different LC products that use satellite imagery at different spatial resolutions. Highlighting the importance of distinguishing between LC and land use, the group developed a hierarchical LC schema combined with a set of attributes which is translatable so that refined details can be added in each class later on. The schema is designed for a global LC product and assumes that the LC definitions will be updated annually. Details of the schema are provided in here.
Group 2 reviewed the challenges and ad-hoc choices for using ML with EO data. After two days of discussions, they generated a set of best practices for this application. Their recommendations are focused on four topics: (1) accuracy of training data labels, (2) achieving higher accuracies within and between LC classes, (3) maintaining labeled training datasets and (4) best practices for a global LC algorithm using Sentinel-2 imagery. Their detailed recommendations are included in the notes from the meeting (available here), and covers all aspects of these four topics.
Group 3 examined current standards in storing and distributing labeled satellite imagery and the caveats related to each of them. They also developed a training data architecture using the Spatio-Temporal Asset Catalogue (STAC) specifications. This training data specification enables combining raw imagery and label information in one standard catalogue that is adaptable to a wide range of labeled imagery. It will accelerate adoption and use of these data in ML algorithms. The label asset in the catalogue allows for the labels to be “tile classification,” “object detection,” or “segmentation of pixels.” The draft version of this spec is published in this GitHub repository along with a sample GeoJSON file from the SpaceNet challenge.
Note: This was originally published by Radiant.Earth on its official Medium blog