INTRODUCTION
This document outlines the creation of a global inventory of reference samples and Earth Observation (EO) / gridded datasets for the Global Pasture Watch (GPW) initiative. This inventory supports the training and validation of machine-learning models for GPW grassland mapping. This documentation outlines methodology, data sources, workflow, and results.
Keywords: Grassland, Land Use, Land Cover, Gridded Datasets, Harmonization
OBJECTIVES
Create a global inventory of existing reference samples for land use and land cover (LULC);
Compile global EO / gridded datasets that capture LULC classes and harmonize them to match the GPW classes;
Develop automated scripts for data harmonization and integration.
DATA COLLECTION
Datasets incorporated:
Datasets
Spatial distribution
Time period
Number of individual samples
WorldCereal
Global
2016-2021
38,267,911
Global Land Cover Mapping and Estimation (GLanCE)
Global
1985-2021
31,061,694
EuroCrops
Europe
2015-2022
14,742,648
GeoWiki G-GLOPS training dataset
Global
2021
11,394,623
MapBiomas Brazil
Brazil
1985-2018
3,234,370
Land Use/Land CoverArea Frame Survey (LUCAS)
Europe
2006-2018
1,351,293
Dynamic World
Global
2019-2020
1,249,983
Land Change Monitoring,Assessment, and Projection (LCMap)
U.S. (CONUS)
1984-2018
874,836
GeoWiki 2012
Global
2011-2012
151,942
PREDICTS
Global
1984-2013
16,627
CropHarvest
Global
2018-2021
9,714
Total: 102,355,642 samples
WORKFLOW
Harmonization Process
We harmonized global reference samples and EO/gridded datasets to align with GPW classes, optimizing their integration into the GPW machine-learning workflow.
We considered reference samples derived by visual interpretation with spatial support of at least 30 m (Landsat and Sentinel), that could represent LULC classes for a point or region.
Each dataset was processed using automated Python scripts to download vector files and convert the original LULC classes into the following GPW classes:
0. Other land cover
1. Natural and Semi-natural grassland
2. Cultivated grassland
3. Crops and other related agricultural practices
We empirically assigned a weight to each sample based on the original dataset's class description, reflecting the level of mixture within the class. The weights range from 1 (Low) to 3 (High), with higher weights indicating greater mixture. Samples with low mixture levels are more accurate and effective for differ...