Accuracy assessment of land cover maps

Accuracy assessment of land cover maps is the process of evaluating the reliability and quality of land cover maps. These maps are typically derived from remote sensing or other geospatial data sources using classification techniques, and play a role in environmental monitoring, urban planning, and climate change studies, thus making accuracy assessment essential to their performance.
The accuracy is typically assessed by comparison with reference data. These data are usually ground-based data or high-resolution imagery that is considered to represent the "true" land cover. Comparison of land cover maps with reference data can help identify misclassifications, and is often quantified using metrics such as overall accuracy, user's and producer's accuracy, and the Kappa coefficient.
In addition to validating individual maps against reference data, accuracy assessments can compare different land cover products to evaluate their relative accuracy and suitability for various applications.

Sampling strategies

Sampling refers to the procedure of selecting reference data. There are several common sampling strategies:

Simple random sampling: Each unit in the population has an equal probability of selection. This method is fast and widely applicable, but it may result in insufficient representation of rare land cover classes.
Stratified random sampling: Samples are grouped into strata and each sample is drawn from a single stratum. This ensures proportional representation of each category.
Systematic sampling: Samples are selected at regular spatial intervals to ensure spatial balance. However, this method may introduce bias if the interval repeats a pattern.
Clustered sampling: The population is divided into groups or clusters. This approach is cost-effective.

Sample size selection

Selecting an appropriate sample size is an essential step in the validation design of land cover mapping. Two common ways to decide sample size are:

Cochran’s equation: Estimate the total required sample size considering both confidence level and error margin.
A stratified formula: Use the overall sample size while also considering the permissible error level and the land cover proportion.

Sample interpretation

Sample interpretation refers to the assignment of a land cover class to each sample unit. There are several common sampling interpretation approaches:

Manual interpretation: Sample labeling is done by experts using optical or satellite imagery. This approach can provide high-quality labels, but it is time-consuming and does not scale well.
Automated labeling: Algorithms or existing maps are used to assign classes. It is faster and more scalable for processing large data sets, but may require manual inspection.
Crowdsourcing: Public volunteers label samples via platforms such as GeoWiki. It allows large-scale labeling, but label quality may vary.

Accuracy metrics

There are many quantitative metrics used to assess the accuracy of land cover maps. These metrics are usually derived from a confusion matrix, which summarizes the agreement between the classified map labels and the reference labels for a sample set.

Overall accuracy (OA)

Overall accuracy is an overall indicator, calculated as the proportion of correctly classified samples to the total number of samples.
Sometimes, it is valuable to report class-wise accuracy as well.

User's accuracy (UA), Producer's accuracy (PA) and F1-score

User's accuracy and producer's accuracy are class-wise indicators.
User's accuracy represents the probability that a pixel classified as a specific land cover class on the map actually corresponds to that class on the ground. Its complementary measure corresponds to the commission error.
Producer's accuracy indicates the probability that a reference pixel of a specific land cover class is correctly classified on the map. Its complementary measure corresponds to the omission error.
UA and PA can also be averaged separately to provide an overall perspective of classification performance from the user's and producer's perspectives.
The F1-score combines UA and PA into one metric to measure the trade-off between them. It is the harmonic mean of UA and PA, where the relative contributions of the two metrics are equal.

Kappa coefficient

The Kappa coefficient accounts for both omission and commission errors, as well as the possibility of chance agreement between the land cover maps and the reference data. Kappa values range from -1 to 1, and common rules of thumb for its interpretation are as follows:

Kappa value	Strength of agreement
< 0	Poor agreement
0–0.20	Slight agreement
0.21–0.40	Fair agreement
0.41–0.60	Moderate agreement
0.61–0.80	Substaintial agreement
0.81–1.0	Perfect agreement

Confidence intervals

Since accuracy metrics are often sample-based, they are subject to uncertainty. The uncertainty of an estimate can be expressed by calculating its standard error or reporting a confidence interval. A confidence interval provides a range of values for a parameter, accounting for the uncertainty of the sample-based estimate.

Comparative evaluation

In addition to assessing the accuracy of a single land cover product, many studies also conduct comparative evaluations across multiple land cover products. These products often differ in input data, classification schemes, or classification algorithms. Therefore, comparative evaluation is particularly important for understanding the consistency, differences, complementarity, and usability of these datasets.
Comparative evaluation is usually conducted in the following ways:

Harmonize land cover class definitions.
Conduct qualitative comparisons by visual inspection of different land cover maps.
Perform quantitative assessments using a common reference dataset and assessment metrics.

Recent studies have compared high-resolution land cover products such as ESA WorldCover, Esir Land Cover, and Google's Dynamic World to assess their relative accuracy and thematic consistency across different regions and land cover types. These efforts help users make informed choices when selecting products for specific purpose.