Analyze data limitations
Last updated
Was this helpful?
Last updated
Was this helpful?
After identifying relevant data resources, the assessment team should analyze their limitations on the platform using a set of analysis criteria. The limitations analysis aims to:
Eliminate unsuitable datasets early: Quickly disqualify datasets that are not sufficiently relevant or have critical limitations, such as outdated information or low spatial resolution, to avoid unnecessary effort on a full suitability evaluation.
Document dataset characteristics: Record the extent to which each dataset meets the requirements of the associated metric and document any limitations.
Select the best datasets: Choose the most suitable datasets from available options, ensuring they provide the most reliable and accurate information for evaluating metrics.
The goal is to select datasets that either independently or in combination with other datasets deliver reasonably accurate results for the intended metrics. This ensures the assessment uses the best available information for the landscape.
The LandScale platform enables the assessment team to identify any limitations for each dataset based on eight key criteria. For each criterion, the team should document the specific limitations encountered. This documentation is important for clearly outlining the assessment team's considerations when using the data, informing the validation process and metric result limitations, and streamlining data selection efforts for future assessments.
In some cases, datasets may only partially fulfill a metric’s measurement (e.g., spatial data covering part of the landscape). These datasets can still be used if they are intended to be combined with others to meet the full scope of the metric. When this occurs, the team should document the data limitations, noting that partial datasets can still contribute meaningfully to the assessment when used in combination with other resources.
Data for future assessments
Documentation of all data investigated and the outcomes of the data limitations analysis within the LandScale platform will enable users to seamlessly avoid unsuitable datasets during reassessments.
However, caution must be exercised when there are significant differences in the characteristics of the data used across assessments. For example, when calculating the amount or rate of ecosystem conversion, newer data with better precision may result in a change measurement that reflects improved precision rather than actual change—leading to significant over- or under-calculation of the true conversion.
It is also important to identify whether a metric definition has changed (e.g., between LandScale framework versions) to ensure that the correct data is used for the specified measurement. If a metric definition has changed, the assessment team may propose using the original metric as an alternative to maintain continuity and comparability of results across assessments, rather than adopting the new metric definition.
Relevance: How is this resource thematically relevant to a LandScale assessment?
The dataset should be thematically relevant to the metrics it is intended to measure. It does not necessarily need to provide a full or direct measure the metric, but it should address the overall topic.
When using proxy datasets, consider how closely the proxy measure aligns with the intended metric. If only proxy data is available, you may need to revisit the metric and consider proposing an alternative metric.
Reputability: Does the author of this resource have a good track record of producing unbiased, high-quality datasets? Is the resource accompanied by sufficient documentation?
To evaluate the reputability of a dataset, assess whether it was collected and assembled using standardized or widely accepted methods. Check for potential conflicts of interest or biases from the data owner or developer that could affect the dataset's validity. Ideally, the developer should have a proven track record of producing unbiased, high-quality datasets.
Consider the organization hosting or providing the dataset as well. Large data providers and clearinghouses often have well-established criteria for selecting and vetting datasets, which can be a positive indicator of reputability. However, each dataset should still be evaluated on its own merits, rather than relying solely on the reputation of the provider.
The dataset should also include adequate documentation (e.g., metadata) explaining the data fields, attributes, collection methods, and any limitations or disclaimers regarding data quality. Review this documentation to determine whether the dataset reflects reasonable attention to accuracy, precision, and quality control.
Spatial coverage: Does this dataset provide adequate geographic coverage?
Ideally, a dataset should cover the entire landscape. However, when no single dataset provides full coverage for a given metric, multiple datasets may be combined, each covering a portion of the landscape. This approach is particularly useful when the landscape boundary (e.g., a municipality) does not align with the units of measurement for a given metric (e.g., water-related outcomes measured at the catchment level). When combining datasets, ensure that they are comparable and create a coherent picture of the entire landscape.
Remote sensing data may have gaps due to factors like cloud cover. The assessment team should evaluate how these gaps might impact the representativeness and accuracy of the entire dataset. If gaps significantly affect the dataset's quality, the assessment team should explore ways to mitigate this—such as finding supplemental data for the 'no data' areas—or, if the gaps are too large, may need to reject the dataset entirely.
Spatial resolution: What resolution (pixels, meters) is this dataset applicable at?
The spatial resolution of the data must be fine enough to accurately measure the metrics for which it is used. For example:
For land-related metrics derived or informed by remote sensing or land use/land cover maps, the resolution should generally be no coarser than 100 meters, with 30 meters or better being preferred.
For human well-being and governance metrics, the data should cover the entire landscape, sub-units within it, or corresponding comparably sized areas (e.g., municipalities).
If the dataset does not meet these criteria, use the data with the best available spatial resolution and document the limitations. In some cases, data can be down-scaled to the landscape level using advanced analytical models, but this should only be done if the necessary expertise is available.
Temporal coverage: Is this dataset sufficiently recent, and does it cover the necessary timespan?
Currentness refers to the date of the most recent observations within a dataset. The dataset should be current enough to provide an accurate representation or approximation of the present conditions at the time of the LandScale assessment. Thresholds for currentness depend on the nature of the phenomenon being measured, particularly its rate of change in the context of the landscape. Temporal coverage refers to the time range of a time-series dataset, defined by its start and end dates. For example, time-series data showing changes in agricultural productivity in a given area may be available from 2000 to 2020. Temporal coverage is most relevant for metrics that track changes over time (e.g., 1.1.2.1) or that require multi-year averages to smooth out inter-annual variation (e.g., 1.3.1.1).
If the metric requires time-series data, the dataset must provide sufficient temporal resolution (e.g., daily, monthly, seasonally, annually) and consistency in data collection over time to ensure accurate analysis of status and trends. Preference should be given to datasets that are expected to be regularly updated so that they will continue meeting these temporal criteria in subsequent assessments. Such datasets will ensure continuity and relevance for ongoing and future trend analyses.
The assessment team should consider potential trade-offs between temporal characteristics and other criteria. While the most current data is generally preferred, there may be trade-offs with other characteristics such as spatial resolution and degree of disaggregation. In some cases, an older dataset may be preferable: in landscapes with little conversion of natural ecosystems, an older ecosystem map may offer higher spatial resolution and more detailed ecosystem classifications than a more recent map that lacks these qualities.
Disaggregation: What level of disaggregation does this dataset provide?
If the assessment team identifies secondary datasets that do not provide sufficient disaggregation but are otherwise suitable, it may be worth contacting the data provider to inquire if disaggregated data are available. Some datasets are published with aggregated data but the disaggregated source data might be available upon request or special arrangement.
If the dataset lacks the desired degree of disaggregation but is otherwise the most or only suitable dataset, the assessment team may decide to use it, but they should document the disaggregation limitation. Alternatively, they may propose an alternative metric that the available dataset is better suited to measure.
Sampling: Does this dataset have adequate (or the best available) sampling of the given population or phenomenon?
For datasets that rely on sampling and statistical methods to infer population values, the assessment team should evaluate the sampling frame, design, methods, and sample size to determine if they are appropriate. What constitutes 'appropriate' will depend on the metric, the degree of heterogeneity of the metric within the landscape, and adherence to data collection best practices. For example:
Sampling frame: Review how well the sample group represents the full population for the given parameter. Greater heterogeneity within the population typically requires a more sophisticated sampling design and/or a larger sample size to ensure representativeness.
Sampling design and methods: Assess whether the sampling methods (e.g., random, systematic, stratified, convenience) used by the data developer provide suitable representation and sufficient control for bias.
Sample size: Determine if the sample size is large enough to provide reliable results with sufficient statistical power and precision. Consider whether co-variables have been accounted for appropriately in the sampling method, stratification, and analysis.
Consult the dataset’s documentation (metadata) to evaluate its sampling scheme, representativeness, and any limitations. No sample-derived dataset will perfectly represent the full population. By understanding the sampling approach, the assessment team can avoid processing the data beyond its credible limits of inference.
Accessibility and cost: Does this dataset have adequate or better accessibility and cost relative to alternative datasets, or will it limit publication of the results?
It is generally preferred to select datasets that are in the public domain, as these can be referenced in LandScale assessment reports and examined by any interested user of the results. However, in some cases, the most suitable datasets may be unpublished, proprietary, or confidential. If this is the case, the dataset may be used, provided its source and attributes can be reviewed by the assessment team (and, if validation is being sought, by the LandScale team and local reviewers). This review ensures that the data are appropriate for the metrics they are intended to measure. When candidate datasets are only available for a fee, the assessment team should weigh the trade-offs between cost and data quality. It is advisable to allocate a budget for data procurement in LandScale assessments, as some expenditure is typically necessary, and to review all potential datasets that require funding. This will help the assessment team determine how best to allocate resources to optimize data availability and quality across all indicators and metrics.
If the best dataset for a given metric is unpublished, proprietary, or private, the assessment team should explore whether a data-sharing agreement or another arrangement can facilitate the dataset’s use. Such agreements can meet the data owner’s needs while enabling the necessary LandScale quality reviews and validation. Before seeking validation, the assessment team must ensure they have permission from the owner of any non-public datasets to publish results derived from them.
The dataset should include at least the degree of disaggregation specified within the metric description and associated explanation in the . This means the dataset should break down the data into categories or units that are necessary for the given metric. For example, if the metric requires disaggregation by region or demographic characteristics (e.g., age, gender, income), the dataset should include this level of detail.