A raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020

Mateen Mahmood; Paula Moraga

doi:10.12688/f1000research.164537.2

Home Browse A raster-based dataset for spatio-temporal analysis of forest fires...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Data Note

Revised

A raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020

[version 2; peer review: 1 approved, 2 approved with reservations]

Mateen Mahmood¹, Paula Moraga ¹

PUBLISHED 29 Jan 2026

Author details Author details

¹ Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Makkah Province, 23955-6900, Saudi Arabia

Mateen Mahmood
Roles: Data Curation, Investigation, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Paula Moraga
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Ecology and Global Change gateway.

Abstract

Forest fires are becoming increasingly common worldwide, posing a threat to the environment, economy, and society. Spatiotemporal analysis of forest fires is important to understand their characteristics and causes and to inform decision-making. This type of analysis requires the availability of a number of factors that contribute to fire occurrence, such as land use, environment, climate, and human activities, at high spatial and temporal resolutions. The South American Amazon rainforest covers a large area, and acquiring a useful dataset for analysis requires extensive effort and computer-intensive processing. This study investigates potential data sources, establishes a methodology, and prepares a dataset of attributes useful for spatiotemporal fire analysis. We provide a raster-based dataset that includes fires, land use, environment, and climate factors at a spatial resolution of 500 m and monthly temporal resolution from 2001 to 2020, which facilitates the analysis of forest fires in the Amazon. Moreover, because data sources and implementation procedures are detailed, this work also encourages similar research in other parts of the world.

Keywords

Amazon; Fires; Burnt Area; Land Cover; Elevation; Precipitation; Humidity; Temperature

Corresponding author: Paula Moraga

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2026 Mahmood M and Moraga P. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Mahmood M and Moraga P. A raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020 [version 2; peer review: 1 approved, 2 approved with reservations]. F1000Research 2026, 14:916 (https://doi.org/10.12688/f1000research.164537.2) First published: 12 Sep 2025, 14:916 (https://doi.org/10.12688/f1000research.164537.1) Latest published: 29 Jan 2026, 14:916 (https://doi.org/10.12688/f1000research.164537.2)

Revised Amendments from Version 1

This version adds the full data product names and IDs, clarifies the data integration and resolution procedures, and expands the description of the coordinate transformation process. The data processing and technical validation section has been updated to better highlight quality control procedures. It also includes several rephrased and clarifying statements to improve overall clarity.

See the authors' detailed response to the review by Mahlatse Kganyago
See the authors' detailed response to the review by I NENGAH SURATI JAYA

Introduction

The alarming increase in the frequency and severity of forest fires around the globe has become a significant threat to forested areas worldwide. These wildfires not only threaten human lives and their properties but also continue to contribute to the reshaping of local and global ecosystems. Because of their varying spatiotemporal nature at multiple scales, they are substantially diverse in their frequency, size, intensity, and pattern.¹ Similarly, the source of ignition is an amalgamation of numerous aspects such as weather, climate, land use, and other causes such as lightning, volcanic eruptions, rockfalls, and combustion material.² This constant vulnerability of forests exposed to wildfires is horrifying, but when considered in the context of ecological and socio-economic consequences, it poses a major challenge to fire management authorities and related stakeholders.³

To ensure better preparedness and deploy improved preventive measures, the spatio-temporal relations between the probable causes of wildfires and the characteristics of those fire incidents must be analyzed. Such analysis will not only assist with mitigation but may also aid in the prediction and forecasting of future events by better understanding the underlying events propagating fire occurrences.⁴ Such in-depth spatio-temporal statistical investigations of these complex interactions require the collection of all available associated attributes, combined from heterogeneous sources (with varying extents, spatial scales, temporal resolutions, file formats, etc.) into a processed unified structure available in the form of common specifications.

The South American Amazon is one of the largest rainforests in the world⁵ and hosts thousands of wildfires annually.⁶ Despite numerous studies related to spatio-temporal statistical analysis of forest fires in many regions of the world,^2,4,7–9 there remains a notable scarcity of basin-wide, multivariate longitudinal studies for the entire Amazon region. While some research has addressed specific drivers of ignition, existing Amazon-specific studies tend to be limited to sub-regions or specific administrative boundaries.^3,10 For a study area of this size, data collection is a time-intensive task, with exhaustive pre-processing requiring cumbersome setups. Hence, the development of an Amazon-wide database that includes all available attributes related to fires, integrated into a common format, is required.

The aim of this work is to provide a scientific community with a dataset related to spatiotemporal forest fire analysis for the Amazon region. The dataset includes historical data of 20 years (2001-2020) in a monthly temporal resolution for the complete extent of the Amazon region at a spatial scale of 500 m. Because the study area of the entire Amazon rainforest is large, the raw data sources must be at a global or regional level (in South America). Otherwise, data for the same attribute are expected to be gathered from multiple local-level sources, raising concerns regarding data integrity. Global- and regional-level satellite-based raster products were acquired and further clipped for the South American region to compute three types of data: (a) raw data, (b) pre-processed data and (c) working data. A schematic overview of this study is presented in Figure 1. Raw data refer to data file(s) extracted from the accessed data packages (i.e., data layer of the subject attribute, taken out from the data package containing various other attribute layers as well). The extracted attribute layers have varying spatial resolutions, dissimilar spatial extents, different spatial projections, and inconsistent file formats. Raw data are pre-processed to acquire pre-processed data, with the attribute layers in a consistent file format and with the same projection system. Finally, all attribute layers are processed to obtain working data, with the data extent confined to the Amazon region and with fixed spatial resolution, such that each raster cell of an attribute layer aligns exactly over the raster cell of the other attribute layer.

Figure 1. Schematic overview of the data processing process.

This manuscript presents the complete process of data collection for raster-based attributes of forest fires in the Amazon rainforest, along with a description of the methodological baseline and details of the implementation process. The availability of such a ready-made dataset with a detailed methodology of data collection and computer-intensive preprocessing procedures will be useful to many researchers working in the domain of forest fire analysis. For example, this dataset has been used to map the geographic and temporal distributions of burned areas and risk factors in the Amazon from 2001 to 2020 using an ensemble approach that harnesses a range of machine learning algorithms.¹¹ Furthermore, this dataset provides encouragement for developing similar datasets tailored to varying study regions, spatial resolutions, and research domains.¹²

Methods

The Amazon rainforest has an area of over 5.2 million square kilometers, covers approximately one-third of South America, and extends into eight countries.⁵ Within this region, data management authorities in each country generally focus on their own regions. To create a database for the entire extent of the Amazon rainforest and to ensure that all relevant areas of potential importance are included in the study area, we defined the study area for this work as the entire Amazon basin, as shown in Figure 2. The extent of the study area can be defined as -79.43629, -18.00816: -44.49108, 8.66346 with the coordinate reference system EPSG:4326 - World Geodetic System (WGS) 84 - Geographic. For spatiotemporal modeling, the selection of the data period needs to have a considerable temporal range as well as data availability for the chosen period. A review of the literature related to spatiotemporal modeling of forest fires, as summarized in Table 1, indicates that a period of 5-30 years with monthly or yearly frequency is used for the temporal characterization of forest fires. Keeping in view what is available for the Amazon Rainforest (for the whole region), we decided to proceed with a data period of 20 years from 2001 to 2020, with a monthly frequency as the temporal resolution. The spatial resolution was finalized as 500 m for the final spatial grid. This is based not only on the available data for the Amazon Rainforest but also on the computational complexity involved in a study area of approximately 5 million square kilometers.

Figure 2. Study area of Amazon rainforest.

Amazon boundary obtained from.²⁰

Table 1. Summary of study characteristics from previous works related to forest fire analysis.

Reference	Study region	Study area	Data period	Temporal resolution
A⁴	Southern France	40,000 sq.km	1995-2018	Monthly
B²¹	Autazes, Brazil	7,632 sq.km	1985-2015	Monthly
C²²	South Korea	99,720 sq.km	1980-2000	Annual
D^2,7,23	Catalonia, Spain	30,000 sq.km	2004-2008	Multi-Year
E⁸	Castellon, Spain	6,632 sq.km	2001-2006	Multi-Year
F²⁴	Islamabad, Pakistan	158 sq.km	2005-2018	Multi-Month
G²⁵	California and Nevada, USA	120,000 sq.km	1984-2006	Multi-Month

In addition to the study design involving spatial resolution, temporal frequency, and spatial data extent, another equally important aspect is the selection of covariates. These variables can be broadly categorized as attributes related to land use, climate, the environment, topography, and human activities. Land use and land cover (LULC) variables are highly related to forest fires, as the type of land surface not only determines fire ignition but also its propagation. Climatic variables, such as humidity, precipitation, wind speed, and temperature, also influence the occurrence of forest fires. Topographic variables such as elevation, slope, and aspect are also of core importance as they regulate how quickly a fire will move up or down the hills. Finally, human activities also play a critical role in the initiation of forest fires. Hence, variables such as population density, buildings, and the urban-forest interface are of high significance. Table 2 summarizes the list of potential forest fire analysis attributes discussed in the literature.

Table 2. Summary of study attributes from previous works related to forest fire analysis.

Reference	Description of attributes
F^3,8,9,26,27	Land Use Effects/Vegetation Type/Deforestation/Forest Type/Land Cover
G^4,9,26	Population Density/Housing Density/Buildings
H^3,4,26,27	Elevation, Slope and Aspect
I⁹	Humidity
J⁹	Wind Speed
K^4,9,26,27	Temperature
L^4,9	Precipitation
M⁸	Isothermality
N⁴	Protected Zones
O^3,8,9,26	Road Density, Distance to Road
P³	Maximum Cumulative Water Deficit
Q^3,8	Soil Type/Soil Texture/Soil Permeability

Data collection

From the list of attributes identified from the literature as potentially related to forest-fire analysis ( Table 2), not all of them are available for the entire Amazon Rainforest, let alone for the study period 2001-2020. Specifically, variables such as protected zones, isothermality, and maximum cumulative water deficit were only available for certain regions and for a particular time period. Similarly, elevation-related attributes were only available for certain years between the period 2001-2020. In this study, attributes that were available for the complete Amazon region and for the selected time period of 2001-2020, are identified and further acquired, as detailed in Table 3, with Date of Access: 01 May 2022. This section details the complete data-acquisition process related to each collected attribute.

Table 3. Summary of collected attributes related to forest fire analysis, with original temporal resolution of monthly frequency (except Land Cover which is Annual, and Elevation which is One time).

These attributes were pre-processed to acquire working data at 500 meters and monthly resolution, for the period of 2001 to 2020.

S#	Variable name	Description	Spatial resolution	Source
1.	Burnt Area	Classes (Burnt, Not Burnt, Water)	500 meters	MODIS²⁸
2.	Land Cover (Annual)	11 Classes of Land Cover	5,600 meters	MODIS²⁹
3.	Precipitation	Average rate of precipitation	10,000 meters	GES-DISC³⁰
4.	Soil Moisture	Model-calculated	37,000 meters	CPC³¹
5.	Elevation (One-time)	Based on Digital Elevation Model	1,000 meters	EarthEnv³²
6.	Land Surface Temperature	Daytime observations	5,000 meters	MODIS³³
7.	Specific Humidity	Model-calculated	1,000 meters	GES DISC³⁴
8.	Evapotranspiration (ET)	Model-calculated	1,000 meters	GES DISC³⁴
9.	Near Surface Wind Speed	Model-calculated	1,000 meters	GES DISC³⁴
10.	Near Surface Air Temperature	Model-calculated	1,000 meters	GES DISC³⁴

Burnt Area (BA)

The data product acquired was MODIS/Terra+Aqua Direct Broadcast Burned Area Monthly L3 Global 500 m SIN Grid V006 MCD64A1 Version 6.1, which is a gridded burnt area product at a resolution of 500 m, available in Hierarchical Data Format (HDF) format. The product provides the date of burn (in the form of the day of the year) for individual cells with additional classes, such as unburnt, missing data, and water. The data product is available for the period 2000 to the present (2022), with global spatial coverage in the form of regional subsets. The layers extracted from the data source are for regions 5 and 6, which cover the Amazon area. The data layer values are in units of a day, with a valid range of data values as between 1-366 (representing the day of the year). Further details related to the product, including the quality assessment and known issues, are available at MODIS MCD64A1 (https://lpdaac.usgs.gov/products/mcd64a1v061/ ).

As the burnt area product is available at the regional level, an additional data processing step for the burnt area product is the merging of two separate regional-level products to cover the entire region of the Amazon basin boundary. Additionally, the data were re-classified to assign a single value of 1 to all burn dates (1-366) to identify the cell with burn data as simply burnt. Hence, working data has four classes (burnt, unburnt, missing, and water) with values (1, 0, -1, and -2), respectively.

Land Cover (LC)

The data product acquired was MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG V006 MCD12C1 Version 6, which consists of three gridded land cover classification schemes at a resolution of 5,600 m, available in the HDF format. The three available classification schemes include Maps of the International Geosphere-Biosphere Programme (IGBP) providing 17 classes, University of Maryland (UMD) providing 16 classes, and Leaf Area Index (LAI) providing 11 classes. LAI classification schemes are extracted from the data product as 11 classes are sufficient for representation of different land covers in terms of Water, Urban, Forest, Grassland, etc., and additional classes available in other schemes are further subdivisions of forests and grassland types. The data product is available for the period 2000 to the present (2022) with global spatial coverage. The details of the land cover classes of the LAI scheme are provided in Table 4. The name of the layer extracted from the data source is Land Cover Type-3, with a range of data values between classes 0 and 10. Further details related to the product, including the quality assessment and known issues, are available at MODIS MCD12C1 (https://lpdaac.usgs.gov/products/mcd12c1v006/ ).

Table 4. Class details of Leaf Area Index (LAI) classification scheme, from MODIS.²⁹

Class name	Value	Description
Water Bodies	0	Permanent water bodies
Grasslands	1	Dominated by herbaceous annuals (<2 m)
Shrublands	2	Shrub (1-2 m)
Broadleaf Croplands	3	Dominated by herbaceous annuals (<2 m) - cultivated with broadleaf crops
Savannas	4	From 10% to 60% tree cover (>2 m)
Evergreen Broadleaf Forests	5	Dominated by evergreen broadleaf and palmate trees (>2 m)
Deciduous Broadleaf Forests	6	Dominated by deciduous broadleaf trees (>2 m)
Evergreen Needleleaf Forests	7	Dominated by evergreen conifer trees (>2 m)
Deciduous Needleleaf Forests	8	Dominated by deciduous needleleaf (larch) tree (>2 m)
Non-Vegetated Lands	9	Non-vegetated barren (sand, rock, soil) /permanent snow and ice
Urban and Built-up Lands	10	Impervious surface area including building materials, asphalt, and vehicles
Unclassified	255	Missing inputs

Precipitation

The data product acquired is Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement)-based multi-satellite precipitation product, Version 06 B, available in Hierarchical Data Format version 5 (HDF5) format. The product provides a monthly product of average precipitation rates at a 0.1 °× 0.1 ° (approximately 10,000 m at the equator) spatial resolution, estimated from numerous precipitation-relevant satellite passive microwave (PMW) sensors. The dataset is available for 2000–2021 with global spatial coverage. The values are represented in millimeters per hour (mm/hr), with a scale factor of 1000 and missing values marked with -9999. Thus, a value of 500 indicates 500/1000 mm/h. Further details related to the product are available at the GES-DISC GPM IMERG Final Precipitation L3 ( https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGM_06/summary ).

Soil moisture

The data product acquired is a model-calculated (not directly observed) averaged soil moisture water height equivalent, namely CPC Soil Moisture Version 2, available in the GEOTIFF format. The data are a monthly product of 0.5 °× 0.5 °(approximately 37,000 m at the equator) spatial resolution, with data available from 1948 to the present (2022). The spatial coverage of the product is 89.75N–89.75S, 0.25E–359.75E. The values are represented in millimeters (mm), with missing values marked as -9999. Further details related to the product are available at CPC Soil Moisture (https: //psl.noaa.gov/data/gridded/data.cpcsoil.html ).

In the preprocessing of the Soil Moisture data product, data transformation is implemented as an additional step. As the source data have a spatial offset, not aligning with the reference base map, the data are transformed to correct alignment using the Geospatial Data Abstraction Library (GDAL).¹³

Elevation

The acquired data product is a global multivariate package related to terrain features, which can serve many large-scale research publications. The data product is based on a 250 m Digital Elevation Model (DEM), available in Tagged Image File Format (TIF) format, from Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010).¹⁴ This data product provides many topographic variables, such as elevation, slope, aspect, northness, elasticity, roughness index, and topographic position index at different resolutions of 1, 10, 50, or 100 km, with global spatial coverage; however, our focus is only on elevation. The Elevation values are represented in meters (m). Further details related to this product are available at ( https://www.earthenv.org/topography ).

Land Surface Temperature (LST)

The data product acquired was MODIS/Terra Land-Surface Temperature/Emissivity Monthly Global 0.05Deg CMG MOD11C3 Version 6, which is a monthly Land Surface Temperature & Emissivity (LST&E) value product at a spatial resolution of 0.05 ° (approximately 5,600 m), available in the HDF format. The data product provides values for both daytime and nighttime observations, along with other details related to the quality assessment. The data product is available for the period 2000 to the present (2022) with global spatial coverage. The temperature values are represented in kelvin (K), with a scale factor of 0.02 and a range of values between 7,500 and 65,535. Thus, the LST value equal to X represents X*0.02 kelvin. Further details related to this product are available at MODIS MOD11C3 ( https://lpdaac.usgs.gov/products/mod11c3v006/ ).

Specific humidity, Evapotranspiration (ET), wind and air temperature

The acquired data provides a set of parameters related to land surface observations. The data is a simulation-based product of the Noah 3.6.1, model from Famine Early Warning Systems, Network (FEWS NET) Land Data Assimilation System (FLDAS). All the provided variables are available as a monthly product in a 0.10 degree spatial resolution (approximately 1,000 m at the equator) and available (as a layer) in NETCDF file format. The dataset is available for the period from 1982 to the present (2022) with global spatial coverage. The values of Specific Humidity are represented as (kg/kg), using a ratio between kilogram of water (moisture) per kilogram of air; whereas Evapotranspiration, Wind and Air Temperature are measured in (kg/m²s), (m/s) and kelvin (K), respectively. Further details related to the product are available at the GES DISC-FLDAS Noah Land Surface Model L4 (https://disc.gsfc.nasa.gov/datasets/FLDAS_NOAH01_C_GL_M_001/summary ).

While these model-derived variables (including Soil Moisture) introduce inherent numerical uncertainties compared to direct field observations, they are incorporated to expand the suite of available environmental covariates, providing the multivariate depth necessary for robust spatiotemporal analysis across the Amazon. As the primary objective of this work is the curation and standardization of a basin-wide Amazon dataset, a formal independent uncertainty analysis remains beyond the scope of this data curation effort. By providing these products in a common, analysis-ready format, this work establishes the necessary foundation for future studies to conduct such analytical sensitivity assessments and empirical validations.

Data processing

All of the various attributes collected in the database have different spatial resolutions, as described in Table 3. Similarly, not all variables are available at monthly resolution, as Land Cover and Elevation are annual and one-time, respectively. Moreover, all of these variables cover different spatial extents and have dissimilar spatial orientations. To obtain a dataset with all the variables at a fixed spatial extent and resolution, we constructed a spatial grid of 500m resolution covering the Amazon region and obtained the cell values for this raster following the steps described below. Similarly, we executed the process to achieve a monthly temporal resolution for all variables, with the data period from 2001 to 2020.

To achieve temporal harmonization across these differing frequencies, we adopted a temporal expansion framework where annual and static values are mapped consistently across the corresponding twelve monthly increments of each year. This ensures that every monthly snapshot in the 240-month time series contains a complete suite of environmental covariates, enabling dynamic analyses such as fire risk modeling or temporal trend assessment. This approach maintains temporal sensitivity by allowing dynamic climate variables to fluctuate monthly while the slower-evolving landscape attributes such as Land Cover provide a stable structural context for each year.

Although the collected data packages for different attributes have heterogeneous specifications, their processing generally follows a common workflow. A methodological baseline of the processing steps is shown in Figure 3. Specifically, Accessed Data refers to the downloaded data package from data sources in various formats, such as HDF, HDF5, NETCDF, GEOTIFF, and TIF. Accessed data in source data formats, such as HDF, HDF5, and NETCDF, contained several layers with different attributes, and the layer related to the subject attribute was extracted from this set of layers. Accessed data with the source data formats of GEOTIFF or TIF contained only the required layer that was extracted. These extracted layers are referred to as the raw data.

Figure 3. Overview of methodology for data processing.

To resolve the inherent inconsistencies in spatial orientation and resolution, we developed a standardized processing framework. First, all raw data layers are projected onto EPSG:102033-South America Albers Equal Area Conic, to acquire the Projected Layer. This equal-area coordinate reference system was specifically chosen to maintain geometric fidelity across the vast longitudinal expanse of the Amazon basin,¹⁵ minimizing the distortion of area and shape that typically occurs in global projections. To correct for grid misalignments and boundary distortions caused by the differing source orientations of the datasets collected, we introduced a standardized spatial grid as a master template. By mapping all attributes onto this fixed spatial grid, we ensured that every pixel across all variables represents the exact same geographical footprint, thereby eliminating spatial offsets and ensuring seamless interoperability between datasets.

The projected layers are at either a global or regional-level (based on the specifications of the data source), and to confine them all to the Amazon Basin boundary, these layers were further clipped using a shapefile-based (vector) Amazon Basin boundary. This clipped layer is labelled as pre-processed data.

Although all layers are cropped to the Amazon basin boundary, their respective cells may not exactly align with each other owing to differences in their source data extent, cell-grid orientation, and spatial resolution. To obtain layers of the same spatial extent, resolution, and orientation, we executed a rigorous two-step disaggregation and resampling procedure to transfer cell values from the pre-processed data to the fixed spatial grid (master template). The spatial grid covered the entire Amazon Basin boundary and had a cell resolution of 500 m. The value of each cell was transferred to this grid for each attribute, and the process was repeated for all attributes, thereby creating a separate spatial grid for each attribute.

First, layers with resolution coarser than 500 m (ranging up to 37 km) were disaggregated to approximately 500 m. The disaggregation factor varied for each attribute, based on the spatial resolution of the source data. Following this, the terra:resample function in R was used to transfer information from the attribute layers to the fixed spatial grid. To minimize spatial inaccuracy in this environmentally heterogeneous region, the resampling method was tailored to the variable type: the ‘near’ (nearest neighbor) method was employed for Land Cover, Burnt Area, Soil Moisture, Specific Humidity, Evapotranspiration, Near Surface Wind Speed, and Near Surface Air Temperature to preserve original discrete values and categorical integrity. Conversely, ‘bilinear’ interpolation was used for Land Surface Temperature and Precipitation to accurately represent the continuous spatial gradients of these atmospheric phenomena.

The resulting spatial grids corresponding to each attribute constitute the final layers available for analysis, hence called working data. This workflow was followed for each monthly file (i.e., 240 files over 20 years) to achieve a consistent monthly temporal resolution from 2001 to 2020. By maintaining this monthly sensitivity for dynamic climate variables while holding slower-evolving landscape variables such as Land Cover constant within each annual cycle, the dataset remains sensitive to the immediate environmental drivers for fire while providing a stable structural context. While an analytical sensitivity assessment regarding the impact of integrated variable frequencies is a valuable research direction, such an analysis is beyond the scope of this data curation work.

Figure 4 illustrates an example of Land Surface Temperature in January 2020 for all three categories of raw data, pre-processed data and working data. Similarly, Figure 5 presents an example of a single monthly instance from January 2020 for all the variables collected.

Figure 4. Land surface temperature for January 2020.

Top: Raw data (Global), Bottom: Pre-processed data (cropped) and working data (re-sampled spatial grid).

Figure 5. Plots of the variables related to forest fires for the region of Amazon Rainforest in January 2020.

In terms of implementation, pre-processing work was completed using GIS software, and the processing work was executed in the statistical computing software R.^16,17 All data layers were managed using the SpatRaster data structure in the terra package,¹⁸ ensuring a transparent, scripted workflow. This algorithmic approach serves as a systematic process log, minimizing accumulated errors across the 240 temporal layers and ensuring the reproducibility of the dataset.

Technical validation

The raster-based dataset of covariates presented in this study is a collection of established datasets that do not include any newly created data records. This work mainly focuses on exhaustive data search and its acquisition process, followed by computer-intensive pre-processing to develop a dataset for the Amazon region. As noted in the Data Collection section, these industry-standard source datasets have undergone rigorous independent validation, as documented in their respective technical documentation; therefore, a secondary validation against field observations is beyond the scope of this curation effort. To ensure process transparency, the dataset follows standardized naming conventions.

While the primary contribution of this work lies in workflow standardization and the resolution of dataset fragmentation, this technical framework serves as the essential infrastructure required for future algorithmic advancements. By delivering a harmonized, analysis-ready foundation, this study enables high-level environmental research and predictive modeling,¹¹ that were previously hindered by spatial and temporal data incompatibility. This standardized curation ensures that the resulting working data is fit-for-purpose for complex spatiotemporal analyses and provides a reproducible baseline for the wider research community.

License

The raster-based dataset of covariates presented in this study was published under a Creative Commons Attribution 4.0, International (CC BY 4.0) License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the authors and the source, a link to the license is provided, and it is indicated if changes were made.

Data availability

The dataset with all the collected variables related to forest fires is available at the Zenodo repository titled ‘Raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020’ (https://doi.org/10.5281/ zenodo.7215402 ).³⁵ The dataset comprises three folders for each of the ten variables, referring to the data categories of Raw Data, Pre-Processed Data and Working Data with names 01. Raw Data, 02. Pre-Processed Data and 03. Working Data, respectively. An additional Read Me document includes details regarding the coordinate system, data extent, and data sources. All files were in GEOTIFF format, which can be accessed using the statistical software R¹⁶ or any of the GIS software, such as Quantum GIS - QGIS (opensource) (https://www.qgis.org/en/site/ ), GRASS GIS (opensource) ( https://grass.osgeo.org/ ), or ArcGIS (proprietary) (https://www.arcgis.com/index.html ).

In the case of Land Cover, which is annual-based data, the filename includes the variable short name (Landcover), the respective data category (raw for raw data, preproc for pre-processed data, or working for working data), and the year (2001–2020):

[Landcover]_[data_category]_[year].tif

In the case of elevation, which is only one-time data, the filename includes the variable’s short name and the respective data category:

[Elevation]_[data_category].tif

For all other variables, the filename includes the variable short name, the respective data category, and the year and month:

[variable_short_name]_[data_category]_[year]_[month].tif

To load and visualize the data in R, .tif files of any of the three categories can be loaded as a raster by using the raster¹⁹ or terra packages. The plot function of terra can be used to visualize the raster as follows:

r <- terra::rast(’<filepath/filename.tif>’) plot(r)

Similarly, to visualize the data in Quantum GIS (QGIS), .tif file can be loaded to select the raster option in the Data Source Manager:

[Data Source Manager > Raster > (filepath)]

References

1. Pimont F, et al.: Prediction of regional wildfire activity in the probabilistic bayesian framework of firelihood. Ecol. Appl. 2021; 31: e02316. PubMed Abstract | Publisher Full Text
2. Serra L, Juan P, Varga D, et al.: Spatial pattern modelling of wildfires in catalonia, spain 2004–2008. Environ. Model. Softw. 2013; 40: 235–244. Publisher Full Text
3. Dos Reis M, de Alencastro Graça PML , Yanai AM, et al.: Forest fires and deforestation in the central amazon: Effects of landscape and climate on spatial and temporal dynamics. J. Environ. Manag. 2021; 288: 112310. PubMed Abstract | Publisher Full Text
4. Opitz T, Bonneu F, Gabriel E: Point-process based bayesian modeling of space–time structures of forest fire occurrences in mediterranean france. Spatial Stat. 2020; 40: 100429. Publisher Full Text
5. Watson G: Amazon Rainforest. Weigl Publishers; 2019.
6. Bonilla-Aldana D, et al.: Brazil burning! what is the potential impact of the amazon wildfires on vector-borne and zoonotic emerging diseases? – a statement from an international experts meeting. Travel Med. Infect. Dis. 2019; 31: 101474. PubMed Abstract | Publisher Full Text
7. Juan P, Mateu J, Saez M: Pinpointing spatio-temporal interactions in wildfire patterns. Stoch. Env. Res. Risk A. 2012; 26: 1131–1150. Publisher Full Text
8. Aragó P, Juan P, Díaz-Avalos C, et al.: Spatial point process modeling applied to the assessment of risk factors associated with forest wildfires incidence in castellón, spain. Eur. J. For. Res. 2016; 135: 451–464. Publisher Full Text
9. Papakosta P, Straub D: Probabilistic prediction of daily fire occurrence in the mediterranean with readily available spatio-temporal data. iForest-Biogeosciences For. 2016; 10: 32.
10. Cano-Crespo A, Traxl D, Thonicke K: Spatio-temporal patterns of extreme fires in amazonian forests. The Eur. Phys. J. Special Top. 2021; 230: 3033–3044. Publisher Full Text
11. Abid M, Gonzalez JA, de Rivera OR , et al.: Mapping the spatio-temporal distribution of burned areas in the amazon from 2001 to 2020: An ensemble modeling approach. Environ. Ecol. Stat. 2025; 32: 707–734. Publisher Full Text
12. Moraga P, Baker L: rspatialdata: a collection of data sources and tutorials on downloading and visualising spatial data using r. F1000Res. 2022; 11: 770. PubMed Abstract | Publisher Full Text | Free Full Text
13. GDAL/OGR contributors: GDAL/OGR Geospatial Data Abstraction software Library. Open Source Geospatial Foundation; 2020.
14. Danielson JJ, Gesch DB: Global multi-resolution terrain elevation data 2010 (GMTED2010). DC, USA: US Department of the Interior, US Geological Survey Washington; 2011.
15. ESRI. Albers - arcmap.
16. R Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.
17. Moraga P: Spatial Statistics for Data Science: Theory and Practice with R. Data Science series. Boca Raton, Florida: Chapman & Hall/CRC; 2023.
18. Hijmans RJ: terra: Spatial Data Analysis. R package version 1.5-21. 2022.
19. Hijmans RJ: raster: Geographic Data Analysis and Modeling. R package version 3.5-21. 2022.
20. Amazon Basin Polygon: ESRI ArcGIS.Reference Source
21. dos Reis M , de Alencastro Graça PML , Yanai AM, et al.: Forest fires and deforestation in the central amazon: Effects of landscape and climate on spatial and temporal dynamics. J. Environ. Manag. 2021; 288: 112310. PubMed Abstract | Publisher Full Text
22. Kim SJ, et al.: Multi-temporal analysis of forest fire probability using socio-economic and environmental variables. Remote Sens. 2019; 11: 86. Publisher Full Text
23. Trilles S, Juan P, Diaz L, et al.: Integration of environmental models in spatial data infrastructures: A use case in wildfire risk prediction. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2013; 6: 128–138. Publisher Full Text
24. Tariq A, et al.: Forest fire monitoring using spatial-statistical and geo-spatial analysis of factors determining forest fire in margalla hills, islamabad, pakistan. Geomat. Nat. Haz. Risk. 2021; 12: 1212–1233. Publisher Full Text
25. Miller JD, Safford H, Crimmins M, et al.: Quantitative evidence for increasing forest fire severity in the sierra nevada and southern cascade mountains, california and nevada, usa. Ecosystems. 2009; 12: 16–32. Publisher Full Text
26. Serra L, et al.: Spatio-temporal log-gaussian cox processes for modelling wildfire occurrence: the case of catalonia, 1994–2008. Environ. Ecol. Stat. 2014; 21: 531–563. Publisher Full Text
27. Møller J, Díaz-Avalos C: Structured spatio-temporal shot-noise cox point process models, with a view to modelling forest fires. Scand. J. Stat. 2010; 37: 2–25. Publisher Full Text
28. Giglio L, Justice C, Boschetti L, et al.: MODIS/Terra+aqua burned area monthly L3 global 500m SIN grid V061.2021.
29. Friedl M, Sulla-Menashe D: MCD12C1 MODIS/Terra+Aqua land cover type yearly L3 global 0.05deg CMG V006.2015.
30. Huffman GJ, Stocker EF, Bolvin DT, et al.: GPM IMERG Final Precipitation L3 1 month 0.1 degree x 0.1 degree V06. Greenbelt, MD: Goddard Earth Sciences Data; 2019.
31. Fan Y, Van Den Dool H: Climate prediction center global monthly soil moisture data set at 0.5 resolution for 1948 to present. J. Geophys. Res.-Atmos. 2004; 109. Publisher Full Text
32. Amatulli G, et al.: A suite of global, cross-scale topographic variables for environmental and biodiversity modeling. Sci Data. 2018; 5: 1–15. Publisher Full Text
33. Wan Z, Hook S, Hulley G: MOD11C3 MODIS/Terra land surface Temperature/Emissivity monthly L3 global 0.05deg CMG V006.2015.
34. Mcnally A, Hsl N: FLDAS Noah Land Surface Model L4 Global Monthly 0.1 x 0.1 degree (MERRA-2 and CHIRPS). Greenbelt, MD, USA: Goddard Earth Sciences Data; 2018.
35. Mahmood M, Moraga P: Raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020 (Version 1.0). [Dataset]. Zenodo. 2022. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 12 Sep 2025

Author details Author details

¹ Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Makkah Province, 23955-6900, Saudi Arabia

Mateen Mahmood
Roles: Data Curation, Investigation, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Paula Moraga
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 29 Jan 2026, 14:916

https://doi.org/10.12688/f1000research.164537.2

version 1

Published: 12 Sep 2025, 14:916

https://doi.org/10.12688/f1000research.164537.1

© 2026 Mahmood M and Moraga P. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Mahmood M and Moraga P. A raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020 [version 2; peer review: 1 approved, 2 approved with reservations]. F1000Research 2026, 14:916 (https://doi.org/10.12688/f1000research.164537.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 29 Jan 2026

Revised

Views

Reviewer Report 02 Jun 2026

Suresh Babu KV Babu, University of Cyprus, Nicosia, Cyprus

Approved

https://doi.org/10.5256/f1000research.195424.r482336

This work presents a large-scale, harmonized, analysis-ready dataset designed for forest fire research in the Amazon basin. It integrates 20 years of monthly environmental and fire-related variables at a spatial resolution of 500 meters across the entire Amazon region. The dataset fills a major gap in basin-wide wildfire research by giving researchers a standardized and reproducible way to do multivariate spatiotemporal analysis.
By converting heterogeneous raw data into a consistent format, including projection, extent, and resolution, the study significantly facilitates researchers’ investigations into wildfire dynamics in the Amazon. Additionally, it offers a valuable methodological framework for creating similar datasets in other regions or for various environmental applications.

Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Yes
Are sufficient details of methods and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a useable and accessible format?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Wildfire risk modeling, fire danger prediction, Burned area, Fire forecasting, Fire detection, Burned area mapping etc.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 12 Sep 2025

Views

Reviewer Report 08 Nov 2025

Mahlatse Kganyago, University of Johannesburg, Johannesburg, South Africa

Approved with Reservations

https://doi.org/10.5256/f1000research.181063.r422841

A raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020
The authors present a curated dataset of the amazon basin including various landscape and climate variables necessary to model long-term trends of forest fires. The paper is generally well-written and methods can be replicated in other regions.
Abstract
“and monthly resolution” – I suggest that authors add temporal after monthly.
Introduction
“The South American Amazon is one of the largest…” – should start as new paragraph since it diverts from the idea communicated in this paragraph.
“similar studies do not exist for the Amazon region” – Please confirm through a comprehensive literature search. I doubt this is entirely accurate.
“this dataset encourages the creation of similar datasets” – please rephrase to improve clarity.
Methods
“[-79.43629, -18.00816: -44.49108, 8.66346]” – Should not be in brackets.
“MCD64A1” , “MCD12C1”, “MOD11C3” – please provide name of the product in addition to its ID or Acronymn.
“…acquired is (Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement (GPM)…” – Should not be in brackets.
[89.75N–89.75S, 0.25E–359.75E] - Should not be in brackets.

“…as – -9999.” – not clear, please consider removing the dash.

“Noah 3.6.1, model…” – the comma here creates fragmentation.

“…and are termed Working Data” – The term has been used above, but only defined here. Please consider defining a term when they are first mentioned.

“file (240 files over 20 years)” – please insert “i.e.,” in brackets.

Data availability
There is space on the data link which breaks the hyperlink. Please correct.
https://doi.org/10.5281/ zenodo.7215402

Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Yes
Are sufficient details of methods and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a useable and accessible format?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Remote sensing of vegetation

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 29 Jan 2026

Paula Moraga, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division, Thuwal, Saudi Arabia

29 Jan 2026

Author Response

Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added ... Continue reading Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added the word “temporal” to clarify the nature of the resolution.

“The South American Amazon is one of the largest…” – should start as new paragraph since it diverts from the idea communicated in this paragraph.
Response: A new paragraph has been created at this location to ensure a more logical flow of ideas.

“similar studies do not exist for the Amazon region” – Please confirm through a comprehensive literature search. I doubt this is entirely accurate.
Response: We have refined the text to clarify that while sub-regional studies exist, there is a scarcity of integrated, basin-wide, multivariate studies for the entire Amazon region.

“this dataset encourages the creation of similar datasets” – please rephrase to improve clarity.
Response: We have rephrased this to emphasize that the dataset provides a framework/standard for the development of future longitudinal fire databases in other regions.

“[-79.43629, -18.00816: -44.49108, 8.66346]” – Should not be in brackets.
Response: The brackets have been removed from the geographic coordinate strings.

“MCD64A1” , “MCD12C1”, “MOD11C3” – please provide name of the product in addition to its ID or Acronymn.
Response: Full product names have been added alongside their respective IDs.

“…acquired is (Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement (GPM)…” – Should not be in brackets.
Response: The unnecessary nested parentheses have been removed for better readability.

[89.75N–89.75S, 0.25E–359.75E] - Should not be in brackets.
Response: Brackets have been removed from the spatial extent definition.

“…as – -9999.” – not clear, please consider removing the dash.
Response: The dash has been removed to clarify that -9999 is the specific value used for missing data.

“Noah 3.6.1, model…” – the comma here creates fragmentation.
Response: The comma has been removed to improve the sentence flow.

“…and are termed Working Data” – The term has been used above, but only defined here. Please consider defining a term when they are first mentioned.
Response: The first use of Working Data has been corrected to ensure conceptual clarity. This and similar terms are introduced in the Introduction section, and defined and explained in the Data Processing section.

“file (240 files over 20 years)” – please insert “i.e.,” in brackets.
Response: The text now has an added i.e., as recommended.

There is space on the data link which breaks the hyperlink. Please correct.
https://doi.org/10.5281/ zenodo.7215402
Response: The space in the DOI URL has been removed.
Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added the word “temporal” to clarify the nature of the resolution.

“The South American Amazon is one of the largest…” – should start as new paragraph since it diverts from the idea communicated in this paragraph.
Response: A new paragraph has been created at this location to ensure a more logical flow of ideas.

“similar studies do not exist for the Amazon region” – Please confirm through a comprehensive literature search. I doubt this is entirely accurate.
Response: We have refined the text to clarify that while sub-regional studies exist, there is a scarcity of integrated, basin-wide, multivariate studies for the entire Amazon region.

“this dataset encourages the creation of similar datasets” – please rephrase to improve clarity.
Response: We have rephrased this to emphasize that the dataset provides a framework/standard for the development of future longitudinal fire databases in other regions.

“[-79.43629, -18.00816: -44.49108, 8.66346]” – Should not be in brackets.
Response: The brackets have been removed from the geographic coordinate strings.

“MCD64A1” , “MCD12C1”, “MOD11C3” – please provide name of the product in addition to its ID or Acronymn.
Response: Full product names have been added alongside their respective IDs.

“…acquired is (Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement (GPM)…” – Should not be in brackets.
Response: The unnecessary nested parentheses have been removed for better readability.

[89.75N–89.75S, 0.25E–359.75E] - Should not be in brackets.
Response: Brackets have been removed from the spatial extent definition.

“…as – -9999.” – not clear, please consider removing the dash.
Response: The dash has been removed to clarify that -9999 is the specific value used for missing data.

“Noah 3.6.1, model…” – the comma here creates fragmentation.
Response: The comma has been removed to improve the sentence flow.

“…and are termed Working Data” – The term has been used above, but only defined here. Please consider defining a term when they are first mentioned.
Response: The first use of Working Data has been corrected to ensure conceptual clarity. This and similar terms are introduced in the Introduction section, and defined and explained in the Data Processing section.

“file (240 files over 20 years)” – please insert “i.e.,” in brackets.
Response: The text now has an added i.e., as recommended.

There is space on the data link which breaks the hyperlink. Please correct.
https://doi.org/10.5281/ zenodo.7215402
Response: The space in the DOI URL has been removed.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 29 Jan 2026

Paula Moraga, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division, Thuwal, Saudi Arabia

29 Jan 2026

Author Response

Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added ... Continue reading Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added the word “temporal” to clarify the nature of the resolution.

“The South American Amazon is one of the largest…” – should start as new paragraph since it diverts from the idea communicated in this paragraph.
Response: A new paragraph has been created at this location to ensure a more logical flow of ideas.

“similar studies do not exist for the Amazon region” – Please confirm through a comprehensive literature search. I doubt this is entirely accurate.
Response: We have refined the text to clarify that while sub-regional studies exist, there is a scarcity of integrated, basin-wide, multivariate studies for the entire Amazon region.

“this dataset encourages the creation of similar datasets” – please rephrase to improve clarity.
Response: We have rephrased this to emphasize that the dataset provides a framework/standard for the development of future longitudinal fire databases in other regions.

“[-79.43629, -18.00816: -44.49108, 8.66346]” – Should not be in brackets.
Response: The brackets have been removed from the geographic coordinate strings.

“MCD64A1” , “MCD12C1”, “MOD11C3” – please provide name of the product in addition to its ID or Acronymn.
Response: Full product names have been added alongside their respective IDs.

“…acquired is (Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement (GPM)…” – Should not be in brackets.
Response: The unnecessary nested parentheses have been removed for better readability.

[89.75N–89.75S, 0.25E–359.75E] - Should not be in brackets.
Response: Brackets have been removed from the spatial extent definition.

“…as – -9999.” – not clear, please consider removing the dash.
Response: The dash has been removed to clarify that -9999 is the specific value used for missing data.

“Noah 3.6.1, model…” – the comma here creates fragmentation.
Response: The comma has been removed to improve the sentence flow.

“…and are termed Working Data” – The term has been used above, but only defined here. Please consider defining a term when they are first mentioned.
Response: The first use of Working Data has been corrected to ensure conceptual clarity. This and similar terms are introduced in the Introduction section, and defined and explained in the Data Processing section.

“file (240 files over 20 years)” – please insert “i.e.,” in brackets.
Response: The text now has an added i.e., as recommended.

There is space on the data link which breaks the hyperlink. Please correct.
https://doi.org/10.5281/ zenodo.7215402
Response: The space in the DOI URL has been removed.
Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added the word “temporal” to clarify the nature of the resolution.

“The South American Amazon is one of the largest…” – should start as new paragraph since it diverts from the idea communicated in this paragraph.
Response: A new paragraph has been created at this location to ensure a more logical flow of ideas.

“similar studies do not exist for the Amazon region” – Please confirm through a comprehensive literature search. I doubt this is entirely accurate.
Response: We have refined the text to clarify that while sub-regional studies exist, there is a scarcity of integrated, basin-wide, multivariate studies for the entire Amazon region.

“this dataset encourages the creation of similar datasets” – please rephrase to improve clarity.
Response: We have rephrased this to emphasize that the dataset provides a framework/standard for the development of future longitudinal fire databases in other regions.

“[-79.43629, -18.00816: -44.49108, 8.66346]” – Should not be in brackets.
Response: The brackets have been removed from the geographic coordinate strings.

“MCD64A1” , “MCD12C1”, “MOD11C3” – please provide name of the product in addition to its ID or Acronymn.
Response: Full product names have been added alongside their respective IDs.

“…acquired is (Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement (GPM)…” – Should not be in brackets.
Response: The unnecessary nested parentheses have been removed for better readability.

[89.75N–89.75S, 0.25E–359.75E] - Should not be in brackets.
Response: Brackets have been removed from the spatial extent definition.

“…as – -9999.” – not clear, please consider removing the dash.
Response: The dash has been removed to clarify that -9999 is the specific value used for missing data.

“Noah 3.6.1, model…” – the comma here creates fragmentation.
Response: The comma has been removed to improve the sentence flow.

“…and are termed Working Data” – The term has been used above, but only defined here. Please consider defining a term when they are first mentioned.
Response: The first use of Working Data has been corrected to ensure conceptual clarity. This and similar terms are introduced in the Introduction section, and defined and explained in the Data Processing section.

“file (240 files over 20 years)” – please insert “i.e.,” in brackets.
Response: The text now has an added i.e., as recommended.

There is space on the data link which breaks the hyperlink. Please correct.
https://doi.org/10.5281/ zenodo.7215402
Response: The space in the DOI URL has been removed.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 27 Oct 2025

I NENGAH SURATI JAYA, IPB University, Bogor, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.181063.r418949

Overall, this manuscript makes a meaningful contribution to advancing understanding of forest fire dynamics in the tropical Amazon through an integrative, data-driven approach. This manuscript presents a comprehensive methodological framework for integrating heterogeneous spatial and temporal datasets to develop a high-resolution spatio-temporal database for forest fire analysis in the Amazon region. The approach is technically robust and highly relevant to large-scale environmental monitoring, forest management, and climate research.
The novelty of the study lies not in producing new observations but in the harmonization of heterogeneous remote sensing and model-based datasets through rigorous preprocessing and geospatial standardization. This contributes a ready-to-use dataset that bridges gaps between data availability, accessibility, and analytical readiness—serving as a foundation for advanced environmental and fire research.
While the study demonstrates methodological rigor and conceptual clarity, several areas require further elaboration to enhance the manuscript’s scientific depth, reproducibility, and practical impact. In particular, the data preprocessing procedures, validation strategy, and uncertainty assessment need to be described in greater detail to strengthen the transparency and reliability of the proposed workflow.
Overall, this manuscript makes a meaningful contribution to advancing understanding of forest fire dynamics in the tropical Amazon through an integrative, data-driven approach. This manuscript presents a comprehensive methodological framework for integrating heterogeneous spatial and temporal datasets to develop a high-resolution spatio-temporal database for forest fire analysis in the Amazon region. The approach is technically robust and highly relevant to large-scale environmental monitoring, forest management, and climate research.
The novelty of the study lies not in producing new observations but in the harmonization of heterogeneous remote sensing and model-based datasets through rigorous preprocessing and geospatial standardization. This contributes a ready-to-use dataset that bridges gaps between data availability, accessibility, and analytical readiness—serving as a foundation for advanced environmental and fire research.
While the study demonstrates methodological rigor and conceptual clarity, several areas require further elaboration to enhance the manuscript’s scientific depth, reproducibility, and practical impact. In particular, the data preprocessing procedures, validation strategy, and uncertainty assessment need to be described in greater detail to strengthen the transparency and reliability of the proposed workflow.

Major Comments

Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.
Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.
Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.
Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.
Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.
Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Yes
Are sufficient details of methods and materials provided to allow replication by others?

Partly
Are the datasets clearly presented in a useable and accessible format?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Applied remote sensing and quantitative approach (machine learning/deep learning) in forestry and environment, e.g., forest fire, landslide, spatial modelling.

CITE

Report a concern

Author Response 29 Jan 2026

Paula Moraga, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division, Thuwal, Saudi Arabia

29 Jan 2026

Author Response

Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 ... Continue reading Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.

Response: We have detailed our two-step alignment procedure in the Data Processing section (paragraphs 6 and 7). This includes an initial disaggregation of coarse datasets followed by a strategic resampling using the terra package in R. To ensure spatial accuracy across heterogeneous landscapes, we implemented "nearest neighbor" resampling for categorical/discrete variables to preserve data integrity, and "bilinear" interpolation for continuous atmospheric variables to accurately represent spatial gradients.

2. Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.

Response: We agree that model-derived products involve inherent uncertainties. We have added a clarifying statement at the end of the Data Collection section explicitly defining the intended use of these parameters. While independent field validation is a valuable endeavor, it remains beyond the scope of this data curation manuscript. Our primary objective is to provide a standardized, analysis-ready infrastructure for the research community. The utility and reliability of this curated framework have already been demonstrated in a published application (Abid et al., 2025, Environ Ecol Stat, https://doi.org/10.1007/s10651-025-00661-x).

3. Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.

Response: We have expanded the description of our coordinate transformation process in Data Processing (paragraph 4). The use of EPSG:102033 (South America Albers Equal Area Conic) is now explicitly justified as a measure to maintain geometric fidelity and minimize area/shape distortions across the basin’s longitudinal extent. To resolve grid misalignments, we describe the implementation of a standardized fixed spatial grid as a master template, ensuring identical geographical footprints for every pixel across all variables.

4. Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.

Response: We have updated the Data Processing (paragraph 2) to define our "temporal expansion" framework, where annual/static variables are mapped consistently across corresponding monthly increments to create a synchronized 240-month time series. While we recognize that assessing the impact of data frequency on model sensitivity is an important research direction, such analytical performance testing is outside the scope of this work, which focuses on the workflow standardization and the resolution of dataset fragmentation.

5. Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.

Response: We have updated the Data Processing (paragraph 8 and 10) and Technical Validation sections to highlight our quality control procedures. By utilizing a scripted algorithmic pipeline (terra package), we ensure that all 240 monthly layers are processed with identical parameters, effectively serving as an automated workflow log to eliminate manual intervention errors. While formal ISO-compliant certification is beyond the scope of this curation effort, our systematic documentation provides a transparent, reproducible framework already validated by its use in published dynamic research (Abid et al., 2025).

6. Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Response: We have added a clarifying statement in the Technical Validation section regarding the scientific novelty of this work. We believe that resolving dataset fragmentation through a standardized, analysis-ready framework is a critical scientific contribution that removes the primary bottleneck for modeling in the Amazon. While developing a new predictive model is beyond the current scope, the utility of this dataset has been proven in recent research (Abid et al., 2025), where the curated variables demonstrated the sensitivity required for complex ensemble modeling of fire dynamics.
Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.

Response: We have detailed our two-step alignment procedure in the Data Processing section (paragraphs 6 and 7). This includes an initial disaggregation of coarse datasets followed by a strategic resampling using the terra package in R. To ensure spatial accuracy across heterogeneous landscapes, we implemented "nearest neighbor" resampling for categorical/discrete variables to preserve data integrity, and "bilinear" interpolation for continuous atmospheric variables to accurately represent spatial gradients.

2. Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.

Response: We agree that model-derived products involve inherent uncertainties. We have added a clarifying statement at the end of the Data Collection section explicitly defining the intended use of these parameters. While independent field validation is a valuable endeavor, it remains beyond the scope of this data curation manuscript. Our primary objective is to provide a standardized, analysis-ready infrastructure for the research community. The utility and reliability of this curated framework have already been demonstrated in a published application (Abid et al., 2025, Environ Ecol Stat, https://doi.org/10.1007/s10651-025-00661-x).

3. Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.

Response: We have expanded the description of our coordinate transformation process in Data Processing (paragraph 4). The use of EPSG:102033 (South America Albers Equal Area Conic) is now explicitly justified as a measure to maintain geometric fidelity and minimize area/shape distortions across the basin’s longitudinal extent. To resolve grid misalignments, we describe the implementation of a standardized fixed spatial grid as a master template, ensuring identical geographical footprints for every pixel across all variables.

4. Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.

Response: We have updated the Data Processing (paragraph 2) to define our "temporal expansion" framework, where annual/static variables are mapped consistently across corresponding monthly increments to create a synchronized 240-month time series. While we recognize that assessing the impact of data frequency on model sensitivity is an important research direction, such analytical performance testing is outside the scope of this work, which focuses on the workflow standardization and the resolution of dataset fragmentation.

5. Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.

Response: We have updated the Data Processing (paragraph 8 and 10) and Technical Validation sections to highlight our quality control procedures. By utilizing a scripted algorithmic pipeline (terra package), we ensure that all 240 monthly layers are processed with identical parameters, effectively serving as an automated workflow log to eliminate manual intervention errors. While formal ISO-compliant certification is beyond the scope of this curation effort, our systematic documentation provides a transparent, reproducible framework already validated by its use in published dynamic research (Abid et al., 2025).

6. Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Response: We have added a clarifying statement in the Technical Validation section regarding the scientific novelty of this work. We believe that resolving dataset fragmentation through a standardized, analysis-ready framework is a critical scientific contribution that removes the primary bottleneck for modeling in the Amazon. While developing a new predictive model is beyond the current scope, the utility of this dataset has been proven in recent research (Abid et al., 2025), where the curated variables demonstrated the sensitivity required for complex ensemble modeling of fire dynamics.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 29 Jan 2026

Paula Moraga, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division, Thuwal, Saudi Arabia

29 Jan 2026

Author Response

Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 ... Continue reading Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.

Response: We have detailed our two-step alignment procedure in the Data Processing section (paragraphs 6 and 7). This includes an initial disaggregation of coarse datasets followed by a strategic resampling using the terra package in R. To ensure spatial accuracy across heterogeneous landscapes, we implemented "nearest neighbor" resampling for categorical/discrete variables to preserve data integrity, and "bilinear" interpolation for continuous atmospheric variables to accurately represent spatial gradients.

2. Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.

Response: We agree that model-derived products involve inherent uncertainties. We have added a clarifying statement at the end of the Data Collection section explicitly defining the intended use of these parameters. While independent field validation is a valuable endeavor, it remains beyond the scope of this data curation manuscript. Our primary objective is to provide a standardized, analysis-ready infrastructure for the research community. The utility and reliability of this curated framework have already been demonstrated in a published application (Abid et al., 2025, Environ Ecol Stat, https://doi.org/10.1007/s10651-025-00661-x).

3. Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.

Response: We have expanded the description of our coordinate transformation process in Data Processing (paragraph 4). The use of EPSG:102033 (South America Albers Equal Area Conic) is now explicitly justified as a measure to maintain geometric fidelity and minimize area/shape distortions across the basin’s longitudinal extent. To resolve grid misalignments, we describe the implementation of a standardized fixed spatial grid as a master template, ensuring identical geographical footprints for every pixel across all variables.

4. Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.

Response: We have updated the Data Processing (paragraph 2) to define our "temporal expansion" framework, where annual/static variables are mapped consistently across corresponding monthly increments to create a synchronized 240-month time series. While we recognize that assessing the impact of data frequency on model sensitivity is an important research direction, such analytical performance testing is outside the scope of this work, which focuses on the workflow standardization and the resolution of dataset fragmentation.

5. Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.

Response: We have updated the Data Processing (paragraph 8 and 10) and Technical Validation sections to highlight our quality control procedures. By utilizing a scripted algorithmic pipeline (terra package), we ensure that all 240 monthly layers are processed with identical parameters, effectively serving as an automated workflow log to eliminate manual intervention errors. While formal ISO-compliant certification is beyond the scope of this curation effort, our systematic documentation provides a transparent, reproducible framework already validated by its use in published dynamic research (Abid et al., 2025).

6. Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Response: We have added a clarifying statement in the Technical Validation section regarding the scientific novelty of this work. We believe that resolving dataset fragmentation through a standardized, analysis-ready framework is a critical scientific contribution that removes the primary bottleneck for modeling in the Amazon. While developing a new predictive model is beyond the current scope, the utility of this dataset has been proven in recent research (Abid et al., 2025), where the curated variables demonstrated the sensitivity required for complex ensemble modeling of fire dynamics.
Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.

Response: We have detailed our two-step alignment procedure in the Data Processing section (paragraphs 6 and 7). This includes an initial disaggregation of coarse datasets followed by a strategic resampling using the terra package in R. To ensure spatial accuracy across heterogeneous landscapes, we implemented "nearest neighbor" resampling for categorical/discrete variables to preserve data integrity, and "bilinear" interpolation for continuous atmospheric variables to accurately represent spatial gradients.

2. Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.

Response: We agree that model-derived products involve inherent uncertainties. We have added a clarifying statement at the end of the Data Collection section explicitly defining the intended use of these parameters. While independent field validation is a valuable endeavor, it remains beyond the scope of this data curation manuscript. Our primary objective is to provide a standardized, analysis-ready infrastructure for the research community. The utility and reliability of this curated framework have already been demonstrated in a published application (Abid et al., 2025, Environ Ecol Stat, https://doi.org/10.1007/s10651-025-00661-x).

3. Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.

Response: We have expanded the description of our coordinate transformation process in Data Processing (paragraph 4). The use of EPSG:102033 (South America Albers Equal Area Conic) is now explicitly justified as a measure to maintain geometric fidelity and minimize area/shape distortions across the basin’s longitudinal extent. To resolve grid misalignments, we describe the implementation of a standardized fixed spatial grid as a master template, ensuring identical geographical footprints for every pixel across all variables.

4. Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.

Response: We have updated the Data Processing (paragraph 2) to define our "temporal expansion" framework, where annual/static variables are mapped consistently across corresponding monthly increments to create a synchronized 240-month time series. While we recognize that assessing the impact of data frequency on model sensitivity is an important research direction, such analytical performance testing is outside the scope of this work, which focuses on the workflow standardization and the resolution of dataset fragmentation.

5. Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.

Response: We have updated the Data Processing (paragraph 8 and 10) and Technical Validation sections to highlight our quality control procedures. By utilizing a scripted algorithmic pipeline (terra package), we ensure that all 240 monthly layers are processed with identical parameters, effectively serving as an automated workflow log to eliminate manual intervention errors. While formal ISO-compliant certification is beyond the scope of this curation effort, our systematic documentation provides a transparent, reproducible framework already validated by its use in published dynamic research (Abid et al., 2025).

6. Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Response: We have added a clarifying statement in the Technical Validation section regarding the scientific novelty of this work. We believe that resolving dataset fragmentation through a standardized, analysis-ready framework is a critical scientific contribution that removes the primary bottleneck for modeling in the Amazon. While developing a new predictive model is beyond the current scope, the utility of this dataset has been proven in recent research (Abid et al., 2025), where the curated variables demonstrated the sensitivity required for complex ensemble modeling of fire dynamics.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 12 Sep 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 29 Jan 26			read
Version 1 12 Sep 25	read	read

I NENGAH SURATI JAYA, IPB University, Bogor, Indonesia
Mahlatse Kganyago, University of Johannesburg, Johannesburg, South Africa
Suresh Babu KV Babu, University of Cyprus, Nicosia, Cyprus

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

8 Views

02 Jun 2026 | for Version 2

Suresh Babu KV Babu, University of Cyprus, Nicosia, Cyprus

8 Views Cite this report Responses(0)

Approved

Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Yes
Are sufficient details of methods and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a useable and accessible format?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Wildfire risk modeling, fire danger prediction, Burned area, Fire forecasting, Fire detection, Burned area mapping etc.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

28 Views

08 Nov 2025 | for Version 1

Mahlatse Kganyago, University of Johannesburg, Johannesburg, South Africa

28 Views Cite this report Responses(1)

Approved With Reservations

Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Yes
Are sufficient details of methods and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a useable and accessible format?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Remote sensing of vegetation

Respond to this report

Responses (1)

Author Response

29 Jan 2026

Paula Moraga, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division, Thuwal, Saudi Arabia

Thank you for your helpful and insightful comments. Please find our responses below.

“and monthly resolution” – I suggest that authors add temporal after monthly.
Response: We have added the word “temporal” to clarify the nature of the resolution.

“The South American Amazon is one of the largest…” – should start as new paragraph since it diverts from the idea communicated in this paragraph.
Response: A new paragraph has been created at this location to ensure a more logical flow of ideas.

“similar studies do not exist for the Amazon region” – Please confirm through a comprehensive literature search. I doubt this is entirely accurate.
Response: We have refined the text to clarify that while sub-regional studies exist, there is a scarcity of integrated, basin-wide, multivariate studies for the entire Amazon region.

“this dataset encourages the creation of similar datasets” – please rephrase to improve clarity.
Response: We have rephrased this to emphasize that the dataset provides a framework/standard for the development of future longitudinal fire databases in other regions.

“[-79.43629, -18.00816: -44.49108, 8.66346]” – Should not be in brackets.
Response: The brackets have been removed from the geographic coordinate strings.

“MCD64A1” , “MCD12C1”, “MOD11C3” – please provide name of the product in addition to its ID or Acronymn.
Response: Full product names have been added alongside their respective IDs.

“…acquired is (Integrated Multi-satellite Retrievals for GPM (Global Precipitation Measurement (GPM)…” – Should not be in brackets.
Response: The unnecessary nested parentheses have been removed for better readability.

[89.75N–89.75S, 0.25E–359.75E] - Should not be in brackets.
Response: Brackets have been removed from the spatial extent definition.

“…as – -9999.” – not clear, please consider removing the dash.
Response: The dash has been removed to clarify that -9999 is the specific value used for missing data.

“Noah 3.6.1, model…” – the comma here creates fragmentation.
Response: The comma has been removed to improve the sentence flow.

“…and are termed Working Data” – The term has been used above, but only defined here. Please consider defining a term when they are first mentioned.
Response: The first use of Working Data has been corrected to ensure conceptual clarity. This and similar terms are introduced in the Introduction section, and defined and explained in the Data Processing section.

“file (240 files over 20 years)” – please insert “i.e.,” in brackets.
Response: The text now has an added i.e., as recommended.

There is space on the data link which breaks the hyperlink. Please correct.
https://doi.org/10.5281/ zenodo.7215402
Response: The space in the DOI URL has been removed.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

21 Views

27 Oct 2025 | for Version 1

I NENGAH SURATI JAYA, IPB University, Bogor, Indonesia

21 Views Cite this report Responses(1)

Approved With Reservations

Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.
Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.
Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.
Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.
Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.
Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Yes
Are sufficient details of methods and materials provided to allow replication by others?

Partly
Are the datasets clearly presented in a useable and accessible format?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Applied remote sensing and quantitative approach (machine learning/deep learning) in forestry and environment, e.g., forest fire, landslide, spatial modelling.

Respond to this report

Responses (1)

Author Response

29 Jan 2026

Paula Moraga, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division, Thuwal, Saudi Arabia

Thank you for your thorough review and insightful comments. We address each comment separately below.

1. Data Integration and Resolution Consistency
The integration of datasets with diverse spatial (250 m–37 km) and temporal (annual–monthly) resolutions may lead to interoperability inconsistencies. The authors should explicitly describe the re-sampling and disaggregation procedures, including interpolation methods and error assessments, to minimize spatial inaccuracy, particularly in environmentally heterogeneous regions.

Response: We have detailed our two-step alignment procedure in the Data Processing section (paragraphs 6 and 7). This includes an initial disaggregation of coarse datasets followed by a strategic resampling using the terra package in R. To ensure spatial accuracy across heterogeneous landscapes, we implemented "nearest neighbor" resampling for categorical/discrete variables to preserve data integrity, and "bilinear" interpolation for continuous atmospheric variables to accurately represent spatial gradients.

2. Uncertainty of Model-Derived Variables
The inclusion of model-based parameters (e.g., soil moisture, evapotranspiration) increases temporal completeness but introduces uncertainty. An uncertainty analysis and comparison with field or independent observational datasets are strongly recommended to enhance the reliability of the results.

Response: We agree that model-derived products involve inherent uncertainties. We have added a clarifying statement at the end of the Data Collection section explicitly defining the intended use of these parameters. While independent field validation is a valuable endeavor, it remains beyond the scope of this data curation manuscript. Our primary objective is to provide a standardized, analysis-ready infrastructure for the research community. The utility and reliability of this curated framework have already been demonstrated in a published application (Abid et al., 2025, Environ Ecol Stat, https://doi.org/10.1007/s10651-025-00661-x).

3. Reprojection and Coordinate Transformation
The reprojection of datasets to a standard coordinate reference system (e.g., EPSG:102033) must be carefully documented. The authors should describe the methods used to maintain geometric fidelity and correct for potential grid misalignments or boundary distortions.

Response: We have expanded the description of our coordinate transformation process in Data Processing (paragraph 4). The use of EPSG:102033 (South America Albers Equal Area Conic) is now explicitly justified as a measure to maintain geometric fidelity and minimize area/shape distortions across the basin’s longitudinal extent. To resolve grid misalignments, we describe the implementation of a standardized fixed spatial grid as a master template, ensuring identical geographical footprints for every pixel across all variables.

4. Temporal Harmonization and Analytical Sensitivity
Given the integration of variables with differing temporal frequencies (e.g., annual land cover versus monthly climate data), a clear temporal harmonization framework is needed. This should ensure that the resulting datasets retain temporal sensitivity for dynamic analyses such as fire risk modeling or temporal trend assessment.

Response: We have updated the Data Processing (paragraph 2) to define our "temporal expansion" framework, where annual/static variables are mapped consistently across corresponding monthly increments to create a synchronized 240-month time series. While we recognize that assessing the impact of data frequency on model sensitivity is an important research direction, such analytical performance testing is outside the scope of this work, which focuses on the workflow standardization and the resolution of dataset fragmentation.

5. Workflow Transparency and Quality Control
With more than 240 temporal layers per attribute, reproducibility is a primary concern. The authors are encouraged to implement and document standardized quality control procedures—including metadata compliance (e.g., ISO 19115), workflow logs, and version tracking—to ensure process transparency and minimize accumulated processing errors.

Response: We have updated the Data Processing (paragraph 8 and 10) and Technical Validation sections to highlight our quality control procedures. By utilizing a scripted algorithmic pipeline (terra package), we ensure that all 240 monthly layers are processed with identical parameters, effectively serving as an automated workflow log to eliminate manual intervention errors. While formal ISO-compliant certification is beyond the scope of this curation effort, our systematic documentation provides a transparent, reproducible framework already validated by its use in published dynamic research (Abid et al., 2025).

6. Validation and Methodological Innovation
Although the study effectively tackles dataset fragmentation and incompatibility using open-source solutions, its primary innovation lies in workflow standardization rather than algorithmic advancement. Incorporating validation experiments or developing a predictive model would strengthen the methodological contribution and scientific novelty of the work.

Response: We have added a clarifying statement in the Technical Validation section regarding the scientific novelty of this work. We believe that resolving dataset fragmentation through a standardized, analysis-ready framework is a critical scientific contribution that removes the primary bottleneck for modeling in the Amazon. While developing a new predictive model is beyond the current scope, the utility of this dataset has been proven in recent research (Abid et al., 2025), where the curated variables demonstrated the sensitivity required for complex ensemble modeling of fire dynamics.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Pimont F, et al.: Prediction of regional wildfire activity in the probabilistic bayesian framework of firelihood. Ecol. Appl. 2021; 31: e02316. PubMed Abstract | Publisher Full Text

[2] 2. Serra L, Juan P, Varga D, et al.: Spatial pattern modelling of wildfires in catalonia, spain 2004–2008. Environ. Model. Softw. 2013; 40: 235–244. Publisher Full Text

[3] 3. Dos Reis M, de Alencastro Graça PML , Yanai AM, et al.: Forest fires and deforestation in the central amazon: Effects of landscape and climate on spatial and temporal dynamics. J. Environ. Manag. 2021; 288: 112310. PubMed Abstract | Publisher Full Text

[4] 4. Opitz T, Bonneu F, Gabriel E: Point-process based bayesian modeling of space–time structures of forest fire occurrences in mediterranean france. Spatial Stat. 2020; 40: 100429. Publisher Full Text

[5] 5. Watson G: Amazon Rainforest. Weigl Publishers; 2019.

[6] 6. Bonilla-Aldana D, et al.: Brazil burning! what is the potential impact of the amazon wildfires on vector-borne and zoonotic emerging diseases? – a statement from an international experts meeting. Travel Med. Infect. Dis. 2019; 31: 101474. PubMed Abstract | Publisher Full Text

[7] 7. Juan P, Mateu J, Saez M: Pinpointing spatio-temporal interactions in wildfire patterns. Stoch. Env. Res. Risk A. 2012; 26: 1131–1150. Publisher Full Text

[8] 8. Aragó P, Juan P, Díaz-Avalos C, et al.: Spatial point process modeling applied to the assessment of risk factors associated with forest wildfires incidence in castellón, spain. Eur. J. For. Res. 2016; 135: 451–464. Publisher Full Text

[9] 9. Papakosta P, Straub D: Probabilistic prediction of daily fire occurrence in the mediterranean with readily available spatio-temporal data. iForest-Biogeosciences For. 2016; 10: 32.

[10] 10. Cano-Crespo A, Traxl D, Thonicke K: Spatio-temporal patterns of extreme fires in amazonian forests. The Eur. Phys. J. Special Top. 2021; 230: 3033–3044. Publisher Full Text

[11] 11. Abid M, Gonzalez JA, de Rivera OR , et al.: Mapping the spatio-temporal distribution of burned areas in the amazon from 2001 to 2020: An ensemble modeling approach. Environ. Ecol. Stat. 2025; 32: 707–734. Publisher Full Text

[12] 12. Moraga P, Baker L: rspatialdata: a collection of data sources and tutorials on downloading and visualising spatial data using r. F1000Res. 2022; 11: 770. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. GDAL/OGR contributors: GDAL/OGR Geospatial Data Abstraction software Library. Open Source Geospatial Foundation; 2020.

[14] 14. Danielson JJ, Gesch DB: Global multi-resolution terrain elevation data 2010 (GMTED2010). DC, USA: US Department of the Interior, US Geological Survey Washington; 2011.

[15] 15. ESRI. Albers - arcmap.

[16] 16. R Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.

[17] 17. Moraga P: Spatial Statistics for Data Science: Theory and Practice with R. Data Science series. Boca Raton, Florida: Chapman & Hall/CRC; 2023.

[18] 18. Hijmans RJ: terra: Spatial Data Analysis. R package version 1.5-21. 2022.

[19] 19. Hijmans RJ: raster: Geographic Data Analysis and Modeling. R package version 3.5-21. 2022.

[20] 20. Amazon Basin Polygon: ESRI ArcGIS.Reference Source

[21] 21. dos Reis M , de Alencastro Graça PML , Yanai AM, et al.: Forest fires and deforestation in the central amazon: Effects of landscape and climate on spatial and temporal dynamics. J. Environ. Manag. 2021; 288: 112310. PubMed Abstract | Publisher Full Text

[22] 22. Kim SJ, et al.: Multi-temporal analysis of forest fire probability using socio-economic and environmental variables. Remote Sens. 2019; 11: 86. Publisher Full Text

[23] 23. Trilles S, Juan P, Diaz L, et al.: Integration of environmental models in spatial data infrastructures: A use case in wildfire risk prediction. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2013; 6: 128–138. Publisher Full Text

[24] 24. Tariq A, et al.: Forest fire monitoring using spatial-statistical and geo-spatial analysis of factors determining forest fire in margalla hills, islamabad, pakistan. Geomat. Nat. Haz. Risk. 2021; 12: 1212–1233. Publisher Full Text

[25] 25. Miller JD, Safford H, Crimmins M, et al.: Quantitative evidence for increasing forest fire severity in the sierra nevada and southern cascade mountains, california and nevada, usa. Ecosystems. 2009; 12: 16–32. Publisher Full Text

[26] 26. Serra L, et al.: Spatio-temporal log-gaussian cox processes for modelling wildfire occurrence: the case of catalonia, 1994–2008. Environ. Ecol. Stat. 2014; 21: 531–563. Publisher Full Text

[27] 27. Møller J, Díaz-Avalos C: Structured spatio-temporal shot-noise cox point process models, with a view to modelling forest fires. Scand. J. Stat. 2010; 37: 2–25. Publisher Full Text

[28] 28. Giglio L, Justice C, Boschetti L, et al.: MODIS/Terra+aqua burned area monthly L3 global 500m SIN grid V061.2021.

[29] 29. Friedl M, Sulla-Menashe D: MCD12C1 MODIS/Terra+Aqua land cover type yearly L3 global 0.05deg CMG V006.2015.

[30] 30. Huffman GJ, Stocker EF, Bolvin DT, et al.: GPM IMERG Final Precipitation L3 1 month 0.1 degree x 0.1 degree V06. Greenbelt, MD: Goddard Earth Sciences Data; 2019.

[31] 31. Fan Y, Van Den Dool H: Climate prediction center global monthly soil moisture data set at 0.5 resolution for 1948 to present. J. Geophys. Res.-Atmos. 2004; 109. Publisher Full Text

[32] 32. Amatulli G, et al.: A suite of global, cross-scale topographic variables for environmental and biodiversity modeling. Sci Data. 2018; 5: 1–15. Publisher Full Text

[33] 33. Wan Z, Hook S, Hulley G: MOD11C3 MODIS/Terra land surface Temperature/Emissivity monthly L3 global 0.05deg CMG V006.2015.

[34] 34. Mcnally A, Hsl N: FLDAS Noah Land Surface Model L4 Global Monthly 0.1 x 0.1 degree (MERRA-2 and CHIRPS). Greenbelt, MD, USA: Goddard Earth Sciences Data; 2018.

[35] 35. Mahmood M, Moraga P: Raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020 (Version 1.0). [Dataset]. Zenodo. 2022. Publisher Full Text

A raster-based dataset for spatio-temporal analysis of forest fires in the Amazon rainforest from 2001 to 2020

Abstract

Keywords

Revised Amendments from Version 1

Introduction

Figure 1. Schematic overview of the data processing process.

Methods

Figure 2. Study area of Amazon rainforest.

Table 1. Summary of study characteristics from previous works related to forest fire analysis.

Table 2. Summary of study attributes from previous works related to forest fire analysis.

Data collection

Table 3. Summary of collected attributes related to forest fire analysis, with original temporal resolution of monthly frequency (except Land Cover which is Annual, and Elevation which is One time).

Table 4. Class details of Leaf Area Index (LAI) classification scheme, from MODIS.29

Data processing

Figure 3. Overview of methodology for data processing.

Figure 4. Land surface temperature for January 2020.

Figure 5. Plots of the variables related to forest fires for the region of Amazon Rainforest in January 2020.

Technical validation

License

Data availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Table 4. Class details of Leaf Area Index (LAI) classification scheme, from MODIS.²⁹