1 Introduction
Variation in soil biological, chemical, and physical properties occurs across the landscape and in response to both regional and local (i.e., field-scale) variations in the five soil forming factors: parent material, relief or topography, biota, climate, and time. Superimposed on this is the influence of changes in land use and current and historic management practices which can further modify soil properties. Quantifying and understanding the patterns and drivers for this variation is an important component of many agri-environmental studies. For example, to meet the desired level of precision for agronomic and environmental nutrient management plans the spatial variability in soil nutrients will influence the soil sampling design in terms of number and locations of soil samples (Kariuki et al., 2009; Starr et al., 1995).
Sediment source fingerprinting is a watershed-scale technique that is used to identify and quantify the relative proportions of sediment derived from unique sources. This technique uses natural occurring biogeochemical properties as fingerprints (i.e., tracers) to discriminate between potential sources of sediment and are linked to downstream sediment using mixing models. From a sediment fingerprinting perspective, investigating the spatial variability of soil properties at a watershed-scale can be advantageous to identify, classify, and distinguish between potential sources of sediment (Pulley et al., 2017). However, investigating spatial variability at smaller scales is less common (Collins et al., 2019; e.g., Du and Walling, 2017; Luna Miño et al., 2024; Pulley et al., 2018) and remains a research priority (Collins et al., 2020).
There are three main, interconnected, ways that spatial variability in fingerprint properties are an important aspect of sediment fingerprinting. First is to adequately quantify the fingerprint properties such that it is representative of that source. For some fingerprints the variability is not random but rather varies in a more systematic way. For example, the pattern of fallout radionuclides will reflect the patterns of soil erosion and deposition (Wilkinson et al., 2015). Designing and implementing source sampling plans need to take this into consideration as the sampling designed used has been shown to influence the characterization of wide range of commonly used fingerprints (Luna Miño et al., 2024).
Secondly, the issue of spatial variability of fingerprint properties is further complicated by overlying spatial variability in the rates of erosion and sediment delivery. Incorporation of both types of variability into the mixing model will provide a more reliable estimate of the proportion of sediment derived from each source. Many mixing models have well defined inputs (sources) and outputs (sediment) that are characterized by their mean and standard deviation and the spatial distribution or pattern of fingerprints are not considered. This is not ideal as the values of samples that are collected closer, and more hydrologically connected, to the stream network may in fact present a better representation of that source despite potentially deviating from the mean value. This issue can be addressed by strategic sampling where the more likely to erode areas are targeted for sampling. However, a considerable amount of information and insight is lost through that approach. There has been some progress using information on erosion rates to calculate a erosion rate-weighted mean (Du and Walling, 2017; Wilkinson et al., 2015) and using spatially interpolated maps of fingerprint values to provide a finer resolution of the fingerprint variability within each source (Haddadchi et al., 2019).
Lastly, understanding the geomorphic, hydrologic, and biochemical processes that have led to the observed patterns in spatial variability helps in the selection of robust and reliable fingerprints and/or guide the sampling design for source characterization. In selecting fingerprints that provide good discrimination between sources many studies typically used a statistical-based approach (Collins et al., 1997). However, there are concerns that this approach may result in the inclusion of false positives (i.e., type I error) or non-conservative fingerprints (Koiter et al., 2013a). Consequently, there has been a call for the inclusion of a process-based (e.g., weathering, erosion) or geologic/lithologic-based explanation of the fingerprints selected to address these concerns (Collins et al., 2020). Furthermore, there is also a lack of standardization in how sediment source areas are sampled (e.g., judgement, random, transect, grid, stratified) and it can be difficult to have an efficient sampling design without prior knowledge of why and how soil properties vary across the landscape (Luna Miño et al., 2024). Prior knowledge of the spatial variability of soil fingerprint properties would be beneficial; however, in practice this can be difficult, particularly with geochemical properties as routine lab analysis often return information on more than 50 elements. The spatial patterns of some soil properties are well studied because of their agronomic importance or ability to infer other important soil properties and processes and can include fallout radionuclides [e.g., 137Cs, 7Be; (Ritchie et al., 1970)], plant nutrients [e.g., N, P; (Vasu et al., 2017)], soil colour [e.g., hue, value; (Viscarra Rossel et al., 2006)], major non-acid forming cations [e.g., Ca, Na; (Sun et al., 2021a)]. In contrast, the processes that control the distribution of other soil properties, such as rare earth elements and trace metals, are less well studied or tend to be site-specific, making it difficult to draw generalizations.
Terrain attributes such as elevation, slope curvature, slope position, and soil wetness indices have been shown to be useful information in the understanding and modelling of a range of soil properties including soil moisture (Beaudette et al., 2013), texture (Kokulan et al., 2018), colour (Brown et al., 2004), organic matter (Zhang et al., 2012), conductivity (Umali et al., 2012), and geochemistry (Lima et al., 2023). Similar techniques may provide additional insight into the pedologic and geomorphic processes that drive the observed patterns of fingerprint properties within a given source. Since digital elevation models (DEMs) are more publicly available and can in some case be generated using drone imagery, while soil property data are often limited, terrain attributes derived from DEMs can be used to guide sampling design.
This study builds on the previous work of Luna Miño (Luna Miño et al., 2024) where the impact of three different sampling designs on the characterization of source materials, within the framework of the sediment fingerprinting approach, was assessed. This study expands that study by using the data from grid sampling approach to assess the spatial autocorrelation, create soil property (i.e., fingerprint) maps, and identify important terrain attributes driving the observed patterns. The objectives of this study were (1) to investigate the spatial variability of a range of soil colour and geochemical properties in an agricultural and forested site; and (2) to assess the relative importance and correlation of terrain attributes with the spatial distribution of these soil properties. Together, these objectives address how terrain attributes may be used to understand spatial distributions of soil properties and help guide sampling design.
2 Methods
2.1 Site description
Two sites of contrasting land uses located in the Wilson Creek Watershed (WCW), near McCreary, Manitoba, Canada were selected to investigate the spatial variability in fingerprint properties. The headwaters of the WCW are located on top of the Manitoba Escarpment within the boundary of Riding Mountain National Park. There is a ~300m drop in elevation crosses the escarpment where the streams become deeply incised. At the base of the escarpment is a large alluvial fan situated in the lacustrine deposits of glacial lake Aggasiz where the main stem has a meandering form. However, beyond the national park boundary the stream flows straight through an engineered drain until it reaches the Turtle river (Figure 1). Both sites are both hydrologicaly connected to the mainstem of the Wilson Creek
The first site was a mixedwood forest including white and black spruce (Picea glauca, Picea mariana), balsam fir (Abies balsamea), larch (Larix laricina) and young stands of deciduous trees including trembling aspen (Populus tremuloides). The forested site is located within the boundaries of the national park where there is little disturbance beyond recreational hiking trails. The soils within the park are not well mapped but likely are part of the Grey Wooded soil association (Luvisol) consisting of fine sandy loam to clay loam soils developed on boulder till of mostly shale with some limestone, and granitic rocks (Ehrlich et al., 1958). The second site is under agricultural production and includes rotations of grain crops and forage. The site is mapped to the Edwards Soil Series (Cumulic Regosol) consisting of silty clay loam to silty clay soil developed on recent alluvial deposits (Ehrlich et al., 1958).
The Köppen-Geiger climate classification of the WCW is cold, without dry season, and with warm summer (Dfb) (Beck et al., 2018). The average annual precipitation is ~539 mm, with approximately 27% falling as snow with a mean annual temperature is 3.0°C (Environment and Canada, 2024). The hydrology of the watershed is snowmelt dominated with ~ 80% of the cumulative runoff occurring during the spring season (May and June) (MacKay, 1970).
2.2 Soil sampling and analysis
This study uses samples and data collected as part of the grid sampling design outlined in Luna Miño (Luna Miño et al., 2024). Briefly, at each site 49 samples were collected using a soil auger on a 7x7 grid at a 100m spacing. Within the forested surface soil samples were collected below the LFH layer to a depth of 5cm, and the agricultural site was sampled to a depth of 15cm to account for the regular mixing of the soil due to tillage and other field operations.
Samples were dried, homogenized with a mortar and pestle, and sieved through a 63 𝜇m sieve to remove the sand fraction. The sand fraction was removed in an effort to reduce the differences in grain size and organic matter content between the two sites (Laceby et al., 2017). Samples were analyzed for a broad suite geochemical element using inductively coupled plasma mass spectrometry (ICP-MS) following a microwave-assisted digestion with aqua-regia (ALS Mineral Division, North Vancouver, BC, Canada). Spectral measurements were made with a spectroradiometer (ASD FieldSpecPro Malvern Panalytical Inc Westborough MA 01581, United States). Spectral reflectance measurements were taken in 1 nm increments over the 0.4-2.5 μm wavelength range. Both samples and Spectralon standard (white reference) were illuminated with a white light source using a halogen-based lamp (12 VDC, 20 Watt). Following the method outlined in Boudreault et al. (Boudreault et al., 2018), fifteen colour coefficients (R, G, B, x, y, Y, X, Z, L, a*, b*, u*, v*, c*, h*) were calculated for each sample (Koiter, 2021). Based on the results of Luna Miño (Luna Miño et al., 2024), a composite fingerprint consisting of 10 geochemical elements (Ca, Co, Cs, Fe, Li, La, Nb, Ni, Rb, and Sr) and five colour coefficients (a*, b*, c*, h*, and x) were identifying as providing a strong discrimination between the agricultural and forested surface soils. These fifteen soil properties are the focus of the detailed spatial analysis detailed in this study.
2.3 Geospatial and terrain analysis
All geostatistics were performed with ArcGIS Pro (v 3.3.0 Esri, 2024). Semivariograms were used to quantify spatial correlation for each of the 15 soil properties. The optimization tool, based on minimizing the mean square error, was used to parameterize the semivariogram model. Kriging was used to interpolate and generate maps of each soil property. The exploratory interpolation tool (Geostatistical Analyst extension) was used to select the kriging type with the highest ranked prediction accuracy.
A Digital Elevation Model (DEM) for the forested site was acquired from publicly available data (Canada, 2024). A DEM for the agricultural site was generated by photogrammetry using UAV imagery, including the use of ground control and check points, with Agisoft Metashape Professional (v1.8.2 Agisoft, 2021). Ordinary kriging was used to calculate a 1 m gridded digital elevation model for each site. Geographic information software (SAGA v2.1.4 Conrad et al., 2015) was used to calculate six additional terrain attributes and included plan and profile curvatures, saga wetness index, catchment area, relative slope position, and vertical channel network distance (Table S2).
2.4 Data analysis
All subsequent statistical analysis was undertaken using R statistical Software v4.4.0 (Team, 2024) through RStudio Integrated Development Environment v2024.04.2 (RStudio, 2024). Plots and maps were created using the R package ggplot2
v 3.5.2 (Wickham, 2016). Skewness was categorized as values between -0.5 and 0.5 considered approximately symmetric, -1.0 to -0.5 or 0.5 to 1 as moderately skewed, and < -1.0 or > 1.0 as highly skewed. Coefficient of variation (CV) thresholds were categorized as low (<15%), moderate (15–35%), high (35–75%), and very high (>75%). Interpolated soil property and terrain attribute data were resampled to a 10 m resolution prior to analysis (terra
v1.8.29 Hijmans, 2024). Random Forest Regression (randomForest
v4.7.1.2 Liaw and Wiener, 2002) was used to assess the relative importance of the terrain attributes on the spatial distribution of soil properties. The dataset was randomly split into training, validation, and testing datasets. Multicollinearity among the terrain attributed was assessed using the Variance Inflation Factor with a threshold of eight and correlated terrain attributes were removed (usdm
v2.1.7 Naimi et al., 2014). The number of variables randomly sampled as candidates at each split within the random forest model was tuned using the training and validation data sets (caret
v7.0.1 Kuhn and Max, 2008). The number of trees to grow was set to 500 and model performance was assessed using the Mean Square Error (MSE) and percent variance explained for both the training (Out of Bag Error) and the validation data sets. To test the model, actual and predicted values were plotted and the R2 and MSE were calculated using the testing data set. Because analyzing interpolated data can cause issues, the random forest model was used to predict the original 49 non-interpolated data points at each site as an additional check.
3 Results
3.1 Univariate summary
Overall, the agricultural site had soil colour and geochemical properties that exhibited lower variability and more symmetrical data distributions as compared to the forested site (Table 1). All 15 colour properties at both sites displayed approximately symmetrical distributions. At the agricultural site, all colour properties were characterized by low coefficients of variation (CV), while the forested site showed slightly greater variability, with 10 colour properties having low CVs and five having moderate CVs.
Similarly, geochemical data at the agricultural site showed lower variability and greater symmetry. Most elements were approximately symmetrical, with only nine exhibiting moderate skewness and five highly skewed (Table 1). Variability was also limited, with the majority of elements having low CVs; 12 had moderate CVs and five had high CVs. In contrast, the forested site showed greater skewness and variability: seven elements exhibited moderate skewness, 14 were highly skewed, 28 had moderate CVs, six had high CVs, and two had very high CVs.
The agricultural site has a relatively flat topography with an elevation change of approximately 3m, with the field draining toward a ditch in the northeast corner. The forested site has a relatively more complex topography, with a channel flowing from the southwest toward the northeast and an overall elevation difference of 18 m across the site. The mean plan and profile curvature measurements for both sites are near zero indicating an area of sediment transit and not accumulation or erosion (Table 2). The agricultural site had a higher SAGA Wetness Index but the forested site had a larger range in values and exhibited a higher degree of variability. The forested site exhibited a smaller mean Relative Slope Position value (streams and depressional areas) and a smaller Vertical Distance to Channel Network, and for both terrain attributes a greater variability as compared to the agricultural reflecting the presence of the stream crossing the forested site.
3.2 Spatial analysis
Soil colour and geochemical composition varied across both sites. In the agricultural field, all 15 soil color and geochemical properties exhibited spatial autocorrelation with most properties demonstrating a strong spatial dependency (Table 3). Some of the soil properties presented a pattern that roughly matches (e.g., Rb, Cs) or mirrors (e.g., Ca, Sr) the overall topography of the site with a gradation between the highest point in the south-west corner towards the lowest points in the north-east (Figure 2). Other properties appear to have more localized highs and low concentrations/values (e.g., c*, h). The geochemical concentrations of Ca and Rb had the largest range values and as result displayed a less patchy distribution across the site. The nugget (Co) was small for all soil properties (<1.5), and Sr had an exceptionally large sill value (900).
At the forested site, the geochemical concentrations of Fe and Rb, along with the color properties a*, b*, c*, and x showed no spatial autocorrelation and were excluded from further analysis and four and five properties exhibiting strong and moderate spatial dependency, respectively (Table 3). In comparison to the agricultural site, the soil properties at the forested site displayed a more moderate spatial dependency. The nugget (Co) was generally small for most soil properties (<2) with the exception of La and Ni. The range values were similar across the different soil properties and fell between 176 and 298 m. Overall, the influence of the channel and floodplain environment can be seen in the pattern of the nine soil properties (Figure 3).
Across both sites, there was a significant (p < 0.05) correlation between the selected soil properties and the terrain attributes, with the exception of the plan and profile curvature attributes (Table S3). The elevation attribute generally had higher correlation coefficients; however, the direction and strength of the correlation did vary between both site and soil property. Overall, the random forest regression models exhibited relatively strong predictive performance, with the models better performing at the agricultural site compared to the forested site (Table 4). With the exception of the Ni concentration and x colour values at the agricultural site, elevation was consistently the terrain attribute that provided the greatest predictive power (Figure 4). SAGA Wetness and relative slope position were generally the second and third most informative terrain attributes. Plan curvature was consistently ranked least important predictive terrain attribute.
4 Discussion
4.1 Variability of soil properties
Variability in soil geochemical properties have been studied at a range of scales including continental (Drew et al., 2010), regional (Rattenbury et al., 2018), watershed (Nanos and Rodríguez Martín, 2012), hillslope/catena, and farm field (Sun et al., 2021b). The objectives of these studies included addressing issues of pollution/contamination, providing benchmark/baseline information, investigating pedological and weathering properties and processes, and soil surveying and mapping (Wilson et al., 2008). Similarly, variability in soil colour, typically using the Munsell colour system, is a commonly reported diagnostic feature used in soil classification and ranges in spatial scale from reconnaissance to detailed soil surveys and maps. For sediment fingerprinting studies, these types of studies are often too site-specific or focus on a smaller subset of soil properties to effectively guide sample design to ensure the desired confidence is met characterizing sources of sediment.
Data distributions in soil science commonly exhibit a positively skewed distribution. This is likely due to several factors including that data of this nature are a semi-bounded distribution, with a lower bound of zero and no upper bound. Hot spots of soil processes, local variations in soil forming factors, and soil/land management practices can also lead to more extreme values (e.g., Vidon et al., 2010). In many cases the cumulative effects of these processes, factors, and practices are multiplicative (i.e., interact) and not linearly additive, resulting in a skewed data distribution. Lastly, the distribution of data will also be a product of the scale of observation, number of samples, and sampling design.
Soil colour properties exhibited a near-normal distribution with a low CV which is consistent with claims that soil hue and value (Munsell colour system) have a low CV (Pennock et al., 2008). These data distribution properties are ideal for statistical and environmental modeling as it typically meets the model assumptions with out requiring transformations. For example, in sediment source fingerprinting, soil properties (i.e., fingerprints) are considered more reliable and robust for use in unmixing models when they show large differences between sources and low variability within each source. Additionally, most mixing models assume fingerprint data are normally distributed. (Luna Miño et al., 2024) demonstrated that soil colour coefficients a*, b*, c*, h*, and x provided good discrimination between the agricultural and forested sites, and the low CV and skewness values reported in Table 1 makes these colour properties ideal fingerprints for sediment source apportionment.
The geochemical properties were more variable and skewed as compared to the soil colour properties. For many trace elements, concentrations are strongly correlated with the proportion of fine-grained material (<2 µm), due to its high specific surface area and enhanced chemical reactivity (Horowitz, 1991). However, in this study the sand-size (>63 µm) material was removed prior to analysis to reduce the effects of grain-size on concentration. This likely resulted in lower variability and less extreme concentrations as compared to other studies that focus on bulk soil samples (<2 mm). In particular, the forested site exhibited a greater amount of variability which is likely due to the more complex topography and geomorphic setting. The floodplain within the forested site is likely accumulating shale-rich material derived from the Manitoba Escarpment which is enriched in trace metals (Nicolas and Bamburak, 2011). This creates a zone of high concentrations relative to upland areas Figure 3. The forested site also had a higher and much more variable soil organic matter content (\(\bar{x}\) = 8.5 %, CV = 51.9 %) as compared to the agricultural site (\(\bar{x}\) = 11.6 %, CV = 16.1 %), which similarly to the grain size distribution, can influence the concentration of many major and trace elements (Horowitz, 1991). These results provide evidence that both land use and landscape complexity both play a role in driving soil property variability.
4.2 Spatial distribution
The difference in the number of soil properties and the magnitude of the spatial auto correlation between the two sites can be used in designing an effect sampling campaign. The agricultural site, which has a simpler topography and a higher degree of spatial autocorrelation, the range values can be used to guide the distance between sampling points and a grid-style sampling regime may be an effective approach. In contrast, the forested site, which has a more complex geomorphic setting and a lower degree of spatial autocorrelation, a stratified sampling design may be the better approach. For example, at the forested site the stratas could include near-stream and hillslope environments. In situations where the soil properties of interest are not known or selected a priori (e.g., sediment fingerprinting) the differences in their spatial autocorrelation are difficult to accommodate in the sampling design. A sampling grid with irregular spacing, including spacing less than 100m, would have provided information on the spatial autocorrelation over shorter distances and reduced the uncertainty in the interpolation of soil properties (Lark and Marchant, 2018).
Mapping the soil properties that have a moderate to high spatial dependence can provide information on underlying soil forming processes and properties. At both sites, to some extent, the patterns appear to reflect the topography of the sites suggesting that geomorphic and hydrologic processes and properties are likely driving the observed patterns. Identifying patterns and understanding the underlying process and properties that drive these patterns are important considerations when designing as soil sampling campaign to successfully meet study objectives, including characterizing soil properties of a field site. In a related context, Koiter (Koiter et al., 2013b) discussed the issues surrounding the use of a statistical only approach to selecting fingerprints and that consideration of how fingerprints have developed improves the robustness of the sediment fingerprinting approach. However, local information on the spatial distribution of geochemical and colour properties at field scales (< 1 km²) is often unavailable, and the processes driving these patterns are also not well documented or studied. When such information does exist, it typically focuses on agronomically important properties (e.g., Mzuku et al., 2005) or is used for soil classification (e.g., Group, 1998). These datasets usually include geochemical properties such as nitrogen (N), phosphorus (P), potassium (K), sulfur (S), calcium (Ca), magnesium (Mg), sodium (Na), iron (Fe), aluminum (Al), nitrate (NO₃⁻), carbonate (CO₃²⁻), bicarbonate (HCO₃⁻), chloride (Cl⁻), and sulfate (SO₄²⁻). They may also include colour characteristics, such as Munsell hue, value, and chroma, as well as other soil properties like texture, organic matter content, and pH. The lack of information on the wide range of soil properties means the researchers are relying on other data, most often elevation, for informing sampling designs.
4.3 Terrain attributes and soil properties
Both the correlation analysis and random forest regression identified elevation as the most influential terrain attribute, followed by the SAGA Wetness Index and relative slope position, in explaining most of the observed variation in soil geochemical and colour properties. This is consistent with the findings of Mashalaba (Mashalaba et al., 2020) who also found that similar terrain attributes were important in predicting a range of other soil properties including texture, bulk density, and hydaulic conductivity. These attributes likely emerged as the most important factor in explaining the observed variability as they are strongly linked to a range of geomorphic and hydrologic process and conditions [(Mello et al., 2022); (Libohova et al., 2024)]. For example, in eroded landscapes in the Prairie region of Canada, Ca concentrations have been found to be higher in upper slope positions from erosion and subsquent exposure of high-carbonate subsoil (Papiernik et al., 2005). In contrast, higher Ca concentrations have been noted in lower slope and depressional areas due to higher solubility of many Ca-minerals (e.g., CaCO3) and the subsequent downslope transport in solution and reduced leaching losses in these accumulation zones. Landscape position can also have a strong influence on pedogenic process; for example, the translocation of Fe and clay down the soil profile is a diagnostic criteria used in classifying soils (Stonehouse and St. Arnaud, 1971). Soil colour also tends to change in a predictable manner in relation to local relief. Tillage and water erosion results in the net loss of darker organic-rich topsoil from upper slope positions resulting in the exposure of the lighter subsoil (Papiernik et al., 2005). Moisture availability is also greater in the lower slope and depressional areas resulting in increased organic matter production resulting in darker organic-rich topsoil as compared to the upper slope positions. There is also evidence that suggests that soil texture varies with elevation and slope position, with coarser material on upper slopes and finer material accumulating in lower positions (Cox et al., 2003; Kokulan et al., 2018). Given the strong correlation of organic matter and texture with soil geochemistry (Horowitz, 1991) and colour (Viscarra Rossel et al., 2009), these properties may also help explain the observed spatial patterns .
The relative importance of terrain attributes in explaining soil property variability differs both among soil properties and between sites. The land use and the overall geomorphic complexity differences between the two study sites are likely interacting with terrain attributes and influencing the patterns of soil properties and modifying the nature of terrain attribute and soil property relationship. This suggests that these relationships observed in this study may not be broadly generalized. Similarly, information on how terrain attributes influence the spatial distribution of many trace elements and soil colour, beyond the Munsell system, at the field scale is limited in the scientific literature. Additional variables including climate and large-scale landscape features will also influence the observed patterns of soil properties. As a result, using terrain attributes to guide soil sampling or interpret spatial patterns of many soil properties remains challenging.
The impact of sampling design at the field scale on the characterization of soil properties can be substantial (Luna Miño et al., 2024), which in turn can affect the interpretation of data, modeling results, and the conclusion drawn. High-quality LiDAR data or digital elevation models (DEMs) are increasingly openly available in many regions and can be used to create detailed terrain attribute maps. By incorporating terrain attributes into the sampling framework, researchers can ensure that key geomorphic and hydrologic gradients are adequately represented. Ultimately, integrating terrain analysis into sediment source fingerprinting is promising not only as a mechanism to improve the quality of source characterization but to also better link source material to downstream sediment.
5 Conclusions
Understanding the spatial variability and distribution of soil geochemical and colour properties at a field-scale is important for agricultural and environmental research, monitoring, modeling, and management practices. This study conducted both univariate and spatial analyses of a suite of soil geochemical and colour properties at two sites with contrasting land uses. The agricultural site, characterized by gently sloping topography, exhibited lower coefficients of variation, approximately normal data distributions, and moderate to strong spatial autocorrelation across most measured properties. In contrast, the forested site featured more geomorphologically complex terrain, with greater variability in soil properties, data distributions that more frequently deviated from normality, and fewer properties exhibiting spatial autocorrelation. Despite these differences, random forest regression consistently identified elevation, the SAGA Wetness Index, and relative slope position as the three most important terrain attributes explaining the observed variability.
These findings underscore the role of topographic controls on many soil property distributions, regardless of land use. However, the strength and direction of the relationship between terrain attributes and soil property results were inconsistent between both site and soil property. While the study was limited to two sites, the approach demonstrates the value of integrating tools like random forest regression with spatial data to better understand soil-landscape relationships. Future research should expand to broader landscapes and incorporate additional biophysical variables to improve generalizability. Overall, this work highlights how terrain-driven spatial patterns can inform more targeted soil sampling, modeling, and land management strategies.
Acknowledgments
Special thanks and recognition for the field and technical support from A. Avila and the Riding Mountain National Park personnel.
Statements and declarations
Funding
This research was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant - From source to sink: Investigating the linkages between sources of sediment and downstream water quality in Canadian watersheds - awarded to AJK (RGPIN-2019-05273).
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Data and code availability
Data and source code for analysis and manuscript available on GitHub: https://github.com/alex-koiter/sampling-design-manuscript
References
Supplemental figures
Supplemental tables
Colour space model | Parameter | Abbreviation |
---|---|---|
RGB | Red | R |
RGB | Green | G |
RGB | Blue | B |
CIE xyY | Chromatic Coordinate x | x |
CIE xyY | Chromatic Coordinate y | y |
CIE xyY | Brightness | Y |
CIE XYZ | Virtual component X | X |
CIE XYZ | Virtual component Z | Z |
CIE LAB | Metric lightness function | L |
CIE LAB | Chromatic coordinate opponent red–green scales | a* |
CIE LAB | Chromatic coordinate opponent red–green scales | b* |
CIE LUV | Chromatic coordinate opponent blue–yellow scales | u* |
CIE LUV | Chromatic oordinate opponent red–green scales | v* |
CIE LCH | CIE hue | c* |
CIE LCH | CIE chroma | h* |
Terrain Attribute | Description |
---|---|
Elevation | Meters above sea level |
Plan Curvature | Across slope curvature |
Profile Curvature | Down slope curvature |
SAGA Wetness Index | Similar to the ‘Topographic Wetness Index’ (TWI), but it is based on a modified catchment area calculation, which does not think of the flow as very thin film. As result it predicts for cells situated in valley floors with a small vertical distance to a channel a more realistic, higher potential soil moisture compared to the standard TWI calculation |
Catchment Area | Area of upslope contributing area |
Relative Slope Position | A value between 0 and 1 illustrating the position of a pixel within the landscape with values approaching 0 indicating streams to pits, and values approaching 1 indicating upper slope positions to peaks |
Vertical Distance to Channel | The vertical distance to a channel network base level. The algorithm consists of two major steps: 1. Interpolation of a channel network base level elevation |
Property | Elevation | SAGA Wetness Index | Rel. Slope Position | Vert. Dist. Channel | Catchment Area | Plan Curvature | Profile Curvature |
---|---|---|---|---|---|---|---|
Agriculture | |||||||
Ca | -0.76*** | 0.59*** | -0.26*** | -0.25*** | 0.1*** | NS | NS |
Co | -0.63*** | 0.61*** | -0.23*** | -0.24*** | 0.14*** | NS | NS |
Cs | 0.58*** | -0.45*** | 0.06*** | 0.14*** | -0.11*** | NS | NS |
Fe | -0.14*** | 0.32*** | -0.09*** | -0.09*** | 0.09*** | NS | NS |
Li | -0.4*** | 0.27*** | -0.42*** | -0.2*** | 0.13*** | NS | NS |
La | 0.53*** | -0.3*** | 0.07*** | 0.15*** | NS | NS | NS |
Nb | 0.48*** | -0.52*** | 0.25*** | 0.15*** | -0.08*** | NS | NS |
Ni | -0.75*** | 0.71*** | -0.32*** | -0.32*** | 0.2*** | NS | NS |
Rb | 0.81*** | -0.71*** | 0.24*** | 0.3*** | -0.15*** | NS | NS |
Sr | -0.81*** | 0.64*** | -0.35*** | -0.27*** | 0.12*** | NS | NS |
a* | 0.61*** | -0.44*** | 0.35*** | 0.23*** | -0.12*** | NS | NS |
b* | 0.39*** | -0.22*** | 0.22*** | 0.1*** | -0.09*** | NS | NS |
c* | 0.41*** | -0.24*** | 0.22*** | 0.11*** | -0.09*** | NS | NS |
h* | -0.22*** | 0.31*** | -0.15*** | -0.15*** | NS | NS | NS |
x | -0.15*** | 0.25*** | -0.23*** | -0.12*** | NS | NS | NS |
Forest | |||||||
Ca | -0.11*** | -0.47*** | 0.09*** | 0.4*** | 0.24*** | 0.1*** | NS |
Co | -0.04* | -0.34*** | 0.05*** | 0.32*** | 0.15*** | 0.05** | -0.03* |
Cs | 0.09*** | -0.4*** | 0.17*** | 0.34*** | 0.18*** | 0.06*** | -0.03* |
Li | 0.1*** | -0.15*** | 0.05*** | 0.18*** | 0.07*** | NS | NS |
La | NS | -0.23*** | NS | 0.24*** | 0.11*** | 0.05*** | NS |
Nb | 0.12*** | 0.46*** | -0.11*** | -0.34*** | -0.21*** | -0.09*** | 0.05*** |
Ni | 0.19*** | -0.27*** | 0.14*** | 0.2*** | 0.04** | NS | NS |
Sr | -0.34*** | -0.44*** | -0.12*** | 0.36*** | 0.26*** | 0.11*** | -0.05** |
h* | -0.08*** | -0.36*** | NS | 0.35*** | 0.22*** | 0.08*** | -0.03* |
x | -0.04** | -0.38*** | 0.09*** | 0.35*** | 0.22*** | 0.09*** | -0.03* |
*** p < 0.001; ** p < 0.01; * p < 0.05; NS = non-significant at p = 0.05 |
Citation
@online{luna_miño,
author = {Luna Miño, Maria and J Koiter, Alexander and E Lychuk, Taras
and Waddel, Arnie and Moulin, Alan},
title = {Investigating the {Spatial} {Variability} in {Soil}
{Geochemical} and {Colour} {Properties} {Across} {Two} {Contrasting}
{Land} {Uses} in {South-Central} {Manitoba}},
langid = {en},
abstract = {Quantification and accurate assessment of the spatial
variability and distribution of soil physical and biogeochemical
properties are vital components of agri-environmental research and
modeling, including sediment source fingerprinting. Understanding
the distribution of soil properties is crucial in the development of
appropriate, reliable, and efficient sampling campaigns. This study
was aimed at investigating the spatial variability in soil
geochemical and colour (i.e., spectral reflectance) soil properties
(\textless63um) across two contrasting land uses. The main
objectives of this study are to: 1) quantify the spatial variability
of geochemical and colour properties at a field-scale
(\textasciitilde{} 40 ha) across agricultural and forested sites; 2)
assess the spatial variability and distribution of soil properties
and its relation to seven terrain attributes (e.g., catchment area,
elevation). A combination of univariate analysis and geostatistical
methods were applied to characterize the soil geochemistry and
colour properties. This information was used to both quantify and
assess the variability in soil properties. The variability and
spatial autocorrelation were generally both site and soil property
specific. For a selection of soil properties exhibiting some spatial
autocorrelation, random forest regression was used to identify the
relative importance of terrain attributes on observed patterns of
soil geochemical and colour properties. Elevation was found to
explain the greatest amount of the variation in soil properties
followed by the SAGA wetness index and relative slope position.
These findings can be used to help create efficient soil sampling
designs by providing information that can inform sampling locations
and number of samples collected in order to meet research needs and
objectives.}
}