Effect of Sampling Design on Characterizing Surface Soil Fingerprinting Properties

Maria Alejandra Luna Miño; Alexander J Koiter; David A Lobb

Abstract

Purpose: The characterization of soil properties is an important part of many different types of agri-environmental research including inventory, comparison, and manipulation studies. Sediment source fingerprinting is a method that is increasingly being used to link sediment sources to downstream sediment. There is currently not a standard approach to characterizing sources and the different approaches to sampling have not been well assessed. Methods: Grid (n=49), transect (n=14), and likely to erode (n=8) sampling designs were used to characterize the geochemical, colour, grain size distribution, and soil organic matter content at two sites under contrasting land uses (agricultural and forested). The impact of the three sampling designs on characterization of fingerprint properties, the relationship between particle size and organic matter content on fingerprint properties, fingerprint selection, source discrimination, and mixing apportionment results were evaluated using a range of methods including 21 virtual mixtures. Results: The likely to erode design resulted in a unique fingerprint signature compared to the other two sampling designs. The correlation between particle size and organic matter on fingerprint properties varied between fingerprint, source, and sampling design. While the number and composition of the fingerprints selected varied between sampling designs there was strong (100%) discrimination between sources regardless of the sampling approach. The maximum absolute difference between the virtual mixtures and the modeled proportions was 7.7, 7.8, and 8.9% for the grid, transect, and likely to erode sampling designs, respectively. Conclusions: The likely to erode sampling design was not representative of the upslope areas as characterized by the grid and transect methods. Despite these differences the final apportionment results using virtual mixtures were qualitatively similar between the three sampling designs. Continued work at the watershed scale is needed to fully evaluate the importance of source sampling design on the sediment source fingerprinting approach.

1 Introduction

1.1 Sediment pollution and erosion

Sediment pollution has been identified as a major cause of surface water impairment, and sediment is considered to be a common pollutant in many watersheds (Owens et al. 2005). Agriculture was found to be a key threat to water quality because of high sediment and nutrient loads (Vörösmarty et al. 2010). Excessive sediment loads can have negative impacts on biota healthiness, result in eutrophication, and increase the cost of preserving drainage ditches. Suitable management practices can be implemented across watersheds to reduce sediment load and erosion in different land uses (Noe et al. 2020). Moreover, fine sediment represents a substantial diffuse source pollutant in surface water because of its role in governing the transfer and fate of nutrients, heavy metals, pesticides and due to its impacts on aquatic ecology (Walling and Collins 2008). Sediment delivered to streams has been considered a main source of impairment of streams and watersheds (Bilotta and Brazier 2008). Therefore, semi-empirical information to identify the origins of sediment sources can be used to target management strategies (Mukundan et al. 2012). Sediment fingerprinting links sources to downstream sediment using natural fingerprints (e.g., physical or biogeochemical properties) and uses an unmixing model to estimate the contributions from each source. For an overview and examples of methodology and applications see Collins et al. (1997), Walling and Collins (2008), Davis and Fox (2009), Owens et al. (2016), Collins et al. (2020), and Evrard et al. (2022).

1.2 Potential source characterization

Characterizing potential sources of sediment may be a problematic and complicated task yet is an important step in the sediment fingerprinting technique. This step may be difficult to accomplish since the different approaches reported in the literature have not been well evaluated, and there is not a standard approach to characterize sources (Collins et al. 2020). There is a broad range in source and in-stream sediment sampling designs used (e.g., Boudreault et al. 2019). These approaches have included the sampling of areas likely to be eroded, but not necessarily actively eroding (Collins et al. 1997), a targeted sampling of actively eroding sites (Wallbrink 2004) and transect/grid sampling (Miller et al. 2005). The characterization of the potential sources typically has three main goals: 1) assess the impact of grain size and organic matter content on fingerprint properties, 2) identify fingerprinting properties that have the ability to discriminate between sources, and 3) serve as end members for the unmixing model. While each of these have been previously investigated, the focus has been more on implications of how the data is processed post sampling (e.g., Haddadchi et al. 2014; Smith and Blake 2014; Laceby et al. 2015; Laceby et al. 2017; Batista et al. 2022).

Along the sediment cascade from source to downstream sediment there is typically a degree of sorting resulting in a shift in both particle size distribution and organic matter content. Given the strong dependency of many fingerprints on these two properties, steps are typically taken into account for the shift in grain size. This is typically done through a combination of sieving to remove coarse-grained material and further application of additional corrections (e.g., normalization, regression) (Laceby et al. 2017). The latter step requires information on both the grain size and organic matter content in addition to fingerprint composition where the nature and quality of the data is dependent on the sampling design used. Similarly, the sampling design used to characterize the mean and variability of fingerprint properties will influence which fingerprints provide the best discrimination between sources. Most selection methods preferentially select fingerprints that have low within source variability and large differences between sources (Pulley et al. 2017). Together, these two steps provide the end members used in mixing models and the final apportionment results will be a reflection of the quality of these inputs.

The lack of standardization in sampling designs makes it difficult to compare results between studies as the differences in sampling designs represent an unquantified source uncertainly (e.g., Koiter et al. 2013b). An effective sampling campaign is designed to efficiently collect samples that are representative of the potential sources of sediment identified within a watershed. The importance of the sampling campaign variables, including the number of samples collected, sampling methods, and sampling design developed to characterize sources is not well understood and it may influence the final sediment fingerprinting apportionment results (Collins et al. 2020). Source sampling typically occurs at two scales of observation. The watershed scale captures the variability within the sources across different regions or physiographic units. The local scale sampling (e.g., individual fields or streambanks within a reach) is where the variability across a catena or along the depth of streambank is captured (Collins et al. 2020). The focus of this paper is on sampling at the local scale, but it is important to note that sampling at the watershed scale is of equal importance.

Using expert knowledge, the likely to erode approach identifies sources with a high probability to contribute sediment to streams. An advantage of this approach is that it allows for the selection of sampling areas close to drainage pathways which are conceptually and physically more directly linked to the downstream sediment. The disadvantage is that the sampling choices made will vary between individuals/experts and it provides no information on soil properties and processes further up slope (i.e., limited context). In contrast, a randomized sampling approach overcomes the issues of the expert-based opinion in the likely to erode approach, one of the limitations is potentially missing important landscape features resulting in an inadequate characterization. This limitation can be partially offset by a stratified random sampling approach whereby distinctive units are first identified and sampling is randomized within each predefined unit (Pennock et al. 2008).

A commonly used sampling design for various field studies is systematic sampling using either transects (e.g., Du and Walling 2017) or grids (e.g., Lauzon et al. 2005). The transect is used for a variety of field studies to show the changes of soil properties along a toposequence parallel to the dominant slope gradient which can extend from the edge of the streambank to the hilltop. This sampling design is recommended to understand the variation of soil properties along the catena as the variability in this direction is typically the largest and the most informative (Pennock et al. 2008). In terms of determining the spacing of samples, soil samples can be collected from equally spaced locations, based on hill slope position, or other landscape features (e.g., edge of field, fence lines).

The grid sampling design is often used for spatial pattern studies because of the ease with which pattern maps can be derived from the grids. Grid sampling designs tend to be a more expensive method employed in soil sampling because of the large number of samples that need to be collected, processed, and analyzed. However, they can provide highly detailed information about the distribution of variability in soil properties (Pennock et al. 2008). Geostatistical approaches typically use grid sampling designs (Pennock et al. 2008). For example, Cambardella et al. (1994) demonstrated that grid sampling designs allows for field-scale variability in soil properties to be assessed beyond univariate statistics (e.g., mean, standard deviations) and can provide additional insight into the underlying processes resulting in the observed patterns. In general, the sampling spacing should be smaller than the distance between relevant landforms in the field (Pennock et al. 2008). When available, sample spacing can be based on prior knowledge of the area and may include soil maps, visual observation of vegetation/yield, or past research studies.

The sediment fingerprinting technique has identified a wide range of potential sources of sediments, for example: arable lands (e.g., Russell et al. 2001), pasture lands (e.g., Blake et al. 2012), gully erosion areas (e.g., Evrard et al. 2013), channel banks (e.g., Collins et al. 2010), landslides (e.g., Nelson and Booth 2002), and urban sources (e.g., Carter et al. 2003). The number and nature of the potential sources of sediment identified will vary among watersheds. This variation is mostly related to geology, geomorphology, soil type, hydrology, spatial scale, topography, and land use. The quality of the representation of potential sources of sediment is largely based on the sampling design used. The sampling design has multiple and cascading effects within the fingerprinting framework, including estimates of mean and variance, fingerprint selection, discrimination potential, and apportionment results. Therefore, careful consideration needs to be given to this step to achieve reliable and robust apportionment results.

The three main objectives of this study are to: 1) compare three unique sampling approaches (transect, grid, and likely to erode) in characterizing potential sources of sediment; 2) evaluate the impact of sampling designs on fingerprinting selection and evaluate the ability to discriminate between a forested and agricultural site; and 3) determine the implications of different sampling designs on the final apportionment results using virtual mixtures. Overall, addressing these objectives will lead to improvement of the sediment fingerprinting approach and lead to more robust and reliable results.

2 Material and Methods

2.1 Site descriptions

The Wilson Creek watershed (WCW) is located in south-western Manitoba, Canada near the town of McCreary (Figure 1). The WCW headwaters are on top of the Manitoba Escarpment, and the stream drops 300 metres as it crosses the escarpment. Beyond the escarpment, there is an alluvial fan, which lies in the lacustrine deposits of glacial Lake Agassiz (McGinn 1979) (Figure 1). The upper portion of the watershed is within the boundaries of Riding Mountain National Park where the development is restricted to recreational hiking trails and is forested. Downstream of the park boundary, the land use is agriculture, and the stream is enclosed in an engineered drainage ditch to the point where it enters the Turtle River (MacKay 1970; McGinn 1979).

Figure 1: Location of study sites in south-western Manitoba, Canada. Regional land use digital elevation model, and sampling points for the two study sites. Note the transect samples are also part of the grid sampling design.

The climate of the region is classified as sub-humid with an average annual precipitation of approximately 538.9 mm, with approximately 27% falling as snow. The mean annual temperature is 3.0°C (1981–2010 climate normals, Environment and Climate Change Canada 2024), and the hydrology of the watershed is characterized as snowmelt dominated with ~ 80% of the cumulative runoff occurring during the spring season (May and June) (MacKay 1970). The dominant soils in the area are Chernozems (Black-Meadow soils) and are developed on thin, loam to clay loam lacustrine deposits, which lie over reworked boulder till. Surface soil textures range from fine sandy loam to clay loam (Ehrlich et al. 1958). Visual observations of erosional and depositional features indicate that forested and agricultural areas in the watershed are contributing sediment.

he two sites encompassing the predominant land uses within the WCW (Figure 1) and adjacent to the mainstem were selected to investigate the implications of source sampling designs on commonly used fingerprinting properties. Both sites are located on an alluvial fan, where the apex is situated at the base of the escarpment where there is an abrupt change in gradient and the stream is no longer deeply incised. The forested site is characterized by a lower gradient stream relative to the stream crossing over the Manitoba Escarpment, floodplains, and a meandering channel form (McGinn 1979) (Figure 1). The forested site is bounded by the park boundary to the west and the vegetation is principally mixedwood with including white and black spruce (Picea glauca, Picea mariana), balsam fir (Abies balsamea), larch (Larix laricina) and young stands of deciduous trees including trembling aspen (Populus tremuloides) and white birch (Betula papyrifera). The agriculture site is low relief and the agricultural production in the field includes rotations of grain crops and forage. The field is bounded by the Wilson Creek (engineered channel) on the north and a secondary surface drain on the east and other fields on the south and west (Figure 1) (McGinn 1979).

2.2 Sampling design

Transect, grid, and likely to erode sampling designs were used to characterize the fingerprint properties of both the agricultural and forested sites (Figure 1). Across all three different sampling designs, a total of 114 unique sampling points were established with 57 samples collected at both sites. At each of the sampling points, a soil auger was used to sample surface soil from 0-15 cm (to account for depth of regular tillage) in the agricultural field and from 0-5 cm below the LFH layer in the forested site.

The transect sampling design consisted of two parallel transects with 7 sampling points per transect, spaced at a distance of 100 m apart. The two transects were orientated parallel to the predominant drainage pathway for the agricultural site and perpendicular to the stream for the forested site. For the grid sampling design, an additional 35 samples per site spaced at 100 m were collected creating a 7x7 grid. At the forested site eight likely to eroded source samples were collected on the flood plain parallel to the stream at the north-east corner of the 7x7 grid. At the agricultural site eight likely to erode source samples were collected at the field edge where infield drainage pathways directly connected the field to the secondary surface drain.

2.3 Laboratory analysis

Source samples were air-dried, manually disaggregated using a mortar and pestle and passed through a 2-mm stainless steel sieve to remove coarse fragments and vegetation and a subsample was further sieved to < 63 µm. Previous work has shown that sieving to < 63 µm is an effective way to reduce the differences in particle size and organic matter between different sources (Laceby et al. 2017). For particle size analysis, samples were digested with hydrogen peroxide (35%) to remove organic matter and an aliquot of a dispersing agent of sodium hexametaphosphate was added following the procedure of Kroetsch and Cang (2007). The grain size distribution and specific surface area (SSA) was measured using a Mastersizer 3000, laser diffraction system (Malvern, UK; 0.01 – 3500 μm diameter measurement range). A constant particle density of 2.65 g cm-3 was used to estimate the SSA. Soil organic matter (SOM) content was determined using loss-on-ignition. Following the procedure of Nelson and Sommers (1996), three grams of oven-dried soil (105°C for 24 hours) were ashed at 400°C for 16 hours. Organic matter and grain size were analyzed as supporting information to help interpret the observed variability in surface soil fingerprint properties.

All soil samples were analyzed through a commercial laboratory (ALS Mineral Division, North Vancouver, BC, Canada) for a broad suite of 51 geochemical elements (Ag, Al, As, Au, B, Ba, Be, Bi, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, Hf, Hg, In, K, La, Li, Mg, Mn, Mo, Na, Nb, Ni, P, Pb, Rb, Re, S, Sb, Sc, Se, Sn, Sr, Ta, Te, Th, Ti, Tl, U, V, W, Y, Zn and Zr) using inductively coupled plasma mass spectrometry (ICP-MS) following a microwave-assisted digestion with aqua-regia. However, of the 51 geochemical elements examined, seven (Au, Ge, Na, Re, Ta, Ti, and W) were below the detection limit in one or more of the samples analyzed, and therefore, they were excluded from subsequent analyses. For colour properties, soil samples were analyzed with a spectroradiometer (ASD FieldSpecPro Malvern Panalytical Inc Westborough MA 01581, United States). Spectral reflectance measurements were taken in 1 nm increments over the 0.4-2.5 μm wavelength range. Both samples and Spectralon standard (white reference) were illuminated with a white light source using a halogen-based lamp (12 VDC, 20 Watt). Light was collected with a fiber optic cable mounted at approximately 2 cm of the sample/white reference panel with an angle of 45°. The reflectance was measured from raw data returned by the FieldSpecPro using RS3 software. Following the method outlined in Boudreault et al. (2018), fifteen colour coefficients (X, Y, Z, x, y, u, v, L, a, b, h, c, R, G, B) (Table S1) were calculated for each sample.

2.4 Data analysis

All statistical analysis was undertaken using R statistical Software 4.1.1 (R Core Team 2021) through RStudio Integrated Development Environment v 1.4.1717 (RStudio 2021). Plots were created using the R packages ggplot2 v 3.3.5 (Wickham 2016) and ggfortify v 0.4.14 (Tang et al. 2016). A Mann–Whitney U test (ɑ = 0.05) was used to assess differences in SSA and SOM between the two sampling sites. A Kuskall Wallis-H test was used to detect for differences (ɑ = 0.05) between the three sampling designs for each site and fingerprint property independently. The Dunn’s Test (FSA v 0.9.5 Ogle et al. 2023) using the Benjamini-Hochberg p-adjustment method was then used as a post-hoc (ɑ = 0.05) to investigate all pair-wise comparisons between the three sampling designs. The relation between fingerprint value/concentration and SOM (%) and SSA (m2 kg-1) was evaluated by calculating Pearson Correlation Coefficient (ɑ = 0.05). Coefficients were calculated for each fingerprint and site independently as well as combining fingerprint data from both sites as sediments are a mixture of both sources.

2.5 Fingerprint selection and apportionment model

A set of 21 virtual mixtures were generated by combining all unique samples across the three sampling designs (combined mixtures). An additional three sets of 21 artificial mixtures were created using the source samples from each sampling design separately (design specific mixtures). The virtual mixtures were calculated by multiplying the mean source fingerprint values by their proportion in each mixture (Batista et al. 2022). The mixtures range from 0% agriculture, 100% forest through to 100% agriculture, 0% forest. For each successive mixture, the proportion of each source increased/decreased in 5% increments. The results using the combined mixtures are the focus of the results and discussion, but the results using the design specific mixtures are in the supplementary materials.

For each of the four sets of virtual and sampling design, the fingerprint properties were selected following the three-step procedure as outlined in Batista et al. (2022). First, the range/bracket test was used to identify fingerprints properties that fall outside of the mixing polygon. For the range test, the fingerprint concentration/value in the mixture should be bracketed by the interquartile ranges (IQRs) of the sources. Any fingerprint that did not meet this criterion for all 21 mixture samples were not considered for further analysis. Second, the Mann Whitney U-test (ɑ = 0.01) was used to select fingerprint properties that could discriminate between sources. The properties that yielded U statistic values above the critical U value, were not considered to be successful in discriminating between source groups and were removed.

Finally, discriminant function analysis (DFA) (klaR v 1.7-0 Weihs et al. 2005) was used to select minimum number of fingerprints that provide the best discrimination (e.g., remove redundant fingerprints) between sources (Collins et al. 1997). This analysis is based on the stepwise selection algorithm of minimization of the Wilks’ lambda (λ), using a niveau = 0.1, to select the smallest set of fingerprint properties for optimal distinction between sources. Linear discriminant analysis with Leave-one-out Cross-validation was applied to assess the accuracy of discrimination following the fingerprint selection process. Principle component analysis (PCA) plots were used to visually assess the discriminatory power of the selected fingerprints.

For each unique virtual mixture and sampling design combination the proportion of sediment derived from potential sources was estimated using the multivariate mixing model MixSIAR (MixSIAR v 3.1.12 Stock and Semmens 2016a). The MixSIAR model has been used in several fingerprinting studies since it is an inclusive and flexible Bayesian mixing model framework implemented as an open-source R package (Stock and Semmens 2016b). For mixing model runs, summarized source data (means and standard deviations) were used. The MCMC parameters correspond to a normal run (chain length = 100000, burn = 50000, thin = 50, chains = 3) and uninformative prior. The trophic enrichment factors were set to zero. The residual error of the model was not included but the process error was included. Model convergence was assessed by the Gelman-Rubin diagnostic, in which none of the variables had a value greater than 1.01. Following the procedures outlined in Batista et al. (2022) four types of model assessment criteria were used: uncertainty, residuals, performance, and contingency errors. Further details and equations are found in supplementary materials (Table S2)

3 Results

3.1 Characterization of soil properties

The agricultural site had an average SOM content for the grid, transect and likely to erode sampling design of 8.5 % (SD = 1.4 %), 7.8 % (SD = 1.1 %), and 5.9 % (SD = 0.8 %), respectively. The forested site had an average SOM content for the grid, transect and likely to erode sampling design of 11.3 (SD = 6.1 %), 11.6 % (SD = 6.1 %), and 2.4 % (SD = 1.2 %), respectively (Figure 2). Across both sites, the likely to erode sampling design had the lowest SOM content and the smallest variability. Between sites the SOM content variability was greater in the forested site as compared to the agricultural sites for each of the three sampling designs. The SOM content was significantly higher in the forested site for the grid and transect sampling designs, but the opposite was found using the likely to erode sampling design.

The SSA of the soil samples was considered as the principal measurement of particle size (Figure 2). The agricultural site had an average SSA of 1853 (SD = 187), 1871 (SD =154), and 1343 (SD = 138) \(m^2 kg^{-1}\) for the grid, transect, and likely to erode sampling designs, respectively. The forest site followed a similar pattern with an average SSA of 764 (SD = 134), 790 (SD = 145), and 502 (SD = 138) \(m^2 kg^{-1}\) for the grid, transect, and likely to erode sampling designs, respectively. Regardless of the sampling design used, the forested site was comprised of significantly coarser grain material compared to the agricultural site. At both sites, the grid and transect designs had comparable grain size results; however, the likely to erode design resulted in the coarser estimate of grain size relative to the other sampling designs.

In [1]:

p1 <- ggplot(data = psa, aes(x = sampling_design, y = specific_surface_area, fill = site)) +
  geom_boxplot() +
    geom_text(data = select(psa, sampling_design, ymax, sig_letter) |> distinct(), 
            position = position_dodge(width = 0.75), aes(y = ymax, label = sig_letter, group = site), size = 2.5) +
  theme_bw(base_size = 10) +
  scale_y_continuous(expand =c(0,0), limits = c(250, 2500), breaks = seq(0, 2500, 500)) +
  labs(y = expression(paste("Specific Surface Area (", m^2*~kg^{-1}, ")"))) +
  theme(axis.title.x = element_blank(),
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 45, hjust=1),
        legend.position = "none") +
  scale_fill_viridis(discrete = TRUE, begin = 0, end = 0.6)

Adding missing grouping variables: `site`Adding missing grouping variables: `site`

Figure 2: Specific surface area and soil organic matter for the agricultural and forested sites as determined by the three sampling designs.

In comparing the sampling designs on characterizing colour properties at the two sites six (X, Y, Z, L, G, B) and three (x, u*, a*) colour coefficients showed no significant differences between the sampling designs for the agricultural and forested sites, respectively (Figure 3). The post-hoc test showed that there were no significant differences between the grid and transect sampling designs at both sites for all colour fingerprints (Figure 3). There were also no differences between the likely to erode and transect sampling designs for the colour properties R and h* (Figure 3).

At both sites no significant differences (p > 0.05) were found between the different sampling designs for five geochemical elements (Ce, Co, La, Mn, Te, Y) and an additional ten (Ba, Bi, Ca, Cs, Hg, K, Pb, Rb, Se, Sr) and eight (Fe, Hf, Li, Mn, Ni, Sb, Sc, Tl) geochemical elements for the agricultural and forested sites, respectively (Figure 3). For the forest site the post-hoc test showed that there were no significant differences between the grid and transect sampling designs for all geochemical fingerprints. In contrast, there were significant difference between the other two comparisons with the exceptions of In and Zr (Figure 3). Similarly, within the agricultural sites there are 18 geochemical fingerprints that show no significant differences between the grid and transect sampling designs, nine between transect and likely to erode, and four between the grid and likely to erode (Figure 3).

In [2]:

p6 <- p4 + p5 + p3 + guide_area() + 
  plot_layout(guides = 'collect') + plot_layout(axis_titles = "collect")
p6

Figure 3: Results for the pair-wise Dunn’s post-hoc test to determine differences in fingerprint properties between the three sampling designs for each site. Dashed line represents an α value of 0.05. Fingerprints that showed no significant differences (p value > 0.05) following the Kruskal Wallis test are not included.

In assessing the correlation between colour and SSA, there were clear differences between sites and sampling design (Figure S1 and Table S3). Within the likely to erode design the correlation was found to be significant for 12 out of the 15 colour coefficients only when the data was combined across both sites. For the agricultural site only the grid design detected significant correlations. Interestingly, the direction and magnitude of the correlations depended on the sampling design used and/or if the data was grouped by site or combined. For example, u* demonstrates a much higher degree of correlation when the data is pooled across the two sites as compared to assessing each site independently. Overall, the colour properties showed a higher number of significant correlations with SOM as compared to SSA regardless of sampling design or how the data was grouped (Figure S2 and Table S3). Similar to the correlations with SSA, the number of significant correlations with SOM within the likely to erode design was greatest when the data was combined. The magnitude and direction of the correlations between colour properties and SOM is also variable across sampling designs and the grouping of the data. For example, the colour coefficient x demonstrates no significant correlations when each site is considered independently, but the combined data demonstrates a positive but weak correlation (r² < 0.4) for the grid and transect designs and a stronger and negative correlation (r² = -0.75) for the likely to erode design.

Similar to the relation between colour and SSA, there were differences between sites and sampling designs in assessing the influence of grain size on the concentration of geochemical elements (Figure S3 and Table S3). Using the likely to erode sampling design there were few significant correlations when investigated independently for the agricultural and forested sites independently; however, pooling the data across both sites increase the number of observed significant correlations. The direction of the correlation between concentration and SSA was mostly positive when the data was combined but more variable when each site was assessed independently. The correlation between geochemical concentration and SOM shares some similarities as compared to the correlation with SSA (Figure S4 and Table S3). However, one observed difference is that when the data is pooled across the sites the likely to erode sampling design typically had a positive correlation between SOM and concentration, whereas the other two sampling designs were typically negatively correlated.

The correlations (i.e., slope) between fingerprints and SSA and SOM also varies widely between fingerprint, site, and sampling designs used to characterize the soil properties (Figure 4 and 5). For example, the relation between the concentration of Al and SSA demonstrates a good example where combining data from both sites may be preferable to an assessment on each site independently. In contrast, the Tl data suggests that a that site/land use specific assessment may be more appropriate (Figure 5). However, a complete assessment of the role of particle size and organic matter on the fingerprinting method requires information on downstream sediment as well.

In [3]:

p6 <- p_ssa_col + p_om_col +
  plot_layout(guides = 'collect')
p6

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Figure 4: Exploring the relation between select colour coefficients and specific surface area and soil organic matter content. Solid lines indicate linear relation for each site and sampling design independently and dashed lines indicate linear relation for each sampling design with data combined across both sites.

In [4]:

p3 <- p_ssa + p_om +
  plot_layout(guides = 'collect')
p3

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Figure 5: Exploring the relation between select geochemical concentrations and specific surface area and soil organic matter content. Solid lines indicate linear relation for each site and sampling design independently and dashed lines indicate linear relation for each sampling design with data combined across both sites.

3.2 Source discrimination and apportionment

A total of 58, 50, and 12 fingerprint properties passed the range test for the grid, transect, and likely to erode designs, respectively (Table 1). For all the colour fingerprints at least one of the virtual mixtures fell outside of the IQR for the likely to erode sources resulting in their elimination from further analysis. The Mann Whitney U-test (p-value < 0.01) further reduced the number of fingerprints for the grid and transect to 49 and 33, respectively and number remained the same for the likely to erode sampling design (tbl-MW). The stepwise DFA further reduced the number of fingerprints to 16, three, and three grid, transect, and likely to erode designs, respectively (Table 3). Of note is that the geochemical concentrations of Li and Fe, as well as the colour coefficient a*, were selected as the first, second, or third variables in the DFA in two of the three sampling designs. The fingerprints selected for all three sampling designs classified 100% of the source samples correctly (Table 3). Using the design specific mixtures, a greater number of fingerprints passed the range and Mann Whitney U test; however, the same fingerprints were ultimately selected for all three designs by the DFA with 100% classification (Table S4, 5, 6).

Table 1: Fingerprint properties that passed the range test for conservative behavior for each sampling approach.

Sampling design	Fingerprinting properties
Grid	Ag Al As Ba Be Bi Ca Ca Cd Ce Co Cr Cu Fe Ga Hf Hg In K K La Li Mg Mn Nb Ni P Pb Rb S Sb Sb Sc Se Sn Sr Th Tl U V Y Zn Zr R G B x* y* Y X Z L a* b* u* v* c* h*
Transect	Ag Al As Ba Be Ca Cd Ce Co Cr Cs Fe Ga Hg In La Mg Mn Mo Ni P Pb Rb S Sb Sc Se Sn Sr Te Th Tl U V Y Zn R G B x* y* Y X Z L a* b* u* v* c* h*
Likely to erode	Ba Bi Co Cs Li Mo Pb S Sn Tl U Zr

Table 2: Fingerprint properties that passed the Mann Whitney test for each sampling approach..

Sampling design	Fingerprinting properties
Grid	Ag Al As Be Bi Ca Cd Ce Co Cr Cs Cu Fe Ga Hf Hg In K La Li Mg Nb Ni Rb S Sb Sc Se Sn Sr Te Th U V Y Zr B G x Y X Z L a* b* u* v* c* h*
Transect	Ag Al As Be Ca Cd Ce Co Cr Cs Fe Ga In La Ni S Sb Sc Se Sn Sr Te U V Y B x Z a* b* u* c* h*
Likely to erode	Ba Bi Co Cs Li Mo Pb Sn Tl U

Table 3: Results of the stepwise DFA for each sampling approach including the percent of samples correctly classified for each site.

(a) Grid sampling design

Composite fingerprint	Wilks’ lambda	Agriculture	Forest
Li	0.062	100	100
Li + a*	0.044	100	100
Li + a* + Fe	0.028	100	100
Li + a* + Fe + Co	0.023	100	100
Li + a* + Fe + Co + Hg	0.022	100	100
Li + a* + Fe + Co + Hg + x	0.019	100	100
Li + a* + Fe + Co + Hg + x + Cs	0.018	100	100
Li + a* + Fe + Co + Hg + x + Cs + La	0.015	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni	0.013	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni +Nb	0.013	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni + Nb + h*	0.012	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni + Nb + h* + b*	0.011	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni + Nb + h* + b* + Rb	0.011	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni + Nb + h* + b* + Rb + Ca	0.010	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni + Nb + h* + b* + Rb + Ca + Sr	0.009	100	100
Li + a* + Fe + Co + Hg + x + Cs + La + Ni + Nb + h* + b* + Rb + Ca + Sr + c*	0.009	100	100

(b) Transect sampling design

Composite fingerprint	Wilks’ lambda	Agriculture	Forest
Fe	0.052	100	100
Fe + a*	0.025	100	100
Fe + a* + Co	0.009	100	100

Composite fingerprint	Wilks’ lambda	Agriculture	Forest
Li	0.024	100	100
Li + U	0.011	100	100
Li + U + Bi	0.008	100	100

The subsequent principal component analysis (PCA) illustrated that the geochemical and colour composition in the agricultural field, considering the three sampling approaches, was different than the forested site (Figure 6 and Figure S5). The first principal component for the grid sampling demonstrates a similar magnitude in loadings where a*, b*, c*, and x* are negative and Co, Fe, La, Li, Nb, Ni, Sr, and h* are positive. The second principal component demonstrates that the sites are differentiated mostly by b*, c*, and Hg. For the transect sampling, first principal component demonstrated a similar magnitude in loadings for all three fingerprints (a*, Co, and Fe) while the second principal component are differentiated by a* and Co. Similarly, the first principal component for the likely to erode design demonstrated a similar magnitude in loadings for all three fingerprints (Bi, Li, and U) while the second principal component are primarily differentiated by U.

In [5]:

p4 <- p1 + p2 + p3 +
  plot_layout(guides = 'collect') & theme(legend.position = "bottom")
p4

Figure 6: Principle component analysis demonstrating the discriminatory ability for the selected fingerprints for the three sampling designs.

Results of means and standard deviations of all the fingerprints that were included in the MixSIAR runs are illustrated in Figure 7. The relative contributions of potential sources to the virtual mixtures for each sampling design are shown in Figure 8. For all three sampling designs the difference between the virtual mixture and the modeled proportions were larger towards when the differences between the contributions was large (e.g., 0% forest, 100% agriculture) (Figure 9). The maximum median absolute difference between the virtual mixtures and the modeled proportions was 7.7, 7.8, and 8.9% for the grid, transect, and likely to erode sampling design respectively. The mixing model results for the design specific mixtures simulation were similar (Figs. Figure S6 and 7). The uncertainty metrics were generally similar among the three sampling designs, with the exception of the W95 (0.025-0.975 quantile width) for the grid was smaller than the other designs (Table 4). In terms of the uncertainty and performance metrics the grid was marginally better. Lastly, the contingency metrics were generally similar among the three sampling designs, with the exception of the CSI95 (Critical success index) for the grid sampling where the hit rate for the 95% interval was higher (~10 to 2%). The CRPS showed a U-shaped response with higher values at the 0 and 100% contributions (Figure S8) Overall, there was a high degree of similarity in model evaluation metrics. Using the design specific mixtures the modeled apportionment and model performance metrics are similar to the combined mixtures (Table S7 and Figure S9).

In [6]:

p1 <- ggplot(data = plotting3, aes(x = sample_design, y = avg, colour = Source)) +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(data = plotting3, aes(ymax = avg + sd, ymin = avg -sd, x = sample_design), position = position_dodge(width = 0.5), width = 0.5) +
  scale_colour_viridis_d(begin = 0, end = 0.6) +
  theme_bw() +
  labs(y = "Concentation (ppm) or value") +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        legend.position = "bottom",
        legend.title = element_blank()) +
  facet_wrap(~Fingerprint, ncol = 4, scales = "free_y", labeller = "label_parsed")

p1

Figure 7: Means and standard deviations of the fingerprints used in the mixing model.

In [7]:

mixing_plot1 <- ggplot(data = all, aes(x = as.factor(prop_forest), y = Forest, fill = sampling_design)) +
  geom_boxplot(size = 0.1, outlier.size = 0.1) +
  theme_bw() +
  scale_y_continuous(expand = c(0,0.01)) +
  labs(y = "Modelled Forest Proportion", x ="Virtual Mixture Forest Proportion") +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_fill_viridis_d()
mixing_plot1

Figure 8: Comparison of the posterior distribution of the modeled proportion of forest source to the proportion of forest source in the virtual mixtures for each of the three sampling designs.

In [8]:

mixing_plot2 <- ggplot(data = all, aes(x = as.factor(prop_forest), y = Forest - prop_forest, fill = sampling_design)) +
  geom_hline(yintercept = 0) +
  geom_boxplot(size = 0.1, outlier.size = 0.1) +
  theme_bw() +
  scale_y_continuous(expand = c(0,0.01), limits = c(-0.5, 0.5)) +
  labs(y = "Proportion Difference (Modelled - Mixture)", x ="Virtual Mixture Forest Proportion") +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_fill_viridis_d() +
  annotate(geom = "text", x = "0.5", y = 0.5, label = "Over estimation", vjust = 1) +
  annotate(geom = "text", x = "0.5", y = -0.5, label = "Under estimation", vjust = 0)
mixing_plot2

Figure 9: Differences in the proportions between modeled and virtual mixtures.

In [9]:

In [10]:

summary_all |>
  group_by(`Evaluation criteria`) |>
  gt()|>
  tab_style(style =  cell_text(weight = "bold", align = "center"), locations =  cells_row_groups()) |>
  tab_options(column_labels.font.weight = "bold")

Table 4: Model evaluation metrics grouped by sampling design and source.

Parameter	Grid Agriculture	Grid Forest	Transect Agriculture	Transect Forest	Likely to erode Agriculture	Likely to erode Forest
Residuals
MAE50	0.01	0.01	0.02	0.02	0.02	0.02
MAE95	0.00	0.00	0.00	0.00	0.00	0.00
ME50	0.00	0.00	0.00	0.00	0.00	0.00
ME95	0.00	0.00	0.00	0.00	0.00	0.00
Performance
NSE50	0.99	0.99	0.98	0.98	0.98	0.98
NSE95	1.00	1.00	1.00	1.00	1.00	1.00
CRPS	0.02	0.02	0.03	0.03	0.02	0.02
Contingency
CSI50	0.92	0.93	0.86	0.90	0.88	0.89
CSI95	0.86	0.82	0.75	0.80	0.77	0.78
HR50	0.99	0.98	0.99	0.98	0.98	0.99
HR95	1.00	1.00	1.00	1.00	1.00	1.00
Uncertainty
W50	0.06	0.06	0.09	0.09	0.09	0.09
W95	0.18	0.18	0.27	0.27	0.26	0.26
P50	0.25	0.25	0.23	0.27	0.24	0.26
P95	0.47	0.48	0.44	0.51	0.45	0.50

4 Discussion

4.1 Characterization of soil properties

SOM can exert a strong control on the concentration and values for geochemical- and colour-based fingerprints; and therefore, an important factor to consider when evaluating the results from sediment source fingerprinting studies (Horowitz 1991; Viscarra Rossel et al. 2009). Results from this study indicate that there are differences in means for SOM content, and the amount of variation is different among the three sampling approaches and land uses (Figure 2). Results from the grid and transect sampling approaches illustrate that the amount of SOM in surface soil was lower at the agricultural site compared to the forested site. The lower SOM content at the agricultural site is likely due to the regular harvesting of biomass and mixing of soil due to tillage. The conversion of native ecosystems to agricultural land uses generally results in a net loss of SOM as tillage increases the contact between soil and biomass and improves soil aeration resulting in higher rates of decomposition (Brady and Weil 2001). However, results from the likely to erode sampling approach indicated that the SOM was higher at the agricultural site compared to the forested site. This difference may be due to soil forming factors, including hydrology (i.e., lower slope positions) and the accumulation of organic-rich particles due to erosion. Similarly, natural factors may also reduce the accumulation or dilute the SOM in the forested site as the location of the likely to erode is situated within a floodplain, and the deposition of coarse-grained and organic-poor sediment (i.e., shale from the Manitoba Escarpment) have been observed following flooding.

The grid, transect, and likely to erode sampling designs all showed differences in grain size (i.e., SSA) in the surface soil between the forested and agriculture sites. There is little evidence that land use practices have a direct impact on the grain size (i.e., rate of clay formation); however, geomorphic processes both local and regional may impact the grain size. Regionally, the forested site is closer to the base of the escarpment and coarser-grained material eroded within the escarpment (i.e., deeply incised stream) is likely being deposited near at the apex of the alluvial fan, which lies in the lacustrine deposits of glacial Lake Agassiz (McGinn 1979). Locally, wind, water, and tillage erosion, and localized flooding, may exert a strong influence on the spatial pattern and range in grain size. For example, the likely to erode material sampled at both sites function as an intermediate storage, a sink of sediment derived from upslope and a source of sediment to the adjacent channel or surface drain and may explain the observed coarse grained material near the channel environment. This study highlights the importance the sampling strategy can have on characterizing both grain size and SOM properties.

In characterizing the geochemical concentration and colour properties of the two sites, this study demonstrated the sampling design used had an impact. Fingerprint properties that exhibit strong spatial patterns due to environmental gradients (e.g., soil moisture) and geomorphic processes (e.g., erosional and depositional areas) (Hoffmann et al. 2009; Borch et al. 2010) are more likely more sensitive to the sampling design (i.e., significant differences between sampling designs) while fingerprints that exhibit no spatial patterns are likely to be less sensitive. The comparison between the grid and transect sampling designs resulted in the fewest differences in fingerprinting properties (Figure 3). This is likely due to the fact that the transect is a subset of the larger grid. The samples collected from the likely to erode had a unique fingerprint as compared to the other two sampling designs (Figure 3 and 7). The material from likely to erode sampling is normally collected at the edge of the field or close to the stream where the hydrologic regime is different than up-slope positions and is typically characterized by an overall higher moisture status and a fluctuating water table.

The redox conditions often found in these areas can facilitate the precipitation of some minerals (e.g., Zinc Sulphide), increase the solubility and mobility of certain elements (e.g., As, Fe and Mn) (Du Laing et al. 2010; Rinklebe et al. 2016). For example, at the agricultural site the concentration of Fe was significantly lower for the likely erode samples compared to the other two designs (Figure 3 and 7) and may be related to the hydrology of the lower slope position. However, grain size and SOM is often correlated to geochemical concentrations due to their high SSA and reactivity, and the lower Fe concentration observed may be related to the coarser grain size and lower SOM observed (Figure 2). In contrast, Mn should follow a similar pattern but there was no difference in Mn concentration detected between the three sampling designs. Differentiating between the role of pedogenic and grain size of the concentration of Fe, Mn or other geochemical properties across the landscape is difficult and requires additional study that is typically outside the scope of most fingerprinting studies. Similarly, the observed differences in both SSA and SOM (Figure 2) are likely driving the observed difference in colour properties between sites (Table 2) and between the likely to erode and the other two designs (Figure 3) as the higher SOM and SSA typically result in darker brown/black colours (Viscarra Rossel et al. 2006). Spatial interpolation the fingerprint data combined with landscape attributes derived from topographic information would identify whether spatial patterns exist and if they relate to landforms. This will help identify the underlying processes that create the observed patterns and further inform sampling design and fingerprint selection.

Accurate characterization of sediment sources is a key component underlying the sediment source fingerprinting approach. Key to achieving this is using an appropriate sampling approach as to not introduce sampling bias and provide an accurate estimation of the mean and variance for each source. Sampling designs are sensitive to fingerprint variance and heterogeneity. Identifying fingerprints that are more homogeneous within a given source would be beneficial as there is less potential for an observed difference (or no difference) between sources to be an artifact of sampling design and not underlying geology/land use. From a practical perspective, there is often little information on the variance and heterogeneity of fingerprints both within and between sources prior to sampling. However, this study investigated a single location for each potential source of sediment and the variability of these fingerprints between different locations across the watershed remains unquantified and a research priority. Due to time and cost constraints, there is often a trade-off between the number of samples at a given location and the number of locations throughout the watershed. Observations at both scales would provide insight into the distribution of fingerprint properties. The impact of different sampling designs at the larger watershed is also an important research question that should be addressed in future studies.

Information on grain size and organic matter content is needed for watershed and environmental management tools, including sediment source fingerprinting (Laceby et al. 2017). These soil properties have a demonstrated impact on many fingerprint concentrations, primarily geochemical and radionuclide concentrations, because of the high SSA and chemical reactivity of the organic-rich and fine-grained materials (Horowitz 1991). When differences in grain size are observed, it can be problematic in making direct comparison of sources of sediment and downstream sediment. Other processes, including abrasion/breakage, adsorption/desorption, and organic matter decomposition can alter the physical and biogeochemical properties of sediments during transport from source to sink (Koiter et al. 2013b). However, these processes have received less attention in the literature because of the complexity to predict the behaviour of fingerprint properties in the environment.

Differences in particle size and organic matter content can be addressed through the application of correction factors, which traditionally have been based on the ratio of SSA and organic matter content between collected in-stream sediment and potential sediment sources (Collins et al. 1997). However, fingerprint and source specific approaches have included linear regression (Gellis and Noe 2013) or normalization with immobile elements (Vale et al. 2016) have also been used. However, the application of correction factors has been criticized since it has not been comprehensively examined among studies (Laceby et al. 2017). Other approaches have included sieving to a smaller grain size (e.g., <10 µm Wilkinson et al. (2009)) or fingerprinting defined grain size fractions (e.g., <2, 2–20 20–40 40–63 µm Hatfield and Maher (2009)).

Results from this study are in-line with the move away from generic particle size and organic matter correction factors (e.g., SSA or SOM ratios) as the magnitude and direction of the correlation between these soil properties and geochemical and colour fingerprints is not consistent between sources, fingerprints, or sampling design. There can be challenges with investigating the relation between SSA and SOM and fingerprint properties. For example, the observed range of SSA within each site is relatively small making the evaluation of the impact of SSA on fingerprint concentration and values difficult. In this study there is some evidence that combining the two sites together extends the observed range of SSA and creates a clearer representation of the relationships between SSA and geochemical and colour fingerprint properties and recognizes that sediments are a mixture of both sources. Similarly, the range in the observed SOM for the agriculture site was narrow, but this was not the case for the forested site. The small sample numbers per site for the likely to erode sampling design was also problematic for investigating the correlation between fingerprint properties and SSA and SOM. However, additional sampling across a range of locations for each source (i.e., land use) and information on the properties of downstream sediment is needed to fully evaluate the impact of grain size and SOM on sediment fingerprinting approach.

4.2 Implications of different sampling designs on sediment fingerprint selection, discrimination, and apportionment results

The fingerprint selection procedure determined the optimal composition of fingerprints that provide the best discrimination for each sampling design by eliminating fingerprints that fell outside the range of the sources, non-informative, and redundant. The fingerprint selection was influenced by each of the sampling designs as they present different quantitative results. No colour-based fingerprints and relatively few geochemical fingerprints within the likely to erode sampling design passed the range test and ultimately was included in the final composite fingerprint. This is likely due to a combined effect of how the virtual mixtures were created (i.e., averaged of all unique samples) and more importantly the landscape position in which the likely to erode samples were collected which resulted in unique fingerprint properties, SOM content, and grain size distribution (Figure 3, 4, 5). The likely to erode sampling design characterized the source in a manner that was fundamentally different relative to the other two designs and highlights how sampling strategy is a critical step in the fingerprinting approach.

Strong discrimination was obtained regardless the sampling design used (Figure 6 and Table 3). Li, Fe, and a* were the first or second fingerprint selected as part of the DFA in least two of the three sampling designs (Table 3). These three fingerprints may be more robust and reliable fingerprints as the difference between the sites were large and consistent across the three sampling designs. The significant difference observed between likely to erode and the other two designs for Fe and a* at the agricultural site (Figure 3) suggests heterogeneity within the source and the choice of sampling designed impacted the estimation of the mean and variance, but the difference between designs was small relative to difference between the sources. In contrast, Li showed no significant differences between the sampling designs at either site (Figure 3) suggesting more homogeneity within the site but still a consistent difference between the sites.

In watersheds with heterogeneous lithologies, a sampling approach relying on geochemical concentrations will likely be meaningful to discriminate between sources based on distinctive geomorphic environments (Evrard et al. 2022). In this study area, the forested site is situated closer to the base of the escarpment and there is likely a gradient in soil properties from the apex of the alluvial fan towards the distal edge. The clear discrimination between the two sources may also be influenced by the observed differences in SOM and particle size. In this study, the two sites had distinctive differences in both SOM and grain size. Despite sieving to < 63um, the agricultural site had a lower organic matter content and a finer texture as compared to the forested site. There is considerable evidence that both these properties exert a strong influence on geochemical composition. Adjusting for these differences to get a more direct comparison is complex as the relations are not consistent between fingerprints and sites. The soft and friable shale being exported from the escarpment quickly disintegrates into smaller grain sizes due to abrasion, breakage freeze/thaw and wet/dry (Koiter et al. 2013a). This dynamic grain size further complicates the issue of fingerprint adjustments based on grain size.

In this study, the amount that the modeled median contributions deviated from the virtual mixtures were similar across the three sampling designs (Figure 8 and 9) and it can be concluded that the sampling design did not have a substantial impact on the practical conclusions drawn from the final apportionment results. However, the use of virtual mixtures to evaluate the apportionment results do need to be interpreted with caution as it excludes watershed processes such as particle size selectivity (Batista et al. 2022). The larger deviation between the modeled and virtual mixture proportions when one source contributions was large (90-100 %) is likely a product of one of the sources having a very small (~0 %) contribution. This deviation is reflected in the U-shaped relation between virtual mixture source proportions and the continuous ranked probability score (Figure S8) and is similar to the results found in Batista et al. (2022)).

The other consideration in sampling design is the sample size for each source. Small sample sizes can impact fingerprint selection as the low statistical power may not be able to detect differences between sources (Type II error). Additionally, small sample sizes can result in poor discrimination between sources and ultimately higher uncertainty in the final apportionment results due to the lower precision and accuracy surrounding the estimates of the mean and variance for each source. For example, the MixSIAR model uses the underlying probability distribution of each source, which is partly dependent on the sample size. The inclusion of informative priors can improve model performance with low sample numbers, or the mean and variance can be specified as fixed by setting an arbitrarily high sample sizes (Ward et al. 2010; Parnell et al. 2013; Stock et al. 2018). In contrast, exceptionally large sample sizes can lead to the detection of very small differences between source groups that are not geologically relevant or are sensitive to non-conservative behaviour. The latter is typically not an issue due to high analytical and labour costs, and in some cases, the analytical costs dictate the sample size (e.g., Kieta et al. 2023). For the fingerprints selected for the mixing model simulation, in addition to differences in the means, the standard deviations were typically smaller for the likely to erode sampling design as compared to the other designs likely due to the samples being collected in close proximity to each other (Figure 8). However, the standard error, which accounts for differences in sample sizes, were comparable across all three sampling designs.

4.3 Importance of a sampling design

A sampling campaign includes a selection of the most efficient approach for collecting samples that can be utilized to evaluate a range of soil properties at a site, field, or experimental/observational unit with respect to the purpose/objective of the study (Pennock et al. 2008). This is important for the subsequent data analysis, interpretation of results, and implications drawn from the data collected. Within sediment fingerprinting approach, designing a sampling campaign that accurately characterizes fingerprint properties both the within and between source variability and at the local and watershed scale is a critical step. Because the sampling design is sensitive to the variance of fingerprint properties where possible, preliminary sampling and/or prior knowledge should be used to guide sampling.

Different sampling approaches employed in some studies (e.g., Boudreault et al. 2019) are particularly valuable in watersheds which are heterogeneous regarding land use, topography, geomorphology, pedology, and geology. Developing a set of guidelines and recommendations to help practitioners select appropriate sampling designs would represent a significant advancement to improve one of the main steps of sediment fingerprinting as they can provide guidance and recommendations for future sediment source fingerprinting and agri-environmental studies. In this exercise, the use of virtual mixtures demonstrated that the final apportionment results were similar among the three sampling designs despite the sensitivity to fingerprint variability. This suggests that the likely to erode, with the fewest samples, is the most optimal and cost-efficient approach. However, the use of virtual mixtures does not answer the very important question as to which sampling approach is the most representative of the sources of the downstream sediment. A likely to erode sampling approach with some strategic transect style sampling may be a good way forward as the transect sampling may provide important context as to how fingerprints vary and give some indication of how pedogenic and geomorphic processes may be influencing fingerprint variability.

Conceptually, when viewed from the sediment cascade perspective (Burt and Allison 2010), material collected near channel environment (i.e., likely to erode sampling) is often not the start of the cascade but rather captures the accumulated upslope/upstream processes and functions as both sink and source of sediment. The advantage of the likely to erode sampling design is that it provides a well-defined source or input into the fluvial system. In contrast, grid and transect sampling approaches can provide insight into the upslope/upstream processes and provides information closer to the start of the cascade but may not provide a well-defined source. This is one of the conceptual issues with the use of mixing models which require well defined inputs (source) and outputs (sediment) when the reality is a cascade or continuum. Mixing models that can accommodate variability in sources on a sub-watershed level as well as hierarchical or distributed sediment sampling (e.g., Vale et al. 2016; Blake et al. 2018) have helped address this issue. Continued advancement of models that more closely represent the full sediment cascade along with source sampling guidelines should remain a research priority.

5 Conclusion

The literature provides a small number of explicit guidelines for designing a reliable source sampling campaign for sediment source fingerprinting studies. This study highlights three different sampling approaches in characterizing soil fingerprint properties at a field-scale across two contrasting land uses in the Canadian Prairies. The characterization of fingerprint properties of the two sites showed significant difference between the sampling designs. The transect and grid sampling designs were the most similar whereas the likely to erode sampling approach provided a unique fingerprint signature. Due to the within source spatial heterogeneity for many geochemical and colour fingerprints the likely to erode sampling design might lead to estimates of fingerprint property mean and variance that may not be representative of the entire source. Similarly, characterization of soil organic matter and grain size, often considered as supporting information to help explain the variability observed in fingerprinting properties, varied between the different sampling designs. These differences between sampling designs resulted in variation in the number and composition of the fingerprint selected between sampling designs. Despite these differences, each sampling design demonstrated effective discrimination between the two sources. Using virtual mixtures, the apportionment results and model performance metrics were similar across all three sampling designs.

The sediment fingerprinting technique can be used as a tool for implementation of management strategies to control the impacts of soil erosion and excessive sediment loads in watersheds. Future sediment source fingerprinting studies must decide on the appropriate selection of field sampling designs. Overall, the results from this study showed that the choice of sampling design was an important factor in characterizing the source material and fingerprint selection process but did not have a large impact on the ability to discriminate or the final apportionment results. However, further standardization of guidelines for sediment source sampling will improve the repeatability of the sediment fingerprinting technique. Having a more reliable sediment fingerprinting approach would lead to better water quality and watershed management.

Acknowledgments

A special thanks and recognition for the field and technical support from A. Avila, A. Baig, and the Riding Mountain National Park personnel.

Statements and declarations

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant - From source to sink: Investigating the linkages between sources of sediment and downstream water quality in Canadian watersheds - awarded to AJK (RGPIN-2019-05273).

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Data and code availability

Data and source code for analysis and manuscript available on GitHub: https://github.com/alex-koiter/sampling-design-manuscript

References

Batista PVG, Laceby JP, Evrard O (2022) How to evaluate sediment fingerprinting source apportionments. Journal of Soils and Sediments 22:1315–1328. https://doi.org/10.1007/s11368-022-03157-4

Bilotta GS, Brazier RE (2008) Understanding the influence of suspended solids on water quality and aquatic biota. Water Research 42:2849–2861. https://doi.org/10.1016/j.watres.2008.03.018

Blake WH, Boeckx P, Stock BC, et al (2018) A deconvolutional Bayesian mixing model approach for river basin sediment source apportionment. Scientific Reports 8:13073. https://doi.org/10.1038/s41598-018-30905-9

Blake WH, Ficken KJ, Taylor P, et al (2012) Tracing crop-specific sediment sources in agricultural catchments. Geomorphology 140:322–329. https://doi.org/10.1016/j.geomorph.2011.10.036

Borch T, Kretzschmar R, Kappler A, et al (2010) Biogeochemical redox processes and their impact on contaminant dynamics. Environmental Science & Technology 44:15–23. https://doi.org/10.1021/es9026248

Boudreault M, Koiter AJ, Lobb DA, et al (2018) Using colour, shape and radionuclide sediment fingerprints to identify sources of sediment in an agricultural watershed in atlantic canada. Canadian Water Resources Journal 43:347–365. https://doi.org/10.1080/07011784.2018.1451781

Boudreault M, Koiter AJ, Lobb DA, et al (2019) Comparison of sampling designs for sediment source fingerprinting in an agricultural watershed in atlantic canada. Journal of Soils and Sediments 19:3302–3318. https://doi.org/10.1007/s11368-019-02306-6

Brady NC, Weil RR (2001) The nature and properties of soils, 13th edn. Prentice Hall, New Jersey, USA

Burt TP, Allison RJ (2010) Sediment cascades in the environment: An integrated approach. In: Burt TP, Allison RJ (eds). John Wiley & Sons, Chichester, UK, pp 1–15

Cambardella CA, Moorman TB, Novak JM, et al (1994) Field-scale variability of soil properties in central iowa soils. Soil Science Society of America Journal 58:1501–1511. https://doi.org/10.2136/sssaj1994.03615995005800050033x

Carter J, Owens PN, Walling DE, Leeks GJL (2003) Fingerprinting suspended sediment sources in a large urban river system. Science of the Total Environment 314-316:513534. https://doi.org/10.1016/S0048-9697(03)00071-8

Collins AL, Blackwell M, Boeckx P, et al (2020) Sediment source fingerprinting: benchmarking recent outputs, remaining challenges and emerging themes. Journal of Soils and Sediments 20:4160–4193. https://doi.org/10.1007/s11368-020-02755-4

Collins AL, Walling DE, Leeks GJL (1997) Source type ascription for fluvial suspended sediment based on a quantitative composite fingerprinting technique. Catena 29:1–27. https://doi.org/10.1016/S0341-8162(96)00064-1

Collins AL, Walling DE, McMellin GK, et al (2010) A preliminary investigation of the efficacy of riparian fencing schemes for reducing contributions from eroding channel banks to the siltation of salmonid spawning gravels across the south west UK. Journal of Environmental Management 91:13411349. https://doi.org/10.1016/j.jenvman.2010.02.015

Davis CM, Fox JF (2009) Sediment fingerprinting: Review of the method and future improvements for allocating nonpoint source pollution. Journal of Environmental Engineering 135:490–504. https://doi.org/10.1061/(ASCE)0733-9372(2009)135:7(490)

Du Laing G, Hanssen T, Bogaert G, Tack FMG (2010) Factors affecting metal mobilisation during oxidation of sulphidic, sandy wetland substrates. In: Vymazal (ed). Springer Science, Dordrecht, Netherlands, p 287297

Du P, Walling DE (2017) Fingerprinting surficial sediment sources: Exploring some potential problems associated with the spatial variability of source material properties. Journal of Environmental Management 194:4–15. https://doi.org/10.1016/j.jenvman.2016.05.066

Ehrlich WA, Pratt LE, Leclaire FP (1958) Reconnaissance soil survey of west-lake map sheet area

Environment, Climate Change Canada (2024) Canadian Climate Normals

Evrard O, Batista PVG, Company J, et al (2022) Improving the design and implementation of sediment fingerprinting studies: summary and outcomes of the TRACING 2021 Scientific School. Journal of Soils and Sediments 22:1648–1661. https://doi.org/10.1007/s11368-022-03203-1

Evrard O, Poulenard J, Némery J, et al (2013) Tracing sediment sources in a tropical highland catchment of central Mexico by using conventional and alternative fingerprinting methods. Hydrological Processes 27:911–922. https://doi.org/10.1002/hyp.9421

Gellis AC, Noe GB (2013) Sediment source analysis in the Linganore Creek watershed, Maryland, USA, using the sediment fingerprinting approach: 2008 to 2010. Journal of Soils and Sediments 13:1735–1753. https://doi.org/10.1007/s11368-013-0771-6

Haddadchi A, Nosrati K, Ahmadi F (2014) Differences between the source contribution of bed material and suspended sediments in a mountainous agricultural catchment of western iran. Catena 116:105113. https://doi.org/10.1016/j.catena.2013.12.011

Hatfield RG, Maher BA (2009) Fingerprinting upland sediment sources: Particle size-specific magnetic linkages between soils, lake sediments and suspended sediments. Earth Surface Processes and Landforms 34:13591373. https://doi.org/10.1002/esp.1824

Hoffmann CC, Kjaergaard C, Uusi-Kamppa J, et al (2009) Phosphorus retention in riparian buffers: Review of their efficiency. Journal of Environmental Quality 38:19421955. https://doi.org/10.2134/jeq2008.0087

Horowitz AJ (1991) A primer on sediment-trace element chemistry, 2nd ed. Lewis Publishers, Chelsea, Michigan, USA

Kieta KA, Owens PN, Petticrew EL, et al (2023) Polycyclic aromatic hydrocarbons in terrestrial and aquatic environments following wildfire: A review. Environmental Reviews 31:141–167. https://doi.org/10.1139/er-2022-0055

Koiter AJ, Lobb DA, Owens PN, et al (2013a) Investigating the role of connectivity and scale in assessing the sources of sediment in an agricultural watershed in the canadian prairies using sediment source fingerprinting. Journal of Soils and Sediments 13:1676–1691. https://doi.org/10.1007/s11368-013-0762-7

Koiter AJ, Owens PN, Petticrew EL, Lobb DA (2013b) The behavioural characteristics of sediment properties and their implications for sediment fingerprinting as an approach for identifying sediment sources in river basins. Earth-Science Reviews 125:24–42. https://doi.org/10.1016/j.earscirev.2013.05.009

Kroetsch D, Cang C (2007) Particle size distribution. In: Carter MR, Gregorich EG (eds) 2nd edn. CRC Press, Boca Raton, FL, USA, pp 713–727

Laceby JP, Evrard O, Smith HG, et al (2017) The challenges and opportunities of addressing particle size effects in sediment source fingerprinting: A review. Earth-Science Reviews 169:85–103. https://doi.org/10.1016/j.earscirev.2017.04.009

Laceby JP, McMahon J, Evrard O, Olley J (2015) A comparison of geological and statistical approaches to element selection for sediment fingerprinting. Journal of Soils and Sediments 15:2117–2131. https://doi.org/10.1007/s11368-015-1111-9

Lauzon JD, O’Halloran IP, Fallow DJ, et al (2005) Spatial variability of soil test phosphorus, potassium, and pH of ontario soils. AGRONOMY JOURNAL 97:524–532. https://doi.org/10.2134/agronj2005.0524

MacKay GH (1970) A quantitative study of geomorphology of the wilson creek watershed, manitoba. PhD thesis

McGinn RA (1979) Alluvial fan geomorphic systems: The riding mountain escarpment model. PhD thesis

Miller JR, Lord M, Yurkovich S, et al (2005) Historical Trends in Sedimentation Rates and Sediment Provenance, Fairfield Lake, Western North Carolina1. Journal of the American Water Resources Association 41:1053–1075. https://doi.org/10.1111/j.1752-1688.2005.tb03785.x

Mukundan R, Walling DE, Gellis AC, et al (2012) Sediment source fingerprinting: Transforming from a research tool to a management tool. Journal of the American Water Resources Association 48:1241–1257. https://doi.org/10.1111/j.1752-1688.2012.00685.x

Nelson DW, Sommers LE (1996) Total Carbon, Organic Carbon, and Organic Matter. John Wiley & Sons, Ltd, pp 961–1010

Nelson EJ, Booth DB (2002) Sediment sources in an urbanizing, mixed land-use watershed. Journal of Hydrology 264:51–68. https://doi.org/10.1016/S0022-1694(02)00059-8

Noe GB, Cashman MJ, Skalak K, et al (2020) Sediment dynamics and implications for management: State of the science from long-term research in the Chesapeake Bay watershed, USA. WIREs Water 7:e1454. https://doi.org/10.1002/wat2.1454

Ogle D, Doll J, Wheeler A, Dinno A (2023) FSA: Simple fisheries stock assessment methods

Owens PN, Batalla RJ, Collins AJ, et al (2005) Fine-grained sediment in river systems: Environmental significance and management issues. River Research and Applications 21:693–717. https://doi.org/10.1002/rra.878

Owens PN, Blake WH, Gaspar L, et al (2016) Fingerprinting and tracing the sources of soils and sediments: Earth and ocean sciences, geoarchaeological, forensic, and human health applications. Earth-Science Reviews 162:1–23. https://doi.org/10.1016/j.earscirev.2016.08.012

Parnell AC, Phillips DL, Bearhop S, et al (2013) Bayesian stable isotope mixing models. Environmetrics 24:387–399. https://doi.org/10.1002/env.2221

Pennock D, Yates T, Braidek J (2008) Soil sampling designs. In: Carter MR, Gregorich EG (eds) 2nd edn. CRC Press, Boca Raton, FL, USA

Pulley S, Foster I, Collins AL (2017) The impact of catchment source group classification on the accuracy of sediment fingerprinting outputs. Journal of Environmental Management 194:16–26. https://doi.org/10.1016/j.jenvman.2016.04.048

R Core Team (2021) R: A language and environment for statistical computing

Rinklebe J, Shaheen SM, Yu K (2016) Release of as, ba, cd, cu, pb, and sr under pre-definite redox conditions in different rice paddy soils originating from the u.s.a. And asia. Geoderma 270:21–32. https://doi.org/10.1016/j.geoderma.2015.10.011

RStudio (2021) RStudio: Integrated development environment for r

Russell MA, Walling DE, Hodgkinson RA (2001) Suspended sediment sources in two small lowland agricultural catchments in the UK. Journal of Hydrology 252:1–24. https://doi.org/10.1016/S0022-1694(01)00388-2

Smith HG, Blake WH (2014) Sediment fingerprinting in agricultural catchments: A critical re-examination of source discrimination and data corrections. Geomorphology 204:177–191. https://doi.org/10.1016/j.geomorph.2013.08.003

Stock BC, Jackson AL, Ward EJ, et al (2018) Analyzing mixing systems using a new generation of bayesian tracer mixing models. PeerJ 6c:e5096. https://doi.org/10.7717/peerj.5096

Stock BC, Semmens BX (2016a) Unifying error structures in commonly used biotracer mixing models. Ecology 97:2562–2569. https://doi.org/10.1002/ecy.1517

Stock BC, Semmens BX (2016b) MixSIAR GUI user manual v3.1

Tang Y, Horikoshi M, Li W (2016) Ggfortify: Unified interface to visualize statistical results of popular r packages. The R Journal 8.2:478–489

Vale SS, Fuller IC, Procter JN, et al (2016) Application of a confluence-based sediment-fingerprinting approach to a dynamic sedimentary catchment, New Zealand. Hydrological Processes 30:812–829. https://doi.org/10.1002/hyp.10611

Viscarra Rossel RA, Cattle SR, Ortega A, Fouad Y (2009) In situ measurements of soil colour, mineral composition and clay content by visNIR spectroscopy. Geoderma 150:253–266. https://doi.org/10.1016/j.geoderma.2009.01.025

Viscarra Rossel RA, Walvoort DJJ, McBratney AB, et al (2006) Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131:5975. https://doi.org/10.1016/j.geoderma.2005.03.007

Vörösmarty CJ, McIntyre PB, Gessner MO, et al (2010) Global threats to human water security and river biodiversity. Nature 467:555561. https://doi.org/10.1038/nature09440

Wallbrink PJ (2004) Quantifying the erosion processes and land-uses which dominate fine sediment supply to moreton bay, southeast queensland, australia. Journal of Environmental Radioactivity 76:67–80. https://doi.org/10.1016/j.jenvrad.2004.03.019

Walling DE, Collins AL (2008) The catchment sediment budget as a management tool. Environmental Science and Policy 11:136–143. https://doi.org/10.1016/j.envsci.2007.10.004

Ward EJ, Semmens BX, Schindler DE (2010) Including Source Uncertainty and Prior Information in the Analysis of Stable Isotope Mixing Models. Environmental Science & Technology 44:4645–4650. https://doi.org/10.1021/es100053v

Weihs C, Ligges U, Luebke K, Raabe N (2005) klaR analyzing german business cycles. In: Baier D, Decker R, Schmidt-Thieme L (eds). Springer, Berlin, Germany, p 335343

Wickham H (2016) ggplot2: Elegant graphics for data analysis. Springer-Verlag, New York NY U.S.A

Wilkinson SN, Wallbrink PJ, Hancock GJ, et al (2009) Fallout radionuclide tracers identify a switch in sediment sources and transport-limited sediment yield following wildfire in a eucalypt forest. Geomorphology 110:140–151. https://doi.org/10.1016/j.geomorph.2009.04.001

Supplemental figures

In [11]:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(gt)

Figure S1: Pearsons correlation coefficients for colour properties and specific surface area for each site independently and both sites combined. Fingerprints that did not have a significant correlation (p value < 0.05) were omitted.

Figure S2: Pearsons correlation coefficients for colour properties and soil organic matter content for each site independently and both sites combined. Fingerprints that did not have a significant correlation (p value < 0.05) were omitted.

Figure S3: Pearsons correlation coefficients for colour properties and specific surface area for each site independently and both sites combined. Fingerprints that did not have a significant correlation (p value < 0.05) were omitted.

Figure S4: Pearsons correlation coefficients for colour properties and soil organic matter content for each site independently and both sites combined. Fingerprints that did not have a significant correlation (p value < 0.05) were omitted.

Figure S5: Principle component analysis and loadings demonstrating the discriminatory ability for the selected fingerprints for the three sampling designs.

Figure S6: Comparison of the posterior distribution of the modeled proportion of forest source to the proportion of forest source in the virtual mixtures for each of the three sampling designs using the design specific mixtures.

Figure S7: Differences in the proportions between modeled and virtual design specific mixtures.

Figure S8: Relation between virtual mixture source proportions and the CRPS for the three different sampling designs.

Figure S9: Relation between virtual mixture source proportions and the CRPS for the three different sampling designs using the design specific mixtures.

Supplemental tables

Table S1: Description of spectral reflectance colour coefficients used as fingerprints. Reproduced from Boudreault et al. (2018)

In [21]:

readRDS(file = "./images/suptab1.rds")

Warning: Unknown or uninitialised column: `column_units`.

Warning: Unknown or uninitialised column: `spanner_units`.

Parameter	Abbreviation
RGB
Red	R
Green	G
Blue	B
CIE xyY
Chromatic coordinate x	x
Chromatic coordinate y	y
Brightness	Y
CIE LAB
Metric lightness function	L
Chromatic coordinate opponent red–green scales	a*
Chromatic coordinate opponent blue–yellow scales	b*
CIE LUV
Chromatic coordinate opponent red–green scales	u*
Chromatic coordinate opponent blue–yellow scales	v*
CIE LCH
CIE hue	c*
CIE chroma	h*

Table S2: Model evaluation metric and criteria

Criteria	Parameter	Equation	Reference
Uncertainty	Interval accuracy (P)	\[ \frac{encompassed} {total} \]
Uncertainty	Interval width (W)	\[ upperquantile − lowerquantile \]
Residual methods	Mean absolute error (MAE)	\[ \frac{1}{n}\sum_{i=1}^{n} \|y_i -\hat{y_i}\| \]	Bennett et al. (2013)
Residual methods	Mean error (ME)	\[ \frac{1}{n}\sum_{i=1}^{n} (y_i -\hat{y_i}) \]	Bennett et al. (2013)
Performance	Continuous ranked probability score (CRPS)	\[ (F_i,y_i) = \int_{-\infty}^\infty (F_i(y_i)-H[y_i \geq \hat{y}]^2 dx \]	Matheson and Winkler (1976)
Performance	Nash–Sutcliffe efficiency index (NSE)	\[ 1-\frac{\frac{1}{n}\sum_{1}^{n}(yi - \hat{y_i})^2} {\frac{1}{n}\sum_{1}^{n}(yi - \bar{y_i})^2} \]	Nash and Sutcliffe (1970)
Contingency	Critical success index (CSI)	\[ \frac{hits}{hits+misses+falsealarms} \]	Bennett et al. (2013)
Contingency	Hit rate (HR)	\[ \frac{hits}{hits+misses} \]	Bennett et al. (2013)

Table S3: Overall summary of significant (p <0.05) Pearsons correlations between soil properties, specific surface area (SSA) and organic matter content, and fingerprinting properties.

In [22]:

readRDS(file = "./images/suptab3.rds")

Warning: Unknown or uninitialised column: `column_units`.

Warning: Unknown or uninitialised column: `spanner_units`.

Site	Property	Fingerprint	No. fingerprints	No. p < 0.05	% p<0.05
Grid
Agriculture	Organic matter	Colour	15	14	93.3
Agriculture	Organic matter	Geochemistry	44	14	31.8
Agriculture	SSA	Colour	15	8	53.3
Agriculture	SSA	Geochemistry	44	22	50.0
Combined	Organic matter	Colour	15	13	86.7
Combined	Organic matter	Geochemistry	44	30	68.2
Combined	SSA	Colour	15	13	86.7
Combined	SSA	Geochemistry	44	37	84.1
Forest	Organic matter	Colour	15	14	93.3
Forest	Organic matter	Geochemistry	44	34	77.3
Forest	SSA	Colour	15	13	86.7
Forest	SSA	Geochemistry	44	37	84.1
Likely to erode
Agriculture	Organic matter	Colour	15	7	46.7
Agriculture	Organic matter	Geochemistry	44	4	9.1
Agriculture	SSA	Colour	15	0	0.0
Agriculture	SSA	Geochemistry	44	7	15.9
Combined	Organic matter	Colour	15	13	86.7
Combined	Organic matter	Geochemistry	44	35	79.5
Combined	SSA	Colour	15	12	80.0
Combined	SSA	Geochemistry	44	36	81.8
Forest	Organic matter	Colour	15	0	0.0
Forest	Organic matter	Geochemistry	44	1	2.3
Forest	SSA	Colour	15	0	0.0
Forest	SSA	Geochemistry	44	6	13.6
Transect
Agriculture	Organic matter	Colour	15	9	60.0
Agriculture	Organic matter	Geochemistry	44	9	20.5
Agriculture	SSA	Colour	15	0	0.0
Agriculture	SSA	Geochemistry	44	3	6.8
Combined	Organic matter	Colour	15	11	73.3
Combined	Organic matter	Geochemistry	44	27	61.4
Combined	SSA	Colour	15	10	66.7
Combined	SSA	Geochemistry	44	32	72.7
Forest	Organic matter	Colour	15	13	86.7
Forest	Organic matter	Geochemistry	44	29	65.9
Forest	SSA	Colour	15	12	80.0
Forest	SSA	Geochemistry	44	32	72.7

Table S4: Fingerprint properties that passed the range test for conservative behavior for each sampling approach using the design specific mixtures.

Sampling design	Fingerprinting properties
Grid	Ag Al As Ba Be Bi Ca Cd Ce Co Cr Cs Cu Fe Ga Hf Hg In K La Li Mg Mn Mo Nb Ni P Pb Rb S Sb Sc Se Sn Sr Te Th Tl U V Y Zn Zr R G B x* y* Y X Z L a* b* u* v* c* h*
Transect	Ag Al As B Ba Be Bi Ca Cd Ce Co Cr Cs Cu Fe Ga Hf Hg In K La Li Mg Mn Mo Nb Ni P Pb Rb S Sb Sc Se Sn Sr Te Th Tl U V Y Zn Zr R G B x* y* Y X Z L a* b* u* v* c* h*
Likely to erode	Ag Al As B Ba Be Bi Ca Cd Ce Co Cr Cs Cu Fe Ga Hf Hg In K La Li Mg Mn Mo Nb Ni P Pb Rb S Sb Sc Se Sn Sr Te Th Tl U V Y Zn Zr R G B x* y* Y X Z L a* b* u* v* c* h*

Table S5: Fingerprint properties that passed the Mann Whitney test for each sampling approach using the design specific mixtures.

Sampling design	Fingerprinting properties
Grid	Ag Al As Be Bi Ca Cd Ce Co Cr Cs Cu Fe Ga Hf Hg In K La Li Mg Nb Ni Rb S Sb Sc Se Sn Sr Te Th U V Y Zr G B x Y X Z L a* b* u* v* c* h*
Transect	Ag Al As B Be Bi Ca Cd Ce Co Cr Cs Cu Fe Ga In La Li Nb Ni S Sb Sc Se Sn Sr Te U V Y B x* Z a* b* u* c* h*
Likely to erode	Ag Al As B Ba Be Bi Ca Cd Co Cr Cs Cu Fe Ga Hg In K Li Mg Mo Nb Ni Pb Rb Sb Sc Se Sn Sr Th Tl U V Y Zn R G x* y* Y X L a* b* u* v* c*

Table S6: Results of the stepwise DFA for each sampling approach including the percent of samples correctly classified for each site using the design specific mixtures.

Grid sampling design
Composite fingerprint	Wilks’ lambda	Agriculture	Forest
Li	0.062	100	100
Li + a*	0.044	100	100
Li + a* + Fe	0.028	100	100
Li + a* + Fe + Co	0.023	100	100
Li + a* + Fe + Co + Hg	0.022	100	100
Li + a* + Fe + Co + Hg + x*	0.019	100	100
Li + a* + Fe + Co + Hg + x* + Cs	0.018	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La	0.015	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni	0.013	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb	0.013	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb + h*	0.012	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb + h* + b*	0.011	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb + h* + b* + Rb	0.011	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb + h* + b* + Rb + Ca	0.010	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb + h* + b* + Rb + Ca + Sr	0.009	100	100
Li + a* + Fe + Co + Hg + x* + Cs + La + Ni + Nb + h* + b* + Rb + Ca + Sr + c*	0.009	100	100

Transect sampling design
Composite fingerprint	Wilks’ lambda	Agriculture	Forest
Li	0.048	100	100
Li + Cu	0.035	100	100
Li + Cu + Ca	0.026	100	100
Li + Cu + Ca + Be	0.021	100	100
Li + Cu + Ca + Be + Co	0.016	100	100

Likely to erode sampling design
Composite fingerprint	Wilks’ lambda	Agriculture	Forest
Li	0.024	100	100
Li + Sc	0.008	100	100
Li + Sc + Sn	0.006	100	100

Table S7: Model evaluation metrics grouped by sampling design and source using the design specific mixtures.

In [23]:

readRDS(file = "./images/suptab7.rds")

Warning: Unknown or uninitialised column: `column_units`.

Warning: Unknown or uninitialised column: `spanner_units`.

Parameter	Grid Agriculture	Grid Forest	Transect Agriculture	Transect Forest	Likely to erode Agriculture	Likely to erode Forest
Residuals
MAE50	0.01	0.01	0.02	0.02	0.01	0.01
MAE95	0.00	0.00	0.00	0.00	0.00	0.00
ME50	0.00	0.00	0.00	0.00	0.00	0.00
ME95	0.00	0.00	0.00	0.00	0.00	0.00
Performance
NSE50	0.99	0.99	0.99	0.99	0.99	0.99
NSE95	1.00	1.00	1.00	1.00	1.00	1.00
CRPS	0.02	0.02	0.02	0.02	0.01	0.01
Contingency
CSI50	0.93	0.92	0.91	0.91	0.93	0.93
CSI95	0.82	0.86	0.81	0.80	0.87	0.87
HR50	0.98	0.99	0.97	0.98	0.99	0.99
HR95	1.00	1.00	1.00	1.00	1.00	1.00
Uncertainty
W50	0.06	0.06	0.08	0.08	0.06	0.06
W95	0.18	0.18	0.24	0.24	0.18	0.18
P50	0.25	0.25	0.25	0.25	0.25	0.25
P95	0.48	0.47	0.48	0.47	0.47	0.48

Article Notebook