Back to Article
MixSIAR results plotting and model performance - composite mixtures
Download Source

MixSIAR results plotting and model performance - composite mixtures

Author

Alex Koiter

Load Libraries

In [1]:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(patchwork)
library(scoringRules)
library(viridis)
Loading required package: viridisLite
library(gt)

Load data

In [2]:
all_grid <- read_csv(here::here("./notebooks/Mixtures_all_grid/grid_final_results.csv")) %>%
  mutate(sampling_design = "Grid")
Rows: 63000 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): model_filename, run
dbl (4): prop_forest, alpha.prior, Agriculture, Forest
lgl (4): mix, source, discr, model

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
all_transect <- read_csv(here::here("./notebooks/Mixtures_all_transect/transect_final_results.csv")) %>%
  mutate(sampling_design = "Transect")
Rows: 63000 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): model_filename, run
dbl (4): prop_forest, alpha.prior, Agriculture, Forest
lgl (4): mix, source, discr, model

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
all_likely <- read_csv(here::here("./notebooks/Mixtures_all_likely/likely_final_results.csv")) %>%
  mutate(sampling_design = "Likely to erode")
Rows: 63000 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): model_filename, run
dbl (4): prop_forest, alpha.prior, Agriculture, Forest
lgl (4): mix, source, discr, model

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
all <- all_grid %>%
  bind_rows(all_transect, all_likely) %>%
  mutate(sampling_design = fct_relevel(sampling_design, "Grid", "Transect", "Likely to erode"))

Plotting

In [3]:
mixing_plot1 <- ggplot(data = all, aes(x = as.factor(prop_forest), y = Forest, fill = sampling_design)) +
  geom_boxplot(size = 0.1, outlier.size = 0.1) +
  theme_bw() +
  scale_y_continuous(expand = c(0,0.01)) +
  labs(y = "Modelled Forest Proportion", x ="Virtual Mixture Forest Proportion") +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_fill_viridis_d()
mixing_plot1
Comparison of the posterior distribution of the modeled proportion of forest source to the proportion of forest source in the virtual mixtures for each of the three sampling designs.
In [4]:
mixing_plot2 <- ggplot(data = all, aes(x = as.factor(prop_forest), y = Forest - prop_forest, fill = sampling_design)) +
  geom_hline(yintercept = 0) +
  geom_boxplot(size = 0.1, outlier.size = 0.1) +
  theme_bw() +
  scale_y_continuous(expand = c(0,0.01), limits = c(-0.5, 0.5)) +
  labs(y = "Proportion Difference (Modelled - Mixture)", x ="Virtual Mixture Forest Proportion") +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_fill_viridis_d() +
  annotate(geom = "text", x = "0.5", y = 0.5, label = "Over estimation", vjust = 1) +
  annotate(geom = "text", x = "0.5", y = -0.5, label = "Under estimation", vjust = 0)
mixing_plot2
Differences in the proportions between modeled and virtual mixtures.
In [5]:
all %>%
  select(prop_forest, Agriculture, Forest, sampling_design) %>%
  mutate(diff = Forest - prop_forest) %>%
  group_by(sampling_design, prop_forest) %>%
  summarize(median = median(diff)) %>%
  print(n=63)
`summarise()` has grouped output by 'sampling_design'. You can override using
the `.groups` argument.
# A tibble: 63 × 3
# Groups:   sampling_design [3]
   sampling_design prop_forest   median
   <fct>                 <dbl>    <dbl>
 1 Grid                   0     0.0731 
 2 Grid                   0.05  0.0611 
 3 Grid                   0.1   0.0547 
 4 Grid                   0.15  0.0474 
 5 Grid                   0.2   0.0414 
 6 Grid                   0.25  0.0368 
 7 Grid                   0.3   0.0278 
 8 Grid                   0.35  0.0236 
 9 Grid                   0.4   0.0180 
10 Grid                   0.45  0.0143 
11 Grid                   0.5   0.00657
12 Grid                   0.55  0.00424
13 Grid                   0.6  -0.00145
14 Grid                   0.65 -0.00378
15 Grid                   0.7  -0.0108 
16 Grid                   0.75 -0.0136 
17 Grid                   0.8  -0.0210 
18 Grid                   0.85 -0.0243 
19 Grid                   0.9  -0.0329 
20 Grid                   0.95 -0.0419 
21 Grid                   1    -0.0601 
22 Transect               0     0.0782 
23 Transect               0.05  0.0612 
24 Transect               0.1   0.0545 
25 Transect               0.15  0.0459 
26 Transect               0.2   0.0483 
27 Transect               0.25  0.0462 
28 Transect               0.3   0.0440 
29 Transect               0.35  0.0393 
30 Transect               0.4   0.0418 
31 Transect               0.45  0.0384 
32 Transect               0.5   0.0387 
33 Transect               0.55  0.0372 
34 Transect               0.6   0.0346 
35 Transect               0.65  0.0320 
36 Transect               0.7   0.0321 
37 Transect               0.75  0.0330 
38 Transect               0.8   0.0257 
39 Transect               0.85  0.0174 
40 Transect               0.9   0.00477
41 Transect               0.95 -0.0212 
42 Transect               1    -0.0538 
43 Likely to erode        0     0.0895 
44 Likely to erode        0.05  0.0693 
45 Likely to erode        0.1   0.0596 
46 Likely to erode        0.15  0.0526 
47 Likely to erode        0.2   0.0392 
48 Likely to erode        0.25  0.0283 
49 Likely to erode        0.3   0.0277 
50 Likely to erode        0.35  0.0226 
51 Likely to erode        0.4   0.0209 
52 Likely to erode        0.45  0.0217 
53 Likely to erode        0.5   0.0233 
54 Likely to erode        0.55  0.0222 
55 Likely to erode        0.6   0.0276 
56 Likely to erode        0.65  0.0260 
57 Likely to erode        0.7   0.0257 
58 Likely to erode        0.75  0.0226 
59 Likely to erode        0.8   0.0203 
60 Likely to erode        0.85  0.0147 
61 Likely to erode        0.9   0.00542
62 Likely to erode        0.95 -0.0148 
63 Likely to erode        1    -0.0465 
plot1 <- all %>%
  select(prop_forest, Agriculture, Forest, sampling_design) %>%
  pivot_longer(cols = c(Agriculture,  Forest), names_to = "source", values_to = "value") %>%
  group_by(prop_forest, sampling_design, source) %>%
  summarise(`0.5 quantile` = median(value),
            `0.25 quantile` = quantile(value, 0.25),
            `0.75 quantile` = quantile(value, 0.75),
            `0.975 quantile` = quantile(value, 0.975),
            `0.025 quantile` = quantile(value, 0.025)) %>%
  pivot_longer(cols = `0.5 quantile`:`0.025 quantile`, names_to = "quartile", values_to = "value") %>%
  #filter(source == "Forest") %>%
  pivot_wider(names_from = sampling_design, values_from = value)
`summarise()` has grouped output by 'prop_forest', 'sampling_design'. You can
override using the `.groups` argument.
plot1 %>%
  group_by(source, quartile) %>%
  summarise(r = round(cor(Transect, `Likely to erode`), 3)) 
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups:   source [2]
   source      quartile           r
   <chr>       <chr>          <dbl>
 1 Agriculture 0.025 quantile 0.999
 2 Agriculture 0.25 quantile  0.999
 3 Agriculture 0.5 quantile   0.999
 4 Agriculture 0.75 quantile  0.999
 5 Agriculture 0.975 quantile 0.999
 6 Forest      0.025 quantile 0.999
 7 Forest      0.25 quantile  0.999
 8 Forest      0.5 quantile   0.999
 9 Forest      0.75 quantile  0.999
10 Forest      0.975 quantile 0.999
plot1 %>%
  group_by(source, quartile) %>%
  summarise(r = round(cor(Grid, `Likely to erode`), 3)) 
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups:   source [2]
   source      quartile           r
   <chr>       <chr>          <dbl>
 1 Agriculture 0.025 quantile 0.995
 2 Agriculture 0.25 quantile  0.999
 3 Agriculture 0.5 quantile   0.999
 4 Agriculture 0.75 quantile  0.999
 5 Agriculture 0.975 quantile 0.997
 6 Forest      0.025 quantile 0.997
 7 Forest      0.25 quantile  0.999
 8 Forest      0.5 quantile   0.999
 9 Forest      0.75 quantile  0.999
10 Forest      0.975 quantile 0.995
plot1 %>%
  group_by(source, quartile) %>%
  summarise(r = round(cor(Transect, Grid), 3)) 
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups:   source [2]
   source      quartile           r
   <chr>       <chr>          <dbl>
 1 Agriculture 0.025 quantile 0.993
 2 Agriculture 0.25 quantile  0.998
 3 Agriculture 0.5 quantile   0.999
 4 Agriculture 0.75 quantile  0.999
 5 Agriculture 0.975 quantile 0.999
 6 Forest      0.025 quantile 0.999
 7 Forest      0.25 quantile  0.999
 8 Forest      0.5 quantile   0.999
 9 Forest      0.75 quantile  0.998
10 Forest      0.975 quantile 0.993
p1 <- ggplot(data = plot1, aes(x = Grid, y = Transect, colour = source)) +
  geom_point() +
  theme_bw() +
  geom_abline(slope = 1) +
  labs(x = "Grid", y = "Transect") +
  scale_y_continuous(expand = c(0,0.01)) +
  scale_x_continuous(expand = c(0,0.01)) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  scale_colour_manual(values = c("#C5B477", "#439078")) +
  facet_wrap(~quartile, ncol = 5)

p2 <- ggplot(data = plot1, aes(x = Grid, y = `Likely to erode`, colour = source)) +
  geom_point() +
  theme_bw() +
  geom_abline(slope = 1) +
  labs(x = "Grid", y = "Likely to erode") +
  scale_y_continuous(expand = c(0,0.01)) +
  scale_x_continuous(expand = c(0,0.01)) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  scale_colour_manual(values = c("#C5B477", "#439078")) +
  facet_wrap(~quartile, ncol = 5)

p3 <- ggplot(data = plot1, aes(x = Transect, y = `Likely to erode`, colour = source)) +
  geom_point() +
  theme_bw() +
  geom_abline(slope = 1) +
  labs(x = "Transect",y = "Likely to erode") +
  scale_y_continuous(expand = c(0,0.01)) +
  scale_x_continuous(expand = c(0,0.01)) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  scale_colour_manual(values = c("#C5B477", "#439078")) +
  facet_wrap(~quartile, ncol = 5)

mixing_plot3 <- p1 / p2 / p3 + plot_layout(guides = "collect") & theme(legend.position = 'bottom', legend.title = element_blank())
mixing_plot3

#ggsave(plot = mixing_plot3, filename = "Figures R1/Mixture_all_1to1.png", width = 174, height = 200, units = "mm", dpi =600)

Model performance

In [6]:
summary_1 <- all %>%
  select(prop_forest, Agriculture, Forest, sampling_design) %>%
  pivot_longer(cols = c(Agriculture,  Forest), names_to = "source", values_to = "value") %>%
  group_by(sampling_design, prop_forest, source) %>%
  summarise(median = median(value),
            q25 = quantile(value, 0.25),
            q75 = quantile(value, 0.75),
            W50 = quantile(value, 0.75) - quantile(value, 0.25),
            W95 = quantile(value, 0.975) - quantile(value, 0.025),
            P50 = sum(value[value < quantile(value, 0.75) & value > quantile(value, 0.25)])/n(),
            P95 = sum(value[value < quantile(value, 0.975) & value > quantile(value, 0.025)])/n()) %>%
  group_by(sampling_design, source) %>%
  summarise(W50 = mean(W50),
            W95 = mean(W95),
            P50 = mean(P50),
            P95 = mean(P95))
`summarise()` has grouped output by 'sampling_design', 'prop_forest'. You can
override using the `.groups` argument.
`summarise()` has grouped output by 'sampling_design'. You can override using
the `.groups` argument.
summary_2 <- all %>%
  mutate(prop_ag = 1 - prop_forest) %>%
  group_by(prop_forest, sampling_design) %>%
  summarise(CRPS_forest = crps_sample(y = unique(prop_forest), dat = Forest),
            CRPS_ag = crps_sample(y = unique(prop_ag), dat = Agriculture)) %>%
  group_by(sampling_design) %>%
  summarise(Forest = mean(CRPS_forest),
            Agriculture = mean(CRPS_ag)) %>%
  pivot_longer(cols = c(Forest, Agriculture), names_to = "source", values_to = "CRPS")
`summarise()` has grouped output by 'prop_forest'. You can override using the
`.groups` argument.
CRPS_plot <- all %>%
  mutate(prop_ag = 1 - prop_forest) %>%
  group_by(prop_forest, prop_ag, sampling_design) %>%
  summarise(Forest = crps_sample(y = unique(prop_forest), dat = Forest),
            Agriculture = crps_sample(y = unique(prop_ag), dat = Agriculture)) 
`summarise()` has grouped output by 'prop_forest', 'prop_ag'. You can override
using the `.groups` argument.
In [7]:
supfig8 <- ggplot(data = CRPS_plot, aes(x = prop_forest, y = Forest, fill = sampling_design)) +
  geom_line(aes(colour = sampling_design)) +
  geom_point(shape = 21) +
  theme_bw() +
  scale_y_continuous(expand = c(0,0.01)) +
  scale_x_continuous(expand = c(0,0.01)) +
  scale_fill_viridis_d() +
  scale_colour_viridis_d() +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  labs(x = "Virtual Mixture Forest Proportion", y = "CRPS") 
supfig8

saveRDS(supfig8, file = here::here("./images/supfig8.rds"))
# ggsave(plot = mixing_plot4, filename = "Figures R1/Mixture_all_CRPS.png", width = 84, height = 84, units = "mm", dpi =600)
In [8]:
summary_3 <- all %>%
  select(prop_forest, Agriculture, Forest, sampling_design) %>%
  #mutate(prop_ag = 1 - prop_forest) %>%
  pivot_longer(cols = c(Agriculture,  Forest), names_to = "source", values_to = "value") %>%
  #pivot_longer(cols = c(prop_forest,  prop_ag), names_to = "proportion", values_to = "actual") %>%
  group_by(sampling_design, source, prop_forest) %>%
  mutate(error50 = ifelse(value >= quantile(value, 0.25) & value <= quantile(value, 0.75), 0,
                     ifelse(value < quantile(value, 0.25), quantile(value, 0.25) - value,
                            quantile(value, 0.75) - value)),
         error95 = ifelse(value >= quantile(value, 0.025) & value <= quantile(value, 0.975), 0,
                          ifelse(value < quantile(value, 0.025), quantile(value, 0.025) - value,
                                 quantile(value, 0.975) - value)),
         Main = value > .5,
         Main50 = quantile(value, 0.75) > .5,
         Main95 = quantile(value, 0.975) > .5,
         Hit50 = ifelse(Main == T & Main50 == T, T, F),
         Hit95 = ifelse(Main == T & Main95 == T, T, F),
         Miss50 = ifelse(Main == T & Main50 == F, T, F),
         Miss95 = ifelse(Main == T & Main95 == F, T, F),
         FA50 = ifelse(Main == F & Main50 == T, T, F),
         FA95 = ifelse(Main == F & Main95 == T, T, F)) %>%
  group_by(sampling_design, source) %>%
  summarise(MAE50 = mean(abs(error50)),
            MAE95 = mean(abs(error95)),
            ME50 = mean(error50),
            ME95 = mean(error95),
            NSE50 = 1 - (sum(error50^2)) / sum((value - mean(value))^2),
            NSE95 = 1 - (sum(error95^2)) / sum((value - mean(value))^2),
            CSI50 = sum(Hit50)/(sum(Hit50) + sum(Miss50) + sum(FA50)),
            CSI95 = sum(Hit95)/(sum(Hit95) + sum(Miss95) + sum(FA95)),
            HR50 = sum(Hit50) / (sum(Hit50) + sum(Miss50)),
            HR95 = sum(Hit95) / (sum(Hit95) + sum(Miss95)))
`summarise()` has grouped output by 'sampling_design'. You can override using
the `.groups` argument.
In [9]:
summary_all <- summary_3 %>%
  right_join(summary_2) %>%
  right_join(summary_1) %>% 
  ungroup() %>%
  mutate(analyis = paste(sampling_design, source)) %>%
  select(-sampling_design, -source) %>%
  pivot_longer(cols = MAE50:P95, names_to = "Parameter", values_to = "value") %>%
  mutate(value = format(round(value, 2), digits = 2)) %>%
  pivot_wider(names_from = analyis, values_from = value) %>%
  mutate(`Evaluation criteria` = case_when(Parameter %in% c("P50", "P95", "W50", "W95") ~ "Uncertainty",
                                           Parameter %in% c("MAE50", "MAE95", "ME50", "ME95") ~ "Residuals",
                                           Parameter %in% c("CRPS", "NSE50", "NSE95") ~ "Performance",
                                           Parameter %in% c("CSI50", "CSI95", "HR50", "HR95") ~ "Contingency"))
Joining with `by = join_by(sampling_design, source)`
Joining with `by = join_by(sampling_design, source)`
summary_all
# A tibble: 15 × 8
   Parameter `Grid Agriculture` `Grid Forest` `Transect Agriculture`
   <chr>     <chr>              <chr>         <chr>                 
 1 MAE50     0.01               0.01          0.02                  
 2 MAE95     0.00               0.00          0.00                  
 3 ME50      0.00               0.00          0.00                  
 4 ME95      0.00               0.00          0.00                  
 5 NSE50     0.99               0.99          0.98                  
 6 NSE95     1.00               1.00          1.00                  
 7 CSI50     0.92               0.93          0.86                  
 8 CSI95     0.86               0.82          0.75                  
 9 HR50      0.99               0.98          0.99                  
10 HR95      1.00               1.00          1.00                  
11 CRPS      0.02               0.02          0.03                  
12 W50       0.06               0.06          0.09                  
13 W95       0.18               0.18          0.27                  
14 P50       0.25               0.25          0.23                  
15 P95       0.47               0.48          0.44                  
# ℹ 4 more variables: `Transect Forest` <chr>,
#   `Likely to erode Agriculture` <chr>, `Likely to erode Forest` <chr>,
#   `Evaluation criteria` <chr>
In [10]:
In [11]:

summary_all |>
  group_by(`Evaluation criteria`) |>
  gt()|>
  tab_style(style =  cell_text(weight = "bold", align = "center"), locations =  cells_row_groups()) |>
  tab_options(column_labels.font.weight = "bold")
Model evaluation metrics grouped by sampling design and source.
Parameter Grid Agriculture Grid Forest Transect Agriculture Transect Forest Likely to erode Agriculture Likely to erode Forest
Residuals
MAE50 0.01 0.01 0.02 0.02 0.02 0.02
MAE95 0.00 0.00 0.00 0.00 0.00 0.00
ME50 0.00 0.00 0.00 0.00 0.00 0.00
ME95 0.00 0.00 0.00 0.00 0.00 0.00
Performance
NSE50 0.99 0.99 0.98 0.98 0.98 0.98
NSE95 1.00 1.00 1.00 1.00 1.00 1.00
CRPS 0.02 0.02 0.03 0.03 0.02 0.02
Contingency
CSI50 0.92 0.93 0.86 0.90 0.88 0.89
CSI95 0.86 0.82 0.75 0.80 0.77 0.78
HR50 0.99 0.98 0.99 0.98 0.98 0.99
HR95 1.00 1.00 1.00 1.00 1.00 1.00
Uncertainty
W50 0.06 0.06 0.09 0.09 0.09 0.09
W95 0.18 0.18 0.27 0.27 0.26 0.26
P50 0.25 0.25 0.23 0.27 0.24 0.26
P95 0.47 0.48 0.44 0.51 0.45 0.50