---
title: "Landscape Metric Covariates"
author: "Bill Peterman"
output:
  rmarkdown::html_vignette:
    number_sections: true
    toc: true
vignette: >
  %\VignetteIndexEntry{Landscape Metric Covariates}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
library(multiScaleR)
library(terra)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6.5,
  fig.height = 4.25,
  warning = FALSE,
  message = FALSE
)

pkg_extdata <- function(...) {
  src_path <- normalizePath(file.path("..", "inst", "extdata", ...),
                            winslash = "/",
                            mustWork = FALSE)
  if (file.exists(src_path)) {
    return(src_path)
  }

  inst_path <- system.file("extdata", ..., package = "multiScaleR")
  if (nzchar(inst_path) && file.exists(inst_path)) {
    return(inst_path)
  }

  stop("Could not locate required extdata file: ", paste(..., collapse = "/"))
}
```

# Why explicit covariate specifications?

Traditional `multiScaleR` workflows assume that each raster layer becomes one
optimized, kernel-weighted covariate. That remains the default behavior. For
landscape metrics, however, the relationship between raster layers and model
covariates is no longer one-to-one.

For example, the same binary forest raster might be used to calculate:

- a distance-weighted proportion of forest cover,
- edge density within a fixed buffer, and
- edge density at an optimized scale.

Those are different covariates, even though they come from the same source
raster. The `msr_vars()` interface lets you name those derived covariates
explicitly.

---

# Landscape metric reference

`landscape_var()` supports 16 metrics grouped into three categories. All metrics
are computed within a circular buffer centered on each sample point (or on each
raster cell when projecting with `kernel_scale.raster()`). Metrics that operate
on categorical rasters expect integer class values; continuous rasters should
be classified before use.

## Composition metrics

These metrics describe the diversity and abundance of patch types within the
buffer. They require a categorical (integer-valued) raster.

| Code | Name | Description |
|------|------|-------------|
| `shdi` | Shannon Diversity Index | $-\sum p_i \ln p_i$, where $p_i$ is the proportion of class $i$. Increases with richness and evenness. The `base` argument controls the logarithm base (default: natural log). |
| `shei` | Shannon Evenness Index | SHDI / ln(richness). Ranges from 0 (one dominant class) to 1 (perfectly even). |
| `sidi` | Simpson Diversity Index | $1 - \sum p_i^2$. Probability that two randomly selected cells belong to different classes. Ranges from 0 to 1. |
| `siei` | Simpson Evenness Index | SIDI / (1 − 1/richness). Standardizes SIDI against maximum possible diversity; requires ≥ 2 classes. |
| `msidi` | Modified Simpson Diversity | $-\ln \sum p_i^2$. Logarithmic transformation of Simpson concentration; unbounded above. |
| `msiei` | Modified Simpson Evenness | MSIDI / ln(richness). Evenness analogue of MSIDI. |
| `pr` | Patch Richness | Count of distinct classes present in the buffer. |
| `prd` | Patch Richness Density | PR per 100 ha of buffer area. |
| `rpr` | Relative Patch Richness | (PR / `classes_max`) × 100. Requires the `classes_max` argument. |
| `ta` | Total Area | Total area of the buffer in hectares. |

## Edge metrics

These metrics quantify the amount and complexity of boundaries between classes.
They require a categorical raster and use cell adjacency information.

| Code | Name | Description |
|------|------|-------------|
| `ed` | Edge Density | Total internal edge length (m) per hectare of buffer area. Higher values indicate a more fragmented landscape. |
| `te` | Total Edge | Total internal edge length in metres. |
| `lsi` | Landscape Shape Index | Total perimeter / minimum possible perimeter for the same number of cells. LSI = 1 for a perfectly compact (circular) patch; increases with irregularity. |

## Adjacency metrics

These metrics describe how often similar or dissimilar classes occur next to one
another. They require a categorical raster and use cell-pair counts.

| Code | Name | Description |
|------|------|-------------|
| `ai` | Aggregation Index | Area-weighted aggregation of like adjacencies within classes. Higher values indicate that classes are more aggregated. |
| `pladj` | Proportion of Like Adjacencies | Percentage of cell-pair boundaries shared between cells of the *same* class. High values indicate a clumped or aggregated landscape. |
| `contag` | Contagion | Measures the degree of clumping across all class pairs. Ranges from 0 (maximum interspersion) to 100 (single class). Requires ≥ 2 classes. |

---

# Load example data

This example uses the simulated count data and survey locations included with
`multiScaleR`. The data contain 100 spatial sample points, which is large enough
to demonstrate the workflow without making the vignette expensive to build.

```{r load-data}
data("landscape_counts")
dat <- landscape_counts

data("surv_pts")
pts <- vect(surv_pts)

land_rast <- rast(pkg_extdata("landscape.tif"))
```

The raster layer `land1` is a binary habitat layer. To demonstrate a categorical
landscape-diversity metric, we also create a simple three-class land-cover layer
from the continuous `land2` raster. In a real analysis, this would usually be a
mapped categorical land-cover product.

```{r build-landcover}
landcover <- classify(
  land_rast$land2,
  rcl = matrix(
    c(-Inf, -0.5, 1,
      -0.5,  0.5, 2,
       0.5,  Inf, 3),
    ncol = 3,
    byrow = TRUE
  )
)
names(landcover) <- "landcover"

metric_rasters <- c(land_rast$land1, landcover)
```

```{r plot-inputs, fig.cap = "***Source rasters used to derive model covariates.***"}
plot(metric_rasters)
plot(metric_rasters$land1, main = "land1 with survey points")
points(pts, pch = 19, cex = 0.6, col = "red")
```

---

# Define derived covariates

Use `msr_vars()` to define model covariates as transformations of source raster
layers. Here, `land1_prop` is optimized as a kernel-weighted covariate, while
`land1_ed_500` and `landcover_shdi_500` are fixed-radius landscape metrics
calculated within 500 m of each point.

```{r define-vars}
scale_vars <- msr_vars(
  land1_prop = kernel_var("land1"),
  land1_ed_500 = landscape_var("land1", metric = "ed", radius = 500),
  landcover_shdi_500 = landscape_var("landcover", metric = "shdi", radius = 500)
)

scale_vars
```

This is the key distinction:

- `land1` and `landcover` are source raster layers.
- `land1_prop`, `land1_ed_500`, and `landcover_shdi_500` are model covariates.
- Only `land1_prop` receives an optimized sigma parameter in this example.

---

# Prepare model data

Pass the covariate specification to `kernel_prep()`. The returned
`kernel_dat` data frame contains the derived covariates, not merely the original
raster layer names.

```{r kernel-prep}
kernel_inputs <- kernel_prep(
  pts = pts,
  raster_stack = metric_rasters,
  max_D = 1200,
  kernel = "gaussian",
  scale_vars = scale_vars,
  verbose = FALSE
)

head(kernel_inputs$kernel_dat)
```

Combine the derived covariates with the response data and fit the starting
model.

```{r fit-model}
df <- data.frame(dat, kernel_inputs$kernel_dat)

mod <- glm(
  counts ~ site + land1_prop + land1_ed_500 + landcover_shdi_500,
  family = poisson(),
  data = df
)
```

---

# Optimize the kernel covariate

Because the two landscape metrics were defined with fixed radii, only
`land1_prop` is optimized. This keeps the example fast while still showing how
kernel-scaled and landscape-metric covariates can be included in the same model.

```{r optimize}
opt <- multiScale_optim(
  fitted_mod = mod,
  kernel_inputs = kernel_inputs,
  verbose = FALSE
)
```

```{r summary}
summary(opt)
```

---

# Project derived covariates to rasters

`kernel_scale.raster()` also understands `scale_vars`. When a fitted
`multiScaleR` object is supplied, optimized covariates use their fitted scale
and fixed-radius landscape metrics use their specified radius.

```{r project, fig.cap = "***Projected derived covariates.***"}
projected <- kernel_scale.raster(
  raster_stack = metric_rasters,
  multiScaleR = opt,
  scale_center = TRUE,
  clamp = TRUE,
  verbose = FALSE
)

plot(projected)
```

The resulting raster names match the model covariates:

```{r projected-names}
names(projected)
```

---

# Producing landscape metric rasters without a fitted model

`kernel_scale.raster()` can generate landscape metric raster surfaces directly,
without having fitted a model at all. This is useful when you want to explore
how a metric varies across the landscape at a given scale, compare metrics
side-by-side before committing to a model structure, or simply produce a
covariate map from a classified raster.

Pass a `scale_vars` object in which all covariates have a fixed `radius` (i.e.,
`radius` is not `NULL` in `landscape_var()`). No `sigma` argument is needed
because no scale is being optimized.

```{r standalone-metric-rasters, fig.cap = "***Landscape metric surfaces computed directly from the source rasters at a fixed 500 m radius.***", fig.height = 7}
## Define several metrics at a fixed 500 m radius
standalone_vars <- msr_vars(
  land1_ai    = landscape_var("land1",    metric = "ai",    radius = 500),
  land1_ed    = landscape_var("land1",    metric = "ed",    radius = 500),
  land1_lsi   = landscape_var("land1",    metric = "lsi",   radius = 500),
  land1_pladj = landscape_var("land1",    metric = "pladj", radius = 500),
  cover_shdi  = landscape_var("landcover", metric = "shdi",  radius = 500),
  cover_pr    = landscape_var("landcover", metric = "pr",    radius = 500),
  cover_sidi  = landscape_var("landcover", metric = "sidi",  radius = 500)
)

## Apply directly — no multiScaleR object or sigma values required
metric_surfaces <- kernel_scale.raster(
  raster_stack = metric_rasters,
  scale_vars   = standalone_vars,
  verbose      = FALSE
)

plot(metric_surfaces)
```

Each output layer is named after the covariate defined in `msr_vars()`:

```{r standalone-names}
names(metric_surfaces)
```

You can also compute a single metric in one line without defining a `msr_vars()`
object — simply wrap a single `landscape_var()` call:

```{r standalone-single, fig.cap = "***Edge density within a 300 m radius.***"}
ed_300 <- kernel_scale.raster(
  raster_stack = metric_rasters["land1"],
  scale_vars   = msr_vars(ed_300m = landscape_var("land1", metric = "ed", radius = 300)),
  verbose      = FALSE
)

plot(ed_300, main = "Edge density (m/ha) within 300 m")
```

Different radii produce notably different surfaces for the same metric. The
example below computes edge density at three scales and plots them together for
comparison:

```{r standalone-multiscale, fig.cap = "***Edge density at three spatial scales.***", fig.height = 3.5}
ed_multi <- kernel_scale.raster(
  raster_stack = metric_rasters["land1"],
  scale_vars   = msr_vars(
    ed_200m = landscape_var("land1", metric = "ed", radius = 200),
    ed_500m = landscape_var("land1", metric = "ed", radius = 500),
    ed_900m = landscape_var("land1", metric = "ed", radius = 900)
  ),
  verbose = FALSE
)

plot(ed_multi,
     main = c("ED — 200 m", "ED — 500 m", "ED — 900 m"))
```

---

# Optimizing landscape metric radii

To optimize the scale of a landscape metric, omit `radius` from
`landscape_var()`. For example:

```{r optimized-landscape-vars, eval = FALSE}
scale_vars_opt <- msr_vars(
  land1_prop = kernel_var("land1"),
  land1_ed = landscape_var("land1", metric = "ed"),
  landcover_shdi = landscape_var("landcover", metric = "shdi")
)
```

This is statistically meaningful, but it is more computationally expensive than
optimizing a kernel-weighted mean. At each candidate radius, the landscape metric
must be recalculated for every sample point. For moderate or large data sets,
consider first testing the workflow with a subset of points or using
precomputed results in reproducible documents.

---

# Practical guidance

Use fixed-radius landscape metrics when you have a biologically motivated buffer
distance or when the metric is expensive to recompute. Use optimized landscape
metric radii when the scale itself is a central question and computation time is
acceptable.

At present, optimized landscape metrics are not supported with
`kernel = "expow"` because the exponential-power shape parameter does not apply
cleanly to fixed-buffer landscape metrics.