Spatial Clustering of Points Data for Tidy-modeling — spatial_cluster

This is a wrapper function around spatial_clustering_cv from spatialsample. Spatial cluster sampling splits a data into V groups groups using partitioning (kmeans)/ hierarchical(hclust) clustering of some variables, typically spatial coordinates. A resample of a analysis data consists of V-1 of the folds/clusters while the assessment set contains the final fold/cluster. In basic spatial cross-validation (i.e. no repeats), the number of resamples is equal to V.

Usage

spatial_cluster_sample(
  data = data,
  coords = NULL,
  v = 10,
  spatial = TRUE,
  clust_method = "kmeans",
  dist_clust = NULL,
  ...
)

Arguments

data: input data set one of sp, sf or data.frame with X and Y as variables
coords: (vector) pair of coordinates if data type is aspatial
v: number of partitions of the data set or number of clusters
spatial: (logical) if data set is spatial (when sf or sp) or aspatial (data.frame)
clust_method: one of partitioning (default = kmeans) or one of hierarchical methods(hclust)
dist_clust: the agglomeration method to be used. This should be one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC). the dist_clust in the function is method in stats::hclust
...: Extra arguments passed on to stats::kmeans() or stats::hclust()

Value

A tibble with classes spatial_cv, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and an identification variable id.

Details

The variables in the coords argument, if input data is data.frame or extracted from sp, or sf data are used for clustering of the data into disjointed sets. These clusters are used as the folds for cross-validation. Depending on how the data are distributed spatially.

Since this function heavily relies on "spatialsample" all the attributes class and attributes were not modified from. In fact, these information holds true in case of repeated_spatial_cluster_sample.

References

A. Brenning, "Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest," 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, 2012, pp. 5372-5375, doi: 10.1109/IGARSS.2012.6352393.

Julia Silge (2021). spatialsample: Spatial Resampling Infrastructure. https://github.com/tidymodels/spatialsample, https://spatialsample.tidymodels.org.

Julia Silge, Fanny Chow, Max Kuhn and Hadley Wickham (2021). rsample: General Resampling Infrastructure. R package version 0.1.1. https://CRAN.R-project.org/package=rsample

Examples


# spatial point clustering

# read data

data("landcover")

# setting seeds

set.seed(1318)

spc_fold<- spatial_cluster_sample(data = landcover,coords = NULL,v = 10,spatial = TRUE,
                                 clust_method = "kmeans")
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
class(spc_fold)
#> [1] "spatial_clustering_cv" "rset"                  "tbl_df"               
#> [4] "tbl"                   "data.frame"           


if (FALSE) {
data("landcover")

scv<- spatial_cluster_sample(data = landcover,coords = NULL, v = 10, spatial = TRUE,
           clust_method = "kmeans",dist_clust = NULL,)
scv
}