Spatial Clustering of Points Data for Tidy-modeling
Source:R/spatial_cluster_sample.R
spatial_cluster_sample.Rd
This is a wrapper function around spatial_clustering_cv from spatialsample. Spatial cluster sampling splits a data into V groups groups using partitioning (kmeans)/ hierarchical(hclust) clustering of some variables, typically spatial coordinates. A resample of a analysis data consists of V-1 of the folds/clusters while the assessment set contains the final fold/cluster. In basic spatial cross-validation (i.e. no repeats), the number of resamples is equal to V.
Usage
spatial_cluster_sample(
data = data,
coords = NULL,
v = 10,
spatial = TRUE,
clust_method = "kmeans",
dist_clust = NULL,
...
)
Arguments
- data
input data set one of sp, sf or data.frame with X and Y as variables
- coords
(vector) pair of coordinates if data type is aspatial
- v
number of partitions of the data set or number of clusters
- spatial
(logical) if data set is spatial (when sf or sp) or aspatial (data.frame)
- clust_method
one of partitioning (default = kmeans) or one of hierarchical methods(
hclust
)- dist_clust
the agglomeration method to be used. This should be one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC). the dist_clust in the function is method in stats::hclust
- ...
Extra arguments passed on to
stats::kmeans()
orstats::hclust()
Value
A tibble with classes spatial_cv
, rset
, tbl_df
, tbl
, and
data.frame
. The results include a column for the data split objects and
an identification variable id
.
Details
The variables in the coords
argument, if input data is data.frame or
extracted from sp, or sf data are used for clustering of the data into
disjointed sets. These clusters are used as the folds for cross-validation.
Depending on how the data are distributed spatially.
Since this function heavily relies on "spatialsample" all the attributes class
and attributes were not modified from. In fact, these information holds true in case of
repeated_spatial_cluster_sample
.
References
A. Brenning, "Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest," 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, 2012, pp. 5372-5375, doi: 10.1109/IGARSS.2012.6352393.
Julia Silge (2021). spatialsample: Spatial Resampling Infrastructure. https://github.com/tidymodels/spatialsample, https://spatialsample.tidymodels.org.
Julia Silge, Fanny Chow, Max Kuhn and Hadley Wickham (2021). rsample: General Resampling Infrastructure. R package version 0.1.1. https://CRAN.R-project.org/package=rsample
Examples
# spatial point clustering
# read data
data("landcover")
# setting seeds
set.seed(1318)
spc_fold<- spatial_cluster_sample(data = landcover,coords = NULL,v = 10,spatial = TRUE,
clust_method = "kmeans")
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
class(spc_fold)
#> [1] "spatial_clustering_cv" "rset" "tbl_df"
#> [4] "tbl" "data.frame"
if (FALSE) {
data("landcover")
scv<- spatial_cluster_sample(data = landcover,coords = NULL, v = 10, spatial = TRUE,
clust_method = "kmeans",dist_clust = NULL,)
scv
}