- Fixed bug where passing a polygon to
spatial_nndm_cv()
forced leave-one-out CV, rather than the intended sampling of prediction points from the polygon.
-
spatial_block_cv()
now adds anexpand_bbox
attribute to the resulting rset for compatibility withrsample::reshuffle_rset()
-
autoplot.spatial_block_cv()
now plots the proper grid (using the newexpand_bbox
attribute).
-
spatial_block_cv()
gains an argument,expand_bbox
, which represents the proportion a bounding box should be expanded by (each corner of the bounding box is expanded bybbox_corner_value * expand_bbox
).- This is a breaking change for data in planar coordinate reference systems. Set to 0 to obtain previous behaviors.
- Data in geographic coordinates was already having its bounding box expanded by the default 0.00001.
- This makes it so that regularly spaced data is less likely to fall precisely along grid lines (and therefore fall into two assessment sets) and so that geographic data falls is more likely to fall within the constructed grid.
- Thanks to Nikos on StackOverflow for reporting this behavior: https://stackoverflow.com/q/77374348/9625040
-
spatial_block_cv()
will now throw an error if observations are in multiple assessment folds (caused by observations, or observation centroids, falling precisely along grid polygon boundaries). -
In
spatial_nndm_cv()
, passing a single polygon (or multipolygon) to theprediction_sites
argument will result in prediction sites being sampled from that polygon, rather than from its bounding box. -
get_rsplit()
is now re-exported from the rsample package. This provides a more natural, pipe-able interface for accessing individual splits;get_rsplit(rset, 1)
is identical torset$splits[[1]]
.
spatial_nndm_cv()
is a new function for nearest neighbor distance matching cross-validation, as described in Milà et al. 2022 (doi: 10.1111/2041-210X.13851). NNDM was first implemented in CAST (https://cran.r-project.org/package=CAST).
-
spatial_clustering_cv()
no longer accepts non-sf objects. Usersample::clustering_cv()
for these instead (#126). -
spatial_clustering_cv()
now uses edge-to-edge distances, like the rest of the package, rather than centroids (#126).
-
All functions now have a
repeats
argument, defaulting to 1, allowing for repeated cross-validation (#122, #125, #126). -
spatial_clustering_cv()
now has adistance_function
argument, set by default toas.dist(sf::st_distance(x))
(#126).
-
Outputs from
spatial_buffer_vfold_cv()
should now have the correctradius
andbuffer
attributes (#110). -
spatial_buffer_vfold_cv()
now has the correctid
values when using repeats (#116). -
spatial_buffer_vfold_cv()
now throws an error whenrepeats > 1 && v >= nrow(data)
(#116). -
The minimum
sf
version required is now>= 1.0-9
, so that unit objects can be passed tocellsize
inspatial_block_cv()
(#113; #124). -
autoplot()
now handles repeated cross-validation properly (#123).
-
Mike Mahoney is taking over as package maintainer, as Julia Silge (who remains a package author) moves to focus on ModelOps work.
-
Functions will now return rsplits without
out_id
, like most rsample functions, wheneverbuffer
isNULL
. -
spatial_block_cv()
,spatial_buffer_vfold_cv()
, and buffering now support using sf or sfc objects with a missing CRS. The assumption is that data in an NA CRS is projected, with all distance values in the same unit as the projection. Trying to use alternative units will fail. Set a CRS if these assumptions aren't correct. -
spatial_buffer_vfold_cv()
and buffering no longer support tibble or data.frame inputs (they now require sf or sfc objects). It was not easy to use these to begin with, but should have always caused an error: usersample::vfold_cv()
instead or transform your data into an sf object. -
spatial_buffer_vfold_cv()
has had some attribute changes to matchrsample
:strata
attribute is now the name of the column used for stratification, or not set if there was no stratification.pool
andbreaks
have been added as attributesradius
andbuffer
are now set to 0 if they were passed asNULL
.
-
spatial_buffer_vfold_cv()
is a new function which wrapsrsample::vfold_cv()
, allowing users to add inclusion radii and exclusion buffers to their vfold resamples. This is the supported way to perform spatially buffered leave-one-out cross validation (setv
tonrow(data)
). -
spatial_leave_location_out_cv()
is a new function with wrapsrsample::group_vfold_cv()
, allowing users to add inclusion radii and exclusion buffers to their vfold resamples. -
spatial_block_cv()
is a new function for performing spatial block cross-validation. It currently supports randomly assigning blocks to folds. -
spatial_clustering_cv()
gains an argument,cluster_function
, which specifies what type of clustering to perform.cluster_function = "kmeans"
, the default, usesstats::kmeans()
for k-means clustering, whilecluster_function = "hclust"
usesstats::hclust()
for hierarchical clustering. Users can also provide their own clustering function. -
spatial_clustering_cv()
now supportssf
objects! Coordinates are inferred automatically when usingsf
objects, and anything passed tocoords
will be ignored with a warning. Clusters made usingsf
objects will take coordinate reference systems into account (usingsf::st_distance()
), unlike those made using data frames. -
All resampling functions now support spatial buffering using two arguments.
radius
lets you specify an inclusion radius for your test set, where any data withinradius
of the original assessment set will be added to the assessment set.buffer
specifies an exclusion buffer around the test set, where any data withinbuffer
of the assessment set (afterradius
is applied) will be excluded from both sets. -
autoplot()
now has a method for spatial resamples built fromsf
objects. It works both onrset
objects and onrsplit
objects, and has a special method for outputs fromspatial_block_cv()
. -
boston_canopy
is a new dataset with data on tree canopy change over time in Boston, Massachusetts, USA. It uses a projected coordinate reference system and US customary units; see?boston_canopy
for instructions on how to install these into your PROJ installation if needed.
-
The "Getting Started" vignette has been revised to demonstrate the new features and clustering methods.
-
A new vignette has been added walking through the spatial buffering process.
-
R versions before 3.4 are no longer supported.
-
glue
,sf
, andunits
have been added to Imports. -
ggplot2
has been moved to Imports. It had been in Suggests. -
covr
,gifski
,lwgeom
, andvdiffr
are now in Suggests. -
rlang
now has a minimum version of 1.0.0 (was previously unversioned).
- Added a
NEWS.md
file to track changes to the package.