-
Notifications
You must be signed in to change notification settings - Fork 122
Closed
Description
< -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
The problem
In step_geodist the Pythagorean theorem is used and the documentation examples use latitude-longitude pairs. Unfortunately the Pythagorean theorem really doesn't apply to latitude-longitude pairs. step_geodist returns the same distance for each example below, which is likely not what the user desires. The Haversine function is one way to calculate the more common great circle distance.
I can initiate a pull request with both options and a new is_lat_lon argument to step_geodist if interested.
Reproducible example
library(pracma) # for haversine function
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
p1 <- data.frame(latitude = 88.0, longitude = 88.0)
ex_1 <- recipe( ~ ., data = p1) |>
step_geodist(lat = latitude, lon = longitude,
ref_lat = 89.0, ref_lon = 89.0) |>
prep() |>
bake(new_data = NULL)
p1 <- data.frame(latitude = 0.0, longitude = 0.0)
ex_2 <- recipe( ~ ., data = p1) |>
step_geodist(lat = latitude, lon = longitude,
ref_lat = 1.0, ref_lon = 1.0) |>
prep() |>
bake(new_data = NULL)
ex_1$geo_dist
#> [1] 1.414214
ex_2$geo_dist
#> [1] 1.414214
pracma::haversine(c(88.0, 88.0), c(89.0, 89.0))
#> [1] 111.2288
pracma::haversine(c(0.0, 0.0), c(1.0, 1.0))
#> [1] 157.2494Created on 2021-06-14 by the reprex package (v2.0.0)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.0 (2021-05-18)
#> os Pop!_OS 20.04 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_CA:en
#> collate en_CA.UTF-8
#> ctype en_CA.UTF-8
#> tz America/Toronto
#> date 2021-06-14
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
#> class 7.3-19 2021-05-03 [4] CRAN (R 4.0.5)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.1.0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
#> dplyr * 1.0.6 2021-05-05 [1] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
#> gower 0.2.2 2020-06-23 [1] CRAN (R 4.1.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
#> ipred 0.9-11 2021-03-12 [1] CRAN (R 4.1.0)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
#> lattice 0.20-44 2021-05-02 [4] CRAN (R 4.1.0)
#> lava 1.6.9 2021-03-11 [1] CRAN (R 4.1.0)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
#> lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> MASS 7.3-54 2021-05-03 [4] CRAN (R 4.0.5)
#> Matrix 1.3-4 2021-06-01 [4] CRAN (R 4.1.0)
#> nnet 7.3-16 2021-05-03 [4] CRAN (R 4.0.5)
#> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> pracma * 2.3.3 2021-01-23 [1] CRAN (R 4.1.0)
#> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.1.0)
#> recipes * 0.1.16.9000 2021-06-14 [1] local
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
#> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.1.0)
#> rpart 4.1-15 2019-04-12 [4] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
#> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.1.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.1.0)
#> survival 3.2-11 2021-04-26 [4] CRAN (R 4.0.5)
#> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.1.0)
#> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.1.0)
#> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
#> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.1.0)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
#> xfun 0.23 2021-05-15 [1] CRAN (R 4.1.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] /home/jonathankennel/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/libraryMetadata
Metadata
Assignees
Labels
No labels