Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_geodist uses the Pythagorean theorem for latitude-longitude examples #725

Closed
jkennel opened this issue Jun 14, 2021 · 2 comments
Closed

Comments

@jkennel
Copy link
Contributor

jkennel commented Jun 14, 2021

< -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

The problem

In step_geodist the Pythagorean theorem is used and the documentation examples use latitude-longitude pairs. Unfortunately the Pythagorean theorem really doesn't apply to latitude-longitude pairs. step_geodist returns the same distance for each example below, which is likely not what the user desires. The Haversine function is one way to calculate the more common great circle distance.

I can initiate a pull request with both options and a new is_lat_lon argument to step_geodist if interested.

Reproducible example

library(pracma)   # for haversine function
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

p1 <- data.frame(latitude = 88.0, longitude = 88.0)
ex_1 <- recipe( ~ ., data = p1) |>
  step_geodist(lat = latitude, lon = longitude,
               ref_lat = 89.0, ref_lon = 89.0) |>
  prep() |>
  bake(new_data = NULL)

p1 <- data.frame(latitude = 0.0, longitude = 0.0)
ex_2 <- recipe( ~ ., data = p1) |>
  step_geodist(lat = latitude, lon = longitude,
               ref_lat = 1.0, ref_lon = 1.0) |>
  prep() |>
  bake(new_data = NULL)


ex_1$geo_dist
#> [1] 1.414214
ex_2$geo_dist
#> [1] 1.414214

pracma::haversine(c(88.0, 88.0), c(89.0, 89.0))
#> [1] 111.2288
pracma::haversine(c(0.0, 0.0), c(1.0, 1.0))
#> [1] 157.2494

Created on 2021-06-14 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       Pop!_OS 20.04 LTS           
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_CA:en                    
#>  collate  en_CA.UTF-8                 
#>  ctype    en_CA.UTF-8                 
#>  tz       America/Toronto             
#>  date     2021-06-14                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source        
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.2.1       2020-12-09 [1] CRAN (R 4.1.0)
#>  class         7.3-19      2021-05-03 [4] CRAN (R 4.0.5)
#>  cli           2.5.0       2021-04-26 [1] CRAN (R 4.1.0)
#>  crayon        1.4.1       2021-02-08 [1] CRAN (R 4.1.0)
#>  DBI           1.1.1       2021-01-15 [1] CRAN (R 4.1.0)
#>  digest        0.6.27      2020-10-24 [1] CRAN (R 4.1.0)
#>  dplyr       * 1.0.6       2021-05-05 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2       2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0       2021-05-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0       2020-07-31 [1] CRAN (R 4.1.0)
#>  generics      0.1.0       2020-10-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2       2020-08-27 [1] CRAN (R 4.1.0)
#>  gower         0.2.2       2020-06-23 [1] CRAN (R 4.1.0)
#>  highr         0.9         2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.1.1     2021-01-22 [1] CRAN (R 4.1.0)
#>  ipred         0.9-11      2021-03-12 [1] CRAN (R 4.1.0)
#>  knitr         1.33        2021-04-24 [1] CRAN (R 4.1.0)
#>  lattice       0.20-44     2021-05-02 [4] CRAN (R 4.1.0)
#>  lava          1.6.9       2021-03-11 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.0       2021-02-15 [1] CRAN (R 4.1.0)
#>  lubridate     1.7.10      2021-02-26 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1       2020-11-17 [1] CRAN (R 4.1.0)
#>  MASS          7.3-54      2021-05-03 [4] CRAN (R 4.0.5)
#>  Matrix        1.3-4       2021-06-01 [4] CRAN (R 4.1.0)
#>  nnet          7.3-16      2021-05-03 [4] CRAN (R 4.0.5)
#>  pillar        1.6.1       2021-05-16 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.1.0)
#>  pracma      * 2.3.3       2021-01-23 [1] CRAN (R 4.1.0)
#>  prodlim       2019.11.13  2019-11-17 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4       2020-04-17 [1] CRAN (R 4.1.0)
#>  R6            2.5.0       2020-10-28 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.6       2021-01-15 [1] CRAN (R 4.1.0)
#>  recipes     * 0.1.16.9000 2021-06-14 [1] local         
#>  reprex        2.0.0       2021-04-02 [1] CRAN (R 4.1.0)
#>  rlang         0.4.11      2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.8         2021-05-07 [1] CRAN (R 4.1.0)
#>  rpart         4.1-15      2019-04-12 [4] CRAN (R 4.0.0)
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.6.2       2021-05-17 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.4.1       2021-03-30 [1] CRAN (R 4.1.0)
#>  survival      3.2-11      2021-04-26 [4] CRAN (R 4.0.5)
#>  tibble        3.1.2       2021-05-16 [1] CRAN (R 4.1.0)
#>  tidyr         1.1.3       2021-03-03 [1] CRAN (R 4.1.0)
#>  tidyselect    1.1.1       2021-04-30 [1] CRAN (R 4.1.0)
#>  timeDate      3043.102    2018-02-21 [1] CRAN (R 4.1.0)
#>  utf8          1.2.1       2021-03-12 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8       2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.2       2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.23        2021-05-15 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /home/jonathankennel/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
@juliasilge
Copy link
Member

Closed in #728

@github-actions
Copy link

github-actions bot commented Aug 4, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants