Skip to content

step_geodist uses the Pythagorean theorem for latitude-longitude examples #725

@jkennel

Description

@jkennel

< -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

The problem

In step_geodist the Pythagorean theorem is used and the documentation examples use latitude-longitude pairs. Unfortunately the Pythagorean theorem really doesn't apply to latitude-longitude pairs. step_geodist returns the same distance for each example below, which is likely not what the user desires. The Haversine function is one way to calculate the more common great circle distance.

I can initiate a pull request with both options and a new is_lat_lon argument to step_geodist if interested.

Reproducible example

library(pracma)   # for haversine function
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

p1 <- data.frame(latitude = 88.0, longitude = 88.0)
ex_1 <- recipe( ~ ., data = p1) |>
  step_geodist(lat = latitude, lon = longitude,
               ref_lat = 89.0, ref_lon = 89.0) |>
  prep() |>
  bake(new_data = NULL)

p1 <- data.frame(latitude = 0.0, longitude = 0.0)
ex_2 <- recipe( ~ ., data = p1) |>
  step_geodist(lat = latitude, lon = longitude,
               ref_lat = 1.0, ref_lon = 1.0) |>
  prep() |>
  bake(new_data = NULL)


ex_1$geo_dist
#> [1] 1.414214
ex_2$geo_dist
#> [1] 1.414214

pracma::haversine(c(88.0, 88.0), c(89.0, 89.0))
#> [1] 111.2288
pracma::haversine(c(0.0, 0.0), c(1.0, 1.0))
#> [1] 157.2494

Created on 2021-06-14 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       Pop!_OS 20.04 LTS           
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_CA:en                    
#>  collate  en_CA.UTF-8                 
#>  ctype    en_CA.UTF-8                 
#>  tz       America/Toronto             
#>  date     2021-06-14                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source        
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.2.1       2020-12-09 [1] CRAN (R 4.1.0)
#>  class         7.3-19      2021-05-03 [4] CRAN (R 4.0.5)
#>  cli           2.5.0       2021-04-26 [1] CRAN (R 4.1.0)
#>  crayon        1.4.1       2021-02-08 [1] CRAN (R 4.1.0)
#>  DBI           1.1.1       2021-01-15 [1] CRAN (R 4.1.0)
#>  digest        0.6.27      2020-10-24 [1] CRAN (R 4.1.0)
#>  dplyr       * 1.0.6       2021-05-05 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2       2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0       2021-05-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0       2020-07-31 [1] CRAN (R 4.1.0)
#>  generics      0.1.0       2020-10-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2       2020-08-27 [1] CRAN (R 4.1.0)
#>  gower         0.2.2       2020-06-23 [1] CRAN (R 4.1.0)
#>  highr         0.9         2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.1.1     2021-01-22 [1] CRAN (R 4.1.0)
#>  ipred         0.9-11      2021-03-12 [1] CRAN (R 4.1.0)
#>  knitr         1.33        2021-04-24 [1] CRAN (R 4.1.0)
#>  lattice       0.20-44     2021-05-02 [4] CRAN (R 4.1.0)
#>  lava          1.6.9       2021-03-11 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.0       2021-02-15 [1] CRAN (R 4.1.0)
#>  lubridate     1.7.10      2021-02-26 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1       2020-11-17 [1] CRAN (R 4.1.0)
#>  MASS          7.3-54      2021-05-03 [4] CRAN (R 4.0.5)
#>  Matrix        1.3-4       2021-06-01 [4] CRAN (R 4.1.0)
#>  nnet          7.3-16      2021-05-03 [4] CRAN (R 4.0.5)
#>  pillar        1.6.1       2021-05-16 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.1.0)
#>  pracma      * 2.3.3       2021-01-23 [1] CRAN (R 4.1.0)
#>  prodlim       2019.11.13  2019-11-17 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4       2020-04-17 [1] CRAN (R 4.1.0)
#>  R6            2.5.0       2020-10-28 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.6       2021-01-15 [1] CRAN (R 4.1.0)
#>  recipes     * 0.1.16.9000 2021-06-14 [1] local         
#>  reprex        2.0.0       2021-04-02 [1] CRAN (R 4.1.0)
#>  rlang         0.4.11      2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.8         2021-05-07 [1] CRAN (R 4.1.0)
#>  rpart         4.1-15      2019-04-12 [4] CRAN (R 4.0.0)
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.6.2       2021-05-17 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.4.1       2021-03-30 [1] CRAN (R 4.1.0)
#>  survival      3.2-11      2021-04-26 [4] CRAN (R 4.0.5)
#>  tibble        3.1.2       2021-05-16 [1] CRAN (R 4.1.0)
#>  tidyr         1.1.3       2021-03-03 [1] CRAN (R 4.1.0)
#>  tidyselect    1.1.1       2021-04-30 [1] CRAN (R 4.1.0)
#>  timeDate      3043.102    2018-02-21 [1] CRAN (R 4.1.0)
#>  utf8          1.2.1       2021-03-12 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8       2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.2       2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.23        2021-05-15 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /home/jonathankennel/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions