Hi,
I have a data frame with multiple date columns (encoded as characters) where the date format can differ between columns (e.g. col1 has the following format YYYY-MM-DD and col2 is coded as DD/MM/YYYY. What I'd like to do is supply multiple formats while converting the char columns to date columns.
Using lubridate::as_date(..., format = c(...)), lubridate version is 1.8.0
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
tibble(
date1 = c("2020-10-02", NA_character_),
date2 = c("03/10/2020", "03/11/2020")
) %>%
mutate(
across(
.cols = contains("date"),
.fns = ~ lubridate::as_date(.x, format = c("%Y-%m-%d", "%d/%m/%Y"))
)
)
#> # A tibble: 2 × 2
#> date1 date2
#> <date> <date>
#> 1 2020-10-02 NA
#> 2 NA 2020-11-03
packageVersion("lubridate")
#> [1] '1.8.0'
packageVersion("dplyr")
#> [1] ‘1.0.8’
Not sure why this fails to convert the first element in date2 from 03/10/2020 to 2020-10-03.
On the other hand, using base R's as.Date(..., tryFormats = c(...)) works as expected / desired.
tibble(
date1 = c("2020-10-02", NA_character_),
date2 = c("03/10/2020", "03/11/2020")
) %>%
mutate(
across(
.cols = contains("date"),
.fns = ~ as.Date(.x, tryFormats = c("%Y-%m-%d", "%d/%m/%Y"))
)
)
#> # A tibble: 2 × 2
#> date1 date2
#> <date> <date>
#> 1 2020-10-02 2020-10-03
#> 2 NA 2020-11-03
Additionally, using as_date(parse_date_time(..., orders = c(...))) gives the right result too on this example.
tibble(
date1 = c("2020-10-02", NA_character_),
date2 = c("03/10/2020", "03/11/2020")
) %>%
mutate(
across(
.cols = contains("date"),
.fns = ~ lubridate::as_date(lubridate::parse_date_time(.x, orders = c("%Y-%m-%d", "%d/%m/%Y")))
)
)
#> # A tibble: 2 × 2
#> date1 date2
#> <date> <date>
#> 1 2020-10-02 2020-10-03
#> 2 NA 2020-11-03
What is unexpected / surprising is that as_date(...) accepts multiple formats but generates NAs where there should be data. As far as I can tell, this only seems to be happening when combining as_date and across. If there are multiple formats in one single column then as_date works as expected
tibble(
date1 = c("2020-10-02", "03/10/2020", NA_character_)
) %>%
mutate(
date1_1 = lubridate::as_date(date1, format = "%Y-%m-%d"),
date1_2 = lubridate::as_date(date1, format = c("%Y-%m-%d", "%d/%m/%Y")))
#> # A tibble: 3 × 3
#> date1 date1_1 date1_2
#> <chr> <date> <date>
#> 1 2020-10-02 2020-10-02 2020-10-02
#> 2 03/10/2020 NA 2020-10-03
#> 3 <NA> NA NA
Hi,
I have a data frame with multiple date columns (encoded as characters) where the date format can differ between columns (e.g.
col1has the following formatYYYY-MM-DDandcol2is coded asDD/MM/YYYY. What I'd like to do is supply multiple formats while converting the char columns to date columns.Using
lubridate::as_date(..., format = c(...)), lubridate version is 1.8.0Not sure why this fails to convert the first element in date2 from
03/10/2020to2020-10-03.On the other hand, using base R's
as.Date(..., tryFormats = c(...))works as expected / desired.tibble( date1 = c("2020-10-02", NA_character_), date2 = c("03/10/2020", "03/11/2020") ) %>% mutate( across( .cols = contains("date"), .fns = ~ as.Date(.x, tryFormats = c("%Y-%m-%d", "%d/%m/%Y")) ) ) #> # A tibble: 2 × 2 #> date1 date2 #> <date> <date> #> 1 2020-10-02 2020-10-03 #> 2 NA 2020-11-03Additionally, using
as_date(parse_date_time(..., orders = c(...)))gives the right result too on this example.tibble( date1 = c("2020-10-02", NA_character_), date2 = c("03/10/2020", "03/11/2020") ) %>% mutate( across( .cols = contains("date"), .fns = ~ lubridate::as_date(lubridate::parse_date_time(.x, orders = c("%Y-%m-%d", "%d/%m/%Y"))) ) ) #> # A tibble: 2 × 2 #> date1 date2 #> <date> <date> #> 1 2020-10-02 2020-10-03 #> 2 NA 2020-11-03What is unexpected / surprising is that
as_date(...)accepts multiple formats but generatesNAs where there should be data. As far as I can tell, this only seems to be happening when combiningas_dateandacross. If there are multiple formats in one single column thenas_dateworks as expectedtibble( date1 = c("2020-10-02", "03/10/2020", NA_character_) ) %>% mutate( date1_1 = lubridate::as_date(date1, format = "%Y-%m-%d"), date1_2 = lubridate::as_date(date1, format = c("%Y-%m-%d", "%d/%m/%Y"))) #> # A tibble: 3 × 3 #> date1 date1_1 date1_2 #> <chr> <date> <date> #> 1 2020-10-02 2020-10-02 2020-10-02 #> 2 03/10/2020 NA 2020-10-03 #> 3 <NA> NA NA