Skip to content

Parsing issues when combining as_date and across with multiple date formats #1042

@ad1729

Description

@ad1729

Hi,

I have a data frame with multiple date columns (encoded as characters) where the date format can differ between columns (e.g. col1 has the following format YYYY-MM-DD and col2 is coded as DD/MM/YYYY. What I'd like to do is supply multiple formats while converting the char columns to date columns.

Using lubridate::as_date(..., format = c(...)), lubridate version is 1.8.0

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

tibble(
    date1 = c("2020-10-02", NA_character_), 
    date2 = c("03/10/2020", "03/11/2020")
    ) %>% 
    mutate(
        across(
            .cols = contains("date"), 
            .fns = ~ lubridate::as_date(.x, format = c("%Y-%m-%d", "%d/%m/%Y"))
            )
        )
#> # A tibble: 2 × 2
#>   date1      date2     
#>   <date>     <date>    
#> 1 2020-10-02 NA        
#> 2 NA         2020-11-03

packageVersion("lubridate")
#> [1] '1.8.0'

packageVersion("dplyr")
#> [1] ‘1.0.8’

Not sure why this fails to convert the first element in date2 from 03/10/2020 to 2020-10-03.

On the other hand, using base R's as.Date(..., tryFormats = c(...)) works as expected / desired.

tibble(
    date1 = c("2020-10-02", NA_character_), 
    date2 = c("03/10/2020", "03/11/2020")
    ) %>% 
    mutate(
        across(
            .cols = contains("date"), 
            .fns = ~ as.Date(.x, tryFormats = c("%Y-%m-%d", "%d/%m/%Y"))
            )
        )
#> # A tibble: 2 × 2
#>   date1      date2     
#>   <date>     <date>    
#> 1 2020-10-02 2020-10-03
#> 2 NA         2020-11-03

Additionally, using as_date(parse_date_time(..., orders = c(...))) gives the right result too on this example.

tibble(
    date1 = c("2020-10-02", NA_character_), 
    date2 = c("03/10/2020", "03/11/2020")
    ) %>% 
    mutate(
        across(
            .cols = contains("date"), 
            .fns = ~ lubridate::as_date(lubridate::parse_date_time(.x, orders = c("%Y-%m-%d", "%d/%m/%Y")))
        )
    )
#> # A tibble: 2 × 2
#>   date1      date2     
#>   <date>     <date>    
#> 1 2020-10-02 2020-10-03
#> 2 NA         2020-11-03

What is unexpected / surprising is that as_date(...) accepts multiple formats but generates NAs where there should be data. As far as I can tell, this only seems to be happening when combining as_date and across. If there are multiple formats in one single column then as_date works as expected

tibble(
    date1 = c("2020-10-02", "03/10/2020", NA_character_)
    ) %>% 
    mutate(
        date1_1 = lubridate::as_date(date1, format = "%Y-%m-%d"), 
        date1_2 = lubridate::as_date(date1, format = c("%Y-%m-%d", "%d/%m/%Y")))
#> # A tibble: 3 × 3
#>   date1      date1_1    date1_2   
#>   <chr>      <date>     <date>    
#> 1 2020-10-02 2020-10-02 2020-10-02
#> 2 03/10/2020 NA         2020-10-03
#> 3 <NA>       NA         NA

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions