Skip to content

joins match rows on NA across datasets #2033

@garrettgman

Description

@garrettgman

Below, inner_join() matches two rows that each have an NA for the key. This seems to violate the intuition about NA's (and leads to an incorrect result).

This seems to happen for all of the joins.

library(dplyr)
​
songs <- data_frame(song = c("Do-Re-Mi", "A Spoonful of Sugar"), movie = c("The Sound of Music", NA))
songs
##                  song              movie
##                 <chr>              <chr>
## 1            Do-Re-Mi The Sound of Music
## 2 A Spoonful of Sugar               <NA>
​
singers <- data_frame(movie = c(NA, "The Sound of Music"), singer = c("Arnold Schwarzenegger", "Julie Andrews"))
singers
##                movie                singer
##                <chr>                 <chr>
## 1               <NA> Arnold Schwarzenegger
## 2 The Sound of Music         Julie Andrews
​
songs %>% inner_join(singers, by = "movie")
##                  song              movie                singer
##                 <chr>              <chr>                 <chr>
## 1            Do-Re-Mi The Sound of Music         Julie Andrews
## 2 A Spoonful of Sugar               <NA> Arnold Schwarzenegger

Metadata

Metadata

Assignees

Labels

bugan unexpected problem or unintended behavior

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions