Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum(is.na()) helpers #2

Closed
hadley opened this issue Sep 6, 2016 · 7 comments
Closed

sum(is.na()) helpers #2

hadley opened this issue Sep 6, 2016 · 7 comments

Comments

@hadley
Copy link
Member

hadley commented Sep 6, 2016

n_absent() and n_present()

@njtierney
Copy link

Are you looking to do something really efficient in cpp or would something like the following suffice?

df <- tibble::tibble(x = c(NA, NA, NA, 1.6, 1.8),
                     y = c(NA, 5, 9, 10, NA))

df
#> # A tibble: 5 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1    NA    NA
#> 2    NA     5
#> 3    NA     9
#> 4   1.6    10
#> 5   1.8    NA
# n_absent ------------------------------------------------------------

sum(is.na(df$x))
#> [1] 3
sum(is.na(df$y))
#> [1] 2

n_absent <- function(x) sum(is.na(x))

n_absent(df$x)
#> [1] 3
n_absent(df$y)
#> [1] 2

# n_present -----------------------------------------------------------

sum(!(is.na(df$x)))
#> [1] 2
sum(!(is.na(df$y)))
#> [1] 3

n_present <- function(x) sum(!(is.na(x)))

n_present(df$x)
#> [1] 2
n_present(df$y)
#> [1] 3

@hadley
Copy link
Member Author

hadley commented Sep 12, 2016

It can be a little more efficient if done in C, but basically that.

@njtierney
Copy link

Just a friendly note that some of these helpers are in naniar at the moment, not sure if you plan to implement this in vctrs but if you do, perhaps let me know so I can reduce overlap, and/or help out!

library(naniar)

n_miss(airquality)
#> [1] 44
n_miss(airquality$Ozone)
#> [1] 37
n_complete(airquality)
#> [1] 874
n_complete(airquality$Ozone)
#> [1] 116

prop_miss(airquality)
#> [1] 0.04793028
prop_miss(airquality$Ozone)
#> [1] 0.2418301
prop_complete(airquality)
#> [1] 0.9520697
prop_complete(airquality$Ozone)
#> [1] 0.7581699

pct_miss(airquality)
#> [1] 4.793028
pct_miss(airquality$Ozone)
#> [1] 24.18301
pct_complete(airquality)
#> [1] 95.20697
pct_complete(airquality$Ozone)
#> [1] 75.81699

@njtierney
Copy link

Wanted to add a note here from r-lib/rlang#558

A verb/function that does always return a data.frame / matrix:

  • is_na returns TRUE/FALSE (scalar)
  • are_na returns vector of TRUE/FALSE (vector)
  • are_na_<class> returns class <class>. e.g., are_na_df returns a data.frame of NA?

And quoting @hadley :

I think the principle that we could now follow is that are_na(x) has type logical, but shape that matches x. I think that is a succinct description of the behaviour that you desire. (And actually that's a nice description of a vectorised function - the shape of the output matches the shape of the input(s))

@hadley hadley transferred this issue from r-lib/vctrs Oct 31, 2018
@hadley hadley closed this as completed in 08645fe Nov 29, 2019
@hadley
Copy link
Member Author

hadley commented Nov 29, 2019

I think we'll just stick with is_na() for now; I don't want to commit to 4+ helpers yet.

@lionel-
Copy link
Member

lionel- commented Nov 29, 2019

@hadley Should we try to maintain a different prefix between is_ predicate and vectorised predicates? This way if you see is_ you know that by design it returns a single non-missing boolean, and it's safe to use in if () for instance.

@hadley
Copy link
Member Author

hadley commented Nov 29, 2019

I suspect that ship has already sailed, but it’s worth thinking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants