Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto casting in rbind_all #493

Closed
jeroen opened this issue Jul 11, 2014 · 5 comments
Closed

Auto casting in rbind_all #493

jeroen opened this issue Jul 11, 2014 · 5 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@jeroen
Copy link

jeroen commented Jul 11, 2014

The following works in plyr, it would be great to get it to work in dplyr as well.

mydata <- list(
  data.frame(x=c("foo", "bar")),
  data.frame(x=NA)
)

plyr::rbind.fill(mydata)
dplyr::rbind_all(mydata)

Some context: such data often appear when parsing json, in which a null might become NA. For example:

#requires jsonlite >= 0.9.9
library(jsonlite)

#store all pages in a list first
baseurl <- "http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"
pages <- list()
for(i in 0:20){
  mydata <- fromJSON(paste0(baseurl, "&page=", i), flatten=TRUE)
  message("Retrieving page ", i)
  pages[[i+1]] <- mydata$filings
}

#combine all into one 
library(plyr)
filings <- rbind.fill(pages)
@jimhester
Copy link
Contributor

Also for this case you can get it to work (with a cast from factor to character) if you specify the character form of NA.

mydata <- list(
  data.frame(x=c("foo", "bar")),
  data.frame(x=NA_character_)
)

This would be nice to special case, this situation probably happens fairly often in practice (anytime a calculation over a given group returns NA)

Also your example is missing a comma in between the data frame observations.

@yasminlucero
Copy link

I have the same request. Below is my minimally reproducible example. If I pass only NA, it gives an error. But if I cast the NA to NA_character_ it works as desired.

> t1 = data.frame(a = 1:3, b = rep(NA, 3)) 
> t2 = data.frame(a = 11:13, b = rep('a', 3), stringsAsFactors = FALSE)
> tmp = list(t1, t2)
> rbind_all(tmp)

Error: incompatible type (data index: 2, column: 'b', was collecting: logical (dplyr::Collecter_Impl<10>), incompatible with data of type: character

> t1 = data.frame(a = 1:3, b = rep(NA_character_, 3)) 
> tmp = list(t1, t2)
> rbind_all(tmp)

   a    b
  1 NA
  2 NA
  3 NA
 11    a
 12    a
 13    a

@hadley
Copy link
Member

hadley commented Jul 28, 2014

@romainfrancois I think we discussed this in another issue - this one has convinced me that we should allow a vector containing only NA to be converted to any other type.

@romainfrancois
Copy link
Member

Just added a test for "are all these values NA :

template <int RTYPE>
inline bool all_na_impl( const Vector<RTYPE>& x ){
    return all( is_na(x) ).is_true() ; 
}

inline bool all_na( SEXP x ){
    RCPP_RETURN_VECTOR( all_na_impl, x ) ;        
}

And do nothing in that case because values are already NA :

            } else if( all_na(source) ) {
                // do nothing, the collecter already initialized data with the
                // right NA 
            }

@jeroen
Copy link
Author

jeroen commented Nov 12, 2014

I don't think this fully works:

mydata <- list(
  data.frame(x=NA, stringsAsFactors = F),
  data.frame(x=c("foo", "bar"), stringsAsFactors = F)
)

plyr::rbind.fill(mydata)$x
# [1] NA    "foo" "bar"
dplyr::rbind_all(mydata)$x
# Error: incompatible type 

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants