New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do() gives error if returned values are not data frames #397
Comments
Oh, maybe it's that you have to provide a name if the result isn't a data frame? That seems kind of counterintuitive, or at least it could be documented better, and could be mentioned in the error message.
It would also be useful to have a function for extracting columns. You can use
|
Maybe this perhaps. Not sure I is more elegant though:
|
I'd prefer to leave functions that operate on vectors/columns in magrittr. I'll try and make the docs more clear. |
I'm with @wch, my expectation today was that I could run run Edit: actually, I get what I expect if I just assign to a variable name (as @wch mentions above). One improvement might be to return a list with the |
Hi @hadley , I've been battling with this one today. I'm happy to paste a full working example but I'm guessing since this is a year old that you're possibly done and dusted with the topic. An example of something I do regularly, is I will have hospitalisation records for people (multiple records per person), and I might be interested in the first record for each person for a particular diagnosis. I write a small function to find the first record and pull that row out (of that persons larger set of records) and return 'all the first records'. In the past has worked perfectly, rbinding (I presume) the records together. In dplyr I have to do cs_all %>% I tried cs_all %>% but I guess it fails because do wants a data frame as the output? Even editing my findfirst function to output a data frame instead of a vector didn't work. I'm obviously failing to make the connection between some of the nuances of these functions and how they handle different types of data. That said, there's probably a much better way to find the first record than the cobbled together functions I'm using, but it does feel somewhat clean and followable from my end :/ |
@nzcoops -- you should consider posting this to stackoverflow; it's a good place for questions like this and @hadley is very active there. I think what you want is probably
or, for a parallel with
Without seeing the definition of |
Thanks you (lots!) for the reply. findfirst <- function(x, outcome, var = "diag_lab"){ I could have hardcoded x$diag_lab instead of the way I did it but I wanted it to be generic. I am on stackoverflow, googling and a few other SO questions (including one of mine related to do() lead me here. Both your suggestions work, the second I like, the first is just a step I would have never dreamt of taking :) My findfirst is redundant! But did work well in ddply (though there was probably a similar method to your examples within the plyr framework that I could have used). I guess while these solutions work, it doesn't change the underlying want for this framework to rbind? vectors without the need to wrap as.data.frame? Like in the scenario where one might be doing manipulation within the findfirst function as well (as in, more complex than just head/rownumber as a proxy). Clearly I need to go further down the rabbit hole! |
Looking at your code and trying some things, I see the problem here: the problem is not when your function returns results, it's when it doesn't. The So this might be something that is actually fixable on the dplyr side, but it's probably worth bringing it up as a separate issue, namely, that functions returning NULL in a |
@andrewla me again... So I'm trying to something similar but different, and running into another error.
Because x will always be returned a as a data frame in this instance, it's not the (exact) same issue as last time. Any insights or thoughts appreciated. I can export those 100 records and share them somewhere if you a) can't spot it from the code and b) care enough to pursue it :) |
Yet again I arrive at this Issue after Googling, |
You can still use do to get arbitrary output by using side effects:
|
@kendonB it's much simpler to do: iris %>% group_by(Species) %>%
do(model = lm(Petal.Length ~ Petal.Width, data = .)) You should never use But these days I'd recommend the combination of dplyr, tidyr, and purrr described in https://www.youtube.com/watch?v=rz3_FDVt9eg |
Oh I see. I was trying this:
when it should be this:
|
And for completeness, I believe this was what hadley was talking about:
|
The docs say:
But it doesn't work this way. For example:
The result I would expect is something like this (which presently works):
Note that I'm just using
nrow
to illustrate; I know you can usegroup_size
to get this result. The thing I'm actually doing returns a vector for each group, so there's not a simple drop-in replacement for it.The text was updated successfully, but these errors were encountered: