-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking change in nth() for S4 input #6670
Comments
It seems like
Got the relevant tests passing at least. |
I looked into the Message class, and it looks like that is really a scalar object rather than a vector class (in the same sense of how lm is a scalar object). In that case, this new behavior is actually slightly more correct than what was happening before (but yes it is a behavior change). We do still support S4 classes that setClass("Foo", contains = "integer")
x <- new("Foo", 1:5)
x
#> An object of class "Foo"
#> [1] 1 2 3 4 5
dplyr::first(x)
#> An object of class "Foo"
#> [1] 1 Created on 2023-02-01 with reprex v2.0.2.9000 |
Thanks! Would you say as.list() is likely to be a common workaround? I guess it's not possible to say in full generality. In any case, I think a quick NEWS bullet is warranted for closing the issue. |
@MichaelChirico Do you have a more realistic example? i.e. are these getting put into a data frame somehow? |
Yes I think that's right. Here's the function in which this was found (eliminating some parts that may be extraneous): # Converts a message into a list. If there are any nested messages, those are
# merged with the rest of the message at the highest level. This function only
# handles ONE level of nesting.
#'
#' @param response a proto from an RPC response.
#' @param nested a vector containing element names that contain a nested message
#' @param collection a vector containing element names that contain collections
#' like lists.
#' @return a flat data.frame with data types preserved.
MessagesToDataFrame <- function(response, nested = NULL, collection = NULL) {
# Convert the message to a list of rows, each element containing a flattened
# message, even when there is a nested message.
list.of.flattened.messages <- lapply(
dplyr::first(response),
function(.) {
outer <- as.list(.)
# Process collections.
collections <- list()
for (key in collection) {
# Cache the collection to preserve its type.
collections[[key]] <- outer[[key]]
# Remove the collection from the list.
outer[key] <- NULL
}
# Process nested messages.
for (key in nested) {
# Extract the nested message as a list.
submessagelist <- as.list(first(outer[[key]]))
# Remove the old entry.
outer[[key]] <- NULL
# Merge the message and the nested message.
outer <- c(outer, submessagelist)
}
# Add collections back to the entry with preserved data type.
for (key in collection) {
outer[[key]] <- collections[key]
}
return(outer)
}
)
# Combine the lists into one data frame.
res <- data.table::rbindlist(list.of.flattened.messages, fill = TRUE)
return(res)
} It'll take a bit more work to convert the test cases into something I could share here, but let me know if that's worthwhile. |
It seems like maybe |
I'm also a bit puzzled looking at the code. Let me get you a better reprex. |
Some thing seems off with the object definition: library(RProtoBuf)
example("asMessage", package="RProtoBuf", echo= FALSE)
x <- asMessage(tutorial.Person)
length(x)
#> [1] 5
length(names(x))
#> [1] 10
length(as.list(x))
#> [1] 10 Created on 2023-02-01 with reprex v2.0.2 |
Working on a more faithful recreation. |
Here's a more faithful reprex: cat(file = tmp<-tempfile(), '
syntax = "proto2";
package Ex;
message Response {
enum MyType {
A = 1;
B = 2;
C = 3;
D = 4;
E = 5;
}
message Score {
// Metadata about the score.
optional int64 project_id = 1;
optional int64 item_id = 2;
optional string result_id = 3;
repeated string dupe = 4;
optional MyType type = 5;
optional bool rejected = 6;
optional double minimum = 7;
optional double maximum = 8;
repeated string score = 9;
repeated string flag = 10;
optional double other_minimum = 11;
optional double other_maximum = 12;
}
repeated Score score = 1;
}
')
library(RProtoBuf)
readProtoFiles(tmp)
input <- P("Ex.Response")$readASCII('
score {
project_id: 1
item_id: 1
result_id: "1"
dupe: ""
type: A
rejected: false
minimum: 3.5
maximum: 4.5
other_minimum: 3.5
other_maximum: 4.5
}
score {
project_id: 1
item_id: 2
result_id: "2"
dupe: ""
type: B
rejected: true
minimum: 3.5
maximum: 5.5
other_minimum: 3.5
other_maximum: 5.5
}
')
dplyr::first(input)
# [[1]]
# message of type 'Ex.Response.Score' with 10 fields set
# [[2]]
# message of type 'Ex.Response.Score' with 10 fields set
as.list(input)
# $score
# $score[[1]]
# message of type 'Ex.Response.Score' with 10 fields set
# $score[[2]]
# message of type 'Ex.Response.Score' with 10 fields set
input[[1]]
# [[1]]
# message of type 'Ex.Response.Score' with 10 fields set
# [[2]]
# message of type 'Ex.Response.Score' with 10 fields set It looks like |
I think we are going to have to chalk this up to being part of the technical limitations of vctrs right now, which really only supports S4 is they are naturally backed by an atomic type |
saw the NEWS item, that works for me! thanks |
We have some code using
dplyr::first()
on an S4 input that's broken by the change to 1.1.0.I don't see this mentioned in the NEWS, so I'm filing this bug in case it's unintentional / to find workarounds.
minimal-ish Reprex:
The text was updated successfully, but these errors were encountered: