Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change in nth() for S4 input #6670

Closed
MichaelChirico opened this issue Jan 30, 2023 · 12 comments
Closed

Breaking change in nth() for S4 input #6670

MichaelChirico opened this issue Jan 30, 2023 · 12 comments
Milestone

Comments

@MichaelChirico
Copy link
Contributor

We have some code using dplyr::first() on an S4 input that's broken by the change to 1.1.0.

I don't see this mentioned in the NEWS, so I'm filing this bug in case it's unintentional / to find workarounds.

minimal-ish Reprex:

library(RProtoBuf)
example("asMessage", package="RProtoBuf")

x = asMessage( tutorial.Person )

## before 1.1.0
dplyr::first(x)
# [1] "Person"
## after 1.1.0
dplyr::first(x)
# Error in vec_size(x) : `x` must be a vector, not a <Message> object.
@MichaelChirico
Copy link
Contributor Author

It seems like as.list() was sufficient to get the code working again:

dplyr::first(as.list(x))`

Got the relevant tests passing at least.

@DavisVaughan
Copy link
Member

I looked into the Message class, and it looks like that is really a scalar object rather than a vector class (in the same sense of how lm is a scalar object).

In that case, this new behavior is actually slightly more correct than what was happening before (but yes it is a behavior change).

We do still support S4 classes that contain a vector class, like "integer", as these function more like vectors.

setClass("Foo", contains = "integer")

x <- new("Foo", 1:5)
x
#> An object of class "Foo"
#> [1] 1 2 3 4 5

dplyr::first(x)
#> An object of class "Foo"
#> [1] 1

Created on 2023-02-01 with reprex v2.0.2.9000

@MichaelChirico
Copy link
Contributor Author

Thanks! Would you say as.list() is likely to be a common workaround? I guess it's not possible to say in full generality.

In any case, I think a quick NEWS bullet is warranted for closing the issue.

@hadley
Copy link
Member

hadley commented Feb 1, 2023

@MichaelChirico Do you have a more realistic example? i.e. are these getting put into a data frame somehow?

@MichaelChirico
Copy link
Contributor Author

Yes I think that's right. Here's the function in which this was found (eliminating some parts that may be extraneous):

# Converts a message into a list. If there are any nested messages, those are
# merged with the rest of the message at the highest level. This function only
# handles ONE level of nesting.
#'
#' @param response a proto from an RPC response.
#' @param nested a vector containing element names that contain a nested message
#' @param collection a vector containing element names that contain collections
#'        like lists.
#' @return a flat data.frame with data types preserved.
MessagesToDataFrame <- function(response, nested = NULL, collection = NULL) {
  # Convert the message to a list of rows, each element containing a flattened
  # message, even when there is a nested message.
  list.of.flattened.messages <- lapply(
    dplyr::first(response),
    function(.) {
      outer <- as.list(.)
      # Process collections.
      collections <- list()
      for (key in collection) {
        # Cache the collection to preserve its type.
        collections[[key]] <- outer[[key]]
        # Remove the collection from the list.
        outer[key] <- NULL
      }
      # Process nested messages.
      for (key in nested) {
        # Extract the nested message as a list.
        submessagelist <- as.list(first(outer[[key]]))
        # Remove the old entry.
        outer[[key]] <- NULL
        # Merge the message and the nested message.
        outer <- c(outer, submessagelist)
      }
      # Add collections back to the entry with preserved data type.
      for (key in collection) {
        outer[[key]] <- collections[key]
      }
      return(outer)
    }
  )
  # Combine the lists into one data frame.
  res <- data.table::rbindlist(list.of.flattened.messages, fill = TRUE)
  return(res)
}

It'll take a bit more work to convert the test cases into something I could share here, but let me know if that's worthwhile.

@hadley
Copy link
Member

hadley commented Feb 1, 2023

It seems like maybe dplyr::first() didn't do anything since response is a scalar?

@MichaelChirico
Copy link
Contributor Author

I'm also a bit puzzled looking at the code. Let me get you a better reprex.

@hadley
Copy link
Member

hadley commented Feb 1, 2023

Some thing seems off with the object definition:

library(RProtoBuf)
example("asMessage", package="RProtoBuf", echo= FALSE)

x <- asMessage(tutorial.Person)
length(x)
#> [1] 5
length(names(x))
#> [1] 10
length(as.list(x))
#> [1] 10

Created on 2023-02-01 with reprex v2.0.2

@MichaelChirico
Copy link
Contributor Author

tutorial.Person is not the actual message type being used, I just pulled that one since it's included in the package itself.

Working on a more faithful recreation.

@MichaelChirico
Copy link
Contributor Author

Here's a more faithful reprex:

cat(file = tmp<-tempfile(), '
syntax = "proto2";
package Ex;

message Response {
  enum MyType {
    A = 1;
    B = 2;
    C = 3;
    D = 4;
    E = 5;
  }

  message Score {
    // Metadata about the score.
    optional int64 project_id = 1;
    optional int64 item_id = 2;
    optional string result_id = 3;
    repeated string dupe = 4;
    optional MyType type = 5;

    optional bool rejected = 6;
    optional double minimum = 7;
    optional double maximum = 8;
    repeated string score = 9;
    repeated string flag = 10; 
    optional double other_minimum = 11;
    optional double other_maximum = 12;
  }

  repeated Score score = 1;
}
')

library(RProtoBuf)
readProtoFiles(tmp)
input <- P("Ex.Response")$readASCII('
score {
  project_id: 1
  item_id: 1
  result_id: "1"
  dupe: ""
  type: A
  rejected: false
  minimum: 3.5
  maximum: 4.5
  other_minimum: 3.5
  other_maximum: 4.5
}
score {
  project_id: 1
  item_id: 2
  result_id: "2"
  dupe: ""
  type: B
  rejected: true
  minimum: 3.5
  maximum: 5.5
  other_minimum: 3.5
  other_maximum: 5.5
}
')

dplyr::first(input)
# [[1]]
# message of type 'Ex.Response.Score' with 10 fields set

# [[2]]
# message of type 'Ex.Response.Score' with 10 fields set

as.list(input)
# $score
# $score[[1]]
# message of type 'Ex.Response.Score' with 10 fields set

# $score[[2]]
# message of type 'Ex.Response.Score' with 10 fields set

input[[1]]
# [[1]]
# message of type 'Ex.Response.Score' with 10 fields set

# [[2]]
# message of type 'Ex.Response.Score' with 10 fields set

It looks like [[1]] is actually more faithful to what dplyr::first() was doing.

DavisVaughan added a commit that referenced this issue Feb 1, 2023
@DavisVaughan DavisVaughan added this to the 1.1.1 milestone Feb 7, 2023
@DavisVaughan
Copy link
Member

I think we are going to have to chalk this up to being part of the technical limitations of vctrs right now, which really only supports S4 is they are naturally backed by an atomic type
https://vctrs.r-lib.org/dev/reference/vector-checks.html#technical-limitations

@MichaelChirico
Copy link
Contributor Author

saw the NEWS item, that works for me! thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants