Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for formats other than data.frame #5390

Closed
antagomir opened this issue Aug 12, 2023 · 12 comments · Fixed by #5404
Closed

Support for formats other than data.frame #5390

antagomir opened this issue Aug 12, 2023 · 12 comments · Fixed by #5404

Comments

@antagomir
Copy link

antagomir commented Aug 12, 2023

Problem Whereas ggplot2 supports data.frame, many other data structures are available that could benefit from the ability to use ggplot2 functionality. Examples include e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc. Many of these classes support as.data.frame() and can be easily converted into a data.frame. However, the need to do this with every ggplot2 function call becomes rapidly very repetitive.

Suggested solution The default fortify() method, ggplot2:::fortify.default() could just try to call as.data.frame() on the supplied object. This would directly make ggplot() work on any object that supports as.data.frame() (e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc.)

Let's load libraries and example data

library(S4Vectors)
library(ggplot2)
data(iris)

Usual data.frame works as expected:

ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()

DataFrame does not work, and ggplot call throws and error:

ggplot(DataFrame(iris), aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()

Error in fortify():
! data must be a <data.frame>, or an object coercible by fortify(),
not a object.

At the moment our default solution has been to always add as.data.frame() around DataFrame objects, like:

ggplot(as.data.frame(d), aes(x, y)) + geom_point()

There was initial discussion that related to the challenges this adds to teaching standard plotting in ecosystems that rely on classes that are closely related to data.frame but not that.

Initial thought was to solve this in the S4Vectors class (for DataFrame), see the PR by @kevinrue - then @hpages pointed out the more general solution described above.

-> Could ggplot add the as.data.frame check to extend the support to other formats than data.frame? If yes, we might be able to provide a PR.

@teunbrand
Copy link
Collaborator

Using bioconductor extensively myself, I think this is a good idea in principle.
I'm a little bit unsure why this has been opted-out of, so it might be a good idea to ask @hadley if there is a good reason for not attempting to coerce with as.data.frame() in fortify.default.

@hadley
Copy link
Member

hadley commented Aug 13, 2023

I don't think this is a good idea because as.data.frame() accepts many inputs that don't make sense as inputs to ggplot2, e.g.

as.data.frame(1)
#>   1
#> 1 1
as.data.frame(matrix(1:4, nrow = 2))
#>   V1 V2
#> 1  1  3
#> 2  2  4
as.data.frame(NULL)
#> data frame with 0 columns and 0 rows

Created on 2023-08-13 with reprex v2.0.2

In general automatic coercion has a tendency to a make a small number of use cases easier at the high cost of making many mistakes harder to discover.

I think it would be fine to add these methods individually, but I don't think that a blanket default to as.data.frame() is a good idea.

@antagomir
Copy link
Author

One problem is that the DataFrame class specifically is that the DataFrame class is defined in the Bioconductor package S4Vectors. Adding support to ggplot2:::fortify.default() might require importing a Bioconductor package in ggplot2 but I am not sure if this is a possible solution?

@teunbrand
Copy link
Collaborator

No, I think that importing from BioC is not a good option since it isn't in any sense required for the functioning of ggplot2. For that same reason, I don't think it is a good idea for S4Vectors to import ggplot2. Which leaves a little bit of a dilemma.

However, this is precisely what the external generic concept in S7 seems to solve.

@hpages
Copy link
Contributor

hpages commented Aug 14, 2023

@hadley Those are valid concerns but maybe having a few additional sanity checks like length(dim(x)) == 2 and is.character(colnames(x)) before actually trying to call as.data.frame(x) could help mitigate this?

@hadley
Copy link
Member

hadley commented Aug 15, 2023

@hpages yeah, that might be a reasonable approach.

@antagomir
Copy link
Author

Shall we prepare a PR, or what would be a good way to proceed?

@hpages
Copy link
Contributor

hpages commented Aug 15, 2023

I'll work on a PR in the next few days. Thanks @hadley and @antagomir!

@kevinrue
Copy link

Thanks @hpages and everyone involved in the resolution!

@antagomir
Copy link
Author

Awesome. Looking fwd to testing!

@hpages
Copy link
Contributor

hpages commented Sep 1, 2023

This is ready for testing @antagomir. With this change:

library(ggplot2)
library(S4Vectors)

DF1 <- DataFrame(x=1:10, y=runif(10))
ggplot(DF1, mapping=aes(x, y)) + geom_point()  # works!

But:

DF2 <- DataFrame(X=I(cbind(x=1:10, y=runif(10))))
ggplot(DF2, mapping=aes(x, y)) + geom_point()
# Error in `.as_data_frame_trust_no_one()`:
# ! Calling `as.data.frame()` on data-frame-like object `data` (a <DFrame>
#   object) did not preserve its dimensions.
# Run `rlang::last_trace()` to see where the error occurred.

This is a feature!

@antagomir
Copy link
Author

Works like a charm for all the use cases that I tested so far. We can and will do some more testing if this PR becomes merged.

teunbrand pushed a commit that referenced this issue Sep 20, 2023
* fortify.default() accepts data-frame-like objects

`fortify.default()` now accepts a data-frame-like object granted the
object exhibits healthy `dim()`, `colnames()`, and `as.data.frame()`
behaviors. Closes #5390.

* Update snapshot of ggplot(aes(x = x))

* Improve fortify.default() based on Teun's feedback

* Follow style guide a little bit more closely in error messages

(see https://style.tidyverse.org/error-messages.html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants