-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sparse vectors in tibbles? #196
Comments
Oh. I now see that tibble is a very lightweight wrapper around data.frame. So I guess the above is doomed. |
We could do that, but probably not until we've switched to vctrs, and support for sparse columns is added there. What happens if you call as_tibble() on such a data frame? |
Actually it works. If I remove the 1d check in dataframe.R I get: > tt <- tibble(a = 1:10, b = sparseVector(c(T, T, T), c(2, 5, 9), 10))
> dd <- cbind(tt, tt)
> str(dd)
'data.frame': 10 obs. of 4 variables:
$ a: int 1 2 3 4 5 6 7 8 9 10
$ b:Formal class 'lsparseVector' [package "Matrix"] with 3 slots
.. ..@ x : logi TRUE TRUE TRUE
.. ..@ length: num 10
.. ..@ i : num 2 5 9
$ a: int 1 2 3 4 5 6 7 8 9 10
$ b:Formal class 'lsparseVector' [package "Matrix"] with 3 slots
.. ..@ x : logi TRUE TRUE TRUE
.. ..@ length: num 10
.. ..@ i : num 2 5 9
> dd[1, "b"]
sparse vector (nnz/length = 0/1) of class "lsparseVector"
[1] . The problem is with the print method: > tt
Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, (from utils-format.r#89) :
first argument must be atomic But that's easy to fix I would guess. |
I can look deeper into it if there is interest. Adjusting is_vector to is_vector <- function(x) {
is_atomic(x) || is.list(x) || inherits(x, "sparseVector")
} is enough to fix the constructor. |
I'd prefer if this was tackled with r-lib/vctrs#27, then tibbles get to support this data type once we switch to vctrs. For now you can probably work around at your own risk with as_tibble(validate = FALSE) . |
I don't think you are right about passing the ball to that project. These two projects have orthogonal concerns. The number of data structures supported by The core task of tibble is to store column data. Potentially arbitrary types of data could be stored as columns as long as those types support basic protocol (length, [, [[, what else?). Why would one want to artificially restrict this abstraction to vectrs? Finaly the scope of that project is pretty undefined at the moment and it might take years till it finally takes shape. |
I'd like to refrain from supporting in tibble anything that might bring dplyr in trouble. The support for nonstandard data types in dplyr is rather weak, to improve it we need a solid foundation like vctrs. So, currently you'll have to use I don't see the issue of "exponential growth of cross-type operations" because vctrs will be very strict about what can be combined. @hadley: Could you please comment? |
Vctrs is where we will be thinking about "vector type" objects including thinking about what the generic API should be. A data frame/tibble is just a little of vectors of equal lengths. So this discussion definitely belongs in vctrs. I expect the bulk of vctrs to be complete by end of next year, and the a lot of the API design will be complete in the next 3 months. |
@mmaechler: Now that vctrs is gaining traction, what's the best way to support sparse vectors and matrices in data frames or tibbles? Currently I see: tibble::tibble(a = 1:10, b = Matrix::sparseVector(c(T, T, T), c(2, 5, 9), 10))
#> All columns in a tibble must be 1d or 2d objects:
#> * Column `b` is lsparseVector
data.frame(a = 1:10, b = Matrix::sparseVector(c(T, T, T), c(2, 5, 9), 10))
#> Error in as.data.frame.default(x[[i]], optional = TRUE): cannot coerce class 'structure("lsparseVector", package = "Matrix")' to a data.frame Created on 2019-08-07 by the reprex package (v0.3.0) |
Interesting: As As |
Thanks, Martin. {vctrs} 0.2.0 is on CRAN. The relevant text is https://vctrs.r-lib.org/articles/s3-vector.html, it's talking about S3 vectors but perhaps we can use S3 dispatch for the S4 classes? Outside of vctrs, we could see if we can implement an I suspect it's sufficient to implement methods for the base class(es). We could start the work in a package that imports {Matrix} and then incorporate if desired. |
@lionel-: What would be a good |
I'm not familiar with sparse implementations. Is it similar to run-length encoding? Maybe a record vector would be appropriate? |
It's an S4 class with two vector and one scalar components. x <- Matrix::sparseVector(1:3, 3:5, 7)
x
#> sparse vector (nnz/length = 3/7) of class "isparseVector"
#> [1] . . 1 2 3 . .
as.numeric(x)
#> [1] 0 0 1 2 3 0 0
x@x
#> [1] 1 2 3
x@i
#> [1] 3 4 5
x@length
#> [1] 7 Created on 2019-11-01 by the reprex package (v0.3.0) |
Thanks, I'll look into it once I'm done with tidyselect. I'm going to implement methods for third party packages like sf. By the way, I think the tibble methods should also stay in vctrs for a while, because there will probably be API changes in the coming year. |
Does this also mean that 98cac0d (and other packages we discover) belongs in vctrs? Happy to submit PRs. Are we still recommending the "list subclass" approach, where we treat each S3 class that explicitly inherits from |
If it's an important class feel free to submit a PR implementing methods. Perhaps after I have implemented sf, so that we have figured out the layout, conventions, and testing setup. We're definitely going to do the explicit list inheritance approach, that's the first thing I'll tackle when I get back to work on vctrs, which I plan to this week. |
Requires r-lib/vctrs#332, r-lib/vctrs#550 and probably r-lib/vctrs#551, before we can start adding vctrs support to {Matrix} classes. Once the following returns vctrs::vec_is(Matrix::sparseVector(integer(), integer(), 0))
#> [1] FALSE Created on 2020-03-11 by the reprex package (v0.3.0) Even then this would have to happen in a different place, I'm thinking about a {mtrx} package which could eventually be incorporated into {Matrix}. |
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
How difficult would it be to allow for sparse vectors as part of tibbles?
For instance
My primary interest is in space efficient logical vectors.
The text was updated successfully, but these errors were encountered: