Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base scale not working with tibble #762

Closed
TiberiusGracchus2020 opened this issue Apr 15, 2020 · 13 comments
Closed

Base scale not working with tibble #762

TiberiusGracchus2020 opened this issue Apr 15, 2020 · 13 comments
Milestone

Comments

@TiberiusGracchus2020
Copy link

@TiberiusGracchus2020 TiberiusGracchus2020 commented Apr 15, 2020

Applying base scale to multiple columns in a tibble results in an incorrectly scaled output and breaks the tibble on sort. Fresh windows, fresh and latest R/RStudio, latest tidyverse.

library(tibble)
iris <- as_tibble(iris)
iris[1:3] <- scale(iris[1:3])
head(iris)
#> # A tibble: 6 x 5
#>   Sepal.Length[,"~ [,"Sepal.Width"] [,"Petal.Length~ Sepal.Width[,"S~
#>              <dbl>            <dbl>            <dbl>            <dbl>
#> 1           -0.898           1.02              -1.34           -0.898
#> 2           -1.14           -0.132             -1.34           -1.14 
#> 3           -1.38            0.327             -1.39           -1.38 
#> 4           -1.50            0.0979            -1.28           -1.50 
#> 5           -1.02            1.25              -1.34           -1.02 
#> 6           -0.535           1.93              -1.17           -0.535
#> # ... with 7 more variables: [,"Sepal.Width"] <dbl>, [,"Petal.Length"] <dbl>,
#> #   Petal.Length[,"Sepal.Length"] <dbl>, [,"Sepal.Width"] <dbl>,
#> #   [,"Petal.Length"] <dbl>, Petal.Width <dbl>, Species <fct>
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 15, 2020

scale() is a matrix, it must be a list or data frame to be distributed across columns when assigning.

library(tibble)
iris <- as_tibble(iris)
scale <- scale(iris[1:3])
class(scale)
#> [1] "matrix"
iris[1:3] <- as.data.frame(scale)
head(iris)
#> # A tibble: 6 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1       -0.898      1.02          -1.34         0.2 setosa 
#> 2       -1.14      -0.132         -1.34         0.2 setosa 
#> 3       -1.38       0.327         -1.39         0.2 setosa 
#> 4       -1.50       0.0979        -1.28         0.2 setosa 
#> 5       -1.02       1.25          -1.34         0.2 setosa 
#> 6       -0.535      1.93          -1.17         0.4 setosa

Created on 2020-04-15 by the reprex package (v0.3.0)

@TiberiusGracchus2020
Copy link
Author

@TiberiusGracchus2020 TiberiusGracchus2020 commented Apr 15, 2020

Fixed by rolling Tibble package version back to 2.1.3

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 15, 2020

See https://tibble.tidyverse.org/dev/articles/invariants.html for details. What is your use case?

@TiberiusGracchus2020
Copy link
Author

@TiberiusGracchus2020 TiberiusGracchus2020 commented Apr 15, 2020

Standardizing (mean 0, sd =1) multiple columns at once before aggregating to create an index measure. Data science/ analytics

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 15, 2020

Are you calling scale() in a script of yours, or is a package failing now?

@TiberiusGracchus2020
Copy link
Author

@TiberiusGracchus2020 TiberiusGracchus2020 commented Apr 15, 2020

calling from a script

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 15, 2020

I wonder if this is common enough to warrant a compatibility fix that will be converted to an error later on.

We would disallow assignment of columns if the right-hand side is a matrix, and issue a deprecation warning. To get rid of the warning, users would need to use as.data.frame() or list() to specify the intended behavior.

Would that help your use case?

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 15, 2020

Alternatively, we could also revert to previous behavior and ask users to wrap in a list if they really want a matrix column.

@TiberiusGracchus2020
Copy link
Author

@TiberiusGracchus2020 TiberiusGracchus2020 commented Apr 15, 2020

I'm no expert, but for my case, the previous behavior is desirable. The tibble is "broken" after I execute the same command in the latest version, sorting returns an error:
Error in xj[i, , drop = FALSE] : subscript out of bounds
Scale with tibble has been a rather serious component of my toolset for data wrangling, but I'm not an expert in R.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 16, 2020

Thanks. Now:

library(tibble)
iris <- as_tibble(iris)
iris[1:3] <- as.data.frame(scale(iris[1:3]))
head(iris)
#> # A tibble: 6 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1       -0.898      1.02          -1.34         0.2 setosa 
#> 2       -1.14      -0.132         -1.34         0.2 setosa 
#> 3       -1.38       0.327         -1.39         0.2 setosa 
#> 4       -1.50       0.0979        -1.28         0.2 setosa 
#> 5       -1.02       1.25          -1.34         0.2 setosa 
#> 6       -0.535      1.93          -1.17         0.4 setosa

Created on 2020-04-16 by the reprex package (v0.3.0)

Added a section to https://tibble.tidyverse.org/dev/articles/invariants.html#column-subassignment-x-j-a.

@TiberiusGracchus2020
Copy link
Author

@TiberiusGracchus2020 TiberiusGracchus2020 commented Apr 16, 2020

Thanks!

@mlane3
Copy link

@mlane3 mlane3 commented Apr 16, 2020

oh my gosh thanks so much!

@krlmlr krlmlr closed this as completed Apr 17, 2020
@krlmlr krlmlr added this to the 3.0.1 milestone Apr 17, 2020
krlmlr added a commit that referenced this issue Feb 25, 2021
tibble 3.0.1

- `[<-.tbl_df()` coerces matrices to data frames (#762).

- Use delayed import for cli to work around unload problems in downstream packages (#754).

- More soft-deprecation warnings are actually visible.

- If `.name_repair` is a function, no repair messages are shown (#763).

- Remove superseded signal for `as_tibble.list()`, because `as_tibble_row()` only works for size 1.

- `as_tibble(validate = )` now always triggers a deprecation warning.

- Subsetting and subassignment of rows with one-column matrices work again, with a deprecation warning (#760).

- Attempts to update a tibble row with an atomic vector give a clearer error message. Recycling message for subassignment appears only if target size is != 1.

- Tweak title of "Invariants" vignette.
@github-actions
Copy link

@github-actions github-actions bot commented Apr 18, 2021

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants