Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efficient rename #232

Closed
romainfrancois opened this issue Feb 2, 2014 · 11 comments
Closed

efficient rename #232

romainfrancois opened this issue Feb 2, 2014 · 11 comments
Milestone

Comments

@romainfrancois
Copy link
Member

http://stackoverflow.com/questions/21502465/replacement-for-rename-in-dplyr

plyr::rename makes deep copies:

> dplyr:::changes( mtcars, rename(mtcars, c("disp" = "displacement")) )
Changed variables:
             old         new
mpg          0x102a35d60 0x102a4f5a0
cyl          0x102a35e90 0x102a4f6d0
hp           0x102a360f0 0x102a4f930
drat         0x102a36220 0x102a4fa60
wt           0x102a36350 0x102a4fb90
qsec         0x102a36480 0x102a4fcc0
vs           0x102a365b0 0x102a4fdf0
am           0x102a366e0 0x102a4ff20
gear         0x102a36810 0x102a50050
carb         0x102a36940 0x102a50180
disp         0x102a35fc0
displacement             0x102a4f800

Changed attributes:
             old         new
names        0x1032f02a8 0x10595c2a0
row.names    0x102a36a70 0x102a502b0
class        0x1035f7168 0x10595cfe8

And we could have something more dplyr like:

rename( mtcars, displacement = disp )
@romainfrancois
Copy link
Member Author

rename_ (just naming it like this for now) uses shallow copies:

mtcars2 <- rename_( mtcars, disp2 = disp )
> dplyr:::changes(mtcars2, mtcars)
Changed variables:
      old         new
disp2 0x10323d420
disp              0x10323d420

Changed attributes:
      old         new
names 0x10aaf5200 0x104e90558

@romainfrancois
Copy link
Member Author

Needs more logic to handle the grouped case, i.e handling the labels and vars attributes.

@romainfrancois
Copy link
Member Author

Hmm. Did not see #192. we have too many open issues about select. We should decide what it can do. The shallow_copy internal function makes it quite trivial to implement.

@hadley
Copy link
Member

hadley commented Feb 3, 2014

Yes, select should do this. I'll try and implement a few this week (or at least figure out what select should do)

@romainfrancois
Copy link
Member Author

When i get a chance (travelling now), i'll add support code internally that takes a data frame and a character vector of names and does the right shallow copy thing with them.

This way on the R side w just have to calculate the wanted colimns.

@hadley
Copy link
Member

hadley commented Feb 3, 2014

That sounds like a reasonable implementation. Can you make it take a named vector so it can simultaneously rename and select?

@hadley
Copy link
Member

hadley commented Feb 3, 2014

Initial implementation in 8ccdb07 - I think this will make it easy to implement select() for any backend.

@hadley
Copy link
Member

hadley commented Feb 3, 2014

@romainfrancois to be precise, could you please write select_impl() that is given a data frame and a named vector and produces an output without making any copies. I'll then use that in the select() methods.

@romainfrancois
Copy link
Member Author

I've put in some initial code for a back end select_impl. It takes 3 parameters:

  • the data frame (or grouped data frame).
  • a character vector of current names of variables we want to keep
  • the names we want to give them

Depending on what we want, the 2nd and 3rd might be the same vector.

The code as it is now assumes that args 2 and 3 have the same length, that strings from 2 really are in the data frame. It handles the vars and labels attributes we use for grouping.

We could reduce 2 and 3 as a named vector if you prefer

@romainfrancois
Copy link
Member Author

Done. in the select branch.

> select_impl( mtcars, c( cyl2 = "cyl" ) )
                    cyl2
Mazda RX4              6
Mazda RX4 Wag          6
Datsun 710             4
Hornet 4 Drive         6
Hornet Sportabout      8
...

hadley added a commit that referenced this issue Feb 4, 2014
@hadley
Copy link
Member

hadley commented Feb 4, 2014

Thanks. Just pushed changes to use select_vars() throughout, and select_impl() for data frames, tbl_df() and grouped_df(). Will merge branch tomorrow once I've updated the docs.

@hadley hadley closed this as completed in d948c6e Feb 4, 2014
@hadley hadley modified the milestones: v0.1.2, v0.2 Feb 17, 2014
krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants