You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was working with a dataset that had repeated column names and the existing map-based methods made it tricky to rename the columns to avoid the duplicates. I was thinking that positional assignment of column names might be both a solution to this particular problem and a more generally useful capability for tech.ml.dataset. On Zulip, it was suggested that I open an issue here about this.
If possible, I'd like to work up a PR for this. It could either be a new function name-columns, or an extension to rename-columns that behaves differently if a vector is passed in. I have no opinion one way or another about which is preferable.
The text was updated successfully, but these errors were encountered:
Both are good options - I would like to stick with rename-columns to avoid another symbol. If we go with a vector approach the only requirement I see is that the vector must be complete - it must have a name for every column. This is not currently true for the map approach.
* Initial implementation of positional rename
This adds a test to replicate the case of a CSV file with multiple
columns that have the same name (but different values). It only tests
the successful case, not the error case.
* Add ex-info and tests for incorrect rename args
I was working with a dataset that had repeated column names and the existing map-based methods made it tricky to rename the columns to avoid the duplicates. I was thinking that positional assignment of column names might be both a solution to this particular problem and a more generally useful capability for
tech.ml.dataset
. On Zulip, it was suggested that I open an issue here about this.If possible, I'd like to work up a PR for this. It could either be a new function
name-columns
, or an extension torename-columns
that behaves differently if a vector is passed in. I have no opinion one way or another about which is preferable.The text was updated successfully, but these errors were encountered: