Closed
Description
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
df.join() does not retain the attrs of the dataset. Given that the attrs manual states that "many operations that create new datasets will retain attrs", this seems like an omission.
Feature Description
Join is different from concat because there is a clear dataframe that the operation is on. Therefore, it would seem natural if df.join() would retain the attrs of the initial dataframe.
Alternative Solutions
It would also be possible to make the attrs dependent on "how" but this would only be natural for "left" and "right".
Additional Context
No response
Activity
timhoffm commentedon Nov 18, 2024
This should be handled the same way as
concat
: Propagate only if all inputs have the same attrs.concat
is currently a hard-coded special case,pandas/pandas/core/generic.py
Line 6056 in 7fe270c
but we may want to delegate the attrs combination back to the operation instead, i.e.
Note that
other
is the "combination object" for there calls, i.e._MergeOperation
,_Concatenator
etc, which will have to grow the logic for combining attrs.Alternatively, one could leave the combination logic in
__finalize__
but provide a uniform interface on all "combination objects" to give their inputs. Currently, thats non-uniform_Concatenator.objs
, but_MergeOperation.left
/_MergeOperation.right
.rhshadrach commentedon Nov 18, 2024
Thanks for the report! @timhoffm - can you post a reproducible example.
BUG: Copy attrs on pd.merge()
timhoffm commentedon Nov 19, 2024
#60357 should fix this. I've choosen the somewhat smaller refactoring and not pushed the combination logic back into the "combination objects". In fact #59141 removed
_Concatenator
in favor of simple functions. Therefore, I've now choosen the common API to be "provides the inputs viainput_objs
parameter".Note that
join()
is implemented viaconcat()
ormerge()
depending on the case. I've only added explicit tests for these fundamental operations, not forjoin()
, but could add that if desired.BUG: Copy attrs on pd.merge()
BUG: Copy attrs on pd.merge()
BUG: Copy attrs on pd.merge()
BUG: Copy attrs on pd.merge()
8 remaining items