Open
Description
In #50370 the function dedup_names
has been moved to pandas.io.common
so it can be reused by any reader dealing with duplicate column names. The function can be expanded in the future to allow custom renaming patterns, so it should be used by any reader, to make sure we keep consistency with the behavior (as well as avoid duplicate code). There is at least one instance identified in #50370 where a different implementation is used to rename the duplicate columns. We should call dedup_names
instead, and in case other alternative implementations exist, find them and also call dedup_names
.
Activity
muddi900 commentedon Dec 24, 2022
take
leftful commentedon Mar 1, 2023
Hi @muddi900 how are you going with this issue?
Happy to take over if you don't have the time :)
muddi900 commentedon Mar 1, 2023
You can take over if the maintainers allow.
leftful commentedon Mar 2, 2023
take
shteken commentedon Apr 22, 2023
Hi @RhysJohnLewis how are you going with this issue?
Happy to take over if you don't have the time :)
leftful commentedon Apr 24, 2023
@shteken please do. I have not had the time.
12 remaining items
hamedgibago commentedon Jul 1, 2023
Certainly, no problem. I commented the code above and added new line as you can see in the first line, despite results of current tests were ok, but others failed. I should spend some time to debug.
@datapythonista please do not unassign me. Let us both work on issue. Thank you.
rsm-23 commentedon Jul 1, 2023
Thanks @hamedgibago , I'll try independently when I get some time :)
rsm-23 commentedon Jul 1, 2023
@datapythonista the two implementations are definitely different. One approach names columns as [col, col.1, col.1.1] while the other one names it as [col, col.1, col.2] . Need your input. Should we make changes in all the tests or do we change the implementation of dedup_names ?
hamedgibago commentedon Jul 2, 2023
As far as I know, we are not make any changes to existing tests unless we find a bug and inform it to maintainer. After changing the code, we can add new tests and also make sure all other tests will pass.
Good luck.
rsm-23 commentedon Jul 2, 2023
@hamedgibago I think it would really depend. Some tests are already present that consider the output from the custom method and not
dedup_names
and like I mentioned above the way this de-duplication is handled is different in the two approaches so we need to either adjust the implementation ofdedup_names
or adjust the unit tests. Even if we adjust the result ofdedup_names
there should be existing unit tests that validate output from this method, so changing it's behavior would mean modifying those tests as well. There could be one more approach where we probably introduce a param to decide what kind of algorithm to follow inside thededup_names
method but personally, I am not a fan of this.hamedgibago commentedon Jul 2, 2023
@datapythonista What is your idea?
yoav-edelist commentedon Dec 14, 2024
@hamedgibago @datapythonista Is this still in the works? Is this free?
hamedgibago commentedon Dec 21, 2024
Its long time I do not working on it. I have to check it.
leftful commentedon Dec 21, 2024
LifeAsPixels commentedon May 14, 2025
@hamedgibago @datapythonista Is this issue available to be taken? This would be my first issue to work on. It looks like the function was made however it needs to be called as a replacement for some other code attempting to do the same thing elsewhere as noted with 'TODO' in #50370
Veneel77 commentedon Jun 16, 2025
take
CLN: Use dedup_names for column name mangling in Python parser (panda…