Skip to content

CLN: Use dedup_names in all instances where duplicate column names are renamed #50371

Open
@datapythonista

Description

@datapythonista
Member

In #50370 the function dedup_names has been moved to pandas.io.common so it can be reused by any reader dealing with duplicate column names. The function can be expanded in the future to allow custom renaming patterns, so it should be used by any reader, to make sure we keep consistency with the behavior (as well as avoid duplicate code). There is at least one instance identified in #50370 where a different implementation is used to rename the duplicate columns. We should call dedup_names instead, and in case other alternative implementations exist, find them and also call dedup_names.

Activity

muddi900

muddi900 commented on Dec 24, 2022

@muddi900

take

deleted a comment from jayam30 on Feb 16, 2023
leftful

leftful commented on Mar 1, 2023

@leftful

Hi @muddi900 how are you going with this issue?
Happy to take over if you don't have the time :)

muddi900

muddi900 commented on Mar 1, 2023

@muddi900

You can take over if the maintainers allow.

removed their assignment
on Mar 1, 2023
leftful

leftful commented on Mar 2, 2023

@leftful

take

shteken

shteken commented on Apr 22, 2023

@shteken
Contributor

Hi @RhysJohnLewis how are you going with this issue?
Happy to take over if you don't have the time :)

leftful

leftful commented on Apr 24, 2023

@leftful

@shteken please do. I have not had the time.

removed their assignment
on Apr 24, 2023

12 remaining items

hamedgibago

hamedgibago commented on Jul 1, 2023

@hamedgibago

Certainly, no problem. I commented the code above and added new line as you can see in the first line, despite results of current tests were ok, but others failed. I should spend some time to debug.
@datapythonista please do not unassign me. Let us both work on issue. Thank you.

rsm-23

rsm-23 commented on Jul 1, 2023

@rsm-23
Contributor

Thanks @hamedgibago , I'll try independently when I get some time :)

rsm-23

rsm-23 commented on Jul 1, 2023

@rsm-23
Contributor

@datapythonista the two implementations are definitely different. One approach names columns as [col, col.1, col.1.1] while the other one names it as [col, col.1, col.2] . Need your input. Should we make changes in all the tests or do we change the implementation of dedup_names ?

hamedgibago

hamedgibago commented on Jul 2, 2023

@hamedgibago

As far as I know, we are not make any changes to existing tests unless we find a bug and inform it to maintainer. After changing the code, we can add new tests and also make sure all other tests will pass.
Good luck.

rsm-23

rsm-23 commented on Jul 2, 2023

@rsm-23
Contributor

@hamedgibago I think it would really depend. Some tests are already present that consider the output from the custom method and not dedup_names and like I mentioned above the way this de-duplication is handled is different in the two approaches so we need to either adjust the implementation of dedup_names or adjust the unit tests. Even if we adjust the result of dedup_names there should be existing unit tests that validate output from this method, so changing it's behavior would mean modifying those tests as well. There could be one more approach where we probably introduce a param to decide what kind of algorithm to follow inside the dedup_names method but personally, I am not a fan of this.

hamedgibago

hamedgibago commented on Jul 2, 2023

@hamedgibago

@hamedgibago I think it would really depend. Some tests are already present that consider the output from the custom method and not dedup_names and like I mentioned above the way this de-duplication is handled is different in the two approaches so we need to either adjust the implementation of dedup_names or adjust the unit tests. Even if we adjust the result of dedup_names there should be existing unit tests that validate output from this method, so changing it's behavior would mean modifying those tests as well. There could be one more approach where we probably introduce a param to decide what kind of algorithm to follow inside the dedup_names method but personally, I am not a fan of this.

@datapythonista What is your idea?

yoav-edelist

yoav-edelist commented on Dec 14, 2024

@yoav-edelist

@hamedgibago @datapythonista Is this still in the works? Is this free?

hamedgibago

hamedgibago commented on Dec 21, 2024

@hamedgibago

@hamedgibago @datapythonista Is this still in the works? Is this free?

Its long time I do not working on it. I have to check it.

leftful

leftful commented on Dec 21, 2024

@leftful
LifeAsPixels

LifeAsPixels commented on May 14, 2025

@LifeAsPixels

@hamedgibago @datapythonista Is this issue available to be taken? This would be my first issue to work on. It looks like the function was made however it needs to be called as a replacement for some other code attempting to do the same thing elsewhere as noted with 'TODO' in #50370

Veneel77

Veneel77 commented on Jun 16, 2025

@Veneel77

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @hamedgibago@datapythonista@LifeAsPixels@shteken@muddi900

    Issue actions

      CLN: Use dedup_names in all instances where duplicate column names are renamed · Issue #50371 · pandas-dev/pandas