Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve membership check performance in column filtering #61046

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

allrob23
Copy link

@allrob23 allrob23 commented Mar 4, 2025

@allrob23
Copy link
Author

allrob23 commented Mar 4, 2025

pre-commit.ci autofix

@mroeschke mroeschke added Performance Memory or execution speed performance IO CSV read_csv, to_csv labels Mar 4, 2025
@mroeschke
Copy link
Member

Do you have an example benchmark where this PR improves the performance of read_csv?

@allrob23
Copy link
Author

allrob23 commented Mar 4, 2025

This optimization was flagged by a tool I’m developing, which performs code inspection to identify potential performance improvements. However, it doesn’t measure execution times, so I haven’t benchmarked the actual impact yet.

From a theoretical perspective, the change makes sense since the previous implementation performed lookups in a list (O(n)) while the new approach uses a set (O(1)).

Would you be able to help me create a proper benchmark for this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Optimize membership check in column filtering for better performance
2 participants