We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Currently, the columns variable is a list of hashable elements returned by _filter_usecols. In the dictionary comprehension at pandas/pandas/io/parsers/c_parser_wrapper.py#L262:
columns
_filter_usecols
pandas/pandas/io/parsers/c_parser_wrapper.py#L262
col_dict = {k: v for k, v in col_dict.items() if k in columns}
Convert columns to a set before performing the membership check, reducing lookup time to O(1):
columns_set = set(columns) # Convert once col_dict = {k: v for k, v in col_dict.items() if k in columns_set}
This avoids repeated list traversal and improves performance when filtering columns.
Expected Benefits
commit : 0691c5c python : 3.10.8 python-bits : 64 OS : Linux OS-release : 6.5.0-1025-azure Version : #26~22.04.1-Ubuntu SMP Thu Jul 11 22:33:04 UTC 2024 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 2.2.3 numpy : 1.26.4 pytz : 2025.1 dateutil : 2.9.0.post0 pip : 25.0.1 Cython : 3.0.12 sphinx : 8.1.3 IPython : 8.33.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.3 blosc : None bottleneck : 1.4.2 dataframe-api-compat : None fastparquet : 2024.11.0 fsspec : 2025.2.0 html5lib : 1.1 hypothesis : 6.127.5 gcsfs : 2025.2.0 jinja2 : 3.1.5 lxml.etree : 5.3.1 matplotlib : 3.10.1 numba : 0.61.0 numexpr : 2.10.2 odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : 2.9.10 pymysql : 1.4.6 pyarrow : 19.0.1 pyreadstat : 1.2.8 pytest : 8.3.5 python-calamine : None pyxlsb : 1.0.10 s3fs : 2025.2.0 scipy : 1.15.2 sqlalchemy : 2.0.38 tables : 3.10.1 tabulate : 0.9.0 xarray : 2024.9.0 xlrd : 2.0.1 xlsxwriter : 3.2.2 zstandard : 0.23.0 tzdata : 2025.1 qtpy : None pyqt5 : None
No response
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
Description
Currently, the
columns
variable is a list of hashable elements returned by_filter_usecols
. In the dictionary comprehension atpandas/pandas/io/parsers/c_parser_wrapper.py#L262
:Proposed Improvement
Convert columns to a set before performing the membership check, reducing lookup time to O(1):
This avoids repeated list traversal and improves performance when filtering columns.
Expected Benefits
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.10.8
python-bits : 64
OS : Linux
OS-release : 6.5.0-1025-azure
Version : #26~22.04.1-Ubuntu SMP Thu Jul 11 22:33:04 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 1.26.4
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : 3.0.12
sphinx : 8.1.3
IPython : 8.33.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : 1.4.2
dataframe-api-compat : None
fastparquet : 2024.11.0
fsspec : 2025.2.0
html5lib : 1.1
hypothesis : 6.127.5
gcsfs : 2025.2.0
jinja2 : 3.1.5
lxml.etree : 5.3.1
matplotlib : 3.10.1
numba : 0.61.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 19.0.1
pyreadstat : 1.2.8
pytest : 8.3.5
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2025.2.0
scipy : 1.15.2
sqlalchemy : 2.0.38
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.2
zstandard : 0.23.0
tzdata : 2025.1
qtpy : None
pyqt5 : None
Prior Performance
No response
The text was updated successfully, but these errors were encountered: