fix for 61123 read_excel nrows param reads extra rows #61129

zanuka · 2025-03-15T05:32:09Z

Issue: GH-61123
When reading Excel files with pd.read_excel and specifying nrows=4, the behavior differs depending on whether there’s a blank row between tables. For a file with two tables (each with a header and 3 data rows), nrows=4 should yield a DataFrame with one header and 3 data rows (shape (3, n)). However:

In test1.xlsx (with a blank row), it correctly reads the first table (header + 3 rows).
In test2.xlsx (no blank row), it incorrectly includes the second table’s header as a data row, resulting in a shape of (4, n).

This inconsistency occurs because read_excel doesn’t properly respect table boundaries when tables are adjacent, despite the nrows limit.

Fix:

Modified pandas/io/excel/_base.py and related reader modules (_openpyxl.py, _pyxlsb.py, _xlrd.py) to ensure nrows limits reading to the specified number of rows, excluding subsequent table headers even when tables are adjacent.
Added a new test test_excel_read_tables_with_and_without_blank_row in pandas/tests/io/excel/test_readers.py to verify that nrows=4 consistently returns a DataFrame with shape (3, 2) (header + 3 data rows) for both cases.

Changes:

Updated Excel reader logic to stop at nrows without parsing beyond table boundaries.
Ensured consistent behavior across openpyxl, pyxlsb, and xlrd engines.
Squashed commits into a single commit for clarity.

Verification:

Tested with test1.xlsx (blank row) and test2.xlsx (no blank row).
Confirmed both now yield a DataFrame with shape (3, 2) and only the first table’s data.

Steps to Test:

Run pytest pandas/tests/io/excel/test_readers.py::TestReaders::test_excel_read_tables_with_and_without_blank_row.
Verify df1.shape == (3, 2) and df2.shape == (3, 2) match the expected output.

Related Files:

pandas/io/excel/_base.py
pandas/io/excel/_openpyxl.py
pandas/io/excel/_pyxlsb.py
pandas/io/excel/_xlrd.py
pandas/tests/io/excel/test_readers.py

Closes #61123

mroeschke

Did you use an LLM to largely solve this issue (I see a commit from Jolt AI)? At this time the project does not want contributions that are largely AI generated

zanuka · 2025-03-17T20:13:02Z

yeah, was running a test on behalf of Jolt AI to see how their system would handle these types of issues. am fine to close the PR.

fix for 61123 read_excel nrows param reads extra rows

94fcb02

zanuka requested a review from rhshadrach as a code owner March 15, 2025 05:32

test fixups

476a24d

zanuka force-pushed the fix/61123-read_excel-nrows-param-reads-extra-rows branch from 3d54264 to 476a24d Compare March 16, 2025 07:59

zanuka added 2 commits March 16, 2025 01:51

test updates

68cabec

test updates

1aacb98

mroeschke requested changes Mar 17, 2025

View reviewed changes

zanuka closed this Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for 61123 read_excel nrows param reads extra rows #61129

fix for 61123 read_excel nrows param reads extra rows #61129

zanuka commented Mar 15, 2025

mroeschke left a comment

zanuka commented Mar 17, 2025

fix for 61123 read_excel nrows param reads extra rows #61129

fix for 61123 read_excel nrows param reads extra rows #61129

Conversation

zanuka commented Mar 15, 2025

mroeschke left a comment

Choose a reason for hiding this comment

zanuka commented Mar 17, 2025