Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40353][PS][CONNECT] Fix index nullable mismatch in ps.read_excel #50323

Closed
wants to merge 2 commits into from

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Fix nullable mismatch in ps.read_excel

Why are the changes needed?

to re-enable the tests

Does this PR introduce any user-facing change?

no

How was this patch tested?

updated ut

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng zhengruifeng changed the title [SPARK-40353][PYTHON] Fix nullable mismatch in ps.read_excel [SPARK-40353][PS] Fix nullable mismatch in ps.read_excel Mar 19, 2025
@@ -266,33 +264,34 @@ def test_read_excel(self):
pd.read_excel(open(path1, "rb"), index_col=0),
)
self.assert_eq(
ps.read_excel(open(path1, "rb"), index_col=0, squeeze=True),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squeeze is dropped at pandas 2.0, so we need to remove it

@@ -331,30 +321,17 @@ def test_read_excel(self):
self.assert_eq(psdfs["Sheet_name_1"], pdfs1["Sheet_name_1"])
self.assert_eq(psdfs["Sheet_name_2"], pdfs1["Sheet_name_2"])

psdfs = ps.read_excel(tmp, sheet_name=sheet_name, index_col=0, squeeze=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such tests make no sense since squeeze=True is not allowed any more

@@ -1186,9 +1188,29 @@ def read_excel_on_spark(
pdf = pdf_or_pser

psdf = cast(DataFrame, from_pandas(pdf))
return_schema = force_decimal_precision_scale(
as_nullable_spark_type(psdf._internal.spark_frame.drop(*HIDDEN_COLUMNS).schema)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_nullable_spark_type convert all fields (both index and data) to nullable=true, while in InternalFrame it assert the index field should have nullable=false

@zhengruifeng zhengruifeng changed the title [SPARK-40353][PS] Fix nullable mismatch in ps.read_excel [SPARK-40353][PS][CONNECT] Fix index nullable mismatch in ps.read_excel Mar 19, 2025
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the ps_read_excel branch March 24, 2025 02:40
SauronShepherd pushed a commit to SauronShepherd/spark that referenced this pull request Mar 25, 2025
…cel`

### What changes were proposed in this pull request?
Fix nullable mismatch in `ps.read_excel`

### Why are the changes needed?
to re-enable the tests

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
updated ut

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50323 from zhengruifeng/ps_read_excel.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
kazemaksOG pushed a commit to kazemaksOG/spark-custom-scheduler that referenced this pull request Mar 27, 2025
…cel`

### What changes were proposed in this pull request?
Fix nullable mismatch in `ps.read_excel`

### Why are the changes needed?
to re-enable the tests

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
updated ut

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50323 from zhengruifeng/ps_read_excel.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants