Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix multimethod bug in pyspark #1260

Merged
merged 3 commits into from
Aug 11, 2023
Merged

Conversation

cosmicBboy
Copy link
Collaborator

No description provided.

@cosmicBboy
Copy link
Collaborator Author

hey @mfkapton hope you don't mind, made a PR from your fork, I'll test your changes out further

@codecov
Copy link

codecov bot commented Jul 14, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.08 🎉

Comparison is base (62bc484) 93.74% compared to head (78efc9c) 93.82%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1260      +/-   ##
==========================================
+ Coverage   93.74%   93.82%   +0.08%     
==========================================
  Files          90       90              
  Lines        6711     6704       -7     
==========================================
- Hits         6291     6290       -1     
+ Misses        420      414       -6     
Impacted Files Coverage Δ
pandera/api/pyspark/types.py 100.00% <100.00%> (ø)
pandera/backends/pyspark/checks.py 100.00% <100.00%> (ø)

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mfkaptan
Copy link

@cosmicBboy sorry for the late reply. Here's a way to replicate the bug:

import pyspark.sql.types as T
from pyspark.sql import DataFrame, SparkSession

import pandera.extensions as extensions
from pandera.pyspark import DataFrameModel


@extensions.register_check_method(
    supported_types=DataFrame,
)
def fake_check(df: DataFrame):
    return True

class ExampleDFModel(DataFrameModel):
    name: T.StringType()
    age: T.LongType()
    
    class Config:
        fake_check = ()

example_data_cols = ("name", "age")
example_data = [("foo", 42), ("bar", 24)]

# Get the pyspark session
session = SparkSession.builder.config().getOrCreate()
df = session.createDataFrame(example_data, example_data_cols)

out = ExampleDFModel(df, lazy=False)

raises an Exception:

SchemaError: Error while executing check function: DispatchError("No matching functions found")
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/pandera/backends/pyspark/container.py", line 197, in run_checks
    self.run_check(check_obj, schema, check, check_index)
  File "/home/airflow/.local/lib/python3.9/site-packages/pandera/backends/pyspark/base.py", line 85, in run_check
    check_result = check(check_obj, *args)
  File "/home/airflow/.local/lib/python3.9/site-packages/pandera/api/checks.py", line 229, in __call__
    return backend(check_obj, column)
  File "/home/airflow/.local/lib/python3.9/site-packages/pandera/backends/pyspark/checks.py", line 107, in __call__
    check_obj = self.preprocess(check_obj, key)
  File "/home/airflow/.local/lib/python3.9/site-packages/multimethod/__init__.py", line 408, in __call__
    raise DispatchError("No matching functions found")
multimethod.DispatchError: No matching functions found

Here's the output of pip freeze (I'm on python3.9):

req.txt

@mfkaptan mfkaptan force-pushed the pyspark branch 2 times, most recently from 80542c5 to ada409e Compare July 17, 2023 10:33
@mfkaptan
Copy link

@cosmicBboy I've added a test that could help you understand the issue. I've changed the order of the commits so now the test is the first commit. You can check out the commit with the test that fails without the fix:

git co 8f854961f47ee4cc8e03625d6fa179cabff18805
pytest tests/pyspark/test_pyspark_model.py::test_registered_dataframemodel_checks

@cosmicBboy
Copy link
Collaborator Author

hi @mfkaptan can you follow the instructions here to run the linters for this project?

Signed-off-by: mfkaptan <mustafa.kaptan@motius.de>
Making `key` argument optional did not work because of an existing
bug in multimethod: coady/multimethod#90
Added a second overloaded `preprocess` method that doesn't have `key`
arg so multimethod lib can dispatch correctly.
Same with apply function.

Signed-off-by: mfkaptan <mustafa.kaptan@motius.de>
Signed-off-by: mfkaptan <mustafa.kaptan@motius.de>
@cosmicBboy
Copy link
Collaborator Author

looks like unit tests are failing

@mfkaptan
Copy link

looks like unit tests are failing

could you maybe rerun the tests? I feel like this looks irrelevant to my changes 🤔

Screenshot 2023-07-24 at 16 03 44

@mfkaptan
Copy link

@cosmicBboy the tests are passing again ✅

@cosmicBboy cosmicBboy merged commit ef4b5aa into unionai-oss:main Aug 11, 2023
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants