Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Fix 2-argument math functions #139

Closed
wants to merge 3 commits into from
Closed

Conversation

harpaj
Copy link
Contributor

@harpaj harpaj commented Jun 1, 2019

Fixes the binary math functions:

  • atan2 and hypot take two arguments, not one
  • pow supports taking a literal numeric value as its second argument in addition to a Column.

third_party/3/pyspark/sql/functions.pyi Show resolved Hide resolved
third_party/3/pyspark/sql/functions.pyi Show resolved Hide resolved
third_party/3/pyspark/sql/functions.pyi Outdated Show resolved Hide resolved
@zero323
Copy link
Owner

zero323 commented Jun 30, 2019

Probably the best approach is to define the second argument as Union[Column, typing.SupportsFloat]

@harpaj
Copy link
Contributor Author

harpaj commented Jul 9, 2019

Done! As seen in the code you linked above, this applies to both the first and the second argument though.

@zero323
Copy link
Owner

zero323 commented Jul 9, 2019

Done!

Thanks!

As seen in the code you linked above, this applies to both the first and the second argument though.

That's actually only partially true. While Python backend will accept (SupportsFloat, SupportsFloat) just fine, it will fail once call hits Py4j as JVM backend doesn't provide (Double, Double) => Column variant.

>>> from pyspark.sql.functions import atan2                                                                                                                                                                        
>>> atan2(3, 2)
Traceback (most recent call last):
...
Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.atan2. Trace:
py4j.Py4JException: Method atan2([class java.lang.Double, class java.lang.Double]) does not exist
...

I guess the question is what we really want to support here... I guess these would be great:

  • (Union[Column, str], Union[Column, str]) -> Column
  • (SupportsFloat, Union[Column, str]) -> Column
  • (Union[Column, str], SupportsFloat) -> Column

but I suspect (?) we might hit some unsafe overlaps at some point, as str <: SupportsFloat

@zero323
Copy link
Owner

zero323 commented Aug 20, 2019

Hi @harpaj. Do you plan to continue working on this?

@harpaj
Copy link
Contributor Author

harpaj commented Aug 20, 2019

Hi, sorry about that, I was busy at the time you wrote and then forgot about it.
I made the changes you requested. Somehow the git history looks very weird locally for me - are you fine with just squashing on merge or should I try to clean it up?

@zero323
Copy link
Owner

zero323 commented Aug 21, 2019

Hi, sorry about that, I was busy at the time you wrote and then forgot about it.

No worries. I just wanted to know if I should take over the issue.

I made the changes you requested. Somehow the git history looks very weird locally for me - are you fine with just squashing on merge or should I try to clean it up?

Sure, that's not a big deal however, it looks like MyPy is unhappy with such formulation after all.

Traceback (most recent call last):
  File "/path/to/bin/mypy", line 10, in <module>
    sys.exit(console_entry())
....
RecursionError: maximum recursion depth exceeded while calling a Python object
third_party/3/pyspark/sql/functions.pyi:162: : note: use --pdb to drop into pdb

Interestingly enough this doesn't occur in isolation, i.e. such module

from typing import SupportsFloat, overload, Union                             
                                                                             
class Column: ...
                       
ColumnOrStr = Union[str, Column]
                                                                                
@overload                                                                       
def f(col1: ColumnOrStr, col2: ColumnOrStr) -> Column: ...
@overload
def f(col1: ColumnOrStr, col2: SupportsFloat) -> Column: ...
@overload
def f(col1: SupportsFloat, col2: ColumnOrStr) -> Column: ...

type checks just fine.

So it seems we're hitting some MyPy bug here, but I cannot say if it should fail, in a controlled way, in both cases, or rather pass both. I'll try to investigate this further, when I have some time to spare.

Though I guess it would be nice to wrap it up for now, especially because we're dealing with incorrect annotation.

I guess we can drop the protocol for now, i.e.

@overload
def hypot(col1: ColumnOrName, col2: ColumnOrName) -> Column: ...
@overload
def hypot(col1: float, col2: ColumnOrName) -> Column: ...
@overload
def hypot(col1: ColumnOrName, col2: float) -> Column: ...

It is a bit more restrictive than it suppose to, but it is almost as expressive as the other one. What do you think?

@harpaj
Copy link
Contributor Author

harpaj commented Aug 21, 2019

What would you think of instead keeping the slightly more permission version of eeb854b?
I would tend to say that its better to be too permissive than too restrictive for type annotations.

@zero323
Copy link
Owner

zero323 commented Aug 21, 2019

To be honest I am not convinced. When in doubt I have strong preference towards false positive (type checker error, when the code is valid) over false negative (type checker pass, when code is invalid).

The rationale is simple here:

  • If the code doesn't type check, user can always double check, and if necessary mark problematic code with type: ignore. This relatively cheap operation (nothing is executed yet), and can be applied even if type check is hard requirement (let's say in CI pipeline).
  • If the code fails on runtime (false negative) then you might have already pay a lot of money, just to detect a simple mistake.

On a side I personally consider annotations as tool to hint what is intended (contract, over details of implementation) and what is good practice ("one-- and preferably only one --obvious way to do it").

Furthermore, due to Python's dynamism, it is often impossible to provide exhaustive annotations, so some false positives are simply to be expected.

zero323 added a commit that referenced this pull request Aug 24, 2019
Merge changes from #139 and patch to avoid mypy RecursionError
@zero323
Copy link
Owner

zero323 commented Aug 24, 2019

I am closing this as superseded by #184 (combined history of this PR is included as fcf3a25 and ported to branch-2.3 and branch-2.4). Thanks for you contribution @harpaj!

@zero323 zero323 closed this Aug 24, 2019
@harpaj
Copy link
Contributor Author

harpaj commented Aug 26, 2019

Sure, thanks a lot for all the time you put into this project!

@harpaj harpaj deleted the patch-1 branch August 26, 2019 07:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants