Fix 2-argument math functions #139

harpaj · 2019-06-01T08:36:25Z

Fixes the binary math functions:

atan2 and hypot take two arguments, not one
pow supports taking a literal numeric value as its second argument in addition to a Column.

third_party/3/pyspark/sql/functions.pyi

zero323 · 2019-06-30T14:58:12Z

Probably the best approach is to define the second argument as Union[Column, typing.SupportsFloat]

harpaj · 2019-07-09T08:51:43Z

Done! As seen in the code you linked above, this applies to both the first and the second argument though.

zero323 · 2019-07-09T13:54:40Z

Done!

Thanks!

As seen in the code you linked above, this applies to both the first and the second argument though.

That's actually only partially true. While Python backend will accept (SupportsFloat, SupportsFloat) just fine, it will fail once call hits Py4j as JVM backend doesn't provide (Double, Double) => Column variant.

>>> from pyspark.sql.functions import atan2                                                                                                                                                                        
>>> atan2(3, 2)
Traceback (most recent call last):
...
Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.atan2. Trace:
py4j.Py4JException: Method atan2([class java.lang.Double, class java.lang.Double]) does not exist
...

I guess the question is what we really want to support here... I guess these would be great:

(Union[Column, str], Union[Column, str]) -> Column
(SupportsFloat, Union[Column, str]) -> Column
(Union[Column, str], SupportsFloat) -> Column

but I suspect (?) we might hit some unsafe overlaps at some point, as str <: SupportsFloat

zero323 · 2019-08-20T13:25:34Z

Hi @harpaj. Do you plan to continue working on this?

harpaj · 2019-08-20T13:51:29Z

Hi, sorry about that, I was busy at the time you wrote and then forgot about it.
I made the changes you requested. Somehow the git history looks very weird locally for me - are you fine with just squashing on merge or should I try to clean it up?

zero323 · 2019-08-21T08:46:35Z

Hi, sorry about that, I was busy at the time you wrote and then forgot about it.

No worries. I just wanted to know if I should take over the issue.

I made the changes you requested. Somehow the git history looks very weird locally for me - are you fine with just squashing on merge or should I try to clean it up?

Sure, that's not a big deal however, it looks like MyPy is unhappy with such formulation after all.

Traceback (most recent call last):
  File "/path/to/bin/mypy", line 10, in <module>
    sys.exit(console_entry())
....
RecursionError: maximum recursion depth exceeded while calling a Python object
third_party/3/pyspark/sql/functions.pyi:162: : note: use --pdb to drop into pdb

Interestingly enough this doesn't occur in isolation, i.e. such module

from typing import SupportsFloat, overload, Union                             
                                                                             
class Column: ...
                       
ColumnOrStr = Union[str, Column]
                                                                                
@overload                                                                       
def f(col1: ColumnOrStr, col2: ColumnOrStr) -> Column: ...
@overload
def f(col1: ColumnOrStr, col2: SupportsFloat) -> Column: ...
@overload
def f(col1: SupportsFloat, col2: ColumnOrStr) -> Column: ...

type checks just fine.

So it seems we're hitting some MyPy bug here, but I cannot say if it should fail, in a controlled way, in both cases, or rather pass both. I'll try to investigate this further, when I have some time to spare.

Though I guess it would be nice to wrap it up for now, especially because we're dealing with incorrect annotation.

I guess we can drop the protocol for now, i.e.

@overload
def hypot(col1: ColumnOrName, col2: ColumnOrName) -> Column: ...
@overload
def hypot(col1: float, col2: ColumnOrName) -> Column: ...
@overload
def hypot(col1: ColumnOrName, col2: float) -> Column: ...

It is a bit more restrictive than it suppose to, but it is almost as expressive as the other one. What do you think?

harpaj · 2019-08-21T11:09:52Z

What would you think of instead keeping the slightly more permission version of eeb854b?
I would tend to say that its better to be too permissive than too restrictive for type annotations.

zero323 · 2019-08-21T13:11:42Z

To be honest I am not convinced. When in doubt I have strong preference towards false positive (type checker error, when the code is valid) over false negative (type checker pass, when code is invalid).

The rationale is simple here:

If the code doesn't type check, user can always double check, and if necessary mark problematic code with type: ignore. This relatively cheap operation (nothing is executed yet), and can be applied even if type check is hard requirement (let's say in CI pipeline).
If the code fails on runtime (false negative) then you might have already pay a lot of money, just to detect a simple mistake.

On a side I personally consider annotations as tool to hint what is intended (contract, over details of implementation) and what is good practice ("one-- and preferably only one --obvious way to do it").

Furthermore, due to Python's dynamism, it is often impossible to provide exhaustive annotations, so some false positives are simply to be expected.

Merge changes from #139 and patch to avoid mypy RecursionError

zero323 · 2019-08-24T19:02:42Z

I am closing this as superseded by #184 (combined history of this PR is included as fcf3a25 and ported to branch-2.3 and branch-2.4). Thanks for you contribution @harpaj!

harpaj · 2019-08-26T07:53:14Z

Sure, thanks a lot for all the time you put into this project!

Fix 2-argument math functions

eee964a

zero323 suggested changes Jun 30, 2019

View reviewed changes

third_party/3/pyspark/sql/functions.pyi Show resolved Hide resolved

third_party/3/pyspark/sql/functions.pyi Show resolved Hide resolved

third_party/3/pyspark/sql/functions.pyi Outdated Show resolved Hide resolved

Fix two-argument math functions

eeb854b

zero323 added 2.3 2.4 3.0 bug labels Aug 4, 2019

Merge branch 'master' into patch-1

d9dd735

harpaj force-pushed the patch-1 branch from 7dbf215 to d9dd735 Compare August 20, 2019 13:48

zero323 mentioned this pull request Aug 24, 2019

Merge changes from #139 and patch to avoid mypy RecursionError #184

Merged

zero323 added a commit that referenced this pull request Aug 24, 2019

Merge pull request #184 from zero323/harpaj-binary-functions-patch

feb0c61

Merge changes from #139 and patch to avoid mypy RecursionError

zero323 closed this Aug 24, 2019

harpaj deleted the patch-1 branch August 26, 2019 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 2-argument math functions #139

Fix 2-argument math functions #139

harpaj commented Jun 1, 2019

zero323 commented Jun 30, 2019

harpaj commented Jul 9, 2019

zero323 commented Jul 9, 2019 •

edited

zero323 commented Aug 20, 2019

harpaj commented Aug 20, 2019

zero323 commented Aug 21, 2019

harpaj commented Aug 21, 2019

zero323 commented Aug 21, 2019

zero323 commented Aug 24, 2019 •

edited

harpaj commented Aug 26, 2019

Fix 2-argument math functions #139

Fix 2-argument math functions #139

Conversation

harpaj commented Jun 1, 2019

zero323 commented Jun 30, 2019

harpaj commented Jul 9, 2019

zero323 commented Jul 9, 2019 • edited

zero323 commented Aug 20, 2019

harpaj commented Aug 20, 2019

zero323 commented Aug 21, 2019

harpaj commented Aug 21, 2019

zero323 commented Aug 21, 2019

zero323 commented Aug 24, 2019 • edited

harpaj commented Aug 26, 2019

zero323 commented Jul 9, 2019 •

edited

zero323 commented Aug 24, 2019 •

edited