Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Jedi doesn't work with MLReaders #352

Closed
zero323 opened this issue Jan 23, 2020 · 7 comments
Closed

Jedi doesn't work with MLReaders #352

zero323 opened this issue Jan 23, 2020 · 7 comments

Comments

@zero323
Copy link
Owner

zero323 commented Jan 23, 2020

It seems like there is some problem with Jedi compatibility. Some components seem to work pretty well. For example DataFrame without stubs:

In [1]: import jedi                                                                                                                                                                                                

In [2]: from pyspark.sql import SparkSession                                                                                                                                                                       

In [3]: jedi.Interpreter("SparkSession.builder.getOrCreate().createDataFrame([]).", [globals()]).completions()                                                                                                     
---------------------------------------------------------------------------
AttributeError   
...
AttributeError: 'ModuleContext' object has no attribute 'py__path__'

and with stubs:

In [1]: from pyspark.sql import SparkSession                                                                                                                                                                       

In [2]: import jedi                                                                                                                                                                                                

In [3]: jedi.Interpreter("SparkSession.builder.getOrCreate().createDataFrame([]).", [globals()]).completions()                                                                                                     
Out[3]: 
[<Completion: agg>,
 <Completion: alias>,
 <Completion: approxQuantile>,
 <Completion: cache>,
 <Completion: checkpoint>,
 <Completion: coalesce>,
 <Completion: collect>,
 <Completion: colRegex>,
 <Completion: columns>,
 <Completion: corr>,
 <Completion: count>,
 <Completion: cov>,
...
 <Completion: __str__>]

So far so good. However, if take for example LinearRegressionModel.load things don't work so well. Without stubs provides no suggestions

In [1]: import jedi                                                                                                                                                                                                

In [2]: from pyspark.ml.regression import LinearRegressionModel                                                                                                                                                    

In [3]: jedi.Interpreter("LinearRegressionModel.load('foo').", [globals()]).completions()                                                                                                                          
Out[3]: []

but one provided with stubs

In [1]: import jedi                                                                                                                                                                                                

In [2]: from pyspark.ml.regression import LinearRegressionModel                                                                                                                                                    

In [3]: jedi.Interpreter("LinearRegressionModel.load('foo').", [globals()]).completions()                                                                                                                          
Out[3]: 
[<Completion: load>,
 <Completion: read>,
 <Completion: __annotations__>,
 <Completion: __class__>,
 <Completion: __delattr__>,
 <Completion: __dict__>,
 <Completion: __dir__>,
 <Completion: __doc__>,
 <Completion: __eq__>,
 <Completion: __format__>,
 <Completion: __getattribute__>,
 <Completion: __hash__>,
 <Completion: __init__>,
 <Completion: __init_subclass__>,
 <Completion: __module__>,
 <Completion: __ne__>,
 <Completion: __new__>,
 <Completion: __reduce__>,
 <Completion: __reduce_ex__>,
 <Completion: __repr__>,
 <Completion: __setattr__>,
 <Completion: __sizeof__>,
 <Completion: __slots__>,

don't make much sense. If model is fitted:

In [4]: from pyspark.ml.regression import LinearRegression                                                                                                                                                         

In [5]: jedi.Interpreter("LinearRegression().fit(...).", [globals()]).completions()                                                                                                                                
Out[5]: 
[<Completion: aggregationDepth>,
 <Completion: append>,
 <Completion: clear>,
 <Completion: coefficients>,
 <Completion: copy>,
 <Completion: count>,
....
 <Completion: __str__>]

Model which is explicitly annotated works fine, so it seems like there is something in MLReader or one of the sub-classes that causes a failure.

We already have data tests for this (as well as some test cases from apache/spark examples, and mypy seems to be fine with this.

Since LinearRegression.fit works fine (and some toy tests confirm that), Generics are not sufficient to reproduce the problem. So it seems like type parameter is not processed correctly on the path:

Tested with:

  • jedi==0.15.2 and jedi==0.16.0 (0c56aa4).
  • pyspark-stubs==3.0.0.dev5
  • pyspark==3.0.0.dev0 (afe70b3)
@zero323
Copy link
Owner Author

zero323 commented Jan 23, 2020

CC @davidhalter I wonder, if you could take a look at this and let me know if you see any obvious culprit. I cannot say if that's annotation issue, which is missed by mypy, or a problem with Jedi.

@davidhalter
Copy link

@zero323 If the first example (AttributeError) really happens with latest master, please report it, that's definitely a bug.

I'm not really sure I understand the other stuff. Please explain precisely what completions you would want at a certain point and what is wrong. I have never worked with the pyspark API.

@zero323
Copy link
Owner Author

zero323 commented Jan 23, 2020

If the first example (AttributeError) really happens with latest master, please report it, that's definitely a bug.

Here you are davidhalter/jedi#1479. Should be fully reproducible with master HEAD.

@zero323
Copy link
Owner Author

zero323 commented Jan 23, 2020

I'm not really sure I understand the other stuff. Please explain precisely what completions you would want at a certain point and what is wrong. I have never worked with the pyspark API.

Thank you. In general structure goes through multiple levels of inheritance where Reader object:

class Reader(Generic[T]):
    @classmethod 
    def load(cls, str) -> T: ...

is mixed in into different objects

class Model(Reader[Model]): 
    def transform(self, data): ...

So I'd expect that Model.load("path") will be able to complete transform. However it seems like it recognizes Model.load("path") as Reader. Unfortunately I was unable to isolate this behavior into minimal example so far. I'll work on that...

Thanks again for your time. Much obliged.

@davidhalter
Copy link

davidhalter commented Jan 23, 2020

Hmmm, if I had to guess this might have to do with the classmethod. It doesn't have to be that, but the classmethod might cause trouble, because I'm pretty sure that's not tested.

If you find a simple reproduction case (you can probably do that in one simple file with classmethods, that would be awesome!

@zero323
Copy link
Owner Author

zero323 commented Jan 24, 2020

With your guidance I've been able to reduce this problem to davidhalter/jedi#1480. Behavior is slightly different, but looks similar enough to suspect common root cause.

Thanks again @davidhalter!

@zero323
Copy link
Owner Author

zero323 commented Jan 25, 2020

Resolved in Jedi with davidhalter/jedi@da2a55c

@zero323 zero323 closed this as completed Jan 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants