Jedi doesn't work with MLReaders #352

zero323 · 2020-01-23T20:20:14Z

It seems like there is some problem with Jedi compatibility. Some components seem to work pretty well. For example DataFrame without stubs:

In [1]: import jedi                                                                                                                                                                                                

In [2]: from pyspark.sql import SparkSession                                                                                                                                                                       

In [3]: jedi.Interpreter("SparkSession.builder.getOrCreate().createDataFrame([]).", [globals()]).completions()                                                                                                     
---------------------------------------------------------------------------
AttributeError   
...
AttributeError: 'ModuleContext' object has no attribute 'py__path__'

and with stubs:

In [1]: from pyspark.sql import SparkSession                                                                                                                                                                       

In [2]: import jedi                                                                                                                                                                                                

In [3]: jedi.Interpreter("SparkSession.builder.getOrCreate().createDataFrame([]).", [globals()]).completions()                                                                                                     
Out[3]: 
[<Completion: agg>,
 <Completion: alias>,
 <Completion: approxQuantile>,
 <Completion: cache>,
 <Completion: checkpoint>,
 <Completion: coalesce>,
 <Completion: collect>,
 <Completion: colRegex>,
 <Completion: columns>,
 <Completion: corr>,
 <Completion: count>,
 <Completion: cov>,
...
 <Completion: __str__>]

So far so good. However, if take for example LinearRegressionModel.load things don't work so well. Without stubs provides no suggestions

In [1]: import jedi                                                                                                                                                                                                

In [2]: from pyspark.ml.regression import LinearRegressionModel                                                                                                                                                    

In [3]: jedi.Interpreter("LinearRegressionModel.load('foo').", [globals()]).completions()                                                                                                                          
Out[3]: []

but one provided with stubs

In [1]: import jedi                                                                                                                                                                                                

In [2]: from pyspark.ml.regression import LinearRegressionModel                                                                                                                                                    

In [3]: jedi.Interpreter("LinearRegressionModel.load('foo').", [globals()]).completions()                                                                                                                          
Out[3]: 
[<Completion: load>,
 <Completion: read>,
 <Completion: __annotations__>,
 <Completion: __class__>,
 <Completion: __delattr__>,
 <Completion: __dict__>,
 <Completion: __dir__>,
 <Completion: __doc__>,
 <Completion: __eq__>,
 <Completion: __format__>,
 <Completion: __getattribute__>,
 <Completion: __hash__>,
 <Completion: __init__>,
 <Completion: __init_subclass__>,
 <Completion: __module__>,
 <Completion: __ne__>,
 <Completion: __new__>,
 <Completion: __reduce__>,
 <Completion: __reduce_ex__>,
 <Completion: __repr__>,
 <Completion: __setattr__>,
 <Completion: __sizeof__>,
 <Completion: __slots__>,

don't make much sense. If model is fitted:

In [4]: from pyspark.ml.regression import LinearRegression                                                                                                                                                         

In [5]: jedi.Interpreter("LinearRegression().fit(...).", [globals()]).completions()                                                                                                                                
Out[5]: 
[<Completion: aggregationDepth>,
 <Completion: append>,
 <Completion: clear>,
 <Completion: coefficients>,
 <Completion: copy>,
 <Completion: count>,
....
 <Completion: __str__>]

Model which is explicitly annotated works fine, so it seems like there is something in MLReader or one of the sub-classes that causes a failure.

We already have data tests for this (as well as some test cases from apache/spark examples, and mypy seems to be fine with this.

Since LinearRegression.fit works fine (and some toy tests confirm that), Generics are not sufficient to reproduce the problem. So it seems like type parameter is not processed correctly on the path:

Tested with:

jedi==0.15.2 and jedi==0.16.0 (0c56aa4).
pyspark-stubs==3.0.0.dev5
pyspark==3.0.0.dev0 (afe70b3)

The text was updated successfully, but these errors were encountered:

zero323 · 2020-01-23T20:45:45Z

CC @davidhalter I wonder, if you could take a look at this and let me know if you see any obvious culprit. I cannot say if that's annotation issue, which is missed by mypy, or a problem with Jedi.

davidhalter · 2020-01-23T22:49:38Z

@zero323 If the first example (AttributeError) really happens with latest master, please report it, that's definitely a bug.

I'm not really sure I understand the other stuff. Please explain precisely what completions you would want at a certain point and what is wrong. I have never worked with the pyspark API.

zero323 · 2020-01-23T23:14:58Z

If the first example (AttributeError) really happens with latest master, please report it, that's definitely a bug.

Here you are davidhalter/jedi#1479. Should be fully reproducible with master HEAD.

zero323 · 2020-01-23T23:23:37Z

I'm not really sure I understand the other stuff. Please explain precisely what completions you would want at a certain point and what is wrong. I have never worked with the pyspark API.

Thank you. In general structure goes through multiple levels of inheritance where Reader object:

class Reader(Generic[T]):
    @classmethod 
    def load(cls, str) -> T: ...

is mixed in into different objects

class Model(Reader[Model]): 
    def transform(self, data): ...

So I'd expect that Model.load("path") will be able to complete transform. However it seems like it recognizes Model.load("path") as Reader. Unfortunately I was unable to isolate this behavior into minimal example so far. I'll work on that...

Thanks again for your time. Much obliged.

davidhalter · 2020-01-23T23:39:26Z

Hmmm, if I had to guess this might have to do with the classmethod. It doesn't have to be that, but the classmethod might cause trouble, because I'm pretty sure that's not tested.

If you find a simple reproduction case (you can probably do that in one simple file with classmethods, that would be awesome!

zero323 · 2020-01-24T00:26:03Z

With your guidance I've been able to reduce this problem to davidhalter/jedi#1480. Behavior is slightly different, but looks similar enough to suspect common root cause.

Thanks again @davidhalter!

zero323 · 2020-01-25T19:30:40Z

Resolved in Jedi with davidhalter/jedi@da2a55c

zero323 closed this as completed Jan 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jedi doesn't work with MLReaders #352

Jedi doesn't work with MLReaders #352

zero323 commented Jan 23, 2020 •

edited

zero323 commented Jan 23, 2020

davidhalter commented Jan 23, 2020

zero323 commented Jan 23, 2020 •

edited

zero323 commented Jan 23, 2020

davidhalter commented Jan 23, 2020 •

edited

zero323 commented Jan 24, 2020

zero323 commented Jan 25, 2020

Jedi doesn't work with MLReaders #352

Jedi doesn't work with MLReaders #352

Comments

zero323 commented Jan 23, 2020 • edited

zero323 commented Jan 23, 2020

davidhalter commented Jan 23, 2020

zero323 commented Jan 23, 2020 • edited

zero323 commented Jan 23, 2020

davidhalter commented Jan 23, 2020 • edited

zero323 commented Jan 24, 2020

zero323 commented Jan 25, 2020

zero323 commented Jan 23, 2020 •

edited

zero323 commented Jan 23, 2020 •

edited

davidhalter commented Jan 23, 2020 •

edited