Description
Currently, Series
is used in several places where the inner type of the Series isn't known, e.g.:
pandas-stubs/pandas-stubs/core/series.pyi
Lines 864 to 871 in 25fe8aa
There's a couple of issues I'm running into with this
First, the pyright-strict job marks this as partially unknown:
/home/runner/work/pandas-stubs/pandas-stubs/tests/test_series.py:1039:5 - error: Type of "compare" is partially unknown
Type of "compare" is "Overload[(other: Series[Unknown], align_axis: Literal['index', 0], keep_shape: bool = ..., keep_equal: bool = ...) -> Series[Unknown], (other: Series[Unknown], align_axis: Literal['columns', 1] = ..., keep_shape: bool = ..., keep_equal: bool = ...) -> DataFrame]" (reportUnknownMemberType)
Second, when using pyright
with --verifytypes
to look for uncovered parts of the public API, this is flagged as "unknown type":
{
"category": "function",
"name": "pandas.testing.assert_series_equal",
"referenceCount": 1,
"isExported": true,
"isTypeKnown": false,
"isTypeAmbiguous": false,
"diagnostics": [
{
"file": "/home/marcogorelli/type_coverage_py/.pyright_env_pandas/lib/python3.12/site-packages/pandas/_testing/__init__.pyi",
"severity": "error",
"message": "Type of parameter \"left\" is partially unknown\n Parameter type is \"Series[Unknown]\"\n Type argument 1 for class \"Series\" has unknown type",
"range": {
"start": {
"line": 4,
"character": 27
},
"end": {
"line": 4,
"character": 46
}
}
},
Would it be OK to use Series[Any]
instead of just Series
in such cases? Or, as some libraries do, to introduce a type alias Incomplete: TypeAlias = Any
to mean "we should be able to narrow down the type but for now we're not doing so" and use that in some cases
The latter use-case (--verifytypes
) can, I think, really help to prioritise which stubs to add
Activity
Dr-Irv commentedon Feb 28, 2025
We've had discussion about this in another PR. See #1093 (comment)
Idea is to create an
UnknownSeries
type that would correspond to when we don't know the type. I think this may solve the problem you raise above.core.strings.pyi
using them #1146Jeitan commentedon Mar 18, 2025
Is there something similar happening for DataFrames? I'm also running afoul of things being partially unknown using strict with basedpyright, specifically in the return of
read_excel
for a particular overload (a single string passed into the io parameter).Dr-Irv commentedon Mar 19, 2025
I don't think the return types of
read_excel()
should be seen as partially unknown. Can you provide a simple example that illustrates what you are seeing?What might be happening is the following. Let's say you do
df = pd.read_excel("your string")
The result of that is either adict
ofDataFrame
objects or a singleDataFrame
. Let's say it is the latter. If you then dos = df["a"]
, thens
will be partially unknown since it will have typeSeries[Any]
. There are some contributions to change all references inpandas-stubs
fromSeries
toUnknownSeries
. This is an ongoing effort.Jeitan commentedon Mar 20, 2025
@Dr-Irv Okay gotcha. I probably need to do some better things on my end also, so I'll see if I can address it that way first. Thanks!
default
inTypeVar
soSeries
defaults toSeries[Any]
, andIndex
toIndex[Any]
#1232