Open
Description
Currently, we use on many places these annotations for data
/ user_data
:
data: list[dict[str, Any]] | dict[str, Any]
data: dict[str, Any]
This works, but it isn't precise - we only accept JSON-serializable types.
We've got this recursive alias:
J = TypeVar('J', bound='JsonSerializable')
JsonSerializable: TypeAlias = Union[
list[J],
dict[str, J],
str,
bool,
int,
float,
None,
]
But if we use it for these variables:
data: list[dict[str, JsonSerializable]] | dict[str, JsonSerializable]
data: dict[str, JsonSerializable]
We run into variance-related errors, like this:
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: error: Argument 1 to "__call__" of "PushDataFunction" has incompatible type "dict[str, str]"; expected "Union[list[dict[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]], dict[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]]" [arg-type]
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: note: Consider using "Mapping" instead, which is covariant in the value type
If we follow the suggestions, and use the Mapping
and Sequence
:
data: Sequence[Mapping[str, JsonSerializable]] | Mapping[str, JsonSerializable]
We end up with even more errors on the usage side, e.g.
item = {'key': 'value', 'number': 42}
await dataset_client.push_data(item)
Error (dict[str, object]
vs. Mapping[str, JsonSerializable]
)
Argument 1 to "push_data" of "MemoryDatasetClient" has incompatible type "dict[str, object]"; expected "Union[Sequence[Mapping[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]], Mapping[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]]" Mypy[arg-type](https://mypy.readthedocs.io/en/latest/_refs.html#code-arg-type)
Is using the JsonSerializable
alias in this context the right choice? Should we adopt something different? How? The goal is to get precise JSON-serializable typing, avoid variance errors, and usage side errors.