Update `push_data` annotations to use `JsonSerializable` type

Currently, we use on many places these annotations for `data` / `user_data`:

```python
data: list[dict[str, Any]] | dict[str, Any]
```

```python
data: dict[str, Any]
```

This works, but it isn't precise - we only accept JSON-serializable types.

We've got this recursive alias:

```python
J = TypeVar('J', bound='JsonSerializable')
JsonSerializable: TypeAlias = Union[
    list[J],
    dict[str, J],
    str,
    bool,
    int,
    float,
    None,
]
```

But if we use it for these variables:

```python
data: list[dict[str, JsonSerializable]] | dict[str, JsonSerializable]
```

```python
data: dict[str, JsonSerializable]
```

We run into variance-related errors, like this:

```
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: error: Argument 1 to "__call__" of "PushDataFunction" has incompatible type "dict[str, str]"; expected "Union[list[dict[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]], dict[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]]"  [arg-type]
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: note: Consider using "Mapping" instead, which is covariant in the value type
```

If we follow the suggestions, and use the `Mapping` and `Sequence`:

```python
data: Sequence[Mapping[str, JsonSerializable]] | Mapping[str, JsonSerializable]
```

We end up with even more errors on the usage side, e.g.

```python
item = {'key': 'value', 'number': 42}
await dataset_client.push_data(item)
```

Error (`dict[str, object]` vs. `Mapping[str, JsonSerializable]`)

```
Argument 1 to "push_data" of "MemoryDatasetClient" has incompatible type "dict[str, object]"; expected "Union[Sequence[Mapping[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]], Mapping[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]]" Mypy[arg-type](https://mypy.readthedocs.io/en/latest/_refs.html#code-arg-type)
```

Is using the `JsonSerializable` alias in this context the right choice? Should we adopt something different? How? The goal is to get precise JSON-serializable typing, avoid variance errors, and usage side errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update `push_data` annotations to use `JsonSerializable` type #1191

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update push_data annotations to use JsonSerializable type #1191

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Update `push_data` annotations to use `JsonSerializable` type #1191