Error encountered: dictionary changed size during iteration #3554

gaspardc-met · 2024-08-22T10:22:13Z

What happened?

When trying to create an altair chart within a streamlit application, I run into Uncaught Exception: dictionary changed size during iteration. Running altair 5.4.0 here, and downgrading to 5.3.0 seemed to solve this specific issue at the moment.

This is with streamlit caching removed:
Error stack:

Traceback (most recent call last):
  File "/path/to/project/decorators.py", line 68, in wrapper
    result = main_func(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/file.py", line 81, in main
    display_tab(
  File "/path/to/project/file.py", line 224, in display_tab
    display_chronograms(
  File "/path/to/project/predictions/plots.py", line 513, in display_chronograms
    plot_chronogram(
  File "/path/to/project/predictions/plots.py", line 328, in plot_chronogram
    .encode(
        ^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/vegalite/v5/schema/channels.py", line 31233, in encode
    kwargs = _infer_encoding_types(args, kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 964, in infer_encoding_types
    return cache.infer_encoding_types(kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 870, in infer_encoding_types
    return {
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 870, in <dictcomp>
    return {
RuntimeError: dictionary changed size during iteration

I have had this error intermittently in the past, only on cloud deployments, and with altair 5.2.0.
Since at the time it was intermittent and in production I attributed it at the time to a streamlit cache issue (streamlit/streamlit#8409)
Now this error is happening locally, even when I remove any caching mechanism, and it's not intermittent, it never works for that specific plot. Downgrading to altair 5.3.0 seemed to fix the issue on this specific plot for now.

I'm working on a minimal reproduction code example, but at the moment it fails to produce this error on my side with dummy data.

What would you like to happen instead?

No response

Which version of Altair are you using?

5.4.0

The text was updated successfully, but these errors were encountered:

gaspardc-met · 2024-08-22T10:28:46Z

Still cannot manage to reproduce the error with dummy data, but these are basically the plot operations:

numpy==2.1.0
Pandas==2.2.2
streamlit==1.37.0

import pandas as pd
import altair as alt
import streamlit as st
import functools
import numpy as np
import random

# Dummy data for the chronogram
data = pd.DataFrame(
    {
        "start_time": pd.date_range("2023-01-01", periods=100, freq="H").tz_localize("Europe/Paris"),
        "end_time": pd.date_range("2023-01-01 01:00", periods=100, freq="H").tz_localize("Europe/Paris"),
        "asset": ["Asset " + str(i % 5) for i in range(100)],
        "load": [float(np.random.uniform(40, 100)) for _ in range(100)],
    }
)

# Set most 'load' values for 'Asset 0' to NaN or None
asset_0_indices = data[data["asset"] == "Asset 0"].index
indices_to_nullify = random.sample(list(asset_0_indices), k=len(asset_0_indices) - 2)  # Keep only 2 non-NaN

data.loc[indices_to_nullify, "load"] = np.nan  # Set
data = data.reset_index()

# Placeholder functions for processing and legend (you would replace these with actual logic)
def chronogram_legend(target, pump_toggle):
    return "Legend", "Short Legend", "Other Info"


def chronogram_processing(chronogram, timedelta, filter_load):
    return chronogram  # Simply returns the input data in this dummy example


def custom_blues():
    return ["low", "medium", "high"], ["#dceefb", "#86c7f3", "#1f77b4"]


def get_assets_starts_and_stops(chronogram, timedelta, separator_dt):
    # Simple dummy start/stop markers within the range
    starts_and_stops = alt.Chart(chronogram).mark_rule(color="red").encode(x="start_time:T")
    starts_and_stops_texts = (
        alt.Chart(chronogram)
        .mark_text(align="left", dx=5, dy=-5, color="red")
        .encode(x="start_time:T", text=alt.value("Start/Stop"))
    )
    return starts_and_stops, starts_and_stops_texts


def get_vertical_separator(separator_dt, labels_y, y_field):
    return None, None  # Placeholder for the actual function output


# Main function with dummy data and simplified inputs
def plot_chronogram(
    data: pd.DataFrame,
    formatted=".0f",
    target="load",
    timedelta="60min",
    filter_load=True,
    expand: bool = False,
    pump_toggle: bool = False,
    display_starts_and_stops: bool = False,
    separator_dt: pd.Timestamp = None,
):
    # Get legend information
    legend, short_legend, _ = chronogram_legend(target=target, pump_toggle=pump_toggle)

    # Process the data (dummy in this case)
    st.write(data.dtypes)

    # Example of expanding the time (dummy logic here)
    if expand:
        data = data.set_index("start_time").sort_index().reset_index()
        data.loc[28:, "end_time"] = data.loc[28:, "end_time"] + pd.Timedelta("45T")

    # Set up the color scale (dummy logic here)
    if target == "pressure":
        bins, colors = custom_blues()
        scale = alt.Scale(domain=bins, range=colors, type="ordinal")
    elif target == "load":
        scale = alt.Scale(domain=[0, 50, 100], range=["#f7fbff", "#6baed6", "#08306b"], type="threshold")
    else:
        scale = alt.Scale(scheme="blues")

    # Define the sorting order for the y-axis
    sort_order = [""] + data["asset"].sort_values().unique().tolist()

    # Main bar chart
    chart = (
        alt.Chart(data)
        .mark_bar()
        .encode(
            x=alt.X("start_time:T", title="Horizon Temporel"),
            x2=alt.X2("end_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=sort_order),
            color=alt.Color(
                "load:Q",
                title=short_legend,
                scale=scale,
                legend=alt.Legend(title=legend),
            ),
            stroke=alt.value("white"),
            strokeWidth=alt.value(2),
            tooltip=[
                alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                alt.Tooltip("load:Q", format=formatted, title=legend),
            ],
        )
    ).properties(
        title=f"Chronogramme d'opération: {legend}",
        width=1100,
        height=350,
    )

    # Text overlay layer
    text = (
        alt.Chart(data)
        .mark_text(dx=0, dy=0, color="white", fontSize=25)
        .encode(
            x=alt.X("mid_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
            text=alt.Text("load:Q", format=formatted),
            tooltip=[
                alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                alt.Tooltip("load:Q", format=formatted, title=legend),
            ],
        )
    ).transform_calculate(mid_time="datum.start_time + (datum.end_time - datum.start_time)/2")

    # Additional text layer for a specific condition
    text_hot = (
        alt.Chart(data)
        .mark_text(dx=0, dy=0, color="white", fontSize=25)
        .encode(
            x=alt.X("mid_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
            text=alt.value("Chaud"),
            tooltip=alt.value(None),
        )
    ).transform_calculate(mid_time="datum.start_time + (datum.end_time - datum.start_time)/2")

    # Store all charts to be layered
    all_charts = [chart, text, text_hot]

    # Example of including start and stop markers (dummy logic here)
    if display_starts_and_stops:
        starts_and_stops, starts_and_stops_texts = get_assets_starts_and_stops(
            chronogram=data,
            timedelta=timedelta,
            separator_dt=separator_dt,
        )
        all_charts += [starts_and_stops, starts_and_stops_texts]

    # Example of adding a vertical separator (dummy logic here)
    display_separator = separator_dt is not None and separator_dt > data.index.min()
    if display_separator:
        separator, separator_labels = get_vertical_separator(separator_dt=separator_dt, labels_y="", y_field="asset")
        all_charts += [separator, separator_labels]

    # Combine all chart layers
    composed = (
        functools.reduce(lambda a, b: a + b, all_charts)
        .configure_legend(orient="right", titleOrient="right")
        .configure_axis(labelFontSize=15, titleFontSize=15)
    )

    # Display the composed chart in Streamlit
    st.altair_chart(altair_chart=composed, use_container_width=True)


# Test the function with dummy data
plot_chronogram(data=data, display_starts_and_stops=True)

dangotbanned · 2024-08-24T11:11:11Z

Appreciate the detail here @gaspardc-met in #3554 (comment), but a minimal repro would be helpful.

Uncaught Exception: dictionary changed size during iteration is being raised within a (LayerChart|Chart).encode().
No idea which one though, as there seem to be a few.

I copied your code directly, commenting out the streamlit parts and didn't get any errors.
I added some additional checks at the end, all seem to be working as expected.

Attempted Repro without streamlit

def test_infer_encoding_types_mod_iter() -> None:
    import pandas as pd  # noqa: I001
    import altair as alt

    # import streamlit as st
    import functools
    import numpy as np
    import random

    # Dummy data for the chronogram
    data = pd.DataFrame(
        {
            "start_time": pd.date_range(
                "2023-01-01", periods=100, freq="H"
            ).tz_localize("Europe/Paris"),
            "end_time": pd.date_range(
                "2023-01-01 01:00", periods=100, freq="H"
            ).tz_localize("Europe/Paris"),
            "asset": ["Asset " + str(i % 5) for i in range(100)],
            "load": [float(np.random.uniform(40, 100)) for _ in range(100)],  # noqa: NPY002
        }
    )

    # Set most 'load' values for 'Asset 0' to NaN or None
    asset_0_indices = data[data["asset"] == "Asset 0"].index
    indices_to_nullify = random.sample(
        list(asset_0_indices), k=len(asset_0_indices) - 2
    )  # Keep only 2 non-NaN

    data.loc[indices_to_nullify, "load"] = np.nan  # Set
    data = data.reset_index()

    # Placeholder functions for processing and legend (you would replace these with actual logic)
    def chronogram_legend(target, pump_toggle):
        return "Legend", "Short Legend", "Other Info"

    def chronogram_processing(chronogram, timedelta, filter_load):
        return chronogram  # Simply returns the input data in this dummy example

    def custom_blues():
        return ["low", "medium", "high"], ["#dceefb", "#86c7f3", "#1f77b4"]

    def get_assets_starts_and_stops(chronogram, timedelta, separator_dt):
        # Simple dummy start/stop markers within the range
        starts_and_stops = (
            alt.Chart(chronogram).mark_rule(color="red").encode(x="start_time:T")
        )
        starts_and_stops_texts = (
            alt.Chart(chronogram)
            .mark_text(align="left", dx=5, dy=-5, color="red")
            .encode(x="start_time:T", text=alt.value("Start/Stop"))
        )
        return starts_and_stops, starts_and_stops_texts

    def get_vertical_separator(separator_dt, labels_y, y_field):
        return None, None  # Placeholder for the actual function output

    # Main function with dummy data and simplified inputs
    def plot_chronogram(
        data: pd.DataFrame,
        formatted=".0f",
        target="load",
        timedelta="60min",
        filter_load=True,
        expand: bool = False,
        pump_toggle: bool = False,
        display_starts_and_stops: bool = False,
        separator_dt: pd.Timestamp = None,
    ):
        # Get legend information
        legend, short_legend, _ = chronogram_legend(
            target=target, pump_toggle=pump_toggle
        )

        # Process the data (dummy in this case)
        # st.write(data.dtypes)

        # Example of expanding the time (dummy logic here)
        if expand:
            data = data.set_index("start_time").sort_index().reset_index()
            data.loc[28:, "end_time"] = data.loc[28:, "end_time"] + pd.Timedelta("45T")

        # Set up the color scale (dummy logic here)
        if target == "pressure":
            bins, colors = custom_blues()
            scale = alt.Scale(domain=bins, range=colors, type="ordinal")
        elif target == "load":
            scale = alt.Scale(
                domain=[0, 50, 100],
                range=["#f7fbff", "#6baed6", "#08306b"],
                type="threshold",
            )
        else:
            scale = alt.Scale(scheme="blues")

        # Define the sorting order for the y-axis
        sort_order = [""] + data["asset"].sort_values().unique().tolist()  # noqa: RUF005

        # Main bar chart
        chart = (
            alt.Chart(data)
            .mark_bar()
            .encode(
                x=alt.X("start_time:T", title="Horizon Temporel"),
                x2=alt.X2("end_time:T", title=""),
                y=alt.Y(
                    "asset:N", title="Utilisation: Groupes ou AFC", sort=sort_order
                ),
                color=alt.Color(
                    "load:Q",
                    title=short_legend,
                    scale=scale,
                    legend=alt.Legend(title=legend),
                ),
                stroke=alt.value("white"),
                strokeWidth=alt.value(2),
                tooltip=[
                    alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                    alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                    alt.Tooltip("load:Q", format=formatted, title=legend),
                ],
            )
        ).properties(
            title=f"Chronogramme d'opération: {legend}",
            width=1100,
            height=350,
        )

        # Text overlay layer
        text = (
            alt.Chart(data)
            .mark_text(dx=0, dy=0, color="white", fontSize=25)
            .encode(
                x=alt.X("mid_time:T", title=""),
                y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
                text=alt.Text("load:Q", format=formatted),
                tooltip=[
                    alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                    alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                    alt.Tooltip("load:Q", format=formatted, title=legend),
                ],
            )
        ).transform_calculate(
            mid_time="datum.start_time + (datum.end_time - datum.start_time)/2"
        )

        # Additional text layer for a specific condition
        text_hot = (
            alt.Chart(data)
            .mark_text(dx=0, dy=0, color="white", fontSize=25)
            .encode(
                x=alt.X("mid_time:T", title=""),
                y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
                text=alt.value("Chaud"),
                tooltip=alt.value(None),
            )
        ).transform_calculate(
            mid_time="datum.start_time + (datum.end_time - datum.start_time)/2"
        )

        # Store all charts to be layered
        all_charts = [chart, text, text_hot]

        # Example of including start and stop markers (dummy logic here)
        if display_starts_and_stops:
            starts_and_stops, starts_and_stops_texts = get_assets_starts_and_stops(
                chronogram=data,
                timedelta=timedelta,
                separator_dt=separator_dt,
            )
            all_charts += [starts_and_stops, starts_and_stops_texts]

        # Example of adding a vertical separator (dummy logic here)
        display_separator = separator_dt is not None and separator_dt > data.index.min()
        if display_separator:
            separator, separator_labels = get_vertical_separator(
                separator_dt=separator_dt, labels_y="", y_field="asset"
            )
            all_charts += [separator, separator_labels]

        # Combine all chart layers
        composed = (
            functools.reduce(lambda a, b: a + b, all_charts)  # noqa: FURB118
            .configure_legend(orient="right", titleOrient="right")
            .configure_axis(labelFontSize=15, titleFontSize=15)
        )

        # Display the composed chart in Streamlit
        # st.altair_chart(altair_chart=composed, use_container_width=True)
        return composed

    # Test the function with dummy data
    composed = plot_chronogram(data=data, display_starts_and_stops=True)

    # NOTE: Reaching here wouldn't be possible if the error raised
    validated = composed.to_dict(validate=True)
    # NOTE: Another error would have been raised if the spec we returned was not valid
    assert isinstance(validated, dict)

    # NOTE: These may require optional dependencies you don't have,
    # but provide more evidence of the spec produced being valid
    vega_editor_url = composed.to_url()
    assert isinstance(vega_editor_url, str)
    composed.open_editor()

If you do not have the required dependencies for Chart.open_editor, see in Vega Editor

Screenshot of Vega Editor

Edit

A possible issue here is https://github.com/streamlit/streamlit/releases/tag/1.37.0 was released prior to https://github.com/vega/altair/releases/tag/v5.4.0.
streamlit may be making assumptions on the internals of altair, which do not hold since #3444

I'm not familiar with streamlit, but it may be one of these modules that is altering the dictionary:

dangotbanned · 2024-08-27T13:05:34Z

Closing as it appears to be a downstream issue in streamlit.

@gaspardc-met please feel free to comment if you feel I've made a mistake in this assessment

Fixes #3634, #3554

gaspardc-met added the bug label Aug 22, 2024

dangotbanned closed this as completed Aug 27, 2024

dangotbanned closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024

mattijn mentioned this issue Sep 6, 2024

Use object specific counter instead of a global counter for unnamed parameter & view names #3416

Open

dangotbanned mentioned this issue Oct 9, 2024

TypeError: FacetedEncoding.init() got multiple values for argument 'self' #3634

Closed

dangotbanned reopened this Oct 11, 2024

dangotbanned added a commit that referenced this issue Oct 11, 2024

fix: Replace unsafe locals() manipulation in Chart.encode

586e559

Fixes #3634, #3554

dangotbanned mentioned this issue Oct 11, 2024

fix: Replace unsafe locals() manipulation in Chart.encode #3637

Merged

dangotbanned linked a pull request Oct 11, 2024 that will close this issue

fix: Replace unsafe locals() manipulation in Chart.encode #3637

Merged

dangotbanned closed this as completed in #3637 Oct 12, 2024

dangotbanned added a commit that referenced this issue Oct 12, 2024

fix: Replace unsafe locals() manipulation in Chart.encode (#3637)

8135911

Fixes #3634, #3554

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error encountered: dictionary changed size during iteration #3554

Error encountered: dictionary changed size during iteration #3554

gaspardc-met commented Aug 22, 2024

gaspardc-met commented Aug 22, 2024

dangotbanned commented Aug 24, 2024 •

edited

Loading

dangotbanned commented Aug 27, 2024

Error encountered: dictionary changed size during iteration #3554

Error encountered: dictionary changed size during iteration #3554

Comments

gaspardc-met commented Aug 22, 2024

What happened?

What would you like to happen instead?

Which version of Altair are you using?

gaspardc-met commented Aug 22, 2024

dangotbanned commented Aug 24, 2024 • edited Loading

Edit

dangotbanned commented Aug 27, 2024

dangotbanned commented Aug 24, 2024 •

edited

Loading