Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error encountered: dictionary changed size during iteration #3554

Closed
gaspardc-met opened this issue Aug 22, 2024 · 3 comments · Fixed by #3637
Closed

Error encountered: dictionary changed size during iteration #3554

gaspardc-met opened this issue Aug 22, 2024 · 3 comments · Fixed by #3637
Labels

Comments

@gaspardc-met
Copy link

What happened?

When trying to create an altair chart within a streamlit application, I run into Uncaught Exception: dictionary changed size during iteration. Running altair 5.4.0 here, and downgrading to 5.3.0 seemed to solve this specific issue at the moment.

This is with streamlit caching removed:
Error stack:

Traceback (most recent call last):
  File "/path/to/project/decorators.py", line 68, in wrapper
    result = main_func(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/file.py", line 81, in main
    display_tab(
  File "/path/to/project/file.py", line 224, in display_tab
    display_chronograms(
  File "/path/to/project/predictions/plots.py", line 513, in display_chronograms
    plot_chronogram(
  File "/path/to/project/predictions/plots.py", line 328, in plot_chronogram
    .encode(
        ^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/vegalite/v5/schema/channels.py", line 31233, in encode
    kwargs = _infer_encoding_types(args, kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 964, in infer_encoding_types
    return cache.infer_encoding_types(kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 870, in infer_encoding_types
    return {
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 870, in <dictcomp>
    return {
RuntimeError: dictionary changed size during iteration

I have had this error intermittently in the past, only on cloud deployments, and with altair 5.2.0.
Since at the time it was intermittent and in production I attributed it at the time to a streamlit cache issue (streamlit/streamlit#8409)
Now this error is happening locally, even when I remove any caching mechanism, and it's not intermittent, it never works for that specific plot. Downgrading to altair 5.3.0 seemed to fix the issue on this specific plot for now.

I'm working on a minimal reproduction code example, but at the moment it fails to produce this error on my side with dummy data.

What would you like to happen instead?

No response

Which version of Altair are you using?

5.4.0

@gaspardc-met
Copy link
Author

Still cannot manage to reproduce the error with dummy data, but these are basically the plot operations:

numpy==2.1.0
Pandas==2.2.2
streamlit==1.37.0

import pandas as pd
import altair as alt
import streamlit as st
import functools
import numpy as np
import random

# Dummy data for the chronogram
data = pd.DataFrame(
    {
        "start_time": pd.date_range("2023-01-01", periods=100, freq="H").tz_localize("Europe/Paris"),
        "end_time": pd.date_range("2023-01-01 01:00", periods=100, freq="H").tz_localize("Europe/Paris"),
        "asset": ["Asset " + str(i % 5) for i in range(100)],
        "load": [float(np.random.uniform(40, 100)) for _ in range(100)],
    }
)

# Set most 'load' values for 'Asset 0' to NaN or None
asset_0_indices = data[data["asset"] == "Asset 0"].index
indices_to_nullify = random.sample(list(asset_0_indices), k=len(asset_0_indices) - 2)  # Keep only 2 non-NaN

data.loc[indices_to_nullify, "load"] = np.nan  # Set
data = data.reset_index()

# Placeholder functions for processing and legend (you would replace these with actual logic)
def chronogram_legend(target, pump_toggle):
    return "Legend", "Short Legend", "Other Info"


def chronogram_processing(chronogram, timedelta, filter_load):
    return chronogram  # Simply returns the input data in this dummy example


def custom_blues():
    return ["low", "medium", "high"], ["#dceefb", "#86c7f3", "#1f77b4"]


def get_assets_starts_and_stops(chronogram, timedelta, separator_dt):
    # Simple dummy start/stop markers within the range
    starts_and_stops = alt.Chart(chronogram).mark_rule(color="red").encode(x="start_time:T")
    starts_and_stops_texts = (
        alt.Chart(chronogram)
        .mark_text(align="left", dx=5, dy=-5, color="red")
        .encode(x="start_time:T", text=alt.value("Start/Stop"))
    )
    return starts_and_stops, starts_and_stops_texts


def get_vertical_separator(separator_dt, labels_y, y_field):
    return None, None  # Placeholder for the actual function output


# Main function with dummy data and simplified inputs
def plot_chronogram(
    data: pd.DataFrame,
    formatted=".0f",
    target="load",
    timedelta="60min",
    filter_load=True,
    expand: bool = False,
    pump_toggle: bool = False,
    display_starts_and_stops: bool = False,
    separator_dt: pd.Timestamp = None,
):
    # Get legend information
    legend, short_legend, _ = chronogram_legend(target=target, pump_toggle=pump_toggle)

    # Process the data (dummy in this case)
    st.write(data.dtypes)

    # Example of expanding the time (dummy logic here)
    if expand:
        data = data.set_index("start_time").sort_index().reset_index()
        data.loc[28:, "end_time"] = data.loc[28:, "end_time"] + pd.Timedelta("45T")

    # Set up the color scale (dummy logic here)
    if target == "pressure":
        bins, colors = custom_blues()
        scale = alt.Scale(domain=bins, range=colors, type="ordinal")
    elif target == "load":
        scale = alt.Scale(domain=[0, 50, 100], range=["#f7fbff", "#6baed6", "#08306b"], type="threshold")
    else:
        scale = alt.Scale(scheme="blues")

    # Define the sorting order for the y-axis
    sort_order = [""] + data["asset"].sort_values().unique().tolist()

    # Main bar chart
    chart = (
        alt.Chart(data)
        .mark_bar()
        .encode(
            x=alt.X("start_time:T", title="Horizon Temporel"),
            x2=alt.X2("end_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=sort_order),
            color=alt.Color(
                "load:Q",
                title=short_legend,
                scale=scale,
                legend=alt.Legend(title=legend),
            ),
            stroke=alt.value("white"),
            strokeWidth=alt.value(2),
            tooltip=[
                alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                alt.Tooltip("load:Q", format=formatted, title=legend),
            ],
        )
    ).properties(
        title=f"Chronogramme d'opération: {legend}",
        width=1100,
        height=350,
    )

    # Text overlay layer
    text = (
        alt.Chart(data)
        .mark_text(dx=0, dy=0, color="white", fontSize=25)
        .encode(
            x=alt.X("mid_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
            text=alt.Text("load:Q", format=formatted),
            tooltip=[
                alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                alt.Tooltip("load:Q", format=formatted, title=legend),
            ],
        )
    ).transform_calculate(mid_time="datum.start_time + (datum.end_time - datum.start_time)/2")

    # Additional text layer for a specific condition
    text_hot = (
        alt.Chart(data)
        .mark_text(dx=0, dy=0, color="white", fontSize=25)
        .encode(
            x=alt.X("mid_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
            text=alt.value("Chaud"),
            tooltip=alt.value(None),
        )
    ).transform_calculate(mid_time="datum.start_time + (datum.end_time - datum.start_time)/2")

    # Store all charts to be layered
    all_charts = [chart, text, text_hot]

    # Example of including start and stop markers (dummy logic here)
    if display_starts_and_stops:
        starts_and_stops, starts_and_stops_texts = get_assets_starts_and_stops(
            chronogram=data,
            timedelta=timedelta,
            separator_dt=separator_dt,
        )
        all_charts += [starts_and_stops, starts_and_stops_texts]

    # Example of adding a vertical separator (dummy logic here)
    display_separator = separator_dt is not None and separator_dt > data.index.min()
    if display_separator:
        separator, separator_labels = get_vertical_separator(separator_dt=separator_dt, labels_y="", y_field="asset")
        all_charts += [separator, separator_labels]

    # Combine all chart layers
    composed = (
        functools.reduce(lambda a, b: a + b, all_charts)
        .configure_legend(orient="right", titleOrient="right")
        .configure_axis(labelFontSize=15, titleFontSize=15)
    )

    # Display the composed chart in Streamlit
    st.altair_chart(altair_chart=composed, use_container_width=True)


# Test the function with dummy data
plot_chronogram(data=data, display_starts_and_stops=True)

@dangotbanned
Copy link
Member

dangotbanned commented Aug 24, 2024

Appreciate the detail here @gaspardc-met in #3554 (comment), but a minimal repro would be helpful.

Uncaught Exception: dictionary changed size during iteration is being raised within a (LayerChart|Chart).encode().
No idea which one though, as there seem to be a few.

I copied your code directly, commenting out the streamlit parts and didn't get any errors.
I added some additional checks at the end, all seem to be working as expected.

Attempted Repro without streamlit
def test_infer_encoding_types_mod_iter() -> None:
    import pandas as pd  # noqa: I001
    import altair as alt

    # import streamlit as st
    import functools
    import numpy as np
    import random

    # Dummy data for the chronogram
    data = pd.DataFrame(
        {
            "start_time": pd.date_range(
                "2023-01-01", periods=100, freq="H"
            ).tz_localize("Europe/Paris"),
            "end_time": pd.date_range(
                "2023-01-01 01:00", periods=100, freq="H"
            ).tz_localize("Europe/Paris"),
            "asset": ["Asset " + str(i % 5) for i in range(100)],
            "load": [float(np.random.uniform(40, 100)) for _ in range(100)],  # noqa: NPY002
        }
    )

    # Set most 'load' values for 'Asset 0' to NaN or None
    asset_0_indices = data[data["asset"] == "Asset 0"].index
    indices_to_nullify = random.sample(
        list(asset_0_indices), k=len(asset_0_indices) - 2
    )  # Keep only 2 non-NaN

    data.loc[indices_to_nullify, "load"] = np.nan  # Set
    data = data.reset_index()

    # Placeholder functions for processing and legend (you would replace these with actual logic)
    def chronogram_legend(target, pump_toggle):
        return "Legend", "Short Legend", "Other Info"

    def chronogram_processing(chronogram, timedelta, filter_load):
        return chronogram  # Simply returns the input data in this dummy example

    def custom_blues():
        return ["low", "medium", "high"], ["#dceefb", "#86c7f3", "#1f77b4"]

    def get_assets_starts_and_stops(chronogram, timedelta, separator_dt):
        # Simple dummy start/stop markers within the range
        starts_and_stops = (
            alt.Chart(chronogram).mark_rule(color="red").encode(x="start_time:T")
        )
        starts_and_stops_texts = (
            alt.Chart(chronogram)
            .mark_text(align="left", dx=5, dy=-5, color="red")
            .encode(x="start_time:T", text=alt.value("Start/Stop"))
        )
        return starts_and_stops, starts_and_stops_texts

    def get_vertical_separator(separator_dt, labels_y, y_field):
        return None, None  # Placeholder for the actual function output

    # Main function with dummy data and simplified inputs
    def plot_chronogram(
        data: pd.DataFrame,
        formatted=".0f",
        target="load",
        timedelta="60min",
        filter_load=True,
        expand: bool = False,
        pump_toggle: bool = False,
        display_starts_and_stops: bool = False,
        separator_dt: pd.Timestamp = None,
    ):
        # Get legend information
        legend, short_legend, _ = chronogram_legend(
            target=target, pump_toggle=pump_toggle
        )

        # Process the data (dummy in this case)
        # st.write(data.dtypes)

        # Example of expanding the time (dummy logic here)
        if expand:
            data = data.set_index("start_time").sort_index().reset_index()
            data.loc[28:, "end_time"] = data.loc[28:, "end_time"] + pd.Timedelta("45T")

        # Set up the color scale (dummy logic here)
        if target == "pressure":
            bins, colors = custom_blues()
            scale = alt.Scale(domain=bins, range=colors, type="ordinal")
        elif target == "load":
            scale = alt.Scale(
                domain=[0, 50, 100],
                range=["#f7fbff", "#6baed6", "#08306b"],
                type="threshold",
            )
        else:
            scale = alt.Scale(scheme="blues")

        # Define the sorting order for the y-axis
        sort_order = [""] + data["asset"].sort_values().unique().tolist()  # noqa: RUF005

        # Main bar chart
        chart = (
            alt.Chart(data)
            .mark_bar()
            .encode(
                x=alt.X("start_time:T", title="Horizon Temporel"),
                x2=alt.X2("end_time:T", title=""),
                y=alt.Y(
                    "asset:N", title="Utilisation: Groupes ou AFC", sort=sort_order
                ),
                color=alt.Color(
                    "load:Q",
                    title=short_legend,
                    scale=scale,
                    legend=alt.Legend(title=legend),
                ),
                stroke=alt.value("white"),
                strokeWidth=alt.value(2),
                tooltip=[
                    alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                    alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                    alt.Tooltip("load:Q", format=formatted, title=legend),
                ],
            )
        ).properties(
            title=f"Chronogramme d'opération: {legend}",
            width=1100,
            height=350,
        )

        # Text overlay layer
        text = (
            alt.Chart(data)
            .mark_text(dx=0, dy=0, color="white", fontSize=25)
            .encode(
                x=alt.X("mid_time:T", title=""),
                y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
                text=alt.Text("load:Q", format=formatted),
                tooltip=[
                    alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                    alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                    alt.Tooltip("load:Q", format=formatted, title=legend),
                ],
            )
        ).transform_calculate(
            mid_time="datum.start_time + (datum.end_time - datum.start_time)/2"
        )

        # Additional text layer for a specific condition
        text_hot = (
            alt.Chart(data)
            .mark_text(dx=0, dy=0, color="white", fontSize=25)
            .encode(
                x=alt.X("mid_time:T", title=""),
                y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
                text=alt.value("Chaud"),
                tooltip=alt.value(None),
            )
        ).transform_calculate(
            mid_time="datum.start_time + (datum.end_time - datum.start_time)/2"
        )

        # Store all charts to be layered
        all_charts = [chart, text, text_hot]

        # Example of including start and stop markers (dummy logic here)
        if display_starts_and_stops:
            starts_and_stops, starts_and_stops_texts = get_assets_starts_and_stops(
                chronogram=data,
                timedelta=timedelta,
                separator_dt=separator_dt,
            )
            all_charts += [starts_and_stops, starts_and_stops_texts]

        # Example of adding a vertical separator (dummy logic here)
        display_separator = separator_dt is not None and separator_dt > data.index.min()
        if display_separator:
            separator, separator_labels = get_vertical_separator(
                separator_dt=separator_dt, labels_y="", y_field="asset"
            )
            all_charts += [separator, separator_labels]

        # Combine all chart layers
        composed = (
            functools.reduce(lambda a, b: a + b, all_charts)  # noqa: FURB118
            .configure_legend(orient="right", titleOrient="right")
            .configure_axis(labelFontSize=15, titleFontSize=15)
        )

        # Display the composed chart in Streamlit
        # st.altair_chart(altair_chart=composed, use_container_width=True)
        return composed

    # Test the function with dummy data
    composed = plot_chronogram(data=data, display_starts_and_stops=True)

    # NOTE: Reaching here wouldn't be possible if the error raised
    validated = composed.to_dict(validate=True)
    # NOTE: Another error would have been raised if the spec we returned was not valid
    assert isinstance(validated, dict)

    # NOTE: These may require optional dependencies you don't have,
    # but provide more evidence of the spec produced being valid
    vega_editor_url = composed.to_url()
    assert isinstance(vega_editor_url, str)
    composed.open_editor()

If you do not have the required dependencies for Chart.open_editor, see in Vega Editor

Screenshot of Vega Editor

image

Edit

A possible issue here is https://github.com/streamlit/streamlit/releases/tag/1.37.0 was released prior to https://github.com/vega/altair/releases/tag/v5.4.0.
streamlit may be making assumptions on the internals of altair, which do not hold since #3444

I'm not familiar with streamlit, but it may be one of these modules that is altering the dictionary:

@dangotbanned
Copy link
Member

Closing as it appears to be a downstream issue in streamlit.

@gaspardc-met please feel free to comment if you feel I've made a mistake in this assessment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants