Plotting a DataFrame with duplicate columns should produce a more informative error #3181

schlich · 2021-05-06T02:07:33Z

Some weird quirks in the Pandas API allow for columns with duplicate column names:

df=pd.DataFrame.from_records(
    {"Variable1": 1, "Variable2": 2}, index=[1]
).rename(columns={"Variable2": "Variable1"})

	Variable1	Variable1
1	1	2

attempting to plot this results in an uninformative ValueError:

>> px.scatter(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-95-d52a0e44708e> in <module>
      1 import plotly.express as px
      2 df=pd.DataFrame.from_records({"Variable1": "Value", "Variable2": "Value"}, index=[1]).rename(columns={"Variable2": "Variable1"})
----> 3 px.scatter(df)
      4 # px.scatter(
      5 #     df,

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_chart_types.py in scatter(data_frame, x, y, color, symbol, size, hover_name, hover_data, custom_data, text, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, orientation, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, symbol_sequence, symbol_map, opacity, size_max, marginal_x, marginal_y, trendline, trendline_color_override, log_x, log_y, range_x, range_y, render_mode, title, template, width, height)
     62     mark in 2D space.
     63     """
---> 64     return make_figure(args=locals(), constructor=go.Scatter)
     65 
     66 

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in make_figure(args, constructor, trace_patch, layout_patch)
   1859     apply_default_cascade(args)
   1860 
-> 1861     args = build_dataframe(args, constructor)
   1862     if constructor in [go.Treemap, go.Sunburst] and args["path"] is not None:
   1863         args = process_dataframe_hierarchy(args)

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in build_dataframe(args, constructor)
   1375     # now that things have been prepped, we do the systematic rewriting of `args`
   1376 
-> 1377     df_output, wide_id_vars = process_args_into_dataframe(
   1378         args, wide_mode, var_name, value_name
   1379     )

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in process_args_into_dataframe(args, wide_mode, var_name, value_name)
   1196                 else:
   1197                     col_name = str(argument)
-> 1198                     df_output[col_name] = to_unindexed_series(df_input[argument])
   1199             # ----------------- argument is likely a column / array / list.... -------
   1200             else:

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in to_unindexed_series(x)
   1047     required to get things to match up right in the new DataFrame we're building
   1048     """
-> 1049     return pd.Series(x).reset_index(drop=True)
   1050 
   1051 

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    266             name = ibase.maybe_extract_name(name, data, type(self))
    267 
--> 268             if is_empty_data(data) and dtype is None:
    269                 # gh-17261
    270                 warnings.warn(

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/pandas/core/construction.py in is_empty_data(data)
    626     is_none = data is None
    627     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 628     is_simple_empty = is_list_like_without_dtype and not data
    629     return is_none or is_simple_empty
    630 

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/pandas/core/generic.py in __nonzero__(self)
   1440     @final
   1441     def __nonzero__(self):
-> 1442         raise ValueError(
   1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The text was updated successfully, but these errors were encountered:

nicolaskruchten · 2021-05-06T12:46:06Z

Indeed, we need a better error message here! Sorry if it took you a while to figure out what was happening :) We'd love a pull request to make this clearer in the future :)

dvfariaf-bops · 2023-01-09T10:30:43Z

+1 on this!

kb- · 2023-07-06T23:04:20Z

+1 I lost quite some time on this tonight

PhorstenkampFuzzy · 2023-10-13T12:14:38Z

This is a super anoying bug.

mohana-martin · 2023-10-16T17:06:43Z

+1 It's been two years.. Please fix this...
Plotly is otherwise a great plotting tool, but I must say it is very annoying to use due to the inconsistencies and very obscure errors..

gvwilson · 2024-07-11T13:44:09Z

Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if you'd like to submit a PR, we'd be happy to prioritize a review. Thank you - @gvwilson

MarcoGorelli · 2024-11-01T16:20:05Z

#4790 addresses this "for free", and produces the error message:

ValueError: Expected unique column names, got:
- 'Variable1' 2 times

MarcoGorelli · 2024-11-13T17:08:16Z

this is addressed by #4790, I think it can be closed

turbotimon · 2025-02-24T16:44:27Z

maybe similar to #4826

gvwilson self-assigned this Jul 11, 2024

gvwilson removed their assignment Aug 2, 2024

gvwilson added P3 feature labels Aug 12, 2024

gvwilson closed this as completed Nov 13, 2024

turbotimon mentioned this issue Feb 24, 2025

Missing Rows/Columns in Correlation Matrix with Duplicate Column Names After Name Shortening — No Warnings Raised #4826

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting a DataFrame with duplicate columns should produce a more informative error #3181

Plotting a DataFrame with duplicate columns should produce a more informative error #3181

schlich commented May 6, 2021 •

edited

Loading

nicolaskruchten commented May 6, 2021

dvfariaf-bops commented Jan 9, 2023

kb- commented Jul 6, 2023

PhorstenkampFuzzy commented Oct 13, 2023

mohana-martin commented Oct 16, 2023

gvwilson commented Jul 11, 2024

MarcoGorelli commented Nov 1, 2024

MarcoGorelli commented Nov 13, 2024

turbotimon commented Feb 24, 2025

Plotting a DataFrame with duplicate columns should produce a more informative error #3181

Plotting a DataFrame with duplicate columns should produce a more informative error #3181

Comments

schlich commented May 6, 2021 • edited Loading

nicolaskruchten commented May 6, 2021

dvfariaf-bops commented Jan 9, 2023

kb- commented Jul 6, 2023

PhorstenkampFuzzy commented Oct 13, 2023

mohana-martin commented Oct 16, 2023

gvwilson commented Jul 11, 2024

MarcoGorelli commented Nov 1, 2024

MarcoGorelli commented Nov 13, 2024

turbotimon commented Feb 24, 2025

schlich commented May 6, 2021 •

edited

Loading