Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting a DataFrame with duplicate columns should produce a more informative error #3181

Closed
schlich opened this issue May 6, 2021 · 9 comments
Labels
feature something new P3 backlog

Comments

@schlich
Copy link

schlich commented May 6, 2021

Some weird quirks in the Pandas API allow for columns with duplicate column names:

df=pd.DataFrame.from_records(
    {"Variable1": 1, "Variable2": 2}, index=[1]
).rename(columns={"Variable2": "Variable1"})
Variable1 Variable1
1 1 2

attempting to plot this results in an uninformative ValueError:

>> px.scatter(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-95-d52a0e44708e> in <module>
      1 import plotly.express as px
      2 df=pd.DataFrame.from_records({"Variable1": "Value", "Variable2": "Value"}, index=[1]).rename(columns={"Variable2": "Variable1"})
----> 3 px.scatter(df)
      4 # px.scatter(
      5 #     df,

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_chart_types.py in scatter(data_frame, x, y, color, symbol, size, hover_name, hover_data, custom_data, text, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, orientation, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, symbol_sequence, symbol_map, opacity, size_max, marginal_x, marginal_y, trendline, trendline_color_override, log_x, log_y, range_x, range_y, render_mode, title, template, width, height)
     62     mark in 2D space.
     63     """
---> 64     return make_figure(args=locals(), constructor=go.Scatter)
     65 
     66 

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in make_figure(args, constructor, trace_patch, layout_patch)
   1859     apply_default_cascade(args)
   1860 
-> 1861     args = build_dataframe(args, constructor)
   1862     if constructor in [go.Treemap, go.Sunburst] and args["path"] is not None:
   1863         args = process_dataframe_hierarchy(args)

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in build_dataframe(args, constructor)
   1375     # now that things have been prepped, we do the systematic rewriting of `args`
   1376 
-> 1377     df_output, wide_id_vars = process_args_into_dataframe(
   1378         args, wide_mode, var_name, value_name
   1379     )

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in process_args_into_dataframe(args, wide_mode, var_name, value_name)
   1196                 else:
   1197                     col_name = str(argument)
-> 1198                     df_output[col_name] = to_unindexed_series(df_input[argument])
   1199             # ----------------- argument is likely a column / array / list.... -------
   1200             else:

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/plotly/express/_core.py in to_unindexed_series(x)
   1047     required to get things to match up right in the new DataFrame we're building
   1048     """
-> 1049     return pd.Series(x).reset_index(drop=True)
   1050 
   1051 

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    266             name = ibase.maybe_extract_name(name, data, type(self))
    267 
--> 268             if is_empty_data(data) and dtype is None:
    269                 # gh-17261
    270                 warnings.warn(

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/pandas/core/construction.py in is_empty_data(data)
    626     is_none = data is None
    627     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 628     is_simple_empty = is_list_like_without_dtype and not data
    629     return is_none or is_simple_empty
    630 

~/.cache/pypoetry/virtualenvs/psychoanalyze-zEU_a_jI-py3.9/lib/python3.9/site-packages/pandas/core/generic.py in __nonzero__(self)
   1440     @final
   1441     def __nonzero__(self):
-> 1442         raise ValueError(
   1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
@nicolaskruchten
Copy link
Contributor

Indeed, we need a better error message here! Sorry if it took you a while to figure out what was happening :) We'd love a pull request to make this clearer in the future :)

@dvfariaf-bops
Copy link

+1 on this!

@kb-
Copy link

kb- commented Jul 6, 2023

+1 I lost quite some time on this tonight

@PhorstenkampFuzzy
Copy link

This is a super anoying bug.

@mohana-martin
Copy link

+1 It's been two years.. Please fix this...
Plotly is otherwise a great plotting tool, but I must say it is very annoying to use due to the inconsistencies and very obscure errors..

@gvwilson
Copy link
Contributor

Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if you'd like to submit a PR, we'd be happy to prioritize a review. Thank you - @gvwilson

@gvwilson gvwilson self-assigned this Jul 11, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added P3 backlog feature something new labels Aug 12, 2024
@MarcoGorelli
Copy link
Contributor

#4790 addresses this "for free", and produces the error message:

ValueError: Expected unique column names, got:
- 'Variable1' 2 times

@MarcoGorelli
Copy link
Contributor

this is addressed by #4790, I think it can be closed

@turbotimon
Copy link

maybe similar to #4826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P3 backlog
Projects
None yet
Development

No branches or pull requests

9 participants