Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: data type not understood" with dtype: period[M] #1931

Open
tehfink opened this issue Jan 25, 2020 · 7 comments
Open

"TypeError: data type not understood" with dtype: period[M] #1931

tehfink opened this issue Jan 25, 2020 · 7 comments
Labels

Comments

@tehfink
Copy link

tehfink commented Jan 25, 2020

Thanks for a great project! With these versions:

  • altair: 4.0.1
  • pandas: 0.25.3
  • numpy: 1.18.1
  • Python: 3.8.1

I get the following error when trying to pass a dataframe with a period to altair: TypeError: data type not understood

The error disappears when the 'Sale month' column is not passed to altair.

sale['Sale month'] = pd.to_datetime(sale['Sale month'], format='%b-%y').dt.to_period('M')

sale['Sale month']
0        2011-07
1        2011-08
2        2011-09
3        2011-10
4        2011-11
          ...   
15845    2013-11
15846    2013-12
15847    2014-01
15848    2014-02
15849    2014-03
Name: Sale month, Length: 1173, dtype: period[M]

alt.Chart(sale[['Value', 'Sale month']].head()).mark_bar().encode(
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/Documents//virtualenv/lib/python3.8/site-packages/altair/vegalite/v4/api.py in to_dict(self, *args, **kwargs)
    353         copy = self.copy(deep=False)
    354         original_data = getattr(copy, 'data', Undefined)
--> 355         copy.data = _prepare_data(original_data, context)
    356 
    357         if original_data is not Undefined:

~/Documents//virtualenv/lib/python3.8/site-packages/altair/vegalite/v4/api.py in _prepare_data(data, context)
     82     # convert dataframes  or objects with __geo_interface__ to dict
     83     if isinstance(data, pd.DataFrame) or hasattr(data, '__geo_interface__'):
---> 84         data = pipe(data, data_transformers.get())
     85 
     86     # convert string input to a URLData

~/Documents//virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in pipe(data, *funcs)
    632     """
    633     for func in funcs:
--> 634         data = func(data)
    635     return data
    636 

~/Documents/…/virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    301     def __call__(self, *args, **kwargs):
    302         try:
--> 303             return self._partial(*args, **kwargs)
    304         except TypeError as exc:
    305             if self._should_curry(args, kwargs, exc):

~/Documents/…/virtualenv/lib/python3.8/site-packages/altair/vegalite/data.py in default_data_transformer(data, max_rows)
     11 @curry
     12 def default_data_transformer(data, max_rows=5000):
---> 13     return pipe(data, limit_rows(max_rows=max_rows), to_values)
     14 
     15 

~/Documents/…/virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in pipe(data, *funcs)
    632     """
    633     for func in funcs:
--> 634         data = func(data)
    635     return data
    636 

~/Documents//virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    301     def __call__(self, *args, **kwargs):
    302         try:
--> 303             return self._partial(*args, **kwargs)
    304         except TypeError as exc:
    305             if self._should_curry(args, kwargs, exc):

~/Documents//virtualenv/lib/python3.8/site-packages/altair/utils/data.py in to_values(data)
    138         return {'values': data}
    139     elif isinstance(data, pd.DataFrame):
--> 140         data = sanitize_dataframe(data)
    141         return {'values': data.to_dict(orient='records')}
    142     elif isinstance(data, dict):

~/Documents//virtualenv/lib/python3.8/site-packages/altair/utils/core.py in sanitize_dataframe(df)
    221             # otherwise it will give an error on np.issubdtype(dtype, np.integer)
    222             continue
--> 223         elif np.issubdtype(dtype, np.integer):
    224             # convert integers to objects; np.int is not JSON serializable
    225             df[col_name] = df[col_name].astype(object)

~/Documents//virtualenv/lib/python3.8/site-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2)
    391     """
    392     if not issubclass_(arg1, generic):
--> 393         arg1 = dtype(arg1).type
    394     if not issubclass_(arg2, generic):
    395         arg2_orig = arg2

TypeError: data type not understood
@jakevdp
Copy link
Collaborator

jakevdp commented Jan 25, 2020

The sanitization code needs to be updated to recognize period types. That said, Vega-Lite has no data type representing time periods, so I think the best possible outcome would probably be a more informative error.

@jdewanwala
Copy link

Hi Jake, any idea of when this issue will be resolved? Still getting the TypeError: data type not understood error when PeriodIndex is present in the dataframe. Thank you!

@jakevdp
Copy link
Collaborator

jakevdp commented Jun 14, 2020

It will be resolved when someone fixes it. Are you interested?

@jakevdp
Copy link
Collaborator

jakevdp commented Jun 14, 2020

The relevant code that would need to be modified is here: https://github.com/altair-viz/altair/blob/03f37ca8b2701bf8d9e61646cacf8eee2edcad87/altair/utils/core.py#L243-L350

That said, it's unclear what is the best way to represent time periods in Vega-Lite, which doesn't have the concept of time periods. I suppose it could be represented as timestamps with respect to some standard start time, such as 2012-01-01 00:00:00

@jakevdp
Copy link
Collaborator

jakevdp commented Jun 14, 2020

Quick question on this: how would you expect your period columns to be represented within the Vega-Lite chart? Should it be a timestamp? A string? Silently ignored?

@jdewanwala
Copy link

Hi Jake, "I suppose it could be represented as timestamps with respect to some standard start time, such as 2012-01-01 00:00:00" - that would be my thought.

Could the sanitization code be used to convert periods into timestamps before they get passed to Vega-Lite?

And say if PeriodIndex has the frequency of "M" (Monthly), perhaps there is an implicit conversion to 'month(period_column)' in the Altair code? Not sure what's the best way to do this second part. Alternately, the easy way out may be for the user to explicitly specify 'month(period_column)', knowing that the period_column has been cast to time_stamp through sanitization.

Let me know your thoughts. Thank you.

@jakevdp
Copy link
Collaborator

jakevdp commented Jun 15, 2020

We don't have any mechanism at the moment for data sanitization to affect the encoding specification, and I think we should probably avoid making those kinds of assumptions for the user. Silent changes of defaultsthat are convenient in some situations often lead to a lot of confusion in others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants