Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone-aware bug with Multi-Index #1371

Closed
2 tasks done
aboomer07 opened this issue Oct 5, 2023 · 0 comments · Fixed by #1426
Closed
2 tasks done

Timezone-aware bug with Multi-Index #1371

aboomer07 opened this issue Oct 5, 2023 · 0 comments · Fixed by #1426
Labels
bug Something isn't working

Comments

@aboomer07
Copy link
Contributor

aboomer07 commented Oct 5, 2023

Describe the bug
Timezone aware field in a multi-index fails.

Passing a timezone aware field in as an Index works, but if it is part of a multi-index it does not. I saw that this was an old bug that was fixed when the timestamp was a single index level here.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.

Code Sample

import pandas as pd
from pandera import DataFrameModel, Field
from pandera.typing import Index, Series

class PassingModel(DataFrameModel):
    time: Index[pd.DatetimeTZDtype] = Field(
        dtype_kwargs = {"unit": "ns", "tz": "UTC"}
    )
    value: Series[float] = Field()

class FailingModel(DataFrameModel):
    time: Index[pd.DatetimeTZDtype] = Field(
        dtype_kwargs = {"unit": "ns", "tz": "UTC"}
    )
    category: Index[str] = Field()
    value: Series[float] = Field()

PassingModel.example(size=5)
FailingModel.example(size=5)

Expected behavior

Generating the example DataFrame with the PassingModel class works, however it fails with the FailingModel class. The traceback is below:

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File [/lib/python3.10/site-packages/pandera/strategies/pandas_strategies.py:176], in strategy_import_error.<locals>._wrapper(*args, **kwargs)
    169 if not HAS_HYPOTHESIS:  # pragma: no cover
    170     raise ImportError(
    171         'Strategies for generating data requires "hypothesis" to be \n'
    172         "installed. You can install pandera together with the strategies \n"
    173         "dependencies with:\n"
    174         "pip install pandera[strategies]"
    175     )
--> 176 return fn(*args, **kwargs)

File [/lib/python3.10/site-packages/pandera/api/pandas/model.py:327], in DataFrameModel.example(cls, **kwargs)
    318 @classmethod
    319 @docstring_substitution(example_doc=DataFrameSchema.strategy.__doc__)
    320 @st.strategy_import_error
   (...)
    323     **kwargs,
    324 ) -> DataFrameBase[TDataFrameModel]:
    325     """%(example_doc)s"""
    326     return cast(
--> 327         DataFrameBase[TDataFrameModel], cls.to_schema().example(**kwargs)
    328     )

File [/lib/python3.10/site-packages/pandera/api/pandas/container.py:1391], in DataFrameSchema.example(self, size, n_regex_columns)
   1384 with warnings.catch_warnings():
   1385     warnings.simplefilter(
   1386         "ignore",
   1387         category=hypothesis.errors.NonInteractiveExampleWarning,
   1388     )
   1389     return self.strategy(
   1390         size=size, n_regex_columns=n_regex_columns
-> 1391     ).example()

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/strategies.py:337], in SearchStrategy.example(self)
    325 @given(self)
    326 @settings(
    327     database=None,
   (...)
    333 )
    334 def example_generating_inner_function(ex):
    335     self.__examples.append(ex)
--> 337 example_generating_inner_function()
    338 shuffle(self.__examples)
    339 return self.__examples.pop()

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/strategies.py:326], in SearchStrategy.example.<locals>.example_generating_inner_function()
    321 from hypothesis.core import given
    323 # Note: this function has a weird name because it might appear in
    324 # tracebacks, and we want users to know that they can ignore it.
    325 @given(self)
--> 326 @settings(
    327     database=None,
    328     max_examples=100,
    329     deadline=None,
    330     verbosity=Verbosity.quiet,
    331     phases=(Phase.generate,),
    332     suppress_health_check=list(HealthCheck),
    333 )
    334 def example_generating_inner_function(ex):
    335     self.__examples.append(ex)
    337 example_generating_inner_function()

    [... skipping hidden 1 frame]

File [/lib/python3.10/site-packages/pandera/strategies/pandas_strategies.py:1197], in dataframe_strategy.<locals>._dataframe_strategy(draw)
   1192     for check in column_checks:  # type: ignore
   1193         strategy = undefined_check_strategy(
   1194             strategy, check, column=col_name
   1195         )
-> 1197 return draw(strategy)

File [/lib/python3.10/site-packages/hypothesis/internal/conjecture/data.py:951], in ConjectureData.draw(self, strategy, label)
    949 try:
    950     if not at_top_level:
--> 951         return strategy.do_draw(self)
    952     else:
    953         assert start_time is not None

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/lazy.py:160], in LazyStrategy.do_draw(self, data)
    159 def do_draw(self, data):
--> 160     return data.draw(self.wrapped_strategy)

File [/lib/python3.10/site-packages/hypothesis/internal/conjecture/data.py:951], in ConjectureData.draw(self, strategy, label)
    949 try:
    950     if not at_top_level:
--> 951         return strategy.do_draw(self)
    952     else:
    953         assert start_time is not None

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/core.py:1539], in CompositeStrategy.do_draw(self, data)
   1538 def do_draw(self, data):
-> 1539     return self.definition(data.draw, *self.args, **self.kwargs)

File [/lib/python3.10/site-packages/pandera/strategies/pandas_strategies.py:146], in set_pandas_index(draw, df_or_series_strat, index)
    144 """Sets Index or MultiIndex object to pandas Series or DataFrame."""
    145 df_or_series = draw(df_or_series_strat)
--> 146 df_or_series.index = draw(index.strategy(size=df_or_series.shape[0]))
    147 return df_or_series

File [/lib/python3.10/site-packages/hypothesis/internal/conjecture/data.py:951], in ConjectureData.draw(self, strategy, label)
    949 try:
    950     if not at_top_level:
--> 951         return strategy.do_draw(self)
    952     else:
    953         assert start_time is not None

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/strategies.py:836], in MappedSearchStrategy.do_draw(self, data)
    834 try:
    835     data.start_example(MAPPED_SEARCH_STRATEGY_DO_DRAW_LABEL)
--> 836     x = data.draw(self.mapped_strategy)
    837     result = self.pack(x)  # type: ignore
    838     data.stop_example()

File [/lib/python3.10/site-packages/hypothesis/internal/conjecture/data.py:951], in ConjectureData.draw(self, strategy, label)
    949 try:
    950     if not at_top_level:
--> 951         return strategy.do_draw(self)
    952     else:
    953         assert start_time is not None

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/strategies.py:836], in MappedSearchStrategy.do_draw(self, data)
    834 try:
    835     data.start_example(MAPPED_SEARCH_STRATEGY_DO_DRAW_LABEL)
--> 836     x = data.draw(self.mapped_strategy)
    837     result = self.pack(x)  # type: ignore
    838     data.stop_example()

File [/lib/python3.10/site-packages/hypothesis/internal/conjecture/data.py:951], in ConjectureData.draw(self, strategy, label)
    949 try:
    950     if not at_top_level:
--> 951         return strategy.do_draw(self)
    952     else:
    953         assert start_time is not None

File [/lib/python3.10/site-packages/hypothesis/strategies/_internal/strategies.py:837], in MappedSearchStrategy.do_draw(self, data)
    835 data.start_example(MAPPED_SEARCH_STRATEGY_DO_DRAW_LABEL)
    836 x = data.draw(self.mapped_strategy)
--> 837 result = self.pack(x)  # type: ignore
    838 data.stop_example()
    839 _current_build_context.value.record_call(result, self.pack, [x], {})

File [/lib/python3.10/site-packages/pandera/strategies/pandas_strategies.py:1241] in multiindex_strategy.<locals>.<lambda>(x)
   1227 index_dtypes = {
   1228     index.name if index.name is not None else i: str(index.dtype)
   1229     for i, index in enumerate(indexes)
   1230 }
   1231 nullable_index = {
   1232     index.name if index.name is not None else i: index.nullable
   1233     for i, index in enumerate(indexes)
   1234 }
   1235 strategy = pdst.data_frames(
   1236     [index.strategy_component() for index in indexes],
   1237     index=pdst.range_indexes(
   1238         min_size=0 if size is None else size, max_size=size
   1239     ),
   1240 ).map(
-> 1241     lambda x: x.astype(index_dtypes)  # type: ignore[arg-type]
   1242 )
   1244 # this is a hack to convert np.str_ data values into native python str.
   1245 for name, dtype in index_dtypes.items():

File [/lib/python3.10/site-packages/pandas/core/generic.py:6305], in NDFrame.astype(self, dtype, copy, errors)
   6303 else:
   6304     try:
-> 6305         res_col = col.astype(dtype=cdt, copy=copy, errors=errors)
   6306     except ValueError as ex:
   6307         ex.args = (
   6308             f"{ex}: Error while type casting for column '{col_name}'",
   6309         )

File [/lib/python3.10/site-packages/pandas/core/generic.py:6324] in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File [/lib/python3.10/site-packages/pandas/core/internals/managers.py:451], in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File [/lib/python3.10/site-packages/pandas/core/internals/managers.py:352], in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File [/lib/python3.10/site-packages/pandas/core/internals/blocks.py:511], in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File [/lib/python3.10/site-packages/pandas/core/dtypes/astype.py:242], in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File [/lib/python3.10/site-packages/pandas/core/dtypes/astype.py:184], in astype_array(values, dtype, copy)
    180     return values
    182 if not isinstance(values, np.ndarray):
    183     # i.e. ExtensionArray
--> 184     values = values.astype(dtype, copy=copy)
    186 else:
    187     values = _astype_nansafe(values, dtype, copy=copy)

File [/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py:656], in DatetimeArray.astype(self, dtype, copy)
    651     return super().astype(dtype, copy=copy)
    652 elif self.tz is None:
    653     # pre-2.0 this did self.tz_localize(dtype.tz), which did not match
    654     #  the Series behavior which did
    655     #  values.tz_localize("UTC").tz_convert(dtype.tz)
--> 656     raise TypeError(
    657         "Cannot use .astype to convert from timezone-naive dtype to "
    658         "timezone-aware dtype. Use obj.tz_localize instead or "
    659         "series.dt.tz_localize instead"
    660     )
    661 else:
    662     # tzaware unit conversion e.g. datetime64[s, UTC]
    663     np_dtype = np.dtype(dtype.str)

TypeError: Cannot use .astype to convert from timezone-naive dtype to timezone-aware dtype. Use obj.tz_localize instead or series.dt.tz_localize instead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant