Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Dataframes with no columns raise errors for various operations #2094

Open
honno opened this issue Jun 22, 2022 · 0 comments
Open

Comments

@honno
Copy link
Contributor

honno commented Jun 22, 2022

I'm able to create dataframes with zero columns, but representing it produces the following

>>> import vaex
>>> df = vaex.from_dict({})
>>> df
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 4221, in __repr__
    return self._head_and_tail_table(format='plain')
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 3961, in _head_and_tail_table
    if N <= n:
TypeError: '<=' not supported between instances of 'NoneType' and 'int'

I'm not too familiar with Vaex, but I imagine these type of bugs which assume at least 1 column will pop up for various operations, e.g. df.concat(df) raises... although maybe that's a nonsensical in the first place (pandas.concat([pd.DataFrame({}), pd.DataFrame({})]) works interestingly).

Also, such dataframes cannot interop with pandas-dev/pandas#46141

>>> from pandas.api.exchange import from_dataframe
>>> from_dataframe(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pandas/core/exchange/from_dataframe.py", line 57, in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))
  File ".../pandas/core/exchange/from_dataframe.py", line 77, in _from_dataframe
    for chunk in df.get_chunks():
  File ".../vaex/packages/vaex-core/vaex/dataframe_protocol.py", line 750, in get_chunks
    n_chunks = n_chunks if n_chunks is not None else self.num_chunks()
  File ".../vaex/packages/vaex-core/vaex/dataframe_protocol.py", line 712, in num_chunks
    if isinstance(self.get_column(0)._col.values, pa.ChunkedArray):
  File ".../vaex/packages/vaex-core/vaex/dataframe_protocol.py", line 721, in get_column
    return _VaexColumn(self._df[:, i], allow_copy=self._allow_copy)
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 5355, in __getitem__
    df = df[item[0]]
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 5371, in __getitem__
    stop = stop or len(self)
TypeError: 'NoneType' object cannot be interpreted as an integer

I searched around and couldn't figure out if such dataframes are even supported by Vaex in the first place... I have no use case for them myself heh, it's just such dataframes are valid for other dataframe libraries (like pandas). If they're not supported, possibly constructors should raise ValueError if a zero-col dataframe is trying to be initialized.

Vaex was built locally from source (upstream master) on Ubuntu 20.04.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant