Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Issue converting object to string #1686

Open
grafail opened this issue Nov 8, 2021 · 4 comments
Open

[BUG-REPORT] Issue converting object to string #1686

grafail opened this issue Nov 8, 2021 · 4 comments

Comments

@grafail
Copy link

grafail commented Nov 8, 2021

Description
Trying to convert an "object" column to string fails.

Example code:

import vaex
import numpy as np

if __name__ == "__main__":
    arr = np.array([123, "test", None], dtype=object)
    df = vaex.from_arrays(test=arr)
    df['test'] = df['test'].astype('str')
    print(df.head())

Exception:

TypeError: to_string(): incompatible function arguments. The following argument types are supported:
    1. (arg0: numpy.ndarray[float32]) -> vaex.superstrings.StringList64
    2. (arg0: numpy.ndarray[float64]) -> vaex.superstrings.StringList64
    3. (arg0: numpy.ndarray[int64]) -> vaex.superstrings.StringList64
    4. (arg0: numpy.ndarray[int32]) -> vaex.superstrings.StringList64
    5. (arg0: numpy.ndarray[int16]) -> vaex.superstrings.StringList64
    6. (arg0: numpy.ndarray[int8]) -> vaex.superstrings.StringList64
    7. (arg0: numpy.ndarray[uint64]) -> vaex.superstrings.StringList64
    8. (arg0: numpy.ndarray[uint32]) -> vaex.superstrings.StringList64
    9. (arg0: numpy.ndarray[uint16]) -> vaex.superstrings.StringList64
    10. (arg0: numpy.ndarray[uint8]) -> vaex.superstrings.StringList64
    11. (arg0: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    12. (arg0: numpy.ndarray[float32], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    13. (arg0: numpy.ndarray[float64], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    14. (arg0: numpy.ndarray[int64], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    15. (arg0: numpy.ndarray[int32], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    16. (arg0: numpy.ndarray[int16], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    17. (arg0: numpy.ndarray[int8], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    18. (arg0: numpy.ndarray[uint64], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    19. (arg0: numpy.ndarray[uint32], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    20. (arg0: numpy.ndarray[uint16], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    21. (arg0: numpy.ndarray[uint8], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64
    22. (arg0: numpy.ndarray[bool], arg1: numpy.ndarray[bool]) -> vaex.superstrings.StringList64

Invoked with: array([123], dtype=object)

Process finished with exit code 0

Software information

  • Vaex version (import vaex; vaex.__version__):
{'vaex': '4.5.0', 'vaex-core': '4.6.0a5', 'vaex-viz': '0.5.0', 'vaex-hdf5': '0.10.0', 'vaex-server': '0.6.1', 'vaex-astro': '0.9.0', 'vaex-jupyter': '0.6.0', 'vaex-ml': '0.14.0', 'vaex-arrow': '0.4.2'}
  • Vaex was installed via: pip
  • OS: Ubuntu 20.04

Additional information
This makes it impossible to perform an export to HDF5 or other operations that do not support the object type. This is especially blocking when loading CSV files.
Should be relevant to #1568.

@JovanVeljanoski
Copy link
Member

JovanVeljanoski commented Nov 9, 2021

Hi,

Vaex does not officially support object columns or columns of mixed types. Please cast the data to a single type before passing it to vaex. This is kind of like a database behavior sort of.

Reading CSVs should work. There the behavior is a bit different. If you encounter an issue, please made a small CSV file that reproduces the issue.

@maartenbreddels
Copy link
Member

I do think this would be a nice feature. But since dtype=object data is in memory anyways, I think it's easy to do this beforehand. It's unlikely to be big data, nor will it work multithreaded.

Thanks for the report, lets keep this open an a low priority feature for now!

@bls-lehoai
Copy link

bls-lehoai commented Dec 13, 2022

@maartenbreddels
same issue here. i tried to left join 2 tables (so have None value in some cell) which the join key is calculated string column, but then it automatically converted to object type and can't export to hdf5 file.

@maartenbreddels
Copy link
Member

can you have a reproducible example for us? Also, I think it is a different issue right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants