Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] tolist is much slower than to_numpy().tolist() #2325

Open
Ben-Epstein opened this issue Jan 6, 2023 · 3 comments
Open

[BUG-REPORT] tolist is much slower than to_numpy().tolist() #2325

Ben-Epstein opened this issue Jan 6, 2023 · 3 comments

Comments

@Ben-Epstein
Copy link
Contributor

Ben-Epstein commented Jan 6, 2023

Thank you for reaching out and helping us improve Vaex!

Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.

Description
Please provide a clear and concise description of the problem. This should contain all the steps needed to reproduce the problem. A minimal code example that exposes the problem is very appreciated.

Software information

  • Vaex version (import vaex; vaex.__version__): {'vaex-core': '4.16.0', 'vaex-hdf5': '0.12.2'}
  • Vaex was installed via: pip / conda-forge / from source
  • OS:

Additional information

import vaex
df = vaex.example()
df.export("file.arrow")
df2 = vaex.open("file.arrow")

# time this
with vaex.cache.off():
    df2.id.tolist()


# vs this
with vaex.cache.off():
    df2.id.to_numpy.tolilst()

image

@maartenbreddels
Copy link
Member

Interesting. It seems that this is due to Arrow's .to_pylist(). Can you see if you can reproduce this using arrow only? If so, this is an arrow performance issue.

@Ben-Epstein
Copy link
Contributor Author

Ben-Epstein commented Feb 26, 2023

@maartenbreddels yes, it's happening in arrow as well

image

When the column is a numpy array within vaex, it is fast
image

Maybe vaex can know if the column can be a numpy array, and do this automatically? I will also open an issue in pyarrow

@Ben-Epstein
Copy link
Contributor Author

Created apache/arrow#34354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants