Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vaex string API behaves differently from pandas #1524

Open
bergkvist opened this issue Aug 16, 2021 · 7 comments
Open

Vaex string API behaves differently from pandas #1524

bergkvist opened this issue Aug 16, 2021 · 7 comments

Comments

@bergkvist
Copy link

bergkvist commented Aug 16, 2021

Code example

Note that display() is a jupyter-function, and not available by default in Python

import pandas as pd
import vaex

df = pd.DataFrame({ 'amount': [ '1 BTC', '0.03 BTC', '0.4 ETH' ] })
vdf = vaex.from_pandas(df)

display(df.amount.str.split(' ').str.get(-1))
print('\n---\n')
display(vdf.amount.str.split(' ').str.get(-1))

Output:

0    BTC
1    BTC
2    ETH
Name: amount, dtype: object

---

Expression = str_get(str_split(amount, ' '), -1)
Length: 3 dtype: list<item: string> (expression)
------------------------------------------------
0  ['1', 'C']
1  ['3', 'C']
2  ['4', 'H']

Software information

  • Vaex version (import vaex; vaex.__version__):
{'vaex': '4.4.0',
 'vaex-core': '4.4.0',
 'vaex-viz': '0.5.0',
 'vaex-hdf5': '0.9.0',
 'vaex-server': '0.6.0',
 'vaex-astro': '0.8.3',
 'vaex-jupyter': '0.6.0',
 'vaex-ml': '0.13.0'}
  • Vaex was installed via: pip
  • OS: Manjaro Linux

Additional information
Pandas supports 2 ways of accessing string indices, while Vaex only supports one of these:

df.amount.str.get(0)
df.amount.str[0]
vdf.amount.str.get(0)
vdf.amount.str[0] # TypeError: 'StringOperations' object is not subscriptable
@maartenbreddels
Copy link
Member

Thanks for the report. The vaex behavior is intended, all .str functions behave on list of strings as if the function was applied on each element and I have no problem being it different from Pandas. I think what we miss is a list_get. I think similar to the .struct accessor (see https://vaex--1522.org.readthedocs.build/en/1522/api.html#struct-arrow-operations from PR #1447 (cc @mansenfranzen ) we should have some .list operations as well, and map df.x[:,i] to a list_get function.

@bergkvist
Copy link
Author

bergkvist commented Feb 9, 2022

@JovanVeljanoski Has this been implemented now, since the issue was closed? If it has been, what did the syntax land on, and since which version is it available?

@maartenbreddels
Copy link
Member

Not sure why this was closed, @JovanVeljanoski I think we should have an issue about supporting lists, and link that that, there is also a discussion on this IIRC.

@JovanVeljanoski
Copy link
Member

Because the main issue is accessing and operations of lists, and I believe it is tracked elsewhere. But we can keep this open as well.

@longtng
Copy link

longtng commented Feb 13, 2023

I have faced a similar issue as I would like to extract the first 10-chars of each cell from a column. In Pandas just sth like this:
df["COMPANY_CODE"] = df["COMPANY_CODE"][:10]
But it does not work in vaex this way as vaex return issues:
TypeError: 'StringOperations' object is not subscriptable

@maartenbreddels
Copy link
Member

Hi,

I think you want https://vaex.io/docs/api.html#vaex.expression.StringOperations.slice

@longtng
Copy link

longtng commented Feb 14, 2023

Hi,

I think you want https://vaex.io/docs/api.html#vaex.expression.StringOperations.slice

superb, thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants