-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE-REQUEST] Create pyarrow structs via vaex #2032
Comments
What do you think of this: @vaex.register_function()
def create_arrow_struct(**kwargs):
return pa.StructArray.from_arrays(kwargs.values(), kwargs.keys())
df = vaex.datasets.titanic()
df.func. create_arrow_struct(name=df['name'], age=df['age']) |
That's great! But @maartenbreddels it doesn't work if you try to listAgg that struct column. Maybe that's a new issue, not sure. |
Yeah, we can only do that on primitives and strings. Maybe we can split the struct, and merge it back again automatically. |
@JovanVeljanoski any opinions on this? How should we attach this, or do you like my code proposal? |
This is the opposite of #2072 so once we merge that we should take another look at this. |
Still thinking about it.. i want to do some tests but busy... :S |
I think this would be nice df = vaex.from_scalars(user_name="Maarten", user_surname="Breddels")
df = df.struct.merge(join_char="_") # this will automatically collect all user_* into a column name user and df = vaex.datasets.titanic()
df = df.struct.merge({'person': ['name', 'age']} # will create a person struct column based on name and age
or..
df = df.struct.merge({'Person': {'name':'Name', 'age':'Age']} # use a dict to rename? |
I like the proposal of @maartenbreddels above. The one correction/suggestion I would make is this df['person'] = df.struct.merge(['name', 'age'])
df['person'] = df.struct.merge({'name':'Name', 'age':'Age'}) Although I have to say i do not know if |
Yes, since you can image 'df.struct` doing a type check, it also feels odd to me. But, this does organize all methods. Can you start by writing a test, we can do a last-minute name change anyway. |
Description
Since vaex provides all these great struct operations, it would be great if we could create structs in vaex directly via massive dataframes
Additional context
Now we can use structs, but we brought everything into memory
that would be great, but it fails.
Even better would be a helper function, something like
or something similar
The text was updated successfully, but these errors were encountered: