Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle unicode column names #558

Closed
maartenbreddels opened this issue Jan 21, 2020 · 2 comments · Fixed by #617
Closed

Better handle unicode column names #558

maartenbreddels opened this issue Jan 21, 2020 · 2 comments · Fixed by #617

Comments

@maartenbreddels
Copy link
Member

Creating a dataframe with unicode chars like this:

df = vaex.from_dict({'ÚÚ': vaex.vrange(0, 100)})

Will replace the two characters with two underscores, which effectively hides the column (column names beginning with __ are hidden).
This was mentioned here:
https://stackoverflow.com/questions/59738879/python-vaex-how-to-create-dataframe-from-a-csv-file

@markkoob
Copy link

I think this will be obvious (to you!) but since I am here anyway:

df = vaex.from_dict({'./label': vaex.vrange(0,100)})

In our case these characters have special meaning for us, so mutating the string is pretty inconvenient. I'm hoping the solution involves preserving the original string!

@maartenbreddels
Copy link
Member Author

Thanks for sharing that.

To give some understanding, vaex uses the column names as variable names, thus they have to be valid identifiers (like variable names), since it's all built on valid Python expressions. In #370 we improved this to provide automatic translations, but it seems we need to do some more work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants