Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] df.func.where won't replace with np.nan #1199

Open
aanilpala opened this issue Jan 28, 2021 · 5 comments
Open

[BUG-REPORT] df.func.where won't replace with np.nan #1199

aanilpala opened this issue Jan 28, 2021 · 5 comments

Comments

@aanilpala
Copy link

Description
df.func.where won't replace with np.nan

To reproduce:


import numpy as np
df = vaex.from_arrays(x=np.array([0, 1, 2, 3, 4, 0, 0, 5]))
df.func.where(df.x == 0, np.nan, df.x)

ERROR:MainThread:vaex.execution:error in task, flush task queue
Traceback (most recent call last):
  File "/Users/ahmet-anil.pala/klarna_repos/fraud-model-utils/.venv/lib/python3.7/site-packages/vaex/execution.py", line 176, in execute_async
    raise RuntimeError(f'Oops, requesting column {column} from dataset, but it does not exist')
RuntimeError: Oops, requesting column nan from dataset, but it does not exist

Software information

  • Vaex version {'vaex': '4.0.0a10',
    'vaex-core': '4.0.0a16',
    'vaex-viz': '0.5.0.dev1',
    'vaex-hdf5': '0.7.0a6',
    'vaex-server': '0.4.0a2',
    'vaex-astro': '0.8.0a1',
    'vaex-jupyter': '0.6.0.dev1',
    'vaex-ml': '0.11.0a5'}
  • Pandas version: 1.1.5
  • Numpy version: 1.19.5
  • Vaex was installed via: pipenv
  • OS: OS X 10.15.7

Additional information
Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).

@akhtar-shah
Copy link

Hi guys,

Thanks for creating Vaex. It is super helpful.

Did you guys fix this bug?

@derrickcchow
Copy link

Hi guys,
I was wondering if you fixed this bug or is there a workaround to be able to replace the values with np.nan?

@maartenbreddels
Copy link
Member

This should work, I wonder if we should special case 'nan' to be always available.

import numpy as np
import vaex
df = vaex.from_arrays(x=np.array([0, 1, 2, 3, 4, 0, 0, 5]))
df.variables['nan'] = np.nan
df.func.where(df.x == 0, np.nan, df.x)

@akhtar-shah
Copy link

Thanks! I didn’t know that somethings have to be made available before use.
It’d be great to have it always available or this step can be documented somewhere in documentation.

@marixko
Copy link

marixko commented Feb 26, 2023

Thanks, it also worked here! But now I am not being able to drop NaNs after replacing some values with np.nan. Any idea of what is happening?

import pandas as pd
import numpy as np
import vaex
df = pd.DataFrame([[1,1],[-2,2],[3,3], [4,4], [np.nan, 5]], columns=["first", "second"])
df = vaex.from_pandas(df)
df.variables['nan'] = np.nan
df["first"] = df.func.where(df["first"]<0, np.nan, df["first"])
print(df.dropna())

  #    first    second
  0        1         1
  1      nan         2
  2        3         3
  3        4         4
  4      nan         5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants