Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Jupyter Notebook kernel hanging when running basic data aggregations #1398

Open
mtwichan opened this issue Jun 10, 2021 · 9 comments

Comments

@mtwichan
Copy link

mtwichan commented Jun 10, 2021

Description
I'm trying to run the Vaex tutorial in a Jupyter Notebook and the Jupyter Notebook is hanging/freezing anytime I run an aggregation. I would really appreciate any help. I tested the code snippets below on two operating systems with the same results.

Code snippet (run in Jupyter Notebook) pulled from here:

import vaex
df = vaex.example()
df.x
df.x.values

import numpy as np
np.sqrt(df.x**2 + df.y**2 + df.z**2)

df['r'] = np.sqrt(df.x**2 + df.y**2 + df.z**2) # freezes here
df[['x', 'y', 'z', 'r']]

Another example of the notebook freezing (run in Jupyter Notebook):

import vaex
# 107 GB dataset
df = vaex.open('s3://vaex/taxi/yellow_taxi_2009_2015_f32.hdf5?anon=true')
mean = df.mean(df.passenger_count)

Software information

  • Vaex version (import vaex; vaex.__version__): 4.2.0
  • Vaex was installed via: pip / conda-forge / from source: pip
  • OS: Windows 10 Home (10.0.19042 Build 19042) & MacOS Mojave (10.14.16)

Additional information
Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).

Python Version: Python 3.9.2 (Windows) & Python 3.7.2 (MacOS)
requirements.txt (relevant libraries):

ipydatawidgets==4.2.0
ipykernel==5.5.0
ipyleaflet==0.13.6
ipympl==0.7.0
ipython==7.20.0
ipython-genutils==0.2.0
ipyvolume==0.5.2
ipyvue==1.5.0
ipyvuetify==1.6.2
ipywebrtc==0.6.0
ipywidgets==7.6.3
...
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-dash==0.4.0
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
...
vaex==4.2.0
vaex-astro==0.8.1
vaex-core==4.2.0
vaex-hdf5==0.7.0
vaex-jupyter==0.6.0
vaex-ml==0.12.0
vaex-server==0.4.1
vaex-viz==0.5.0
...
@JovanVeljanoski
Copy link
Member

Hi,

Can you tell us what version of numpy do you have? Btw does your machine have an SSD or a HDD?

Also, are you sure that the datasets (the example and especially the taxi) have been downloaded successfully. It might take a while to just download the data before any computations are done..

@mtwichan
Copy link
Author

Hi @JovanVeljanoski,

My machine (Windows) has an SSD.

I'm using numpy==1.19.5.

With regards to downloading the data successfully, I'm fetching the data from s3 or from the sample data provided with the library all within Jupyter Notebook. The cell appears to run successfully (see image below). Where should I be looking to confirm that the data has completed downloading?

jupyter_notebook

@maartenbreddels
Copy link
Member

import vaex
df = vaex.example()
df.x
df.x.values

import numpy as np
np.sqrt(df.x2 + df.y2 + df.z**2)

df['r'] = np.sqrt(df.x2 + df.y2 + df.z**2) # freezes here

If it already freezes here I don't think we should look at the next example... can you also try it from a simple python console?

@mtwichan
Copy link
Author

@maartenbreddels it seems to be working when run as a Python script and in the Python console.

Python console example:

Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import vaex
>>> df = vaex.example()
>>> import numpy as np
>>> df['r'] = np.sqrt(df.x**2 + df.y**2 + df.z**2) # freezes here
>>> print(df[['x', 'y', 'z', 'r']])
#        x             y            z            r
0        1.2318684     -0.39692867  -0.59805775  1.4257367
1        -0.16370061   3.6542213    -0.25490645  3.6667573
2        -2.120256     3.3260527    1.7078403    4.298236
3        4.715589      4.585251     2.2515438    6.9520326
4        7.217187      11.994717    -1.0645622   14.039028
...      ...           ...          ...          ...
329,995  1.9938701     0.7892761    0.2220599    2.1558723
329,996  3.7180912     0.7213376    1.6415337    4.127852
329,997  0.36885077    13.029609    -3.6339347   13.531897
329,998  -0.112592645  1.4529126    2.1689527    2.6130419
329,999  20.79622      -3.3313878   12.188416    24.333895
>>>

Looks like it's a Jupyter notebook issue than? I've tried running the Jupyter notebook with jupyter notebook and ipython notebook with the same results, not sure if this makes a difference.

Thanks for the help by the way!

CC: @Kully

@mtwichan
Copy link
Author

@maartenbreddels I'm not sure why it does not run correctly on our machines, so I've switched to Google Colaboratory and it works great!

@JovanVeljanoski
Copy link
Member

if you are running multiple environments, you can check if the jupyter notebook you are running is from the same environment in which you installed vaex.

If it is works in the console (ipython) like you've shown above, my bet is on something like that.

Vaex and jupyter are for sure compatible - I use that combination daily. Just make sure that you are in the same environment!

@mtwichan
Copy link
Author

Hi @JovanVeljanoski,

I spoke with @maartenbreddels about this bug via video call. If you folks need any more assistance from me, I'm happy to help!

@maartenbreddels
Copy link
Member

maartenbreddels commented Jun 29, 2021 via email

@mtwichan
Copy link
Author

@maartenbreddels here you go!

adal==1.2.7
adlfs==0.7.7
aiobotocore==1.3.0
aiohttp==3.7.4.post0
aioitertools==0.7.1
alabaster==0.7.12
alembic==1.5.8
amqp==5.0.6
ansi2html==1.6.0
aplus==0.11.0
appdirs==1.4.4
arabic-reshaper==2.1.3
argon2-cffi==20.1.0
asgiref==3.3.4
aspy.refactor-imports==2.1.1
astropy==4.2.1
async-generator==1.10
async-timeout==3.0.1
atomicwrites==1.4.0
attrs==20.3.0
Authlib==0.15.3
awsebcli==3.19.3
azure-common==1.1.27
azure-core==1.15.0
azure-datalake-store==0.0.52
azure-identity==1.6.0
azure-mgmt-core==1.2.2
azure-mgmt-storage==18.0.0
azure-storage-blob==12.8.1
Babel==2.9.0
backcall==0.2.0
bandit==1.7.0
billiard==3.6.4.0
black==20.8b1
bleach==3.3.0
bokeh==2.3.2
Bootstrap-Flask==1.5.2
boto3==1.17.92
botocore==1.20.92
bqplot==0.12.27
branca==0.4.2
Brotli==1.0.9
cached-property==1.5.2
cachetools==4.2.2
cairocffi==1.2.0
CairoSVG==2.5.2
can-decoder==0.1.1
celery==5.0.5
cement==2.8.2
certifi==2020.12.5
cffi==1.14.5
cfgv==3.2.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
click-plugins==1.1.1
click-repl==0.1.6
cloudpickle==1.6.0
colorama==0.4.3
colorcet==2.0.6
colorlover==0.3.0
cryptography==3.4.7
cssselect2==0.4.1
cycler==0.10.0
dash==1.20.0
dash-bootstrap-components==0.12.0
dash-core-components==1.16.0
dash-cytoscape==0.3.0
dash-design-kit==1.6.2
dash-enterprise-auth==0.0.4
dash-html-components==1.1.3
dash-renderer==1.9.1
dash-table==4.11.3
dask==2021.5.0
datashader==0.13.0
datashape==0.5.2
ddtrace==0.48.0
decorator==4.4.2
defusedxml==0.6.0
distlib==0.3.1
distributed==2021.5.0
Django==3.2
dnspython==2.1.0
docutils==0.16
dominate==2.6.0
email-validator==1.1.2
entrypoints==0.3
et-xmlfile==1.0.1
filelock==3.0.12
flake8==3.8.4
Flask==1.1.2
Flask-Assets==2.0
Flask-Caching==1.10.1
Flask-Compress==1.9.0
Flask-Login==0.5.0
Flask-Migrate==2.7.0
Flask-SQLAlchemy==2.5.1
Flask-WTF==0.14.3
frozendict==2.0.2
fsspec==2021.6.0
future==0.16.0
gitdb==4.0.5
GitPython==3.1.13
gql==2.0.0
graphql-core==2.3.2
greenlet==1.0.0
gunicorn==20.0.4
h5py==3.2.1
HeapDict==1.0.1
holoviews==1.14.4
html5lib==1.1
identify==1.5.14
idna==2.10
image==1.5.33
imagesize==1.2.0
iniconfig==1.1.1
ipydatawidgets==4.2.0
ipykernel==5.5.0
ipyleaflet==0.13.6
ipympl==0.7.0
ipython==7.20.0
ipython-genutils==0.2.0
ipyvolume==0.5.2
ipyvue==1.5.0
ipyvuetify==1.6.2
ipywebrtc==0.6.0
ipywidgets==7.6.3
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
jedi==0.18.0
Jinja2==2.11.3
jmespath==0.10.0
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-dash==0.4.0
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
kombu==5.0.2
llvmlite==0.36.0
locket==0.2.1
Mako==1.1.4
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.4.1
mccabe==0.6.1
mdf-iter==0.0.4
mistune==0.8.4
msal==1.11.0
msal-extensions==0.3.0
msgpack==1.0.2
msrest==0.6.21
multidict==5.1.0
multipledispatch==0.6.0
mypy-extensions==0.4.3
mysql==0.0.2
mysql-connector-python==8.0.24
mysqlclient==2.0.3
nbclient==0.5.2
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
nodeenv==1.5.0
notebook==6.4.0
numba==0.53.1
numpy==1.20.3
oauthlib==3.1.1
openpyxl==3.0.6
packaging==20.9
pandas==1.1.5
pandas-market-calendars==1.6.1
pandocfilters==1.4.3
panel==0.11.3
param==1.10.1
parso==0.8.1
partd==1.2.0
pathspec==0.5.9
patsy==0.5.1
pbr==5.5.1
pickleshare==0.7.5
Pillow==8.2.0
plotly==4.14.3
pluggy==0.13.1
portalocker==1.7.1
pre-commit==2.10.1
progressbar2==3.53.1
prometheus-client==0.9.0
promise==2.3
prompt-toolkit==3.0.16
protobuf==3.15.8
psutil==5.8.0
py==1.10.0
pyarrow==4.0.0
pyasn1==0.4.8
pycodestyle==2.6.0
pycparser==2.20
pyct==0.4.8
pyerfa==2.0.0
pyflakes==2.2.0
Pygments==2.8.0
PyJWT==2.0.1
pyOpenSSL==20.0.1
pyparsing==2.4.7
PyPDF2==1.26.0
Pyphen==0.10.0
pypiwin32==223
pyrsistent==0.17.3
pytest==6.2.2
python-bidi==0.4.2
python-dateutil==2.8.1
python-dotenv==0.17.0
python-editor==1.0.4
python-utils==2.5.6
pythreejs==2.3.0
pytz==2021.1
pyviz-comms==2.0.2
pywin32==300
pywinpty==0.5.7
PyYAML==5.3.1
pyzmq==22.0.3
qtconsole==5.0.2
QtPy==1.9.0
redis==3.5.3
regex==2020.11.13
reorder-python-imports==2.4.0
reportlab==3.5.67
requests==2.24.0
requests-oauthlib==1.3.0
retrying==1.3.3
rsa==4.7.2
Rx==1.6.1
s3fs==2021.6.0
s3transfer==0.4.2
scipy==1.6.3
semantic-version==2.5.0
Send2Trash==1.5.0
Shapely==1.7.1
six==1.14.0
smart-open==5.1.0
smmap==3.0.5
snowballstemmer==2.1.0
sortedcontainers==2.4.0
Sphinx==3.4.3
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
SQLAlchemy==1.4.9
sqlparse==0.4.1
statsmodels==0.12.2
stevedore==3.3.0
tabulate==0.8.9
tblib==1.7.0
tenacity==7.0.0
termcolor==1.1.0
terminado==0.9.2
testpath==0.4.4
tinycss2==1.1.0
toml==0.10.2
toolz==0.11.1
tornado==6.1
tqdm==4.61.1
trading-calendars==2.1.1
traitlets==5.0.5
traittypes==0.2.1
typed-ast==1.4.2
typing-extensions==3.7.4.3
urllib3==1.25.11
vaex==4.2.0
vaex-astro==0.8.1
vaex-core==4.2.0
vaex-hdf5==0.7.0
vaex-jupyter==0.6.0
vaex-ml==0.12.0
vaex-server==0.4.1
vaex-viz==0.5.0
vine==5.0.0
virtualenv==20.4.2
visitor==0.1.3
wcwidth==0.1.9
WeasyPrint==52.4
webassets==2.0
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
WTForms==2.3.3
xarray==0.18.2
xhtml2pdf==0.2.5
xlrd==2.0.1
yarl==1.6.3
zict==2.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants