Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with vaex to support Python3 #369

Closed
fprada opened this issue Jul 30, 2019 · 25 comments
Closed

Problems with vaex to support Python3 #369

fprada opened this issue Jul 30, 2019 · 25 comments

Comments

@fprada
Copy link

fprada commented Jul 30, 2019

Once we installed vaex using conda the original Python3 was replaced by Python2 when call ipython. In principle vaex supports Python3, how to avoid this?

@maartenbreddels
Copy link
Member

What I suspect happening is:

  • you have a Python2 anaconda distribution for the root
  • you installed vaex in an environment (lets assume you call it vaex)
  • you've activated the vaex environment (I assume with Python3).

Now when you execute $ ipython, you may end up in the root Python environment, which is Python 2.
You can check this by executing $ which ipython (which will not e.g. ~/anaconda/envs/vaex/bin/ipython).
Is this correct?

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Hi again, here more details,

We have installed last anaconda version with python3 as root.

When we follow your instructions:

conda install -c maartenbreddels vaex

Vaex is installed sucesfully but it downgrades our conda python version to 2.7.6

Then I can invoke python from conda that is 2.7.6 version, and I can import vaex with “import vaex”, but everything works in python 2.7.6

Can we use vaex in python3?

Regards.

P.S. Here is the attached message when we try to install vaex with conda:


[root@skun6 ~]# conda install -c maartenbreddels vaex
WARNING: The conda.compat module is deprecated and will be removed in a future release.
Collecting package metadata: done
Solving environment: done

Package Plan

environment location: /usr/local/anaconda3

added / updated specs:
- vaex

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
_ipyw_jlab_nb_ext_conf-0.1.0|           py27_0           4 KB
_libgcc_mutex-0.1          |             main           3 KB
alabaster-0.7.12           |           py27_0          17 KB
anaconda-client-1.7.2      |           py27_0         140 KB
anaconda-navigator-1.9.7   |           py27_0         4.8 MB
anaconda-project-0.8.3     |             py_0         212 KB
aplus-0.11.0               |           py27_0           9 KB  maartenbreddels
asn1crypto-0.24.0          |           py27_0         155 KB
astroid-1.6.5              |           py27_0         402 KB
astropy-2.0.9              |   py27hdd07704_0         6.8 MB
atomicwrites-1.3.0         |           py27_1          13 KB
attrdict-2.0.0             |           py27_0          18 KB  maartenbreddels
attrs-19.1.0               |           py27_1          56 KB
babel-2.7.0                |             py_0         5.8 MB
backcall-0.1.0             |           py27_0          19 KB
backports-1.0              |             py_2         139 KB
backports.functools_lru_cache-1.5|             py_2           9 KB
backports.os-0.1.1         |           py27_0          15 KB
backports.shutil_get_terminal_size-1.0.0|           py27_2           8 KB
backports.tempfile-1.0     |             py_1          12 KB
backports.weakref-1.0.post1|             py_1           7 KB
backports_abc-0.5          |             py_0          13 KB
beautifulsoup4-4.6.3       |           py27_0         135 KB
bitarray-0.9.3             |   py27h7b6447c_0          60 KB
bkcharts-0.2               |           py27_0         124 KB
bleach-3.1.0               |           py27_0         228 KB
bokeh-1.3.0                |           py27_0         4.0 MB
boto-2.49.0                |           py27_0         1.4 MB
bottleneck-1.2.1           |   py27h035aef0_1         127 KB
ca-certificates-2019.5.15  |                0         133 KB
cachetools-1.1.6           |           py27_0          24 KB  maartenbreddels
certifi-2019.6.16          |           py27_1         156 KB
cffi-1.12.3                |   py27h2e261b9_0         218 KB
chardet-3.0.4              |        py27_1003         186 KB
click-7.0                  |           py27_0         116 KB
cloudpickle-1.2.1          |             py_0          28 KB
clyent-1.2.2               |           py27_1          18 KB
colorama-0.4.1             |           py27_0          24 KB
conda-4.7.10               |           py27_0         3.0 MB
conda-build-3.18.9         |           py27_0         530 KB
conda-package-handling-1.3.11|           py27_0         260 KB
conda-verify-3.4.2         |             py_1          25 KB
configparser-3.7.4         |           py27_0          41 KB
contextlib2-0.5.5          |           py27_0          15 KB
cryptography-2.7           |   py27h1ba5d50_0         602 KB
cycler-0.10.0              |           py27_0          13 KB
cython-0.29.12             |   py27he6710b0_0         2.2 MB
cytoolz-0.10.0             |   py27h7b6447c_0         422 KB
dask-1.2.2                 |             py_0          11 KB
dask-core-1.2.2            |             py_0         539 KB
decorator-4.4.0            |           py27_1          18 KB
defusedxml-0.6.0           |             py_0          23 KB
distributed-1.28.1         |           py27_0         852 KB
docutils-0.15.1            |           py27_0         743 KB
entrypoints-0.3            |           py27_0          12 KB
enum34-1.1.6               |           py27_1          57 KB
et_xmlfile-1.0.1           |           py27_0          20 KB
fastcache-1.1.0            |   py27h7b6447c_0          31 KB
filelock-3.0.12            |             py_0          12 KB
flask-1.1.1                |             py_0          73 KB
funcsigs-1.0.2             |           py27_0          20 KB
functools32-3.2.3.2        |           py27_1          23 KB
future-0.17.1              |           py27_0         710 KB
futures-3.3.0              |           py27_0          28 KB
gevent-1.4.0               |   py27h7b6447c_0         2.5 MB
glob2-0.7                  |             py_0          14 KB
gmpy2-2.0.8                |   py27h10f8cd9_2         168 KB
greenlet-0.4.15            |   py27h7b6447c_0          20 KB
h5py-2.9.0                 |   py27h7918eee_0         1.1 MB
heapdict-1.0.0             |           py27_2           8 KB
html5lib-1.0.1             |           py27_0         189 KB
idna-2.8                   |           py27_0         133 KB
imageio-2.5.0              |           py27_0         3.3 MB
imagesize-1.1.0            |           py27_0           9 KB
ipaddress-1.0.22           |           py27_0          32 KB
ipykernel-4.10.0           |           py27_0         145 KB
ipython-5.8.0              |           py27_0         1.0 MB
ipython_genutils-0.2.0     |           py27_0          38 KB
ipywidgets-7.5.0           |             py_0         107 KB
isort-4.3.21               |           py27_0          68 KB
itsdangerous-1.1.0         |           py27_0          26 KB
jdcal-1.4.1                |             py_0          11 KB
jedi-0.13.3                |           py27_0         233 KB
jinja2-2.10.1              |           py27_0         181 KB
jprops-1.0                 |           py27_0           9 KB  maartenbreddels
jsonschema-3.0.1           |           py27_0          86 KB
jupyter-1.0.0              |           py27_7           6 KB
jupyter_client-5.3.1       |             py_0          69 KB
jupyter_console-5.2.0      |           py27_1          35 KB
jupyter_core-4.5.0         |             py_0          48 KB
jupyterlab-0.33.11         |           py27_0        10.0 MB
jupyterlab_launcher-0.11.2 |   py27h28b3542_0          32 KB
keyring-18.0.0             |           py27_0          54 KB
kiwisolver-1.1.0           |   py27he6710b0_0          91 KB
lazy-object-proxy-1.4.1    |   py27h7b6447c_0          29 KB
linecache2-1.0.0           |           py27_0          24 KB
llvmlite-0.29.0            |   py27hd408876_0        17.7 MB
locket-0.2.0               |           py27_1           8 KB
lxml-4.3.4                 |   py27hefd8a0e_0         1.4 MB
markupsafe-1.1.1           |   py27h7b6447c_0          29 KB
matplotlib-2.2.3           |   py27hb69df0a_0         6.5 MB
mccabe-0.6.1               |           py27_1          13 KB
mistune-0.8.4              |   py27h7b6447c_0          53 KB
mkl-service-2.0.2          |   py27h7b6447c_0          67 KB
mkl_fft-1.0.12             |   py27ha843d7b_0         163 KB
mkl_random-1.0.2           |   py27hd81dba3_0         383 KB
more-itertools-5.0.0       |           py27_0          86 KB
mpmath-1.1.0               |           py27_0         972 KB
msgpack-python-0.6.1       |   py27hfd86e86_1          90 KB
multipledispatch-0.6.0     |           py27_0          21 KB
navigator-updater-0.2.1    |           py27_0         1.2 MB
nbconvert-5.5.0            |             py_0         381 KB
nbformat-4.4.0             |           py27_0         139 KB
networkx-2.2               |           py27_1         2.0 MB
nltk-3.4.4                 |           py27_0         2.1 MB
nose-1.3.7                 |           py27_2         213 KB
notebook-5.7.8             |           py27_0         7.2 MB
numba-0.45.0               |   py27h962f231_0         3.0 MB
numexpr-2.6.9              |   py27h9e4a6bb_0         193 KB
numpy-1.16.4               |   py27h7e9f1db_0          49 KB
numpy-base-1.16.4          |   py27hde5b4d6_0         4.3 MB
numpydoc-0.9.1             |             py_0          31 KB
olefile-0.46               |           py27_0          48 KB
openpyxl-2.6.2             |             py_0         157 KB
openssl-1.1.1c             |       h7b6447c_1         3.8 MB
packaging-19.0             |           py27_0          37 KB
pandas-0.24.2              |   py27he6710b0_0        10.9 MB
pandocfilters-1.4.2        |           py27_1          13 KB
parso-0.5.0                |             py_0          67 KB
partd-1.0.0                |             py_0          19 KB
path.py-11.1.0             |           py27_0          52 KB
pathlib2-2.3.4             |           py27_0          35 KB
patsy-0.5.1                |           py27_0         375 KB
pep8-1.7.1                 |           py27_0          51 KB
pexpect-4.7.0              |           py27_0          80 KB
pickleshare-0.7.5          |           py27_0          12 KB
pillow-6.1.0               |   py27h34e0f95_0         631 KB
pip-19.1.1                 |           py27_0         1.8 MB
pkginfo-1.5.0.1            |           py27_0          41 KB
pluggy-0.11.0              |             py_0          20 KB
ply-3.11                   |           py27_0          79 KB
progressbar2-3.6.0         |           py27_0          25 KB  maartenbreddels
prometheus_client-0.7.1    |             py_0          42 KB
prompt_toolkit-1.0.15      |           py27_0         333 KB
psutil-5.6.3               |   py27h7b6447c_0         321 KB
ptyprocess-0.6.0           |           py27_0          22 KB
py-1.8.0                   |           py27_0         137 KB
py-lief-0.9.0              |   py27h7725739_2         1.6 MB
pycodestyle-2.5.0          |           py27_0          60 KB
pycosat-0.6.3              |   py27h14c3975_0         103 KB
pycparser-2.19             |           py27_0         173 KB
pycrypto-2.6.1             |   py27h14c3975_9         460 KB
pycurl-7.43.0.2            |   py27h1ba5d50_0         184 KB
pyflakes-2.1.1             |           py27_0         100 KB
pygments-2.4.2             |             py_0         664 KB
pylint-1.9.2               |           py27_0         772 KB
pyodbc-4.0.26              |   py27he6710b0_0          71 KB
pyopengl-3.1.1a1           |           py27_0         1.3 MB
pyopenssl-19.0.0           |           py27_0          80 KB
pyparsing-2.4.0            |             py_0          58 KB
pyqt-5.9.2                 |   py27h05f1152_2         5.4 MB
pyrsistent-0.14.11         |   py27h7b6447c_0          88 KB
pysocks-1.7.0              |           py27_0          29 KB
pytables-3.5.1             |   py27h71ec239_0         1.4 MB
pytest-4.5.0               |           py27_0         358 KB
pytest-arraydiff-0.3       |   py27h39e3cac_0          15 KB
pytest-astropy-0.5.0       |           py27_0           6 KB
pytest-doctestplus-0.3.0   |           py27_0          23 KB
pytest-openfiles-0.3.2     |           py27_0          11 KB
pytest-remotedata-0.3.1    |           py27_0          13 KB
python-2.7.16              |       h9bab390_0        12.8 MB
python-dateutil-2.8.0      |           py27_0         279 KB
python-libarchive-c-2.8    |          py27_11          22 KB
pytz-2019.1                |             py_0         236 KB
pywavelets-1.0.3           |   py27hdd07704_1         4.4 MB
pyyaml-5.1.1               |   py27h7b6447c_0         177 KB
pyzmq-18.0.0               |   py27he6710b0_0         463 KB
qtawesome-0.5.7            |           py27_1         615 KB
qtconsole-4.5.2            |             py_0          92 KB
qtpy-1.8.0                 |             py_0          38 KB
requests-2.22.0            |           py27_0          89 KB
rope-0.14.0                |             py_0         113 KB
ruamel_yaml-0.15.46        |   py27h14c3975_0         241 KB
scandir-1.10.0             |   py27h7b6447c_0          32 KB
scikit-image-0.14.2        |   py27he6710b0_0        24.0 MB
scikit-learn-0.20.3        |   py27hd81dba3_0         5.8 MB
scipy-1.2.1                |   py27h7c811a0_0        17.6 MB
seaborn-0.9.0              |           py27_0         374 KB
send2trash-1.5.0           |           py27_0          16 KB
setuptools-41.0.1          |           py27_0         640 KB
simplegeneric-0.8.1        |           py27_2           9 KB
singledispatch-3.4.0.3     |           py27_0          15 KB
sip-4.19.8                 |   py27hf484d3e_0         291 KB
six-1.12.0                 |           py27_0          22 KB
snowballstemmer-1.9.0      |             py_0          53 KB
sortedcollections-1.1.2    |           py27_0          17 KB
sortedcontainers-2.1.0     |           py27_0          44 KB
sphinx-1.8.5               |           py27_0         1.9 MB
sphinxcontrib-1.0          |           py27_1           3 KB
sphinxcontrib-websupport-1.1.2|             py_0          35 KB
spyder-3.3.6               |           py27_0         2.5 MB
spyder-kernels-0.5.1       |           py27_0          68 KB
sqlalchemy-1.3.5           |   py27h7b6447c_0         1.7 MB
statsmodels-0.10.1         |   py27hdd07704_0         9.6 MB
subprocess32-3.5.4         |   py27h7b6447c_0          49 KB
sympy-1.4                  |           py27_0         9.9 MB
tblib-1.4.0                |             py_0          14 KB
terminado-0.8.2            |           py27_0          22 KB
testpath-0.4.2             |           py27_0          91 KB
toolz-0.10.0               |             py_0          50 KB
tornado-5.1.1              |   py27h7b6447c_0         643 KB
tqdm-4.32.1                |             py_0          48 KB
traceback2-1.4.0           |           py27_0          30 KB
traitlets-4.3.2            |           py27_0         128 KB
typing-3.7.4               |           py27_0          49 KB
unicodecsv-0.14.1          |           py27_0          24 KB
unittest2-1.1.0            |           py27_0         143 KB
urllib3-1.24.2             |           py27_0         151 KB
vaex-1.0.0b2               |           py27_0         1.0 MB  maartenbreddels
wcwidth-0.1.7              |           py27_0          25 KB
webencodings-0.5.1         |           py27_1          19 KB
werkzeug-0.15.4            |             py_0         262 KB
wheel-0.33.4               |           py27_0          39 KB
widgetsnbextension-3.5.0   |           py27_0         1.8 MB
wrapt-1.11.2               |   py27h7b6447c_0          48 KB
wurlitzer-1.0.2            |           py27_0          12 KB
xlrd-1.2.0                 |           py27_0         187 KB
xlsxwriter-1.1.8           |             py_0         105 KB
xlwt-1.3.0                 |           py27_0         160 KB
zict-1.0.0                 |             py_0          12 KB
zipp-0.5.1                 |             py_0           8 KB
------------------------------------------------------------
                                       Total:       234.9 MB

The following NEW packages will be INSTALLED:

_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
aplus maartenbreddels/linux-64::aplus-0.11.0-py27_0
attrdict maartenbreddels/linux-64::attrdict-2.0.0-py27_0
backports.functoo~ pkgs/main/noarch::backports.functools_lru_cache-1.5-py_2
backports.tempfile pkgs/main/noarch::backports.tempfile-1.0-py_1
backports.weakref pkgs/main/noarch::backports.weakref-1.0.post1-py_1
backports_abc pkgs/main/noarch::backports_abc-0.5-py_0
cachetools maartenbreddels/linux-64::cachetools-1.1.6-py27_0
conda-package-han~ pkgs/main/linux-64::conda-package-handling-1.3.11-py27_0
configparser pkgs/main/linux-64::configparser-3.7.4-py27_0
enum34 pkgs/main/linux-64::enum34-1.1.6-py27_1
funcsigs pkgs/main/linux-64::funcsigs-1.0.2-py27_0
functools32 pkgs/main/linux-64::functools32-3.2.3.2-py27_1
futures pkgs/main/linux-64::futures-3.3.0-py27_0
ipaddress pkgs/main/linux-64::ipaddress-1.0.22-py27_0
jprops maartenbreddels/linux-64::jprops-1.0-py27_0
jupyterlab_launch~ pkgs/main/linux-64::jupyterlab_launcher-0.11.2-py27h28b3542_0
linecache2 pkgs/main/linux-64::linecache2-1.0.0-py27_0
progressbar2 maartenbreddels/linux-64::progressbar2-3.6.0-py27_0
pyopengl pkgs/main/linux-64::pyopengl-3.1.1a1-py27_0
scandir pkgs/main/linux-64::scandir-1.10.0-py27h7b6447c_0
subprocess32 pkgs/main/linux-64::subprocess32-3.5.4-py27h7b6447c_0
traceback2 pkgs/main/linux-64::traceback2-1.4.0-py27_0
typing pkgs/main/linux-64::typing-3.7.4-py27_0
unittest2 pkgs/main/linux-64::unittest2-1.1.0-py27_0
vaex maartenbreddels/linux-64::vaex-1.0.0b2-py27_0

The following packages will be REMOVED:

anaconda-2019.03-py37_0
importlib_metadata-0.8-py37_0
jeepney-0.4-py37_0
jupyterlab_server-0.2.0-py37_0
secretstorage-3.1.1-py37_0
soupsieve-1.8-py37_0

The following packages will be UPDATED:

anaconda-project pkgs/main/linux-64::anaconda-project-~ --> pkgs/main/noarch::anaconda-project-0.8.3-py_0
babel pkgs/main/linux-64::babel-2.6.0-py37_0 --> pkgs/main/noarch::babel-2.7.0-py_0
backports pkgs/main/linux-64::backports-1.0-py3~ --> pkgs/main/noarch::backports-1.0-py_2
bitarray 0.8.3-py37h14c3975_0 --> 0.9.3-py27h7b6447c_0
bokeh 1.0.4-py37_0 --> 1.3.0-py27_0
ca-certificates 2019.1.23-0 --> 2019.5.15-0
certifi 2019.3.9-py37_0 --> 2019.6.16-py27_1
cffi 1.12.2-py37h2e261b9_1 --> 1.12.3-py27h2e261b9_0
chardet 3.0.4-py37_1 --> 3.0.4-py27_1003
cloudpickle pkgs/main/linux-64::cloudpickle-0.8.0~ --> pkgs/main/noarch::cloudpickle-1.2.1-py_0
conda 4.6.11-py37_0 --> 4.7.10-py27_0
conda-build 3.17.8-py37_0 --> 3.18.9-py27_0
conda-verify pkgs/main/linux-64::conda-verify-3.1.~ --> pkgs/main/noarch::conda-verify-3.4.2-py_1
cryptography 2.6.1-py37h1ba5d50_0 --> 2.7-py27h1ba5d50_0
cython 0.29.6-py37he6710b0_0 --> 0.29.12-py27he6710b0_0
cytoolz 0.9.0.1-py37h14c3975_1 --> 0.10.0-py27h7b6447c_0
dask pkgs/main/linux-64::dask-1.1.4-py37_1 --> pkgs/main/noarch::dask-1.2.2-py_0
dask-core pkgs/main/linux-64::dask-core-1.1.4-p~ --> pkgs/main/noarch::dask-core-1.2.2-py_0
defusedxml pkgs/main/linux-64::defusedxml-0.5.0-~ --> pkgs/main/noarch::defusedxml-0.6.0-py_0
distributed 1.26.0-py37_1 --> 1.28.1-py27_0
docutils 0.14-py37_0 --> 0.15.1-py27_0
fastcache 1.0.2-py37h14c3975_2 --> 1.1.0-py27h7b6447c_0
filelock pkgs/main/linux-64::filelock-3.0.10-p~ --> pkgs/main/noarch::filelock-3.0.12-py_0
flask pkgs/main/linux-64::flask-1.0.2-py37_1 --> pkgs/main/noarch::flask-1.1.1-py_0
glob2 pkgs/main/linux-64::glob2-0.6-py37_1 --> pkgs/main/noarch::glob2-0.7-py_0
ipywidgets pkgs/main/linux-64::ipywidgets-7.4.2-~ --> pkgs/main/noarch::ipywidgets-7.5.0-py_0
isort 4.3.16-py37_0 --> 4.3.21-py27_0
jdcal pkgs/main/linux-64::jdcal-1.4-py37_0 --> pkgs/main/noarch::jdcal-1.4.1-py_0
jinja2 2.10-py37_0 --> 2.10.1-py27_0
jupyter_client pkgs/main/linux-64::jupyter_client-5.~ --> pkgs/main/noarch::jupyter_client-5.3.1-py_0
jupyter_core pkgs/main/linux-64::jupyter_core-4.4.~ --> pkgs/main/noarch::jupyter_core-4.5.0-py_0
kiwisolver 1.0.1-py37hf484d3e_0 --> 1.1.0-py27he6710b0_0
lazy-object-proxy 1.3.1-py37h14c3975_2 --> 1.4.1-py27h7b6447c_0
llvmlite 0.28.0-py37hd408876_0 --> 0.29.0-py27hd408876_0
lxml 4.3.2-py37hefd8a0e_0 --> 4.3.4-py27hefd8a0e_0
mkl-service 1.1.2-py37he904b0f_5 --> 2.0.2-py27h7b6447c_0
mkl_fft 1.0.10-py37ha843d7b_0 --> 1.0.12-py27ha843d7b_0
nbconvert pkgs/main/linux-64::nbconvert-5.4.1-p~ --> pkgs/main/noarch::nbconvert-5.5.0-py_0
nltk 3.4-py37_1 --> 3.4.4-py27_0
numba 0.43.1-py37h962f231_0 --> 0.45.0-py27h962f231_0
numpy 1.16.2-py37h7e9f1db_0 --> 1.16.4-py27h7e9f1db_0
numpy-base 1.16.2-py37hde5b4d6_0 --> 1.16.4-py27hde5b4d6_0
numpydoc pkgs/main/linux-64::numpydoc-0.8.0-py~ --> pkgs/main/noarch::numpydoc-0.9.1-py_0
openpyxl pkgs/main/linux-64::openpyxl-2.6.1-py~ --> pkgs/main/noarch::openpyxl-2.6.2-py_0
openssl 1.1.1b-h7b6447c_1 --> 1.1.1c-h7b6447c_1
parso pkgs/main/linux-64::parso-0.3.4-py37_0 --> pkgs/main/noarch::parso-0.5.0-py_0
partd pkgs/main/linux-64::partd-0.3.10-py37~ --> pkgs/main/noarch::partd-1.0.0-py_0
pathlib2 2.3.3-py37_0 --> 2.3.4-py27_0
pexpect 4.6.0-py37_0 --> 4.7.0-py27_0
pillow 5.4.1-py37h34e0f95_0 --> 6.1.0-py27h34e0f95_0
pip 19.0.3-py37_0 --> 19.1.1-py27_0
pluggy pkgs/main/linux-64::pluggy-0.9.0-py37~ --> pkgs/main/noarch::pluggy-0.11.0-py_0
prometheus_client pkgs/main/linux-64::prometheus_client~ --> pkgs/main/noarch::prometheus_client-0.7.1-py_0
psutil 5.6.1-py37h7b6447c_0 --> 5.6.3-py27h7b6447c_0
pygments pkgs/main/linux-64::pygments-2.3.1-py~ --> pkgs/main/noarch::pygments-2.4.2-py_0
pyparsing pkgs/main/linux-64::pyparsing-2.3.1-p~ --> pkgs/main/noarch::pyparsing-2.4.0-py_0
pysocks 1.6.8-py37_0 --> 1.7.0-py27_0
pytest 4.3.1-py37_0 --> 4.5.0-py27_0
python-libarchive~ 2.8-py37_6 --> 2.8-py27_11
pytz pkgs/main/linux-64::pytz-2018.9-py37_0 --> pkgs/main/noarch::pytz-2019.1-py_0
pywavelets 1.0.2-py37hdd07704_0 --> 1.0.3-py27hdd07704_1
pyyaml 5.1-py37h7b6447c_0 --> 5.1.1-py27h7b6447c_0
qtconsole pkgs/main/linux-64::qtconsole-4.4.3-p~ --> pkgs/main/noarch::qtconsole-4.5.2-py_0
qtpy pkgs/main/linux-64::qtpy-1.7.0-py37_1 --> pkgs/main/noarch::qtpy-1.8.0-py_0
requests 2.21.0-py37_0 --> 2.22.0-py27_0
rope pkgs/main/linux-64::rope-0.12.0-py37_0 --> pkgs/main/noarch::rope-0.14.0-py_0
setuptools 40.8.0-py37_0 --> 41.0.1-py27_0
snowballstemmer pkgs/main/linux-64::snowballstemmer-1~ --> pkgs/main/noarch::snowballstemmer-1.9.0-py_0
sphinxcontrib-web~ pkgs/main/linux-64::sphinxcontrib-web~ --> pkgs/main/noarch::sphinxcontrib-websupport-1.1.2-py_0
spyder 3.3.3-py37_0 --> 3.3.6-py27_0
spyder-kernels 0.4.2-py37_0 --> 0.5.1-py27_0
sqlalchemy 1.3.1-py37h7b6447c_0 --> 1.3.5-py27h7b6447c_0
statsmodels 0.9.0-py37h035aef0_0 --> 0.10.1-py27hdd07704_0
sympy 1.3-py37_0 --> 1.4-py27_0
tblib pkgs/main/linux-64::tblib-1.3.2-py37_0 --> pkgs/main/noarch::tblib-1.4.0-py_0
terminado 0.8.1-py37_1 --> 0.8.2-py27_0
toolz pkgs/main/linux-64::toolz-0.9.0-py37_0 --> pkgs/main/noarch::toolz-0.10.0-py_0
tqdm pkgs/main/linux-64::tqdm-4.31.1-py37_1 --> pkgs/main/noarch::tqdm-4.32.1-py_0
urllib3 1.24.1-py37_0 --> 1.24.2-py27_0
werkzeug pkgs/main/linux-64::werkzeug-0.14.1-p~ --> pkgs/main/noarch::werkzeug-0.15.4-py_0
wheel 0.33.1-py37_0 --> 0.33.4-py27_0
widgetsnbextension 3.4.2-py37_0 --> 3.5.0-py27_0
wrapt 1.11.1-py37h7b6447c_0 --> 1.11.2-py27h7b6447c_0
xlsxwriter pkgs/main/linux-64::xlsxwriter-1.1.5-~ --> pkgs/main/noarch::xlsxwriter-1.1.8-py_0
zict pkgs/main/linux-64::zict-0.1.4-py37_0 --> pkgs/main/noarch::zict-1.0.0-py_0
zipp pkgs/main/linux-64::zipp-0.3.3-py37_1 --> pkgs/main/noarch::zipp-0.5.1-py_0

The following packages will be DOWNGRADED:

ipyw_jlab_nb_ext~ 0.1.0-py37_0 --> 0.1.0-py27_0
alabaster 0.7.12-py37_0 --> 0.7.12-py27_0
anaconda-client 1.7.2-py37_0 --> 1.7.2-py27_0
anaconda-navigator 1.9.7-py37_0 --> 1.9.7-py27_0
asn1crypto 0.24.0-py37_0 --> 0.24.0-py27_0
astroid 2.2.5-py37_0 --> 1.6.5-py27_0
astropy 3.1.2-py37h7b6447c_0 --> 2.0.9-py27hdd07704_0
atomicwrites 1.3.0-py37_1 --> 1.3.0-py27_1
attrs 19.1.0-py37_1 --> 19.1.0-py27_1
backcall 0.1.0-py37_0 --> 0.1.0-py27_0
backports.os 0.1.1-py37_0 --> 0.1.1-py27_0
backports.shutil
~ 1.0.0-py37_2 --> 1.0.0-py27_2
beautifulsoup4 4.7.1-py37_1 --> 4.6.3-py27_0
bkcharts 0.2-py37_0 --> 0.2-py27_0
bleach 3.1.0-py37_0 --> 3.1.0-py27_0
boto 2.49.0-py37_0 --> 2.49.0-py27_0
bottleneck 1.2.1-py37h035aef0_1 --> 1.2.1-py27h035aef0_1
click 7.0-py37_0 --> 7.0-py27_0
clyent 1.2.2-py37_1 --> 1.2.2-py27_1
colorama 0.4.1-py37_0 --> 0.4.1-py27_0
contextlib2 0.5.5-py37_0 --> 0.5.5-py27_0
cycler 0.10.0-py37_0 --> 0.10.0-py27_0
decorator 4.4.0-py37_1 --> 4.4.0-py27_1
entrypoints 0.3-py37_0 --> 0.3-py27_0
et_xmlfile 1.0.1-py37_0 --> 1.0.1-py27_0
future 0.17.1-py37_0 --> 0.17.1-py27_0
gevent 1.4.0-py37h7b6447c_0 --> 1.4.0-py27h7b6447c_0
gmpy2 2.0.8-py37h10f8cd9_2 --> 2.0.8-py27h10f8cd9_2
greenlet 0.4.15-py37h7b6447c_0 --> 0.4.15-py27h7b6447c_0
h5py 2.9.0-py37h7918eee_0 --> 2.9.0-py27h7918eee_0
heapdict 1.0.0-py37_2 --> 1.0.0-py27_2
html5lib 1.0.1-py37_0 --> 1.0.1-py27_0
idna 2.8-py37_0 --> 2.8-py27_0
imageio 2.5.0-py37_0 --> 2.5.0-py27_0
imagesize 1.1.0-py37_0 --> 1.1.0-py27_0
ipykernel 5.1.0-py37h39e3cac_0 --> 4.10.0-py27_0
ipython 7.4.0-py37h39e3cac_0 --> 5.8.0-py27_0
ipython_genutils 0.2.0-py37_0 --> 0.2.0-py27_0
itsdangerous 1.1.0-py37_0 --> 1.1.0-py27_0
jedi 0.13.3-py37_0 --> 0.13.3-py27_0
jsonschema 3.0.1-py37_0 --> 3.0.1-py27_0
jupyter 1.0.0-py37_7 --> 1.0.0-py27_7
jupyter_console 6.0.0-py37_0 --> 5.2.0-py27_1
jupyterlab 0.35.4-py37hf63ae98_0 --> 0.33.11-py27_0
keyring 18.0.0-py37_0 --> 18.0.0-py27_0
locket 0.2.0-py37_1 --> 0.2.0-py27_1
markupsafe 1.1.1-py37h7b6447c_0 --> 1.1.1-py27h7b6447c_0
matplotlib 3.0.3-py37h5429711_0 --> 2.2.3-py27hb69df0a_0
mccabe 0.6.1-py37_1 --> 0.6.1-py27_1
mistune 0.8.4-py37h7b6447c_0 --> 0.8.4-py27h7b6447c_0
mkl_random 1.0.2-py37hd81dba3_0 --> 1.0.2-py27hd81dba3_0
more-itertools 6.0.0-py37_0 --> 5.0.0-py27_0
mpmath 1.1.0-py37_0 --> 1.1.0-py27_0
msgpack-python 0.6.1-py37hfd86e86_1 --> 0.6.1-py27hfd86e86_1
multipledispatch 0.6.0-py37_0 --> 0.6.0-py27_0
navigator-updater 0.2.1-py37_0 --> 0.2.1-py27_0
nbformat 4.4.0-py37_0 --> 4.4.0-py27_0
networkx 2.2-py37_1 --> 2.2-py27_1
nose 1.3.7-py37_2 --> 1.3.7-py27_2
notebook 5.7.8-py37_0 --> 5.7.8-py27_0
numexpr 2.6.9-py37h9e4a6bb_0 --> 2.6.9-py27h9e4a6bb_0
olefile 0.46-py37_0 --> 0.46-py27_0
packaging 19.0-py37_0 --> 19.0-py27_0
pandas 0.24.2-py37he6710b0_0 --> 0.24.2-py27he6710b0_0
pandocfilters 1.4.2-py37_1 --> 1.4.2-py27_1
path.py 11.5.0-py37_0 --> 11.1.0-py27_0
patsy 0.5.1-py37_0 --> 0.5.1-py27_0
pep8 1.7.1-py37_0 --> 1.7.1-py27_0
pickleshare 0.7.5-py37_0 --> 0.7.5-py27_0
pkginfo 1.5.0.1-py37_0 --> 1.5.0.1-py27_0
ply 3.11-py37_0 --> 3.11-py27_0
prompt_toolkit 2.0.9-py37_0 --> 1.0.15-py27_0
ptyprocess 0.6.0-py37_0 --> 0.6.0-py27_0
py 1.8.0-py37_0 --> 1.8.0-py27_0
py-lief 0.9.0-py37h7725739_2 --> 0.9.0-py27h7725739_2
pycodestyle 2.5.0-py37_0 --> 2.5.0-py27_0
pycosat 0.6.3-py37h14c3975_0 --> 0.6.3-py27h14c3975_0
pycparser 2.19-py37_0 --> 2.19-py27_0
pycrypto 2.6.1-py37h14c3975_9 --> 2.6.1-py27h14c3975_9
pycurl 7.43.0.2-py37h1ba5d50_0 --> 7.43.0.2-py27h1ba5d50_0
pyflakes 2.1.1-py37_0 --> 2.1.1-py27_0
pylint 2.3.1-py37_0 --> 1.9.2-py27_0
pyodbc 4.0.26-py37he6710b0_0 --> 4.0.26-py27he6710b0_0
pyopenssl 19.0.0-py37_0 --> 19.0.0-py27_0
pyqt 5.9.2-py37h05f1152_2 --> 5.9.2-py27h05f1152_2
pyrsistent 0.14.11-py37h7b6447c_0 --> 0.14.11-py27h7b6447c_0
pytables 3.5.1-py37h71ec239_0 --> 3.5.1-py27h71ec239_0
pytest-arraydiff 0.3-py37h39e3cac_0 --> 0.3-py27h39e3cac_0
pytest-astropy 0.5.0-py37_0 --> 0.5.0-py27_0
pytest-doctestplus 0.3.0-py37_0 --> 0.3.0-py27_0
pytest-openfiles 0.3.2-py37_0 --> 0.3.2-py27_0
pytest-remotedata 0.3.1-py37_0 --> 0.3.1-py27_0
python 3.7.3-h0371630_0 --> 2.7.16-h9bab390_0
python-dateutil 2.8.0-py37_0 --> 2.8.0-py27_0
pyzmq 18.0.0-py37he6710b0_0 --> 18.0.0-py27he6710b0_0
qtawesome 0.5.7-py37_1 --> 0.5.7-py27_1
ruamel_yaml 0.15.46-py37h14c3975_0 --> 0.15.46-py27h14c3975_0
scikit-image 0.14.2-py37he6710b0_0 --> 0.14.2-py27he6710b0_0
scikit-learn 0.20.3-py37hd81dba3_0 --> 0.20.3-py27hd81dba3_0
scipy 1.2.1-py37h7c811a0_0 --> 1.2.1-py27h7c811a0_0
seaborn 0.9.0-py37_0 --> 0.9.0-py27_0
send2trash 1.5.0-py37_0 --> 1.5.0-py27_0
simplegeneric 0.8.1-py37_2 --> 0.8.1-py27_2
singledispatch 3.4.0.3-py37_0 --> 3.4.0.3-py27_0
sip 4.19.8-py37hf484d3e_0 --> 4.19.8-py27hf484d3e_0
six 1.12.0-py37_0 --> 1.12.0-py27_0
sortedcollections 1.1.2-py37_0 --> 1.1.2-py27_0
sortedcontainers 2.1.0-py37_0 --> 2.1.0-py27_0
sphinx 1.8.5-py37_0 --> 1.8.5-py27_0
sphinxcontrib 1.0-py37_1 --> 1.0-py27_1
testpath 0.4.2-py37_0 --> 0.4.2-py27_0
tornado 6.0.2-py37h7b6447c_0 --> 5.1.1-py27h7b6447c_0
traitlets 4.3.2-py37_0 --> 4.3.2-py27_0
unicodecsv 0.14.1-py37_0 --> 0.14.1-py27_0
wcwidth 0.1.7-py37_0 --> 0.1.7-py27_0
webencodings 0.5.1-py37_1 --> 0.5.1-py27_1
wurlitzer 1.0.2-py37_0 --> 1.0.2-py27_0
xlrd 1.2.0-py37_0 --> 1.2.0-py27_0
xlwt 1.3.0-py37_0 --> 1.3.0-py27_0

Proceed ([y]/n)?

@JovanVeljanoski
Copy link
Member

Hi,

Can you please try installing from conda-forge ?

conda install -c conda-forge vaex

Best from a clean env.

Cheers,
Jovan.

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Thanks! Now it seems to work following your suggestions. BUT, I get this error when reading my hdf5 table,

In [15]: ds = vaex.open('/home/users/dae/ishiyama/Uchuu/Rockstar/007/out_7.rockstar.0.hdf5')
ERROR:MainThread:vaex:error opening '/home/users/dae/ishiyama/Uchuu/Rockstar/007/out_7.rockstar.0.hdf5'

ValueError Traceback (most recent call last)
in
----> 1 ds = vaex.open('/home/users/dae/ishiyama/Uchuu/Rockstar/007/out_7.rockstar.0.hdf5')

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/init.py in open(path, convert, shuffle, copy_index, *args, **kwargs)
189 ds = from_csv(path, copy_index=copy_index, **kwargs)
190 else:
--> 191 ds = vaex.file.open(path, *args, **kwargs)
192 if convert and ds:
193 ds.export_hdf5(filename_hdf5, shuffle=shuffle)

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/file/init.py in open(path, *args, **kwargs)
39 break
40 if dataset_class:
---> 41 dataset = dataset_class(path, *args, **kwargs)
42 return dataset
43

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/hdf5/dataset.py in init(self, filename, write)
84 self.h5table_root_name = None
85 self._version = 1
---> 86 self._load()
87
88 def write_meta(self):

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/hdf5/dataset.py in _load(self)
186 if len(root_datasets):
187 # if we have datasets at the root, we assume 'version 1'
--> 188 self._load_columns(self.h5file)
189 self.h5table_root_name = "/"
190

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/hdf5/dataset.py in _load_columns(self, h5data, first)
341 self.add_column(column_name, self._map_hdf5_array(data, column['mask']))
342 else:
--> 343 self.add_column(column_name, self._map_hdf5_array(data))
344 else:
345 transposed = shape[1] < shape[0]

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, f_or_array, dtype)
2757 if len(self) == len(ar):
2758 raise ValueError("Array is of length %s, while the length of the DataFrame is %s due to the filtering, the (unfiltered) length is %s." % (len(ar), len(self), self.length_unfiltered()))
-> 2759 raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original()))
2760 # assert self.length_unfiltered() == len(data), "columns should be of equal length, length should be %d, while it is %d" % ( self.length_unfiltered(), len(data))
2761 self.columns[name] = f_or_array

ValueError: array is of length 16, while the length of the DataFrame is 69882354

In [16]:

@JovanVeljanoski
Copy link
Member

Hi,

How did you create the hdf5 file? The data can be store in multiple ways inside the hdf5 file.

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Hi,

the hdf5 files have been created by our group with our code based on c starting from a huge ASCII file that it is splited and then converted to several hdf5 files. I guess we are not following a standard format? You can take a look at the code here https://bitbucket.org/cnvega/rockstar_outputs/src/default/

Your help is very much welcome! Thanks.

@JovanVeljanoski
Copy link
Member

Easiest / fastest way would probably be to use vaex to read the ascii file and output a single (or multiple) hdf5 files. Then you are guaranteed compatibility. Maybe I could help with this, if you send me a couple of lines from that ascii file?

In general, you can use vaex.read_csv which uses pandas.read_csv to read a text file in memory. It does not have to be a csv file, it can be an ascii, you can define the delimiter, and I think it does have support for standard ascii files.

I hope this helps.

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Thank you! Indeed starting from the ascii certainly would be the best. It'd be great if you can help with this. Please find below the first lines of the ascii (halo catalog) which includes the header and data for 4 halos:

#ID DescID Mvir Vmax Vrms Rvir Rs Np X Y Z VX VY VZ JX JY JZ Spin rs_klypin Mvir_all M200b M200c M500c M2500c Xoff Voff spin_bullock b_to_a c_to_a A[x] A[y] A[z] b_to_a(500c) c_to_a(500c) Ax Ay Az T/|U| M_pe_Behro
ozi M_pe_Diemer Halfmass_Radius rvmax PID
#a = 0.537760
#Om = 0.308900; Ol = 0.691100; h = 0.677400
#FOF linking length: 0.280000
#Unbound Threshold: 0.500000; FOF Refinement Threshold: 0.700000
#Particle mass: 3.27018e+08 Msun/h
#Box size: 2000.000000 Mpc/h
#Force resolution assumed: 0.00427 Mpc/h
#Units: Masses in Msun / h
#Units: Positions in Mpc / h (comoving)
#Units: Velocities in km / s (physical, peculiar)
#Units: Halo Distances, Lengths, and Radii in kpc / h (comoving)
#Units: Angular Momenta in (Msun/h) * (Mpc/h) * km/s (physical)
#Units: Spins are dimensionless
#Np is an internal debugging quantity.
#Rockstar Version: 0.99.9-RC3+
11721 -1 3.270e+09 28.59 31.33 35.278 7.694 45 1.01573 0.49961 0.27360 9.30 -121.61 -692.93 2.487e+08 -1.555e+08 -8.470e+07 0.06865 7.69378 3.9242e+09 3.2702e+09 3.2702e+09 0.0000e+00 0.0000e+00 5.81215 11.02 0.12781 0.11185 0.04569 -6.8
7745 -1.79801 20.51143 0.00000 0.00000 0.00000 0.00000 0.00000 1.2660 3.083e+09 2.943e+09 22.007 28.796 11723
93688 -1 4.251e+09 30.67 38.40 38.502 9.779 39 3.55980 3.13285 1.20113 -109.68 -70.84 -368.92 3.505e+08 3.955e+08 -1.963e+08 0.11550 9.77894 4.5783e+09 4.2512e+09 2.9432e+09 0.0000e+00 0.0000e+00 13.06878 9.52 0.15239 0.09049 0.00000 1.2
5496 12.71236 13.98056 0.00000 0.00000 0.00000 0.00000 0.00000 1.5498 5.686e+09 4.251e+09 28.150 36.152 -1
11722 -1 9.810e+08 19.56 5.01 23.616 4.395 50 1.16682 0.36458 0.13502 48.84 -281.26 -530.28 2.162e+08 -9.036e+07 -8.889e+07 1.50052 4.39534 1.3081e+09 9.8105e+08 9.8105e+08 0.0000e+00 0.0000e+00 16.63480 44.30 0.78025 0.00000 0.00000 0.0
0000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 14.2525 9.997e+08 3.270e+08 18.805 20.511 11723
62160 -1 3.009e+10 61.51 55.45 73.920 13.338 106 2.04250 0.29428 4.37749 -116.57 -18.30 -463.31 4.077e+08 2.614e+09 -1.089e+09 0.03017 13.33807 3.0086e+10 3.0413e+10 2.7470e+10 1.9948e+10 7.8484e+09 6.86375 0.00 0.02965 0.74100 0.59323 0
.55459 8.36537 3.35550 0.63043 0.55334 0.83416 7.19021 -0.82567 0.4740 2.806e+10 2.485e+10 34.376 52.545 -1

@JovanVeljanoski
Copy link
Member

Ah, this is from ROCKSTAR the clustering algorithm right? It should be straightforward to read in the data than.

All you need to do is this

import vaex
names = ['ID', 'DescID', 'Mvir', 'Vmax', 'Vrms', 'Rvir', 'Rs', 'Np', 'X', 'Y', 'Z', 'VX', 'VY', 'VZ', 'JX', 'JY', 'JZ', 'Spin', 'rs_klypin', 'Mvir_all', 'M200b', 'M200c', 'M500c', 'M2500c', 'Xoff', 'Voff', 'spin_bullock', 'b_to_a', 'c_to_a', 'A[x]', 'A[y]', 'A[z]', 'b_to_a(500c)', 'c_to_a(500c)', 'Ax', 'Ay', 'Az', 'T/|U|', 'M_pe_Behroozi', 'M_pe_Diemer', 'Halfmass_Radius', 'rvmax', 'PID']

ds = vaex.read_csv(filepath_or_buffer='data.txt', delim_whitespace=True, comment='#', header=None, names=names, copy_index=False)

where data.txt is just the data you sent above copied to a plain text file, and names is a list with the names of each column, which I took from the data you sent.

Alternatively, you can set the header to be inferred. That requires the top non-comment line of the file to contain all column names. You can either edit the file to achieve this, or perhaps adjust the output of ROCKSTAR such that the header is a bit different.

Hope this helps. Please let me know if this works.

@fprada
Copy link
Author

fprada commented Jul 30, 2019

That's right! We are using ROCKSTAR to create the halo catalogs. In this case it is for a new two-trillion N-body simulation! So, ROCKSTAR provides an ASCII file for each time epoch. The ASCII file is huge, we have more than 4 billion halos! This is why we converted the ASCII file to hdf5, and also splitted to help with the file transfer.

OK. Good. Let me then follow your advise and use vaex.read_csv ...

Thank you!

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Hi Jovan,

I forgot to ask. Once we read the ASCII file in vaex how can we convert it into several hdf5 files?

Thanks!

@JovanVeljanoski
Copy link
Member

Once you read everything in:

df.export_hdf5('/somewhere/on/disk/file.hdf5', progress=True)

You may want to read through
https://vaex.readthedocs.io/en/latest/tutorial.html
just to maximize the value from using vaex.

Cheers

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Got it, thanks! Let me work on it. Keep in touch.
Best

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Hi,

vaex read the ASCII file well and it worked fine, great!

When I want to create a hdf5 version, following df.export_hdf5('/somewhere/on/disk/file.hdf5', progress=True) then I get this error


OSError Traceback (most recent call last)
in
----> 1 ds.export_hdf5("test.hdf5", progress=True)

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/dataframe.py in export_hdf5(self, path, column_names, byteorder, shuffle, selection, progress, virtual, sort, ascending)
5066 """
5067 import vaex.export
-> 5068 vaex.export.export_hdf5(self, path, column_names, byteorder, shuffle, selection, progress=progress, virtual=virtual, sort=sort, ascending=ascending)
5069
5070 def export_fits(self, path, column_names=None, shuffle=False, selection=False, progress=None, virtual=False, sort=None, ascending=True):

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/export.py in export_hdf5(dataset, path, column_names, byteorder, shuffle, selection, progress, virtual, sort, ascending)
340 kwargs = locals()
341 import vaex.hdf5.export
--> 342 vaex.hdf5.export.export_hdf5(**kwargs)
343
344

~/.conda/envs/vaexenv/lib/python3.7/site-packages/vaex/hdf5/export.py in export_hdf5(dataset, path, column_names, byteorder, shuffle, selection, progress, virtual, sort, ascending)
124 selection = "default"
125 # first open file using h5py api
--> 126 with h5py.File(path, "w") as h5file_output:
127
128 h5table_output = h5file_output.require_group("/table")

~/.conda/envs/vaexenv/lib/python3.7/site-packages/h5py/_hl/files.py in init(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
392 fid = make_fid(name, mode, userblock_size,
393 fapl, fcpl=make_fcpl(track_order=track_order),
--> 394 swmr=swmr)
395
396 if swmr_support:

~/.conda/envs/vaexenv/lib/python3.7/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
174 fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
175 elif mode == 'w':
--> 176 fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
177 elif mode == 'a':
178 # Open in append mode (read/write).

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.create()

OSError: Unable to create file (unable to truncate a file which is already open)

@maartenbreddels
Copy link
Member

Odd, are you maybe writing to a file you already opened? Can you change the filename?

@fprada
Copy link
Author

fprada commented Jul 30, 2019

Yeah, I change the filename and the error persist. Yet, I've noticed that the hdf5 file is created in the directory ...

@JovanVeljanoski
Copy link
Member

Hi @fprada

Can you tell me which version of h5py you have installed in the same env as vaex?

Can you try writing to a different directory altogether?
You can also try exporting to arrow or parquet format..

Also if that does not work, can you give us the output of
df.dtypes

Cheers

@JovanVeljanoski
Copy link
Member

(Ups sorry closed it by mistake).

On the positive side, I think i figured it out. I think some of the column names are too exotic for h5py, in particular things like 'T/|U|' and potentially 'A[x]' and 'c_to_a(500c)'.

I suggest to rename the column names to contain only letters (lower or upper case) and numbers, and underscores. Other characters such as [, (, \ / ? etc.. may raise issues. I am not sure if this is due to vaex, or h5py at this point.

Please try using more simple column names, and exporting than.

Cheers.

@maartenbreddels
Copy link
Member

@fprada
With the file example you gave I could reproduce your issue, but there were actually two issues.

The first time you run it, you see a different stacktrace than the second time (that confused me!).

The first time it got confused by T/|U|', which h5py interprets as a group 'T' with a dataset '|U'|. This should be fixed by #370 (I'll keep this open till it is released).

The second time it complains that the file is already open (which is the stacktrace you gave), I think we can improve that as well.

The workaround, for now, is what Jovan suggested:

import vaex
names = ['ID', 'DescID', 'Mvir', 'Vmax', 'Vrms', 'Rvir', 'Rs', 'Np', 'X', 'Y', 'Z', 'VX', 'VY', 'VZ', 'JX', 'JY', 'JZ', 'Spin', 'rs_klypin', 'Mvir_all', 'M200b', 'M200c', 'M500c', 'M2500c', 'Xoff', 'Voff', 'spin_bullock', 'b_to_a', 'c_to_a', 'A[x]', 'A[y]', 'A[z]', 'b_to_a(500c)', 'c_to_a(500c)', 'Ax', 'Ay', 'Az', 'T/|U|', 'M_pe_Behroozi', 'M_pe_Diemer', 'Halfmass_Radius', 'rvmax', 'PID']
names = [vaex.utils.find_valid_name(k) for k in names]
df = vaex.read_csv('somefile.csv', delim_whitespace=True, comment='#', header=None, names=names, copy_index=False)

@fprada
Copy link
Author

fprada commented Jul 31, 2019

Excellent, it works! Thanks very much Maarten and Jovan for your help. Now it creates the hdf5 file, and when I read it with vaex everything looks fine. Great.

Now, if I read a much bigger ascii file (230 GB) with vaex takes really long (still reading after 1.5 hrs, it's taking all 128GB RAM running on 1 CPU). Is there a way to speed up the reading? Why does it take all that RAM?

There are 661592956 rows in the original ascii file. Note that this is a file with only 1/8 of the entire Rockstar data, which contains about 5 billion rows for one redshift snapshot of the simulation :-)

Let me also mentioned that when I exported the previous ascii file to hdf5, I noticed that its size is about the same. Our hdf5 file created with c has about half-size. Is there a way in vaex to reduce the size (some compression?) when exporting hdf5? This is our main interest of having the data in hdf5 instead of ascii.

I should mentioned that our interest on vaex is to provide efficient manipulation and analysis of our data for the entire astronomical community. We do plan to have a first data release soon. Thanks again for all your support!

@JovanVeljanoski
Copy link
Member

Hi @fprada

I am happy to hear that it works.

Perhaps it is best to open another issue regarding any follow up questions, as to not divert this threat too much, but I will offer some advice here.

To your 1st point.. well you are trying to read a 230 GB file, but you only have 128 GB of ram, so that sets a limit on how much you can effectively read in memory at one time. Your computer is probably using the swap disk as an additional ram, but this is much slower, and is best avoided if possible.

How to deal with this: we will eventually provide support for converting larger-than-memory text (csv, ascii) files to hdf5 out-of-the box, but we are busy working on other stuff right now, so this will perhaps happen in a month or two.

In the meantime you can do the following: familiarize yourself with pandas.read_csv, it is what vaex uses to read csv/ascii files. You will see that pandas.read_csv supports reading chunks of files, so read only as many lines as you can fit into the RAM of your machine. Export that to hdf5. Then do the same with next portion of your massive text file and so on. You can write a loop/iterator to do this for you. At the end you will end up with a bunch of hdf5 files, which you can open all together with vaex.open_many, and the result will be a single DataFrame, just as if you opened a single massive hdf5 file. If you prefer to store a single hdf5 file, you can now export this DataFrame into a single hdf5 file and remove the smaller hdf5 files. There may be (small?) performance benefits to working with a single hdf5 file, but it should not matter much.

Once you have the data in hdf5, regardless of the size, you can work with the entire data, as vaex does memory mapping, so you are not actually reading the 200+GB into memory all at one time, as you would to a typical csv or ascii file. This is why we are converting the data to hdf5 (or arrow, or parquet).

About the size of the hdf5: when the ascii data is read by python, it is stored as float64 data type in memory, and as such it is exported to the hdf5 file, which takes more space than the few decimal places you have in the raw ascii file. What you could do is for instance use float32 if you do not need the extra precision. This way the data file will be smaller.

We would be very grateful if you cite/mention the use of this project if it helps you out :)

@fprada
Copy link
Author

fprada commented Jul 31, 2019

Hi,

FYI. after more than 2.5hr the reading hasn't finished ... Still going.

Thanks.

@fprada
Copy link
Author

fprada commented Jul 31, 2019

Thanks Jovan,

that's why we splitted the orignal ascii big file into several smaller hdf5 files. We have done that with our own c code. But unfortunately vaex cannot recognise our format. Likely because that issue you pointed out with the names of the columns. If we can solve this, then the best would be to use vaex.open_many to read the many hdf5 files.

I will take a look at the pandas.read_csv ...

It'd be a please to acknowledge vaex. Hopefully we will make use of it once we are able to make it work for our application ;-) It is an amazing tool! Congratulations.

Best.

@maartenbreddels
Copy link
Member

But unfortunately vaex cannot recognise our format.

It might be possible to get it compatible, but I'm not sure what is more work now.

Thanks for your positive words, glad you find it helpful.

I'll close this issue, feel free to open new ones for new issues.

cheers,

Maarten

@fprada
Copy link
Author

fprada commented Jul 31, 2019

Thank you! Thanks Maarten and Jovan.
I'll be back ;-)
Cheers,
Francisco.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants