Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick Overview docs page #62

Merged
merged 5 commits into from
Mar 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ DataTree
:toctree: generated/

DataTree
DataNode

Attributes
----------
Expand Down Expand Up @@ -51,6 +50,7 @@ Methods
.. autosummary::
:toctree: generated/

DataTree.from_dict
DataTree.load
DataTree.compute
DataTree.persist
Expand Down
6 changes: 4 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
"sphinx.ext.intersphinx",
"sphinx.ext.extlinks",
"sphinx.ext.napoleon",
"IPython.sphinxext.ipython_console_highlighting",
"IPython.sphinxext.ipython_directive",
]

extlinks = {
Expand Down Expand Up @@ -76,9 +78,9 @@
# built documents.
#
# The short X.Y version.
version = "0.0.0" # datatree.__version__
version = "0.0.1" # datatree.__version__
# The full version, including alpha/beta/rc tags.
release = "0.0.0" # datatree.__version__
release = "0.0.1" # datatree.__version__

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
14 changes: 9 additions & 5 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
Datatree
========

**Datatree is a WIP implementation of a tree-like hierarchical data structure for xarray.**
**Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.**


.. toctree::
:maxdepth: 2
:caption: Documentation Contents

installation
tutorial
api
contributing
Installation <installation>
Quick Overview <quick-overview>
Tutorial <tutorial>
API Reference <api>
How do I ... <howdoi>
Contributing Guide <contributing>
Development Roadmap <roadmap>
GitHub repository <https://github.com/TomNicholas/datatree>

Feedback
--------
Expand Down
19 changes: 18 additions & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,21 @@
Installation
============

Coming soon!
Datatree is not yet available on pypi or via conda, so for now you will have to install it from source.

``git clone https://github.com/TomNicholas/datatree.git```

``pip install -e ./datatree/``

The main branch will be kept up-to-date, so if you clone main and run the test suite with ``pytest datatree`` and get no failures,
then you have the most up-to-date version.

You will need xarray and `anytree <https://github.com/c0fec0de/anytree>`_
as dependencies, with netcdf4, zarr, and h5netcdf as optional dependencies to allow file I/O.

.. note::

Datatree is very much still in the early stages of development. There may be functions that are present but whose
internals are not yet implemented, or significant changes to the API in future.
That said, if you try it out and find some behaviour that looks like a bug to you, please report it on the
`issue tracker <https://github.com/TomNicholas/datatree/issues>`_!
83 changes: 83 additions & 0 deletions docs/source/quick-overview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
##############
Quick overview
##############

DataTrees
---------

:py:class:`DataTree` is a tree-like container of ``DataArray`` objects, organised into multiple mutually alignable groups.
You can think of it like a (recursive) ``dict`` of ``Dataset`` objects.

Let's first make some example xarray datasets (following on from xarray's
`quick overview <https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html>`_ page):

.. ipython:: python

import numpy as np
import xarray as xr

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi))
ds

ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]})
ds2

ds3 = xr.Dataset(
dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])),
coords={"species": "human"},
)
ds3

Now we'll put this data into a multi-group tree:

.. ipython:: python

from datatree import DataTree

dt = DataTree.from_dict(
{"root/simulation/coarse": ds, "root/simulation/fine": ds2, "root": ds3}
)
print(dt)

This creates a datatree with various groups. We have one root group (named ``root``), containing information about individual people.
The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups,
named ``fine`` and ``coarse``.

The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets.
They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable.
In (``root``) we placed some completely unrelated information, showing how we can use a tree to store heterogenous data.

The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset.

We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way.
We can access individual dataarrays in a similar fashion

.. ipython:: python

dt["simulation/coarse/foo"]

and we can also pull out the data in a particular group as a ``Dataset`` object using ``.ds``:

.. ipython:: python

dt["simulation/coarse"].ds

Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by

.. ipython:: python

avg = dt["simulation"].mean(dim="x")
print(avg)

Here the ``"x"`` dimension used is always the one local to that sub-group.

You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects
(including indexing and arithmetic), as operations will be mapped over every sub-group in the tree.
This allows you to work with multiple groups of non-alignable variables at once.

.. note::

If all of your variables are mutually alignable
(i.e. they live on the same grid, such that every common dimension name maps to the same length),
then you probably don't need :py:class:`DataTree`, and should consider just sticking with ``xarray.Dataset``.