-
Notifications
You must be signed in to change notification settings - Fork 120
Overview notebook edits #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,23 +36,13 @@ | |
| "%config InlineBackend.figure_format='retina'" | ||
scottyhq marked this conversation as resolved.
Show resolved
Hide resolved
scottyhq marked this conversation as resolved.
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "See the documentation for more"
There's a bunch more places we could do this. I don't know why I didn't do it the first time around. Reply via ReviewNB |
||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# load tutorial dataset\n", | ||
| "ds = xr.tutorial.load_dataset(\"air_temperature\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## What's in a Dataset? \n", | ||
| "Xarray has a few small real-world tutorial datasets hosted in this GitHub repository https://github.com/pydata/xarray-data\n", | ||
| "\n", | ||
| "*Many DataArrays!*" | ||
| "[xarray.tutorial.load_dataset](https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html#xarray.tutorial.open_dataset) is a convenience function to download and open DataSets by name. Here we'll use `air temperature` from National Centers for Environmental Prediction. Xarray objects have convenient HTML representations to give an overview of what we're working with:" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -61,16 +51,27 @@ | |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# dataset repr\n", | ||
| "ds = xr.tutorial.load_dataset(\"air_temperature\")\n", | ||
| "ds" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "Datasets are dict-like containers of DataArrays i.e. they are a mapping of\n", | ||
| "variable name to DataArray.\n" | ||
| "Note that behind the scenes the [`xarray.open_dataset`](https://docs.xarray.dev/en/latest/generated/xarray.open_dataset.html#xarray-open-dataset) function is opening this tutorial data with the \"netCDF engine\" because the data is stored in that format. A few things are done automatically upon opening, but controlled by keyword arguments. For example, try passing the keyword argument `mask_and_scale=False`... what happens?" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## What's in a Dataset? \n", | ||
| "\n", | ||
| "*Many DataArrays!* \n", | ||
| "\n", | ||
| "Datasets are dictionay-like containers of DataArrays. They are a mapping of\n", | ||
| "variable name to DataArray:" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -121,7 +122,9 @@ | |
| "\n", | ||
| "### Named dimensions \n", | ||
| "\n", | ||
| "(`.dims`)" | ||
| "`.dims` correspond to the axes of your data. \n", | ||
| "\n", | ||
| "In this case we have 2 spatial dimensions (`latitude` and `longitude` are store with shorthand names `lat` and `lon`) and one spatial dimension (`time`)." | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -139,10 +142,10 @@ | |
| "source": [ | ||
| "### Coordinate variables \n", | ||
| "\n", | ||
| "(`.coords`)\n", | ||
| "\n", | ||
| "`.coords` is a simple [data container](https://xarray.pydata.org/en/stable/data-structures.html#coordinates)\n", | ||
| "for coordinate variables.\n" | ||
| "for coordinate variables.\n", | ||
| "\n", | ||
| "Here we see the actual timestamps and spatial positions of our air temperature data:" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -196,10 +199,8 @@ | |
| "source": [ | ||
| "### Arbitrary attributes \n", | ||
| "\n", | ||
| "(`.attrs`)\n", | ||
| "\n", | ||
| "`.attrs` is a dictionary that can contain arbitrary python objects. Your only\n", | ||
| "limitation is that some attributes may not be writeable to a certain file formats\n" | ||
| "`.attrs` is a dictionary that can contain arbitrary Python objects (strings, lists, integers, dictionaries, etc.) Your only\n", | ||
| "limitation is that some attributes may not be writeable to certain file formats." | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -217,7 +218,7 @@ | |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# assign your own attribute\n", | ||
| "# assign your own attributes!\n", | ||
| "ds.air.attrs[\"who_is_awesome\"] = \"xarray\"\n", | ||
| "ds.air.attrs" | ||
| ] | ||
|
|
@@ -228,13 +229,11 @@ | |
| "source": [ | ||
| "### Underlying data \n", | ||
| "\n", | ||
| "(`.data`)\n", | ||
| "`.data` contains the [numpy array](https://numpy.org) storing air temperature values.\n", | ||
| "\n", | ||
| "Xarray structures wrap underlying simpler data structures. In this case, the\n", | ||
| "underlying data is a numpy array which you may be familiar with.\n", | ||
| "<img src=\"https://raw.githubusercontent.com/numpy/numpy/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/branding/logo/primary/numpylogo.svg\" width=\"25%\">\n", | ||
| "\n", | ||
| "This part of xarray is quite extensible allowing for GPU arrays, sparse arrays,\n", | ||
| "arrays with units etc. See the demo at the end.\n" | ||
| "Xarray structures wrap underlying simpler data structures. This part of Xarray is quite extensible allowing for GPU arrays, sparse arrays, arrays with units etc. which we'll look at later in this tutorial." | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -260,31 +259,17 @@ | |
| "type(ds.air.data)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "A numpy array!\n", | ||
| "\n", | ||
| "<img src=\"https://raw.githubusercontent.com/numpy/numpy/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/branding/logo/primary/numpylogo.svg\" style=\"width:20%\">\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Review\n", | ||
| "\n", | ||
| "Xarray provides two main data structures\n", | ||
| "\n", | ||
| "- DataArrays that wrap underlying data containers (e.g. numpy arrays) and\n", | ||
| " contain associated metadata\n", | ||
| "- Datasets that are dict-like containers of DataArrays\n", | ||
| "Xarray provides two main data structures:\n", | ||
| "\n", | ||
| "For more see\n", | ||
| "1. [`DataArrays`](https://xarray.pydata.org/en/stable/data-structures.html#dataarray) that wrap underlying data containers (e.g. numpy arrays) and contain associated metadata\n", | ||
| "\n", | ||
| "- https://xarray.pydata.org/en/stable/data-structures.html#dataset\n", | ||
| "- https://xarray.pydata.org/en/stable/data-structures.html#dataarray\n" | ||
| "1. [`DataSets`](https://xarray.pydata.org/en/stable/data-structures.html#dataset) that are dictionary-like containers of DataArrays" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -295,7 +280,7 @@ | |
| "\n", | ||
| "## Why Xarray? \n", | ||
| "\n", | ||
| "Use metadata for fun and ~profit~ papers!\n", | ||
| "Metadata provides context and provides code that is more legible. This reduces the likelihood of errors from typos and makes analysis more intuitive and fun!\n", | ||
| "\n", | ||
| "### Analysis without xarray: `X(`\n" | ||
| ] | ||
|
|
@@ -365,14 +350,12 @@ | |
| "\n", | ||
| "## Extracting data or \"indexing\" \n", | ||
| "\n", | ||
| "(`.sel`, `.isel`)\n", | ||
| "\n", | ||
| "Xarray supports\n", | ||
| "\n", | ||
| "- label-based indexing using `.sel`\n", | ||
| "- position-based indexing using `.isel`\n", | ||
| "\n", | ||
| "For more see https://xarray.pydata.org/en/stable/indexing.html\n" | ||
| "See the documentation for more: https://xarray.pydata.org/en/stable/indexing.html\n" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -486,25 +469,33 @@ | |
| "source": [ | ||
| "---\n", | ||
| "\n", | ||
| "## Concepts for computation\n" | ||
| "## Concepts for computation\n", | ||
| "\n", | ||
| "Consider calculating the *mean air temperature per unit surface area* for this dataset. Because latitude and longitude correspond to spherical coordinates for Earth's surface, each 2.5x2.5 degree grid cell actually has a different surface area as you move away from the equator! This is because *latitudinal length* is fixed ($ \\delta Lat = R \\delta \\phi $), but *longitudinal length varies with latitude* ($ \\delta Lon = R \\delta \\lambda \\cos(\\phi) $)\n", | ||
| "\n", | ||
| "So the [area element for lat-lon coordinates](https://en.wikipedia.org/wiki/Spherical_coordinate_system#Integration_and_differentiation_in_spherical_coordinates) is\n", | ||
| "\n", | ||
| "\n", | ||
| "$$ \\delta A = R^2 \\delta \\phi \\delta \\lambda \\cos(\\phi) $$\n", | ||
| "\n", | ||
| "where $\\phi$ is latitude, $\\delta \\phi$ is the spacing of the points in latitude, $\\delta \\lambda$ is the spacing of the points in longitude, and $R$ is Earth's radius. (In this formula, $\\phi$ and $\\lambda$ are measured in radians)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "### Broadcasting: expanding data\n", | ||
| "\n", | ||
| "Let's try to calculate grid cell area associated with the air temperature data.\n", | ||
| "We may want this to make a proper area-weighted domain-average for example\n", | ||
| "# Earth's average radius\n", | ||
| "R = 6.371e6\n", | ||
| "\n", | ||
| "A very approximate formula is\n", | ||
| "# Coordinate spacing for this dataset is 2.5 x 2.5 degrees\n", | ||
| "dϕ = np.deg2rad(2.5)\n", | ||
| "dλ = np.deg2rad(2.5)\n", | ||
| "\n", | ||
| "$$\n", | ||
| "Δlat \\times Δlon \\times \\cos(\\text{latitude}) \n", | ||
| "$$\n", | ||
| "\n", | ||
| "assuming that $Δlon$ = 111km and $Δlat$ = 111km\n" | ||
| "dlat = R * dϕ * xr.ones_like(ds.air.lon)\n", | ||
| "dlon = R * dλ * np.cos(np.deg2rad(ds.air.lat))" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -513,8 +504,8 @@ | |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "dlon = np.cos(ds.air.lat * np.pi / 180) * 111e3\n", | ||
| "dlon" | ||
| "# cell latitude length is constant with longitude\n", | ||
| "dlat" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -523,8 +514,17 @@ | |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "dlat = 111e3 * xr.ones_like(ds.air.lon)\n", | ||
| "dlat" | ||
| "# cell longitude length changes with latitude\n", | ||
| "dlon" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Broadcasting: expanding data\n", | ||
| "\n", | ||
| "Our longitude and latitude length DataArrays are both 1D with different dimension names. If we multiple these DataArrays together the dimensionality is expanded to 2D via `broadcasting`:" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -562,7 +562,7 @@ | |
| "\n", | ||
| "When doing arithmetic operations xarray automatically \"aligns\" i.e. puts the\n", | ||
| "data on the same grid. In this case `cell_area` and `ds.air` are at the same\n", | ||
| "lat, lon points so things are multiplied as you would expect\n" | ||
| "lat, lon points we end up with a result with the same shape (25x53):\n" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -571,7 +571,7 @@ | |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "(cell_area * ds.air.isel(time=1))" | ||
| "ds.air.isel(time=1) / cell_area" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -590,7 +590,7 @@ | |
| "# make a copy of cell_area\n", | ||
| "# then add 1e-5 degrees to latitude\n", | ||
| "cell_area_bad = cell_area.copy(deep=True)\n", | ||
| "cell_area_bad[\"lat\"] = cell_area.lat + 1e-5\n", | ||
| "cell_area_bad[\"lat\"] = cell_area.lat + 1e-5 # latitudes are off by 1e-5 degrees!\n", | ||
| "cell_area_bad" | ||
| ] | ||
| }, | ||
|
|
@@ -607,6 +607,8 @@ | |
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "The result is an empty array with no latitude coordinates because none of them were aligned!\n", | ||
| "\n", | ||
| "**Tip:** If you notice extra NaNs or missing points after xarray computation, it\n", | ||
| "means that your xarray coordinates were not aligned _exactly_.\n", | ||
| "\n", | ||
|
|
@@ -716,7 +718,7 @@ | |
| "outputs": [], | ||
| "source": [ | ||
| "# weight by cell_area and take mean over (time, lon)\n", | ||
| "ds.weighted(cell_area).mean([\"lon\", \"time\"]).air.plot()" | ||
| "ds.weighted(cell_area).mean([\"lon\", \"time\"]).air.plot();" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -744,7 +746,7 @@ | |
| "outputs": [], | ||
| "source": [ | ||
| "# facet the seasonal_mean\n", | ||
| "seasonal_mean.air.plot(col=\"season\")" | ||
| "seasonal_mean.air.plot(col=\"season\");" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -754,7 +756,7 @@ | |
| "outputs": [], | ||
| "source": [ | ||
| "# contours\n", | ||
| "seasonal_mean.air.plot.contour(col=\"season\", levels=20, add_colorbar=True)" | ||
| "seasonal_mean.air.plot.contour(col=\"season\", levels=20, add_colorbar=True);" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -764,7 +766,16 @@ | |
| "outputs": [], | ||
| "source": [ | ||
| "# line plots too? wut\n", | ||
| "seasonal_mean.air.mean(\"lon\").plot.line(hue=\"season\", y=\"lat\")" | ||
| "seasonal_mean.air.mean(\"lon\").plot.line(hue=\"season\", y=\"lat\");" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "ds" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -789,6 +800,15 @@ | |
| "ds.to_netcdf(\"my-example-dataset.nc\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "```{note}\n", | ||
| "To avoid the `SerializationWarning` you can assign a _FillValue for any NaNs in 'air' array by adding the keyword argument encoding=dict(air={_FillValue=-9999})\n", | ||
| "```" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
|
|
@@ -879,20 +899,20 @@ | |
| "\n", | ||
| "Xarray can wrap other array types! For example:\n", | ||
| "\n", | ||
| "<img src=\"https://docs.dask.org/en/latest/_static/images/dask-horizontal-white.svg\" style=\"width:25%\">\n", | ||
| "<img src=\"https://docs.dask.org/en/latest/_static/images/dask-horizontal-white.svg\" width=\"25%\">\n", | ||
| "\n", | ||
| "**dask** : parallel arrays https://xarray.pydata.org/en/stable/dask.html &\n", | ||
| "https://docs.dask.org/en/latest/array.html\n", | ||
| "\n", | ||
| "<img src=\"https://sparse.pydata.org/en/stable/_images/logo.png\" style=\"width:12%\">\n", | ||
| "<img src=\"https://sparse.pydata.org/en/stable/_images/logo.png\" width=\"12%\">\n", | ||
| "\n", | ||
| "**pydata/sparse** : sparse arrays http://sparse.pydata.org\n", | ||
| "\n", | ||
| "<img src=\"https://raw.githubusercontent.com/cupy/cupy.dev/master/images/cupy_logo.png\" style=\"width:22%\">\n", | ||
| "<img src=\"https://raw.githubusercontent.com/cupy/cupy.dev/master/images/cupy_logo.png\" width=\"22%\">\n", | ||
| "\n", | ||
| "**cupy** : GPU arrays http://cupy.chainer.org\n", | ||
| "\n", | ||
| "<img src=\"https://pint.readthedocs.io/en/stable/_images/logo-full.jpg\" style=\"width:10%\">\n", | ||
| "<img src=\"https://pint.readthedocs.io/en/stable/_images/logo-full.jpg\" width=\"10%\">\n", | ||
| "\n", | ||
| "**pint** : unit-aware computations https://pint.readthedocs.org &\n", | ||
| "https://github.com/xarray-contrib/pint-xarray\n" | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.