# Reading and writing files

One of Xarray's most widely used features is its ability to [read from and write
to a variety of data formats](https://docs.xarray.dev/en/stable/user-guide/io.html). 
For example, Xarray can read the following formats using `open_dataset`/`open_mfdataset`:

- [NetCDF](https://www.unidata.ucar.edu/software/netcdf/)
- [Zarr](https://zarr.readthedocs.io/en/stable/)

Support for additional formats is possible using external packages
- [GRIB](https://en.wikipedia.org/wiki/GRIB) using the [cfgrib](https://github.com/ecmwf/cfgrib) package
- [GeoTIFF](https://gdal.org/drivers/raster/gtiff.html) /
  [GDAL rasters](https://svn.osgeo.org/gdal/tags/gdal_1_2_5/frmts/formats_list.html)
  using the [rioxarray package](https://corteva.github.io/rioxarray/stable/)

<img src="https://www.unidata.ucar.edu/images/logos/netcdf-400x400.png" align="right" width="20%">

## NetCDF

The recommended way to store xarray data structures is NetCDF, which is a binary
file format for self-described datasets that originated in the geosciences.
Xarray is based on the netCDF data model, so netCDF files on disk directly
correspond to Dataset objects.

Xarray reads and writes to NetCDF files using the `open_dataset` /
`open_dataarray` functions and the `to_netcdf` method.

Let's first create some datasets and write them to disk using `to_netcdf`, which
takes the path we want to write to:


In [None]:
import numpy as np
import xarray as xr

# Ensure random arrays are the same each time
np.random.seed(0)

The constructor of `Dataset` takes three parameters:

- `data_vars`: dict-like mapping names to values. It has the format described in
  [coordinates](#coordinates) except we need to use either `DataArray` objects
  or the tuple syntax since we have to provide dimensions
- `coords`: same as for `DataArray`
- `attrs`: same as for `Dataset`

In [None]:
ds1 = xr.Dataset(
    data_vars={
        "a": (("x", "y"), np.random.randn(4, 2)),
        "b": (("z", "x"), np.random.randn(6, 4)),
    },
    coords={
        "x": np.arange(4),
        "y": np.arange(-2, 0),
        "z": np.arange(-3, 3),
    },
)
ds2 = xr.Dataset(
    data_vars={
        "a": (("x", "y"), np.random.randn(7, 3)),
        "b": (("z", "x"), np.random.randn(2, 7)),
    },
    coords={
        "x": np.arange(6, 13),
        "y": np.arange(3),
        "z": np.arange(3, 5),
    },
)

# write datasets
ds1.to_netcdf("ds1.nc")
ds2.to_netcdf("ds2.nc")

# write dataarray
ds1.a.to_netcdf("da1.nc")

Reading those files is just as simple:


In [None]:
xr.open_dataset("ds1.nc")

In [None]:
xr.open_dataarray("da1.nc")

<img src="https://zarr.readthedocs.io/en/stable/_static/logo1.png" align="right" width="20%">


## Zarr

[Zarr](https://zarr.readthedocs.io/en/stable/) is a Python package and data
format providing an implementation of chunked, compressed, N-dimensional arrays.
Zarr has the ability to store arrays in a range of ways, including in memory, in
files, and in cloud-based object storage such as Amazon S3 and Google Cloud
Storage. Xarray’s Zarr backend allows xarray to leverage these capabilities.

Zarr files can be written with:


In [None]:
ds1.to_zarr("ds1.zarr", mode="w")

We can then read the created file with:


In [None]:
xr.open_zarr("ds1.zarr", chunks=None)

setting the `chunks` parameter to `None` avoids `dask` (more on that in a later
session)


**tip:** You can write to any diictionary-like (`MutableMapping`) interface:

In [None]:
mystore = {}

ds1.to_zarr(store=mystore)

## Raster files using rioxarray

[rioxarray](https://corteva.github.io/rioxarray/) is an *Xarray extension* that allows reading and writing a wide variety of geospatial image formats compatible with Geographic Information Systems (GIS), for example GeoTIFF.

If rioxarray is installed your environment it will be automatically detected and give you access to the `.rio` accessor:

In [None]:
da = xr.DataArray(
    data=ds1.a.data,
    coords={
        "y": np.linspace(47.5, 47.8, 4),
        "x": np.linspace(-122.9, -122.7, 2),
    },
)

# Add Geospatial Coordinate Reference https://epsg.io/4326
# this is stored as a 'spatial_ref' coordinate
da.rio.write_crs("epsg:4326", inplace=True)
da

In [None]:
da.rio.to_raster('ds1_a.tiff')

NOTE: you can now load this file into GIS tools like [QGIS](https://www.qgis.org)! Or open back into Xarray:

In [None]:
DA = xr.open_dataarray('ds1_a.tiff', engine='rasterio')
DA.rio.crs