# <center>Lesson 12: Saving Reloadable Data</center>
### <center>yt user/developer workshop, July 2025</center>

## Data analysis generates more data
* Analysis is a form of data reduction.
* We will likely produce intermediate data products that turn into the final plots/figures/images.
* The reduced products may be moved to a different filesystem from the original data.
* Ideal properties of reduced data:
  * consistent with original data (same naming conventions, units, etc.)
  * just as easy to ingest
  * returns us to a state similar as to when it was produced

## The [save_as_dataset](https://yt-project.org/docs/dev/reference/api/yt.frontends.ytdata.utilities.html#yt.frontends.ytdata.utilities.save_as_dataset) function:
* saves data in a format that can be loaded with `yt.load`
* can retain relevant metadata and units of original dataset (if there is one)
* totally generic array data, data containers, plot data, profiles
* HDF5 under the hood

## Totally generic data
* no dataset required
* with or without units

### Make some random data

In [None]:
import numpy as np
import unyt
import yt

In [None]:
array1 = np.random.random((10, 10))
# give this one some units
array2 = unyt.unyt_array(np.random.normal(loc=88., scale=2., size=(50)), "mile/hr")

# store the data in a dictionary
my_data = {"numbers": array1, "speed": array2}

### Save it

In [None]:
fn = yt.save_as_dataset({}, filename="my_data.h5", data=my_data)

### Reload it
* access through `data` attribute (**caveat: no other data containers available**)
* by default, field are of type `data`

In [None]:
my_ds = yt.load(fn)

In [None]:
my_ds.field_list

In [None]:
my_ds.data["data", "speed"]

### Saving additional metadata
* the `extra_attrs` keyword accepts a dictionary that can hold almost anything

In [None]:
my_extra_attrs = {"date": "November 5, 1955",
                  "important_value": unyt.unyt_quantity(30, "yr")}

In [None]:
fn = yt.save_as_dataset({}, filename="my_data.h5", data=my_data, extra_attrs=my_extra_attrs)

In [None]:
my_ds = yt.load(fn)

### Access additional metadata through the `parameters` attribute.

In [None]:
my_ds.parameters

In [None]:
my_ds.parameters["important_value"] * 2

### Retaining information from a parent dataset
* replace the first argument with a dataset

In [None]:
ds = yt.load("/Users/britton/EnzoRuns/yt-workshop-2025/primordial_star/DD0157/DD0157")

In [None]:
fn = yt.save_as_dataset(ds, filename="my_data.h5", data=my_data, extra_attrs=my_extra_attrs)

In [None]:
my_ds = yt.load(fn)

In [None]:
# unitful values have access to unit system of original dataset
my_ds.data["data", "speed"].to("code_velocity")

## Saving data containers
* most data containers have an associated `save_as_dataset` object method
* accepts a list of fields or saves the fields that have already been queried

In [None]:
value, center = ds.find_max(("gas", "density"))
sp = ds.sphere(center, (5, "pc"))

In [None]:
fn = sp.save_as_dataset(fields=[("gas", "density"),
                                ("gas", "temperature"),
                                ("gas", "cell_mass"),
                                ("all", "particle_mass")])

In [None]:
sp_ds = yt.load(fn)

### Access data through regular data containers.
* grid fields have `grid` type (usually with `gas` alias)
* particle fields have original type
* position information for grid and particle data will be saved automatically
* **caveat:** reloaded dataset is technically a particle type now, so grid-based functionality is limited (e.g. slices, projections)

In [None]:
ad = sp_ds.all_data()

In [None]:
ad["grid", "density"]

In [None]:
ad["all", "particle_mass"].to("Msun")

### Make a profile from the reloaded sphere dataset

In [None]:
profile = yt.create_profile(ad, [("gas", "density")], [("gas", "temperature")],
                            weight_field=("gas", "cell_mass"))

In [None]:
from matplotlib import pyplot as plt
import matplotlib
%matplotlib inline

T_mean = profile["gas", "temperature"]
T_std = profile.standard_deviation["gas", "temperature"]
plt.loglog(profile.x, T_mean)
plt.fill_between(profile.x, y1=T_mean-T_std, y2=T_mean+T_std, alpha=0.5)
plt.xlabel("$\\rho\\ [g/cm^{3}]$")
plt.ylabel("T [K]")
plt.show()

## Saving profiles
* Profiling is usually the expensive part; we should save it.

In [None]:
value, center = ds.find_max(("gas", "density"))
sp = ds.sphere(center, (5, "pc"))

bin_fields = [("gas", "density"),
              ("gas", "temperature")]
profile_fields = [("gas", "cell_mass")]

profile = yt.create_profile(sp, bin_fields, profile_fields, weight_field=None)

In [None]:
fn = profile.save_as_dataset()

In [None]:
prof_ds = yt.load(fn)

### The `profile` attribute has the same properties and functionality as the original profile object

In [None]:
prof_ds.profile.x
# prof_ds.profile.y
# prof_ds.profile.x_bins
# prof_ds.profile.y_bins
# prof_ds.profile.used

In [None]:
prof_ds.profile["data", "cell_mass"]

In [None]:
from yt.utilities.physical_constants import mh
X = prof_ds.profile.x / mh
Y = prof_ds.profile.y
Z = profile["gas", "cell_mass"].to("Msun").T

plt.xscale("log")
plt.yscale("log")
plt.xlabel("$\\rho\\ [1/cm^{3}]$")
plt.ylabel("T [K]")

my_norm = matplotlib.colors.LogNorm(vmin=Z[Z>0].min(), vmax=Z.max())
my_plot = plt.pcolormesh(X, Y, Z, norm=my_norm)
plt.colorbar(my_plot, label="$M_{gas}\\ [M_{\\odot}]$")
plt.show()

## What else can you do with `save_as_dataset`?
* [save plot datasets](https://yt-project.org/docs/dev/visualizing/plots.html#remaking-plots)
* [save covering grids](https://yt-project.org/docs/dev/analyzing/saving_data.html#grid-data-containers): these will continue to act like grid datasets on reload
* [save image data](https://yt-project.org/docs/dev/analyzing/saving_data.html#grid-data-containers) (further down in above link): 2D fixed resolution buffers of projections/slices
* pipe generic data into other tools with yt interface (example: halo finders)
* save supporting metadata
  * maps between output name and time/redshift (can save string data) for an entire simulation
  * unit systems, runtime parameters, etc.