# <center>Lesson 3: on-disk fields and basic data containers</center>
### <center>yt user/developer workshop, July 2025</center>

# Covered in this lesson:
* what data are present?
* geometric data containers
* querying field data
* unitful arrays and quantities
* field aliases
* profiles

## Previous concepts:
* **dataset**: a collection of data that we load at once. This holds metadata and is cheap to load.
* **frontend**: how we define one distinct source of data. This gives us the **index**, the object that understands the relationship between the spatial layout of the data and the file layout.

#### We have not yet accessed any actual data (other than metadata).

## New concepts:
* **field**: an array of values describing a quantity associated with each element in the `dataset`. This is the data we want. Examples: the gas densities of the grid cells, the positions of the particles, the brightness of the pixels.
* **data container**: an object containing one or more elements of a `dataset`. It provides access to `fields` for all the elements it contains.

## Load a dataset

In [None]:
import yt

ds = yt.load("/Users/britton/EnzoRuns/yt-workshop-2025/primordial_star/DD0157/DD0157")

## On-Disk Fields

These are the fields saved within the dataset. Their names are defined by the frontend.

### What fields are present?

The `field_list` will find this out from the `dataset`.

In [None]:
ds.field_list

### Field naming convention: all fields are named by a 2-element tuple of `(field type, field name)`

### Notes about creating the `field_list`
* This is the first expensive operation. The `index` gets built.
* The available fields are being detected by inspecting the data.
* The field names are defined by the dataset's `frontend` (mostly).

### Use `field_info` to get more information on each field.

In [None]:
ds.field_info["enzo", "Density"]

In [None]:
ds.field_info["enzo", "x-velocity"]

In [None]:
ds.field_info["enzo", "GasEnergy"]

In [None]:
ds.field_info["io", "particle_mass"]

### `ds.fields` is an even better way to do this.
* `ds.fields.<field type>.<field name>`
* can use `dir` or tab-completion
  * `dir(ds.fields)` for available types
  * `dir(ds.fields.<field type>` for all fields of this type

In [None]:
ds.fields.nbody.particle_mass

## Data Containers

Data containers (aka "data objects") are objects that provide access to field data.

yt Documentation: [all available data containers](https://yt-project.org/docs/dev/analyzing/objects.html#available-objects)



### Geometric data containers
* For 3D shapes: they provide access to elements contained within them. All grid cells in which the cell center is within the boundary are considered contained.
* For 0D, 1D, 2D: they access intersecting elements or elements that contain them.

### Some 3D data containers:
* [sphere](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.spheroids.html#yt.data_objects.selection_objects.spheroids.YTSphere)
* [disk](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.disk.html#yt.data_objects.selection_objects.disk.YTDisk)
* [region](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.region.html#yt.data_objects.selection_objects.region.YTRegion) (rectangular prism)
* [ellipsoid](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.spheroids.html#yt.data_objects.selection_objects.spheroids.YTEllipsoid)
* [all data](https://yt-project.org/docs/dev/reference/api/yt.data_objects.static_output.html#yt.data_objects.static_output.Dataset.all_data) (shortcut for `region` with domain boundaries for corners)

### Some 0/1/2D data containers:
* [point](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.point.html#yt.data_objects.selection_objects.point.YTPoint)
* [ray](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.point.html#yt.data_objects.selection_objects.point.YTPoint) (line with start/end point)
* [cutting-plane](https://yt-project.org/docs/dev/reference/api/yt.data_objects.selection_objects.slices.html#yt.data_objects.selection_objects.slices.YTCuttingPlane) (2D plane or "slice" with arbitrary normal vector)

## Creating data containers

In [None]:
center = ds.domain_center
radius = 0.1 * ds.domain_width[0]

In [None]:
sp = ds.sphere(center, radius)

### Data access: query a data container like a dictionary with the full name of the field

In [None]:
sp["enzo", "Density"]

### What is returned from a query?
* a `unyt` array: a NumPy array with symbolic units that are aware of the dataset's internal units
* an ordered 1D array: the order of elements (grid cells/particles) is always the same. This allows meaningful array operations.
* Data read from disk the first time and grabbed from cache after (`clear_data` deletes the cache).

In [None]:
sp["enzo", "Density"].to("g/cm**3")

In [None]:
sp["io", "particle_mass"].to("Msun")

In [None]:
sp["enzo", "Density"] * sp["enzo", "Temperature"]

In [None]:
sp["enzo", "Density"]

In [None]:
sp.clear_data()
sp["enzo", "Density"]

## Position data for grid cells

The "index" field type defines position fields in several references frames especially relevant to some data containers.
* "x[y,z]"
* "spherical_radius[theta,phi]"
* "cylindrical_radius[theta,z]"
These are technically "derived fields" (covered later in Lesson 5).

In [None]:
sp["index", "x"]

In [None]:
sp["index", "spherical_radius"]

## More on unitful arrays and quantities

These are of type `unyt_array` and `unyt_quantity` from the [unyt](https://unyt.readthedocs.io/en/stable/) package (spun off from yt).

In [None]:
from unyt import unyt_array, unyt_quantity

In [None]:
some_array = unyt_array([1., 2., 3.], "kg")
some_quantity = unyt_array(50000, "km/hr**2")

In [None]:
(some_array * some_quantity).to("N")

#### Caveat: general `unyt` objects don't know about the dataset's internal units 

In [None]:
# this won't work
# some_array.to("code_mass")

#### Instead, use `ds.arr` and `ds.quan`.

In [None]:
another_array = ds.arr([1., 2., 3.], "kg")
another_array.to("code_mass")

#### Caveat for cosmology simulations: comoving and proper reference frames

Generally speaking, lengths are in the proper frame and appending "cm" returns them in the comoving frame.

In [None]:
print (ds.domain_width.to("Mpc"))
print (ds.domain_width.to("Mpccm"))
print (ds.domain_width.to("Mpccm/h"))

Take great care when working with multiple cosmological snapshots at once. For length-related quantities, use "unitary" as it is known to always be a constant. This is a unit system normalized to the size of the box.

In [None]:
print (ds.domain_width.to("unitary"))

## Field aliases

Field types and names exist for quantities that many data formats have in common. Fields are all named using the lowercase-underscore convention. Data is returned in universally defined units (i.e., not frontend-specific).

The best example of this is **"gas"**.

In [None]:
sp["gas", "density"]

The "gas" field type also defined several other common "derived fields" (Lesson 5).

## Profiles

In [None]:
# YOU ARE HERE