# <center>Lesson 4: Derived quantities</center>
### <center>yt user/developer workshop, July 2025</center>

# Covered in this lesson:
- what are derived quantities?
- why do we need them?
- how to compute them?
- how to write your own derived quantities?

## Previous concepts:
* **field**: an array of values describing a quantity associated with each element in the `dataset`. This is the data we want. Examples: the gas densities of the grid cells, the positions of the particles, the brightness of the pixels.
* **data container**: an object containing one or more elements of a `dataset`. It provides access to `fields` for all the elements it contains.

## Load a dataset and some data container

In [10]:
import yt
import numpy as np

In [11]:
ds = yt.load_sample("output_00080")

# We load everything, but we could very well load only a subset of the data.
ad = ds.all_data()

# We also load a sphere centered on the maximum density.
center = ad.argmax(("gas", "density"))
sp = ds.sphere(center, (100, "kpc"))

yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,390 Sample dataset found in '/home/cphyc/Documents/prog/yt-data/output_00080/info_00080.txt'
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,696 Parameters: current_time              = 11.925285011256845 Gyr
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,697 Parameters: domain_dimensions         = [64 64 64]
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,697 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,698 Parameters: domain_right_edge         = [1. 1. 1.]
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,699 Parameters: cosmological_simulation   = True
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,699 Parameters: current_redshift          = 0.14255728632206321
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,699 Parameters: omega_lambda              = 0.723999977111816
yt : [[32mINFO[0m     ] 2025-07-15 12:09:07,700 Parameters: omega_matter              = 0.276000022888184
yt : [[32mINFO[0m     

# Derived quantities - what's in a name?

A derived quantity is a quantity that is not stored in the dataset, but can be computed from the fields that are stored in the dataset.

### Doing things manually

Let's try to implement a simple summation of the total mass in the dataset.

In [12]:
Mcell = ad["gas", "cell_mass"]
Mpart = ad["all", "particle_mass"]

Mtot = Mcell.sum() + Mpart.sum()
Mtot.to("Msun")

unyt_quantity(1.35570908e+16, 'Msun')

While this is fine, a potential problem is that we now have loaded the entirety of the dataset in memory!

In [13]:
print(f"{Mcell.size / 1e6} Mib")
print(f"{Mpart.size / 1e6} Mib")

1.749455 Mib
1.090895 Mib


What if we want to compute the total mass in the sphere?

In [14]:
Mcell = sp["gas", "cell_mass"]
Mpart = sp["all", "particle_mass"]

Mtot = Mcell.sum() + Mpart.sum()
Mtot.to("Msun")

yt : [[32mINFO[0m     ] 2025-07-15 12:11:11,530 Identified     4/   16 intersecting domains (    5 through hilbert key indexing)


unyt_quantity(4.38037979e+11, 'Msun')

What you can see is that most of the code is the same, but we have changed the data container it acts on.
To make things clean (and to avoid repeating ourselves), this _could_ be nicely wrapped in a function:

In [15]:
def total_mass(data):
    return data["gas", "cell_mass"].sum() + data["all", "particle_mass"].sum()

print(f"Total mass: {total_mass(ad)}, in sphere: {total_mass(sp)}")

Total mass: 2.695713431072103e+49 g, in sphere: 8.7100166557051e+44 g


but *yt has your back.*

## Getting started: sum, averages, etc.

In [16]:
Mtot = ad.quantities.total_mass()
print(f"Total mass: {sum(Mtot)}, in sphere: {sum(sp.quantities.total_mass())}")

Total mass: 2.695713431072103e+49 g, in sphere: 8.710016655705098e+44 g


Unsuprisingly, we get the same result as before. But we can also compute more complex quantities.

### Weighted averages
This computes
$$ \langle q \rangle = \frac{1}{\sum w_i} \sum w_i q_i. $$

Again, we could compute this manually but it quickly becomes annoying to handle taking weights of different quantities, etc.

In [17]:
# Weight with w_i = 1
ad.quantities.weighted_average_quantity(
    [("gas", "temperature"), ("gas", "number_density"), ("index", "dx")],
    weight=("index", "ones"),
)



[unyt_quantity(63209.57283701, 'K/dimensionless'),
 unyt_quantity(0.00123836, '1/(cm**3*dimensionless)'),
 unyt_quantity(0.0036679, 'code_length/dimensionless')]

In [18]:
# Mass-weighting
ad.quantities.weighted_average_quantity(
    [("gas", "temperature"), ("gas", "number_density"), ("index", "dx")],
    weight=("gas", "cell_mass"),
)

[unyt_quantity(1860174.32956431, 'K'),
 unyt_quantity(2.85243794e-05, 'cm**(-3)'),
 unyt_quantity(0.01530151, 'code_length')]

In [19]:
# Volume-weighting
ad.quantities.weighted_average_quantity(
    [("gas", "temperature"), ("gas", "number_density"), ("index", "dx")],
    weight=("gas", "cell_volume"),
)

[unyt_quantity(77231.50108772, 'K'),
 unyt_quantity(6.28859148e-07, 'dimensionless/cm**3'),
 unyt_quantity(0.01523595, 'code_length')]

### Slight detour: what is actually happening under the hood?
And why is yt's derived quantities arguably better?

Let's just compute the average temperature in a sphere of $100\,\mathrm{kpc}$. We're reloading to be able to see the log of what's happening (which we switch on/off using `yt.mylog.setLevel()`).

In [20]:
sp = ds.sphere(center, (100, "kpc"))

# Temperorary activate logs to see what's happening
old_level = yt.mylog.level
yt.mylog.setLevel("DEBUG")

yt.mylog.info(">>>>>>>>>>>>>>>> Reading T")
T = sp["gas", "temperature_over_mu"]
yt.mylog.info(">>>>>>>>>>>>>>>> Reading cell mass")
mcell = sp["gas", "cell_mass"]
yt.mylog.info(">>>>>>>>>>>>>>>> Computing weighted average")
T_avg = np.average(T, weights=mcell)
yt.mylog.info(f">>>>>>>>>>>>>>> Average temperature: {T_avg.to('K'):.2e}")

# Restore the old log level
yt.mylog.setLevel(old_level)

yt : [[32mINFO[0m     ] 2025-07-15 12:15:48,136 >>>>>>>>>>>>>>>> Reading T
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,159 Identified domain 8
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,163 Identified domain 9
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,166 Identified domain 10
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,169 Identified domain 11
yt : [[32mINFO[0m     ] 2025-07-15 12:15:48,170 Identified     4/   16 intersecting domains (    5 through hilbert key indexing)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,171 Appending object to info_00080 (type: <class 'yt.frontends.ramses.data_structures.RAMSESDomainSubset'>)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,178 Filling Density with 7867 (1.920e+00 1.301e+05) (7867 zones)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,181 Filling Pressure with 7867 (1.402e-04 5.434e-01) (7867 zones)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:15:48,189 Filling Density with 11232 (4.262e+00 2.087e+06) (11232 zones)
yt : [[35mDEBUG[0m

In [21]:
sp = ds.sphere(center, (100, "kpc"))

# Temperorary activate logs to see what's happening
old_level = yt.mylog.level
yt.mylog.setLevel("DEBUG")

yt.mylog.info(">>>>>>>>>>>>>>>> Computing weighted average")
T_avg = sp.quantities.weighted_average_quantity(
    ("gas", "temperature_over_mu"),
    weight=("gas", "cell_mass"),
)
yt.mylog.info(f">>>>>>>>>>>>>>> Average temperature: {T_avg.to('K'):.2e}")

# Restore the old log level
yt.mylog.setLevel(old_level)

yt : [[32mINFO[0m     ] 2025-07-15 12:17:31,229 >>>>>>>>>>>>>>>> Computing weighted average
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,260 Identified domain 8
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,263 Identified domain 9
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,266 Identified domain 10
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,269 Identified domain 11
yt : [[32mINFO[0m     ] 2025-07-15 12:17:31,270 Identified     4/   16 intersecting domains (    5 through hilbert key indexing)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,272 Appending object to info_00080 (type: <class 'yt.frontends.ramses.data_structures.RAMSESDomainSubset'>)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,281 Filling Density with 7867 (1.920e+00 1.301e+05) (7867 zones)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,282 Filling Pressure with 7867 (1.402e-04 5.434e-01) (7867 zones)
yt : [[35mDEBUG[0m    ] 2025-07-15 12:17:31,289 Filling Density with 7867 (1.920e+00 1.301e+05) (7867 zones)
yt : 

Spot the difference? The first one is a manual computation, the second one uses yt's built-in functionality.

### Slightly more complicated: weighted standard deviation
This computes
$$ \sigma_q^2 = \frac{1}{\sum w_i} \sum w_i (q_i - \langle{q}\rangle_w)^2, $$
where $\langle{q}\rangle_w$ is the weighted average defined above. `ad.quantities.weighted_standard_deviation` returns, for each field, the weighted standard deviation and the weighted average.

In [22]:
ad.quantities.weighted_standard_deviation(
    [("gas", "temperature"), ("gas", "number_density"), ("gas", "metallicity"), ("index", "dx")],
    weight=("gas", "cell_mass"),
)

[unyt_array([4469959.29211321, 1860174.32956431], 'K'),
 unyt_array([5.15787974e-03, 2.85243794e-05], 'cm**(-3)'),
 unyt_array([3.71280598e-05, 3.52757100e-07], '(dimensionless)'),
 unyt_array([0.00168741, 0.01530151], 'code_length')]

In [23]:
ad.quantities.weighted_standard_deviation(
    [("gas", "temperature"), ("gas", "number_density"), ("index", "dx")],
    weight=("gas", "cell_volume"),
)

[unyt_array([662529.70632066,  77231.50108772], 'K'),
 unyt_array([4.18835923e-06, 6.28859148e-07], 'cm**(-3)'),
 unyt_array([0.00189583, 0.01523595], 'code_length')]

### Other useful quantities

In [24]:
# Min/max of the quantity
rhomin, rhomax = ad.quantities.extrema(("gas", "density")).in_units("mp/cm**3")

# Min/max locations of the quantity
rhomin, *xyz = ad.quantities.min_location(("gas", "density"))
rhomax, *xyz = ad.quantities.max_location(("gas", "density"))

# Sample some other fields at the location of the maximum density
rhomax, T_at_rhomax, dx_at_rhomax = ad.quantities.sample_at_max_field_values(
    ("gas", "density"), [("gas", "temperature"), ("index", "dx")],
)

# Sum of some quantities
ad.quantities.total_quantity([("gas", "cell_mass"), ("gas", "cell_volume")])

# Useful shortcuts
com = ad.quantities.center_of_mass(use_gas=True, use_particles=False)
vbulk = ad.quantities.bulk_velocity(use_gas=True, use_particles=False)
Jtot = ad.quantities.angular_momentum_vector(use_gas=True, use_particles=False)



### Why use derived quantities?
- Convenience: you don't have to write the same code over and over again.
- Efficiency: yt can optimize the reading part, so it only reads the data it needs to compute the derived quantity once.
- Parallelization: see file in `derived_quantities_parallel.py`