# 🧑‍🎨 Curate and label planets

This notebook assembles planet populations and labels many of them as having atmospheres or not. It produces a folder of organized populations in `organized-exoatlas-populations`. This sets up the data for the fits, so it should be run before attempting `fit-one-shoreline.ipynb` or `fit-many-shorelines.ipynb`.

## 💿 Download Archival Data 
Let's use `exoatlas` to get access to planet populations.

In [None]:
from shoreline import * 

In [None]:
pops = dict(
    transit=TransitingExoplanets(),
    major=SolarSystem(),
    dwarf=SolarSystemDwarfPlanets(),
    moons=SolarSystemMoons(),
    #minor=SolarSystemMinorPlanets(),
)
pops

## 📝 Update planet references.

Some planets have bad choices for their references listed. Let's fix them.

In [None]:
pops['transit'].load_individual_references()


In [None]:
pops['transit'].update_values(planets=['LTT1445Ab', 'LTT1445Ac'], stellar_luminosity=np.nan, stellar_luminosity_uncertainty=np.nan)
pops['transit'].update_reference('Pass et al. 2023')
pops['transit'].display_individual_references(keys=['stellar_luminosity', 'stellar_teff', 'stellar_radius'], planets=['LTT1445Ab'])


## 🏷️ Label planets with atmospheres.

Let's record which objects in the Solar System have atmospheres and/or signiticant surface volatiles. Let's come up with a few categories:

- `any` represents any evidence of atmosphere or surface volatiles whatsoever; this effectively the original cosmic shoreline criteria
- `CO2` represents an Earth/Venus-like hypthosis, of a thick CO2 atmosphere, between the limits of a magma ocean at the hottest or CO2 freezeout (=outer edge of habitable zone) at the coolest

We'll start by adding actual quanitative constraints we have on atmospheres for Solar System planets. 

In [None]:
# add some columns to record atmospheres or volatiles
for k in pops:
    pops[k].add_column('atmospheric_pressure_H', np.zeros(len(pops[k]))*u.bar)
    pops[k].add_column('atmospheric_pressure_CNO', np.zeros(len(pops[k]))*u.bar)
    pops[k].add_column('surface_volatiles', np.zeros(len(pops[k])).astype(bool))
    pops[k].add_column('is_exoplanet', np.zeros(len(pops[k])).astype(bool))
    pops[k].add_column(f'has_atmosphere', np.nan*np.ones(len(pops[k])))

# add atmospheres to whatever population the planet's in
def add_atmosphere(name, pressure, kind='CNO'):
    '''
    Record an atmosphere on a planet, in any population. 
    '''
    success = False
    for k, p in pops.items():
        try:
            p.table.loc[clean(name).lower()][f'atmospheric_pressure_{kind}'] = pressure 
            success = True
            #print(f'Added {pressure} {kind} atmosphere to pops["{k}"]["{name}"].')
        except KeyError:
            pass 
    if success == False:
        print(f'Could not add atmosphere to {name}!')

add_atmosphere('Mercury', 1e-14*u.bar)
add_atmosphere('Venus', 92*u.bar)
add_atmosphere('Earth', 1*u.bar)
add_atmosphere('Mars', 0.007*u.bar)
add_atmosphere('Jupiter', np.inf*u.bar, kind='H')
add_atmosphere('Saturn', np.inf*u.bar, kind='H')
add_atmosphere('Uranus', np.inf*u.bar, kind='H')
add_atmosphere('Neptune', np.inf*u.bar, kind='H')
add_atmosphere('Titan', 1.5*u.bar)
add_atmosphere('Moon', 1e-14*u.bar)
add_atmosphere('Triton', 1.4e-5*u.bar)
add_atmosphere('Io', 1e-8*u.bar)
add_atmosphere('Europa', 1e-12*u.bar)
add_atmosphere('Ganymede', 1e-12*u.bar)
add_atmosphere('Callisto', 1e-11*u.bar)
add_atmosphere('Pluto', 1e-5*u.bar)


# add surface volatiles that are interacting with atmosphere
def add_surface_volatiles(name):
    '''
    Record surface volatiles on a planet, in any population. 
    '''
    for k, p in pops.items():
        try:
            p.table.loc[clean(name).lower()]['surface_volatiles'] = True 
            #print(f'Added surface volatiles to pops["{k}"]["{name}"].')
        except KeyError:
            pass

for p in ['Mars', 'Pluto', 'Makemake', 'Eris', 'Quaoar']:
    add_surface_volatiles(p)


For exoplanets, let's say old planets larger than a certain radius must have atmospheres/volatiles of some sort.

In [None]:
radius_upper_limit_for_rocky  = 1.8*u.Rearth
radius_lower_limit_for_H  = pops['major']['Neptune'].radius()
radius_lower_limit_for_CNO = 2.2*u.Rearth


# do some work with the transiting planet population
t = pops['transit']

# indicate these are exoplanets (for future tables) 
t.table['is_exoplanet'] = True

# set all atmospheres initially to unknown
for kind in ['H', 'CNO']:
    t.table[f'atmospheric_pressure_{kind}'] = np.nan

# make some decisions based on the size of the planets
#is_precise = t.get_fractional_uncertainty('radius') < 0.25
is_small_enough_to_be_rocky = t.radius() < radius_upper_limit_for_rocky  
is_big_enough_to_have_H = t.radius() > (radius_lower_limit_for_H + t.radius_uncertainty())
is_big_enough_to_have_CNO = t.radius() > (radius_lower_limit_for_CNO + t.radius_uncertainty())

# planets bigger than Neptune must have H envelopes, assume (!) rocky planets have no H
t.table['atmospheric_pressure_H'][is_big_enough_to_have_H] = np.inf 
t.table['atmospheric_pressure_H'][is_small_enough_to_be_rocky] = 0 

# planets bigger than some limit must have at least substantial CNO atmospheres
t.table['atmospheric_pressure_CNO'][is_big_enough_to_have_CNO] = np.inf 

# remove young planets 
stellar_age_limit = 0.5*u.Gyr
too_young = (t.stellar_age() < stellar_age_limit)
is_old = too_young == False 

# trim to just old planets
pops['transit'] = t[is_old]

Now, let's add some atmosphere constraints from direct (mostly JWST) observations.

In [None]:
# revisit these! 
rocky_atmosphere = ['55 Cnc e']
for r in rocky_atmosphere:
    add_atmosphere(r, 0.1*u.bar)


rocky_no_atmosphere = [

    # Spitzer phase curve
    'LHS 3844 b', # 2019Natur.573...87K

    # MIRI LRS phase curve
    'GJ367b', # 2024ApJ...961L..44Z

    # NIRSpec phase curve ()
    'TOI-1685b', # 2024arXiv241203411L

    # Spitzer eclipses
    'GJ1252b', # 2022ApJ...937L..17C


    # MIRI LRS eclipses 
    'GJ486b', # 2024ApJ...975L..22W
    'GJ1132b', 
    'LTT1445Ab', # 2025AJ....169..311W'


    # MIRI photometry eclipses 
    'TOI-1468 b',  # 2025A&A...698A..68M
    'LHS 1140 c', # 2025arXiv250522186F

] 

for r in rocky_no_atmosphere:
    add_atmosphere(r, 0.0*u.bar)


other_atmospheres = ['K2-18b'] #(just barely doesn't make radius cut with uncertainty? )


rocky_worlds_targets = ['LTT 1445Ab', 
                        'LTT 1445Ac', 
                        'GJ 3929b', 
                        'LHS 1140b']

questionable = [
                'Trappist-1b', # probably no, but let's be chill about it
                'Trappist-1c', # 2023Natur.620..746Z = no, 
                'LHS 1478 b' # 2025A&A...695A.171A

                # Kepler phase curves
                'Kepler-10b', 
                'Kepler-78b',
                'K2-141 b', 'L 98-59b'
                ]

other_interesting = ['GJ 1214b', 'K2-18b', 
                     'TOI-700d', 'TOI-700e', 
                     'Kepler-62e', 'Kepler-62f',
                     ]

solar_system_to_annotate = [str(x) for x in pops['major'].name()] + ['Moon', 'Pluto', 'Eris', 'Haumea', 'Makemake', 'Ceres']
planets_to_annotate = solar_system_to_annotate + rocky_atmosphere + rocky_no_atmosphere + rocky_worlds_targets + questionable + other_interesting
f'{planets_to_annotate=}'

Let's check for bad data, and remove those rows from the table.

In [None]:
# mask out planets with bad data (but tell us about them)
for k in pops:            
    problems = []
    for c in ['radius', 'relative_insolation', 'relative_escape_velocity', 'stellar_luminosity']:
        for s in ['', '_uncertainty']:
            x = pops[k].get(f'{c}{s}', kludge=True)
            problem = (np.isfinite(x) == False) | (x < 0)
            print(f'{sum(problem)}/{len(x)} planets are bad for {c}{s}:')
            print(pops[k].name()[problem])
            print()
            problems.append(problem)
            
    ok = np.sum(problems, axis=0) == False 
    pops[k] = pops[k][ok]

Now, let's set a threshold for what we'll say falls into the "has atmosphere" category.

In [None]:
A = dict()
for subset in ['solar', 'exo', 'all']:
    for kind in ["any", "CO2"]:
        for size in ['', '_small']:

            # define 
            A[f'{subset}-{kind}{size}'] = dict(yes={}, no={}, everything={})

            for k in pops:
                if subset == 'exo':
                    if k != 'transit':
                        continue 
                elif subset == 'solar':
                    if k == 'transit':
                        continue 
                elif subset == 'all':
                    pass 

                if size == '_small':
                    ok = pops[k].radius() < radius_lower_limit_for_H
                else:
                    ok = pops[k].radius() < np.inf

                # use 'ok' to define what planets to consider at all
                if kind == "CO2":

                    # avoid magma oceans Kite et al. (20160)
                    dayside_temperature_for_magma_ocean = 1673 * u.K
                    ok *= pops[k].teq(f=2 / 3, albedo=0.0) < dayside_temperature_for_magma_ocean

                    # avoid frozen CO2, very approximately from CO2 SVP=1 bar
                    temperature_of_CO2_freezeout = 194 * u.K
                    ok *= pops[k].teq(f=1 / 4, albedo=0.0) > temperature_of_CO2_freezeout


                # skip populations that have no relevant planets
                if sum(ok) == 0:
                    continue 

                # create a reference subpopulation
                this = pops[k][ok]
                this.color = "gray"
                this.label = None

                if kind == "CO2":

                    # (include Mars + lowest detectable JWST?)
                    atmospheric_pressure_threshold = 1e-3 * u.bar

                    # is there actual CNO-dominated atmosphere?
                    i_yes = this.atmospheric_pressure_CNO() >= atmospheric_pressure_threshold

                    # require an actual constraint to say something doesn't have an atmosphere
                    i_no = this.atmospheric_pressure_CNO() < atmospheric_pressure_threshold

                    # (all others that aren't yes/no will be treated as uncertain)

                elif kind == "H":

                    # require a substantial H atmosphere (not that we have many in-betweens)
                    atmospheric_pressure_threshold = 1 * u.bar

                    # is there evidence of a H-based atmosphere, or not?
                    i_yes = this.atmospheric_pressure_H() >= atmospheric_pressure_threshold
                    i_no = this.atmospheric_pressure_H() < atmospheric_pressure_threshold

                    # (all others that aren't yes/no will be treated as uncertain)

                elif kind == 'any': # (Is identical to CNO, except for radius cut?)

                    # (include Triton + Pluto + such)
                    atmospheric_pressure_threshold = 1e-6 * u.bar

                    # is there a CNO atmosphere?
                    i_yes = this.atmospheric_pressure_CNO() >= atmospheric_pressure_threshold

                    # is there an H atmosphere? 
                    i_yes = i_yes | (
                        this.atmospheric_pressure_H() > atmospheric_pressure_threshold
                    )

                    # also count surface volatiles
                    i_yes = i_yes | (this.surface_volatiles())

                    # require an actual constraint to say something doesn't have an atmosphere
                    i_no = (
                        (this.atmospheric_pressure_CNO() < atmospheric_pressure_threshold)
                        & (this.surface_volatiles() == False)
                        & (i_yes == False)
                    )

                    # (all others that aren't yes/no will be treated as uncertain)


                # store atmosphere label in subppopulation table 
                this.table[f"has_atmosphere"][i_yes] = 1
                this.table[f"has_atmosphere"][i_no] = 0

                # store all "ok" planets for visual reference 
                A[f'{subset}-{kind}{size}']["everything"][k] = this

                # store planets with these atmospheres
                if sum(i_yes) > 0:
                    yes = this[i_yes]
                    yes.color = "navy"  #'royalblue'
                    if k == "major":
                        yes.label = f"has {kind} atmosphere"
                    else:
                        yes.label = None
                    yes.zorder = this.zorder + 1
                    A[f'{subset}-{kind}{size}']["yes"][k] = yes

                # store planets without atmospheres
                if sum(i_no) > 0:
                    no = this[i_no]
                    no.color = "saddlebrown"  # sienna'
                    if k == "major":
                        no.label = f"no {kind} atmosphere"
                    else:
                        no.label = None
                    no.zorder = this.zorder + 1
                    A[f'{subset}-{kind}{size}']["no"][k] = no

    A

## 💾 Save organized populations.
Let's save these populations out to files, so we can reload them easily into other notebooks. 

In [None]:
save_organized_populations(A)

Let's summarize the datasets.

In [None]:
from IPython.display import display 

mkdir('population-definitions')
for kind in ['all-any', 'all-CO2']:
    print(f'🗂️🗂️🗂️🗂️🗂️ {kind} 🗂️🗂️🗂️🗂️🗂️')
    for label in A[kind]:
        N = sum([len(v) for k, v in A[kind][label].items()])
        print(label, N)
    print(A[kind])
    display(convert_labeled_populations_into_table(A[kind]))
    if False:
        visualize_labeled_populations(A[kind])
        plt.suptitle(kind)
        plt.savefig(f'population-definitions/exoatlas-populations-{kind}.pdf')
        plt.show()


When run, this notebook should produce a dictionary of curated populations and a dictionary of tables, all of which are safe to use for fitting (= no `nan` or non-physical values in any quantities we care about).