# Exploring

An exoplanet `Population` is designed to be a (hopefully!) relatively easy way to interact with data for a group of exoplanet systems. Here we step through the basics of how we can explore a population of planets, access standardized planet properties, and filter subsets of planet populations.

## Getting started
The `exoatlas` package contains the tools we will use. All planet properties inside a population have astropy [units](https://docs.astropy.org/en/stable/units/) associated with them, so we make also want to have access to those units for our calculations.

In [None]:
import exoatlas as ea

ea.version()

## Create a `Population`
Now, to get started, we'll make a population that contains all confirmed transiting exoplanets. We can read more about the different populations we can create over one the [Creating](creating.html) page. When we create this population, the code will download a table of the latest data from the NASA Exoplanet Archive.

In [None]:
pop = ea.TransitingExoplanets()

## What's inside a `Population`?
The core ingredient to an exoplanet `Population` is a table of planet properties that have been standardized and populated with astropy units. This `pop.standard` table is an astropy [Table](https://docs.astropy.org/en/stable/table/), so its contents can be accessed or modified as any other astropy `Table`.

In [None]:
pop.standard

## How do we access planet properties?
The main way to access planet properties within a `Population` is with its methods. That is, we can access an array of the values for some property `x` by calling `pop.x()`. Behind the scenes, the population will look to see if there is a column called `"x"` in the standardized table and return that column, or it will do a calculation using some of the internal data. For example, we can get an array of planet names with:

In [None]:
pop.name()

In [None]:
pop.radius()

In [None]:
pop.get('radius')

We also have access to quantities that are not directly included in the table itself but can be calculated from them. For example, we can get an array of the amount of insolation that the planets receive from their stars as:

In [None]:
pop.insolation()

In this case, the insolation is calculated from the planet's orbital separation and the luminosity of the star (which is itself calculated from the stellar effective temperature and radius).

If information needed to do a calculation is missing, `exoatlas` will try to estimate them from other available information. In the `.insolation` case, some planets had no semimajor axes defined in the `.standard` table, but we were able to calculate this quantity from the orbital period, the stellar mass, and Newton's Version of Kepler's 3rd Law.

With this toolkit, you can now access the data you need to make some pretty fundamental plots in exoplanetary science. For example:

In [None]:
import matplotlib.pyplot as plt

plt.loglog(pop.relative_insolation(), pop.radius(), ".")
plt.xlabel("Flux Received (relative to Earth)")
plt.ylabel("Planet Radius (Earth radii)");

## Why is everything a function? 

Many bits of data are simply columns in a giant table, so it might feel a little unnecessary to have to call them as functions. However, others that depend on calculations might need to have custom keyword inputs and/or options to specify, so they need to be callable. For consistency, we just make everything act like a function. 

For example, equilibrium temperature depends on assumed albedo, so it's nice to be able to provide that as a keyword:

In [None]:
pop.teq()

In [None]:
pop.teq(albedo=0.7)

Or, for some calculations, there's a `kludge` option that allows missing masses and/or radii to be replaced with reasonable (but horribly imprecise!) theoretical estimates. We can see this making a difference if we look at a population with imprecise or missing mass estimates, and try to calculate the estimate transmission spectroscopy signal (see [Observing](observing.html)), which depends on the planet's surface gravity, and therefore its mass. By default, nothing will be calculated for planets without masses; if `kludge=True`, planet's without masses will use estimated masses instead!

In [None]:
b = ea.BadMass()
b.transmission_signal()

In [None]:
b.transmission_signal(kludge=True)

## How do we retrieve uncertainties? 

We will often want to know the uncertainty on a particular quantity. We can retrieve this either with the `.get_uncertainty()` method, or by appending `_uncertainty` to the name of a quantity. For core table quantities, uncertainties are extracted directly from the table. 


In [None]:
sigma = pop.get_uncertainty('radius')
sigma

In [None]:
sigma = pop.radius_uncertainty()
sigma

Some uncertainties might be asymmetric, with different upper and lower uncertainties, such as 
$x^{+\sigma_{upper}}_{-\sigma_{lower}}$. We can extract these asymmetric uncertainties with `.get_uncertainty_lowerupper()` or by appending `_uncertainty_lowerupper`.

In [None]:
sigma_lower, sigma_upper = pop.get_uncertainty_lowerupper('stellar_teff')
sigma_lower, sigma_upper

In [None]:
sigma_lower, sigma_upper = pop.stellar_teff_uncertainty_lowerupper()
sigma_lower, sigma_upper

We can force asymmetric uncertaintoies to be symmetric, calculated as $\sigma = (\sigma_{lower} + \sigma_{upper})/2$, just by asking for the a simple symmetric uncertainty.

In [None]:
sigma = pop.get_uncertainty('stellar_teff')
sigma 

We can also estimate uncertainties on derived quantities in the same way. Behind the scenes, uncertainties on derived quantities are estimated using [`astropy.uncertainty](https://docs.astropy.org/en/stable/uncertainty/index.html). Samples are created for each ingredient table column using skew-normal distributions for asymmetric uncertainties as advocated by [Pineda et al. (2021)](https://ui.adsabs.harvard.edu/abs/2021ApJ...918...40P/abstract), and estimated errors are based on the central 68% confidence intervals of the calculated distributions. 


In [None]:
pop.get_uncertainty('scale_height')

In [None]:
pop.scale_height_uncertainty()

We might commonly be interested in the fractional uncertainty on a quantity. We can either calculate this ourselves, or use the `.get_fractional_uncertainty` wrapper.

In [None]:
pop.get_uncertainty('scale_height')/pop.get('scale_height')

In [None]:
pop.get_fractional_uncertainty('scale_height')

Keyword arguments can be supplied when calculating derived quantities, to be passed into the function that actually does the calculating. 

In [None]:
pop.teq_uncertainty(albedo=0.5)

In [None]:
pop.get_uncertainty('teq', albedo=0.5)

In [None]:
pop.teq_uncertainty_lowerupper(albedo=0.5)

In [None]:
pop.get_uncertainty_lowerupper('teq', albedo=0.5)

## How do we access some sub-population of planets?
Often we'll want to pull out some subset of a population. We might want a smaller sample of planets, or all the planets that meet some particular criterion, or maybe the properties of one individual planet. In our experience with `numpy` arrays or `astropy` tables, we've often done this by indexing (`x[0]` or `x[[0, 1, 5]]`), slicing (`x[3:30]`), or masking (`x[some_array > some_other_array]`). 

We can apply the same methods to a `Population`, creating smaller populations by indexing, slicing, or masking. Anything we can do with a `Population` we can do with one of these sub-`Population`s that we create.

In [None]:
pop

In [None]:
one_planet = pop[0]
one_planet

In [None]:
one_planet.name(), one_planet.radius(), one_planet.insolation()

In [None]:
prime_planets = pop[[2, 3, 5, 7, 11, 13, 17, 19, 23]]
prime_planets

In [None]:
first_ten = pop[:10]
first_ten

In [None]:
every_other_exoplanet = pop[::2]
every_other_exoplanet

In [None]:
import astropy.units as u

small = pop[pop.radius() < 4 * u.Rearth]
small

Additionally, we can extract an individual planet or a list of planets by indexing the population with planet name(s). This is using astropy tables' `.loc` functionality, with `"name"` being used as an index.

In [None]:
cute_planet = pop["GJ 1214b"]
cute_planet

In [None]:
cute_planets = pop[["LHS 1140b", "GJ 1214b", "GJ 436b"]]
cute_planets

Systems of planets can also be extracted via name using the `hostname`.

In [None]:
pop['TRAPPIST-1']

Unfortunately it's not possible (right now) to mix and match selecting objects via both the planet name and the host name at the same time. Sorry!

## How do we add new quantities?
Obviously sometimes we might sometimes want to add additional quantities into a population, for filtering or calculating or visualizing.

### `.add_column` for arrays
For static quantities, we could just add a new column to the internal `.standard` astropy Table, it's a little more graceful to use the `.add_column` wrapper. This both adds the data into `.standard` and registers a new method that can be used to extract it (or calculate its uncertainties).

In [None]:
s = ea.SolarSystem()
new_column_name='is_inhabited'
new_column_data=(s.name() == 'Earth')*1
new_column_uncertainty = (s.name() != 'Earth')*0.01
s.add_column(name=new_column_name, data=new_column_data, uncertainty=new_column_uncertainty)


We can see that our column was added, both to the table and as a callable column method.

In [None]:
s.standard['is_inhabited']

In [None]:
s.standard['is_inhabited_uncertainty']

In [None]:
s.is_inhabited()

In [None]:
s.is_inhabited_uncertainty()

### `.add_calculation` for functions 
For quantities we want to calculate on the fly, we can define a new function and then assign it to a method name inside our population using `.add_calculation`. If we want to be able to propagate uncertainties, we will need to pass the `distribution` keyword into all ingredient quantities that go into the calculation, and set its default for the function overall to `False`. This says that when the method is being called normally, it should just return quantity arrays, but when it's being called inside of an uncertainty propagation calculation, everything will be treated as a quantity distributions.

In [41]:
import numpy as np 

def f(self, distribution=False):
    '''
    Surface Area (m)
    
    Calculate the surface area of a planet,
    based on its radius.
    '''
    return 4*np.pi*self.radius(distribution=distribution)


pop.add_calculation(name='surface_area', function=f)

In [None]:
pop.surface_area()

In [None]:
pop.surface_area_uncertainty()

## How do we get help on available quantities? 

It's super important to be able to know exactly what the quantity we're retrieving represents. There are few tools for quick documentation. 

Short descriptions of some common attributes can printed with the `describe_columns()` function.

In [None]:
ea.describe_columns()

Most quantities will have docstrings associated with them, which you can view either with putting a `?` after the method name, or in some tools like `jupyter lab`, hovering the cursor over the method name. 

In [None]:
pop.mass?

In [None]:
pop.teq?

## Explore!
That's about it. For more information about different pre-defined populations see [Creating](creating.html), and for more about pre-packaged visualizations see [Visualizing](visualizing.html).