#  Curating

Sometimes, you might have good reason to want to replace some of the values in a `Population` with better ones. Maybe you prefer one reference over another, maybe you have some unpublished measurements you want to include, or maybe you just want to experiment with changing some values. This page mostly discusses curating the data inside an `Exoplanet` population, but some of the methods might generally apply to other populations. 

In [None]:
import exoatlas as ea
ea.version()

### Changes are Temporary 

Please note, the changes made with `update_reference` or `update_values` will take place only within the current Python session. The underlying standardized data file is unchanged. The bottom of this page shows instructions for saving and loading a curated population.

### Using Different References in an `Exoplanet` Population

The main data in the `Exoplanets` population come from the NASA Exoplanet Archive Planetary Systems Composite Parameters table. There is one entry in this population for each planet in the archive.

In [None]:
e = ea.Exoplanets()
e

If we want to see all individual references for each planet, which is a much larger table containing many more rows than there are planets, we'll need to load a `.individual_references` population. It might take a while.

In [None]:
e.load_individual_references()

Once that's loaded, there's a secret sneaky internal population that we can access through the `.individual_references` attribute, containing every reference for every planet in the archive.

In [None]:
e.individual_references

Now, let's say we want to update what reference is being used to provide the `period` (and related) values for a particular planet. First, we can check what the options are with `.display_individual_references`. 

In [None]:
e.display_individual_references(planets='HD189733b', keys='period')

Then, we can update the population to use one of those options instead of the default.

In [6]:
e.update_reference(planets='HD189733b', references='Ivshina + Winn 2022')

Finally, we can confirm that our change took effect, by checking the references again.

In [None]:
e.display_individual_references(planets='HD189733b', keys='period')

### Updating Values in an `Exoplanet` Population

If we have some custom values we'd like to apply to a planet, we can update its data with the `.update_value` wrapper. 


In [None]:
import astropy.units as u 
e.update_values(planets='HD189733b', radius=1*u.R_jupiter, radius_uncertainty=0.1*u.R_jupiter)

In [None]:
e['HD189733'].radius

### Curating Your Own Exoplanet Population

You might want to be able to regularly curate many changes to the default exoplanet parameters from the NASA Exoplanet Archive, and also be able to reapply those changes even when downloading newly updated archive data. In general, that might look like the following:

- Write a function called something like `curate_population()`, that makes the changes you want by serialling combining `.update_reference` and `.update_values`. You might consider saving this function in a local module and importing it whenever you need it.
- Generate an `Exoplanets` population (or some subset of it), and apply your function to it. This function will change the population in-place; if you want to access the unmodified parameters, you'll need to create a new population. 
- If you want to save and reuse your curation, use the `population_to_save.save(filename)` method to save your curated population out to a local file and `loaded_population = Population(filename)` to load it back in. 

In [10]:
import exoatlas as ea

In [11]:
def curate_population(x):
    '''
    Curate the values in an exoplanet population by 
    making small changes to the values being used.

    Parameters 
    ---------- 
    x : exoatlas.populations.Exoplanets
        An exoplanet population that needs to be curated. 

    Returns
    -------
    Nothing, but the population `x` has been modified 
    in place with updated values and/or reference choices. 
    '''

    # (down)load the individual references
    x.load_individual_references()

    # update the reference for a planet's period/epoch
    x.update_reference(planets='HD189733b', references='Ivshina + Winn 2022')

    # update another value 
    x.update_values(planets='HD189733b', radius=1*u.R_jupiter, radius_uncertainty=0.1*u.R_jupiter)  

In [None]:
e = ea.TransitingExoplanets()
e = e[e.distance() < 50*u.pc]
curate_population(e)

In [None]:
e.save('curated-population.ecsv')

In [None]:
curated_and_saved = ea.Population('curated-population.ecsv')
curated_and_saved['HD189733'].radius()

## 🔔 Problems: 🔔

*Please note there are still a couple of problems that need solving:*

- A curated population that has been saved out to a file and then reloaded via `Population(filename)` will (currently) always load in as a `Population` object. It therefore won't have any of the special powers of the speciality predefined populations like `Exoplanets`. 
- Sometimes (?) trying to due curation with `update_reference` gobbles up bonkers memory, presumably something to do with trying to index large tables many times. This might potentially be solvable with some more careful memory management or index-resetting? Curating mostly works, but sometimes it feels like there's a monster lurking underneath.

If any of these problems are catastrophic for your, please open an issue and we'll try to fix it!*