# Atmospheric retrieval with petitRADTRANS

This is a tutorial for atmospheric retrievals with [petitRADTRANS](https://petitradtrans.readthedocs.io) for which we will use NIR spectra of the directly imaged planet [beta Pic b](https://exoplanets.nasa.gov/exoplanet-catalog/7040/beta-pictoris-b/). Free retrievals are computationally expensive due to the high number of parameter dimensions and the scattering radiative transfer that is important in cloudy atmospheres (see [Mollière et al. 2020](https://ui.adsabs.harvard.edu/abs/2020A%26A...640A.131M/abstract)). Similar to [FitModel](https://species.readthedocs.io/en/latest/species.analysis.html#species.analysis.fit_model.FitModel), the nested sampling supports multiprocessing so it recommended to run free retrievals on a cluster. Before starting, ``petitRADTRANS`` should be installed together with the line and continuum opacities (see [installation instructions](https://petitradtrans.readthedocs.io/en/latest/content/installation.html)).

When running a retrieval, it is important to comment out any of the functions that access the [Database](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database) because writing to the HDF5 file is not possible with multiprocessing. Therefore, the companion data should first be added to the database, then [AtmosphericRetrieval](https://species.readthedocs.io/en/latest/species.analysis.html#species.analysis.retrieval.AtmosphericRetrieval) and [run_multinest](https://species.readthedocs.io/en/latest/species.analysis.html#species.analysis.retrieval.AtmosphericRetrieval.run_multinest) can be executed, and finally the results can be extracted and plotted while commenting out the retrieval part or using a separate script.

Once `species` and `petitRADTRANS` are fully installed, we can get started with the retrieval!

## Getting started

We start by adding the library path of ``MultiNest`` to the ``DYLD_LIBRARY_PATH`` environment variable such that ``PyMultiNest`` can find the compiled library (see [installation instructions](https://johannesbuchner.github.io/PyMultiNest/install.html)).

In [1]:
import os
os.environ['DYLD_LIBRARY_PATH'] = '/Users/tomasstolker/applications/MultiNest/lib'

Next, we import the `species` toolkit and initiate the workflow with an instance of the [SpeciesInit](https://species.readthedocs.io/en/latest/species.core.html#species.core.init.SpeciesInit) class. This will create the HDF5 database and the [configuration file](https://species.readthedocs.io/en/latest/configuration.html) in the working folder. If one of these files was already present, then the existing database and/or configuration will be used.

In [2]:
import species
species.SpeciesInit()

Initiating species v0.4.0... [DONE]
Creating species_config.ini... [DONE]
Database: /Users/tomasstolker/applications/species/docs/tutorials/species_database.hdf5
Data folder: /Users/tomasstolker/applications/species/docs/tutorials/data
Working folder: /Users/tomasstolker/applications/species/docs/tutorials
Creating species_database.hdf5... [DONE]
Creating data folder... [DONE]


<species.core.init.SpeciesInit at 0x10ff68850>

## Adding observational data to the database

The database is used by ``species`` as the central storage for models, data, and results. The data in the HDF5 file is accessed through the [Database](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database) class. As mentioned already, it is best to add the data beforehand and comment out the database part when executing the retrieval part on a cluster with multi-core processing. To access the HDF5 file, we start by creating an instance of [Database](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database).

In [3]:
database = species.Database()

Next, we simply use the [add_companion](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.add_companion) method to automatically add some of the [available spectra, magnitudes, and distance](https://github.com/tomasstolker/species/blob/master/species/data/companions.py) of beta Pic b to the `Database`. This includes the GRAVITY $K$ band spectrum from  and and the GPI $YJH$ band spectrum from [Chilcote et al. 2017](https://ui.adsabs.harvard.edu/abs/2017AJ....153..182C/abstract). The magnitudes will be calibrated into fluxes with the filter profiles and a flux-calibrated spectrum of Vega.

In [4]:
database.add_companion('beta Pic b')

Getting GPI_YJHK spectrum of beta Pic b... [DONE]
IMPORTANT: Please cite Chilcote et al. 2017, AJ, 153, 182
           when making use of this spectrum in a publication
Getting GRAVITY spectrum of beta Pic b... [DONE]
IMPORTANT: Please cite Gravity Collaboration et al. 2020, A&A, 633, 110
           when making use of this spectrum in a publication
Downloading Vega spectrum (270 kB)... [DONE]
Adding Vega spectrum... [DONE]
Adding filter: Magellan/VisAO.Ys... [DONE]
Adding filter: Paranal/NACO.J... [DONE]
Adding filter: Gemini/NICI.ED286... [DONE]
Adding filter: Paranal/NACO.H... [DONE]
Adding filter: Paranal/NACO.Ks... [DONE]
Adding filter: Paranal/NACO.NB374... [DONE]
Adding filter: Paranal/NACO.Lp... [DONE]
Adding filter: Paranal/NACO.NB405... [DONE]
Adding filter: Paranal/NACO.Mp... [DONE]
Adding object: beta Pic b
   - Distance (pc) = 19.75 +/- 0.13
   - Magellan/VisAO.Ys:
      - Apparent magnitude = 15.53 +/- 0.34
      - Flux (W m-2 um-1) = 4.25e-15 +/- 1.35e-15
   - Paranal/NAC



      - Database tag: GPI_YJHK
      - Filename: ./data/companion_data/betapicb_gpi_yjhk.dat
      - Data shape: (152, 3)
      - Wavelength range (um): 0.98 - 2.37
      - Mean flux (W m-2 um-1): 5.40e-15
      - Mean error (W m-2 um-1): 6.44e-16
   - GRAVITY spectrum:
      - Object: Unknown
      - Database tag: GRAVITY
      - Filename: ./data/companion_data/BetaPictorisb_2018-09-22.fits
      - Data shape: (237, 3)
      - Wavelength range (um): 1.97 - 2.49
      - Mean flux (W m-2 um-1): 4.65e-15
      - Mean error (W m-2 um-1): 1.00e-16
   - GRAVITY covariance matrix:
      - Object: Unknown
      - Database tag: GRAVITY
      - Filename: ./data/companion_data/BetaPictorisb_2018-09-22.fits
      - Data shape: (237, 237)
   - Spectral resolution:
      - GPI_YJHK: 40.0
      - GRAVITY: 500.0


Instead of using [add_companion](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.add_companion), it is also possible to manually add the data of an object with the [add_object](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.add_object) method. See [this tutorial](https://species.readthedocs.io/en/latest/tutorials/fitting_model_spectra.html) on fitting data with a grid of model spectra for an example.

## Atmospheric retrieval of abundances, clouds, and P-T structure

We are now ready to start the retrieval! We start by creating an instance of [AtmosphericRetrieval](https://species.readthedocs.io/en/latest/species.data.html#species.analysis.retrieval.AtmosphericRetrieval). Here we provide among others the database tag of the planet data, the line and cloud species that should be included in the forward model, and if scattering should be turned on with the radiative transfer. Scattering will make the calculation of the forward model much slower but is important for cloudy atmospheres.

In [5]:
retrieve = species.AtmosphericRetrieval(object_name='beta Pic b',
                                        line_species=['CO_all_iso_HITEMP', 'H2O_HITEMP', 'CH4', 'NH3', 'CO2', 'H2S', 'Na_allard', 'K_allard', 'PH3', 'VO_Plez', 'TiO_all_Exomol', 'FeH'],
                                        cloud_species=['MgSiO3(c)_cd', 'Fe(c)_cd'],
                                        scattering=True,
                                        output_folder='multinest',
                                        wavel_range=(1.1, 2.46),
                                        inc_spec=['GPI-Y', 'GPI-J', 'GPI-H', 'GRAVITY'],
                                        inc_phot=False,
                                        pressure_grid='smaller',
                                        weights=None)

Object: beta Pic b
Distance: 19.75
Line species:
   - CO_all_iso_HITEMP
   - H2O_HITEMP
   - CH4
   - NH3
   - CO2
   - H2S
   - Na_allard
   - K_allard
   - PH3
   - VO_Plez
   - TiO_all_Exomol
   - FeH
Cloud species:
   - MgSiO3(c)_cd
   - Fe(c)_cd
Line-by-line species: None
Scattering: True
Getting object: beta Pic b... [DONE]
Photometric data:
Spectroscopic data:
   - GRAVITY
     Wavelength range (um) = 1.97 - 2.49
     Spectral resolution = 500.00
Initiating 180 pressure levels (bar): 1.00e-06 - 1.00e+03
Weights for the log-likelihood function:
   - GPI-Y = 1.00e+00
   - GPI-J = 1.00e+00
   - GPI-H = 1.00e+00
   - GRAVITY = 1.00e+00


Next, we execute the actual retrieval with [run_multinest](https://species.readthedocs.io/en/latest/species.data.html#species.analysis.retrieval.AtmosphericRetrieval.run_multinest). The nested sampling algorithm supports multiprocessing so make sure to use MPI when running on a cluster. For testing purpose this is not required though since it also runs without MPI.

The `run_multinest` method has some parameters that are used by the forward model (e.g. specifying the type of chemistry with `chemistry` and the P-T parametrization with `pt_profile`). There is also a dictionary required as argument of `bounds`, to set the boundaries of the priors that are either uniform or log-uniform priors. The latter is for parameters typically starting with `log_`. There is an additional parameter called `prior`, which can be used as Gaussian prior on any of the parameters (including the planet mass).

Not all the parameters that are included in `bounds` are mandatory so some will only be included in the model if they are provided in the dictionary. Furthermore, there are various parametrizations for the clouds available. The model that is used is also determined by the parameters that are included. In fact, in the example below, only the surface gravity (`logg`) and radius (`radius`) are mandatory parameters.

In this case, the C/O ratio and metallicity are free parameter because we are using a chemical equilibrium model. We also retrieve the cloud mass fractions relative to the equilibrium abundances (`mgsio3_fraction` and `fe_fraction`), the sedimentation parameter (`fsed`; determines the vertical extent of the clouds), the eddy diffusion coefficient (`log_kzz`; determines the particles sizes of the clouds), and the width of the log-normal size distribution (`sigma_lnorm`) for the cloud particles. Finally, we fit a scaling (`GPI_scaling`) to the GPI fluxes to account for a systematic difference with the GRAVITY spectrum and use a Gaussian process to model the covariances in the GPI spectrum (see `fit_corr` parameter).

In [None]:
retrieve.run_multinest(bounds={'logg': (2., 6.),
                               'c_o_ratio': (0.1, 1.5),
                               'metallicity': (-3., 3.),
                               'radius': (0.1, 5.),
                               'fsed': (0., 10.),
                               'log_kzz': (1., 15.),
                               'sigma_lnorm': (1.05, 5.),
                               'mgsio3_fraction': (-3., 1.),
                               'fe_fraction': (-3., 1.),
                               'GPI': ((0.8, 1.2), None, None)},
                       chemistry='equilibrium',
                       quenching='pressure',
                       pt_profile='molliere',
                       fit_corr=['GPI'],
                       n_live_points=1000,
                       resume=True,
                       plotting=False,
                       pt_smooth=None,
                       check_flux=None,
                       temp_nodes=None,
                       prior={'mass': (9., 1.6)})

We will not run the actual retrieval since it will take too long. Instead, we will download the results as they would be stored in the `output_folder`.

In [None]:
import urllib.request
urllib.request.urlretrieve('https://home.strw.leidenuniv.nl/~stolker/species/retrieval.tgz',
                           'retrieval.tgz')

And we unpack this compressed TAR archive that includes the output from `MultiNest`.

In [None]:
import tarfile
with tarfile.open('retrieval.tgz') as tar:
    tar.extractall('./')

## Nested sampling output folder

The output data from the nested sampling with ``MultiNest`` is stored in the ``output_folder``. We can use the [add_retrieval](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database.add_retrieval) method of [Database](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database) to store the posterior samples and relevant attributes in the HDF5 database. The argument of ``tag`` is used as name tag in the database. It is also possible to calculate $T_\mathrm{eff}$ for each sample but this takes a long time because each spectrum needs to be calculated over a broad wavelength range.

In [None]:
database.add_retrieval(tag='betapicb',
                       output_folder='multinest',
                       inc_teff=False)

Instead, we use the [get_retrieval_teff](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database.add_retrieval) to estimate $T_\mathrm{eff}$ from a small number of samples. The value is stored in the database as attribute of the ``tag`` group.

In [None]:
database.get_retrieval_teff(tag='betapicb',
                            random=30)

## Plotting the posterior distributions

We can now read the posterior samples from the database and use the plot functionalities to visualize the results. Let's first plot the marginalized posterior distributions by using the [corner.py](https://corner.readthedocs.io) package. The plot is created with the [plot_posterior](https://species.readthedocs.io/en/latest/species.plot.html#species.plot.plot_mcmc.plot_posterior) function. Since there are many free parameters, we will leave out those for the P-T profile by setting `inc_pt_param=False`.

In [None]:
species.plot_posterior(tag='betapicb',
                       offset=(-0.3, -0.35),
                       vmr=False,
                       inc_mass=False,
                       inc_pt_param=False,
                       output='posterior.png')

Let's have a look at the corner plot!

In [None]:
from IPython.display import Image
Image('posterior.png')

## ReadRadtrans and random spectra

In order to post-process the posterior samples, we need to recreate the ``Radtrans`` object of ``petitRADTRANS``. We will use the [get_retrieval_spectra](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database.get_retrieval_spectra) method of [Database](https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database) to create and instance of [ReadRadtrans](https://species.readthedocs.io/en/latest/species.read.html#species.read.read_radtrans.ReadRadtrans) with the adopted parameters from the retrieval. The ``Radtrans`` object is stored as an attribute of ``ReadRadtrans`` and will be used by ``species`` but can typically be ignored by the user. The method also returns a list of random spectra (30 in the example below) that have been recalculated at a resolving power of $R = 500$. Each of the model spectra is stored in a [ModelBox](https://species.readthedocs.io/en/latest/species.core.html#species.core.box.ModelBox).

In [None]:
samples, radtrans = database.get_retrieval_spectra(tag='betapicb',
                                                   random=30,
                                                   wavel_range=(0.5, 6.),
                                                   spec_res=500.)

# Create plots of P-T profiles, opacities, and clouds

In [None]:
species.plot_pt_profile(tag='betapicb',
                        random=100,
                        xlim=(0., 6000.),
                        offset=(-0.07, -0.14),
                        output='pt_profile.png',
                        radtrans=radtrans,
                        extra_axis='grains')

species.plot_opacities(tag='betapicb',
                       offset=(-0.1, -0.14),
                       output='opacities.png',
                       radtrans=radtrans)

species.plot_clouds(tag='betapicb',
                    offset=(-0.1, -0.15),
                    output='clouds.png',
                    radtrans=radtrans,
                    composition='MgSiO3')

## Read companion data, best-fit sample, and fit residuals

In [None]:
best = database.get_probable_sample(tag='betapicb')

objectbox = database.get_object('beta Pic b',
                                inc_phot=False)

objectbox = species.update_spectra(objectbox, best)

residuals = species.get_residuals(datatype='model',
                                  spectrum='petitradtrans',
                                  parameters=best,
                                  objectbox=objectbox,
                                  inc_phot=False,
                                  inc_spec=True,
                                  radtrans=radtrans)

modelbox = radtrans.get_model(model_param=best,
                              spec_res=500.,
                              plot_contribution='contribution.png')

no_clouds = best.copy()
no_clouds['mgsio3_fraction'] = -100.
no_clouds['fe_fraction'] = -100.
model_no_clouds = radtrans.get_model(no_clouds)

## Plot the SED with data and models

In [None]:
species.plot_spectrum(boxes=[samples, modelbox, model_no_clouds, objectbox],
                      filters=None,
                      plot_kwargs=[{'ls': '-', 'lw': 0.1, 'color': 'gray'},
                                   {'ls': '-', 'lw': 0.5, 'color': 'black'},
                                   {'ls': '--', 'lw': 0.3, 'color': 'black'},
                                   {'GPI': {'marker': 'o', 'ms': 2., 'color': 'tab:green', 'ls': 'none', 'alpha': 0.2, 'mew': 0., 'label': 'NIFS'},
                                    'GRAVITY': {'marker': 'o', 'ms': 2., 'color': 'tab:blue', 'ls': 'none', 'alpha': 0.2, 'mew': 0., 'label': 'OSIRIS'}}],
                      residuals=residuals,
                      xlim=(1.1, 2.5),
                      ylim=(0.15e-16, 1.15e-15),
                      ylim_res=(-5., 5.),
                      scale=('linear', 'linear'),
                      offset=(-0.6, -0.05),
                      figsize=(8, 2.5),
                      legend=[{'loc': 'upper right', 'fontsize': 8.}, {'loc': 'lower left', 'fontsize': 8.}],
                      output='spectrum.png')