# Reading/Writing a 🌈

This page describes how to read and/or write `Rainbow` objects, using a variety of format definitions that have been included with the main `chromatic` package.

In [1]:
from chromatic import read_rainbow, version

In [None]:
version()

## Quickstart

To get started reading files, if you have a file that you think contains flux as a function of wavelength and time ("time-series spectra" or "multiwavelength light curves" or some such), try just using the default `read_rainbow` function. It will try to guess the file format from the file name. 

In [None]:
rainbow = read_rainbow(
    "example-datasets/stsci/jw02734002001_04101_00001-seg00*_nis_x1dints.fits"
)

Then, to save a file, try just using the `.save()` method. Again, it will try to guess the file format from the filename.


In [4]:
rainbow.save("example-datasets/chromatic/ero-transit-wasp-96b.rainbow.npy")

The sections below provide more details on some of the available file formats for reading and writing files, but the basic process is what you've already seen: use `read_rainbow()` and `.save()` to load and save spectroscopic light curve data with a variety of formats!

## Reading Files

`chromatic` can load data from a variety of different file formats. Whether these are time-series spectra or binned spectroscopic light curves, there's a good chance that the `read_rainbow` function might be able to load them into a 🌈. By writing custom readers for different data formats, we hope to make it easier to use `chromatic` to compare the results of different analyses.

**Download Example Inputs:** If you want to test out any of these readers, you'll need data files in each format to test on. You can download *some* example datasets from [this link](https://www.dropbox.com/s/es5drnp6ufkz8wv/example-datasets.zip?dl=0). Simply extract that `.zip` file into the directory from which you'll be running this notebook. Another source of files you might want to try reading would be the simulated data generated for the [ers-transit Spring 2022 Data Challenge](https://ers-transit.github.io/data-challenge-with-simulated-data.html#simulated-data).

### `chromatic` rainbow files (`*.rainbow.npy`)

The `chromatic` toolkit saves files in its own default format, which can then be shared and loaded back in. These files directly encode the core dictionaries in binary files, so they load and save quickly. They have the extension `.rainbow.npy` and can be written from any `Rainbow` object. 

In [5]:
r = read_rainbow("example-datasets/chromatic/test.rainbow.npy")

The `Rainbow` reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword `format='rainbow_npy'`, to require the use of the `from_rainbow_npy` reader needed for these files.

### `chromatic` rainbow FITS files (`*.rainbow.fits`)

Because you might want to share a `Rainbow` object with someone not using Python, we define a FITS-based file format. The [Flexible Image Transport System](https://docs.astropy.org/en/stable/io/fits/index.html) is common in astronomy, so there's a good chance someone will be able to load this file into whatever coding language they're using. These files have the extension `.rainbow.fits`, and they will load a tiny bit more slowly than `.rainbow.npy` files; they can be written from any `Rainbow` object. 

In [6]:
r = read_rainbow("example-datasets/chromatic/test.rainbow.fits")

The `Rainbow` reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword `format='rainbow_FITS'`, to require the use of the `from_rainbow_FITS` reader needed for these files.

### generic text files (`*.txt`, `*.csv`)

Text files are slower to read or write, but everyone can make them. This reader will try to load one giant text file in which light curves for all wavelengths are stacked on top of each other or spectra for all times are stacked on top of each other. The text file should at least have columns that look like:
- `wavelength` for wavelength in microns
- `time` for time in days (preferably BJD$_{\rm TDB}$)
- `flux` for flux in any units
- `uncertainty` for flux uncertainties in the same units as `flux`
Additional columns will also be read, and they will be stored in the `.fluxlike` core dictionary.

In [None]:
r = read_rainbow("example-datasets/chromatic/test.rainbow.txt")

If the file-format guess fails, you can feed in the keyword `format='text'` to tell the reader to expect one of these files.

### STScI `jwst` pipeline outputs (`x1dints.fits`)

The `jwst` pipeline developed at the Space Telescope Science Institute will produce extract 1D stellar spectra for time-series observations with the James Webb Space Telescope. Details about the pipeline itself are available [here](https://jwst-pipeline.readthedocs.io/en/latest/). 

These files typically end with the `_x1dints.fits` suffix. Each file contains a number of individual "integrations" (= time points). Because the datasets can get large, sometimes a particular observation might be split into multiple segments, each with its own file. As such, the reader for these files is designed to handle either a single file or a path with a `*` in it that points to a group of files from an observation that's been split into segments.

In [None]:
r = read_rainbow("example-datasets/stsci/*_x1dints.fits")

If the file-format guess fails, you can feed in the keyword `format='x1dints'` to tell the reader to expect one of these files. This reader was rewritten on 13 July 2022 to read in the JWST/ERO `x1dints` datasets. It might not work on earlier simulated `x1dints` files like those in the simulated datasets available [here](https://app.box.com/folder/154382715453?s=tj1jnivn9ekiyhecl5up7mkg8xrd1htl); for those, try using the `format='x1dints_kludge'` keyword.

### `eureka` pipeline outputs (`S[3|4|5].[h5|txt]`)

The [Eureka!](https://github.com/kevin218/Eureka) pipeline is one of many community tools being designed to extract spectra from JWST data. The current outputs have filenames that look like `S3*SpecData.h5` for Stage 3 (extracted spectra), `S4*LCData.h5` for Stage 4 (raw binned light curves), and a group of files `*S5_*_Table_Save_*.txt` for Stage 5 (fitted binned light curves) for all channels. Any of these three stages can be read with `chromatic`.

In [None]:
s3 = read_rainbow("example-datasets/eureka/S3_example_SpecData.h5")

In [None]:
s4 = read_rainbow("example-datasets/eureka/S4_example_LCData.h5")

In [11]:
s5 = read_rainbow("example-datasets/eureka/S5*Table_Save_*.txt")

If the file-format guess fails, you can feed in the keywords `format='eureka_s3'`, `format='eureka_s4'`, or `format='eureka_s5'` to tell the reader what file(s) to expect. (Older versions of Eureka! used text files for earlier stages, with filenames like `S3_*_Table_Save.txt`; that format will continue work with `format='eureka_txt'`.)

### `xarray`-based ERS format (`*.xc`)

Natasha Batalha, Lili Alderson, Munazza Alam, and Hannah Wakeford put together some specifications for a standard format for publishing datasets. The details may still change a little bit (as of 13 July 2022), but `chromatic` can currently read a version their `stellar-spec`, `raw-light-curves`, and `fitted-light-curves` formats. 

In [12]:
spectra = read_rainbow("example-datasets/xarray/stellar-spec.xc")

In [13]:
raw_lcs = read_rainbow("example-datasets/xarray/raw-light-curves.xc")

In [14]:
fitted_lcs = read_rainbow("example-datasets/xarray/fitted-light-curves.xc")

## Writing Files

`chromatic` can write out files in a variety of different file formats. By pairing with the available readers, this makes it possible to effectively switch one file format to another, simply by reading one file in and saving it out as another. To demonstrate the readers, let's create a simple simulated dataset.

In [15]:
from chromatic import SimulatedRainbow

In [16]:
simulated = SimulatedRainbow().inject_transit().inject_systematics().inject_noise()

### `chromatic` rainbow files (`*.rainbow.npy`)

The default file format for saving files encodes the core dictionaries in binary files, using the extension `.rainbow.npy`. This is a file that can be read directly back into `chromatic`. (Indeed, the commands below created the file that we read above.)

In [17]:
simulated.save("example-datasets/chromatic/test.rainbow.npy")

### `chromatic` rainbow FITS files (`*.rainbow.fits`)

If you want to share your Rainbow object with someone who might not be using Python, consider sharing a `.rainbow.fits` file. This is a normal FITS file that many astronomers will have a way of reading. The primary extension has no data but a header that might contain some metadata. The three other extensions `fluxlike`, `wavelike`, and `timelike` contain quantities that have shapes of `(nwave, ntime)`, `(nwave)`, `(ntime)`, respectively.

In [None]:
simulated.save("example-datasets/chromatic/test.rainbow.fits")

### `xarray`-based ERS format (`*.xc`)
`chromatic` can write out to the standard `xarray`-based format described above. These writers will generally raise warnings if important metadata is missing. 

In [None]:
spectra = simulated.save("example-datasets/xarray/stellar-spec.xc")

In [None]:
raw_lcs = simulated.save("example-datasets/xarray/raw-light-curves.xc")

In [21]:
fitted_lcs = simulated.save("example-datasets/xarray/fitted-light-curves.xc")

### generic text files (`*.txt`, `*.csv`)

Text files provide a more generally readable file format, even though they may be slower to read or write. This writer will create one giant text file that stacks the light curves for all wavelengths on top of each other (if the `group_by='wavelength'` keyword is set) or the spectra for all times on top of each other (if the `group_by='time'` keyword is set). The resulting text file should at least have columns that look like:
- `wavelength` for wavelength in microns
- `time` for time in days (preferably BJD$_{\rm TDB}$)
- `flux` for flux in any units
- `uncertainty` for flux uncertainties in the same units as `flux`

In [22]:
simulated.save("example-datasets/chromatic/test.rainbow.txt")

## Other File Formats

Naturally, you might want to use other readers or writers than have already been listed here, to be able to interpret outputs from other analyses or to output the inputs needed for various light curve analyses. We've already added a number of custom readers and writers. Here are the currently available file formats:

In [23]:
from chromatic import available_readers, available_writers

In [None]:
list(available_readers)

In [None]:
list(available_writers)

If you would like a reader and/or writer for a format that doesn't exist above, please either [submit an Issue](../github#Should-I-submit-an-Issue-to-the-chromatic-GitHub-repository?) to discuss its creation or see [Designing New 🌈 Features ](../designing) to learn how to add it yourself. We've invested effort in trying to make it as easy as possible to develop readers/writers for new formats, so please don't hesitate to ask for something to be added. It's really not that hard to add something new.