# TESS Data in the Cloud with TIKE

Adapted from [Project TIKEBook on GitHub](https://github.com/spacetelescope/project-tikebook/blob/main/notebooks/00-the-cloud/00-the-cloud.ipynb)

In this live coding exercise, we're going to take some of the things we've learned and try them out ourselves! This is a good time to get practice using `astroquery` while learning how to work with TESS data in the cloud.

Before we get started, make sure the environment in the upper right is set to "TESS Environment".

## Learning Goals: 
- Understand what TIKE is, and the principles behind cloud platforms
- Define cloud terminology: what is a “bucket” or a server? For that matter, what is the “cloud”?
- Access MAST data through astroquery by name, region, or criteria
- Query TESS data and show a light curve

## What is TIKE?

TIKE stands for the *Timeseries Integrated Knowledge Engine*.

TIKE uses a web-based platform, called JupyterHub, to allow you to run [Jupyter Notebooks](https://jupyterlab.readthedocs.io/en/latest/) and other software "on the cloud" using your web browser: you don't need to install anything on your local computer. TIKE has access to a cloud copy of the [MAST Archive](https://archive.stsci.edu), enabling anyone to access and analyze data from NASA's [TESS mission](https://archive.stsci.edu/missions-and-data/tess). We also have copies of other mission datasets, including data from HST, GALEX, and PanSTARRS. They are generally cataloged in full on the MAST Public Datasets page, so check there for an updated list.

TIKE is continually maintained and updated by humans, so if you run into issues please let us know. Don't hesitate to send us your suggestions for packages and tutorials, either through the [MAST help desk](mailto:archive@stsci.edu) or the [tike_content repository](https://github.com/spacetelescope/tike_content).

## What is the "cloud"?

The "cloud", or cloud computing, refers to the practice of remotely accessing computing resources, rather than hosting them yourself. This term might also be used to refer to software and databases running on those servers. As Randall Munroe put it, "turns out the cloud is just other people's computers".

In our case, "the cloud" is the AWS East Datacenters in northern Virginia. TIKE runs in proximity to this copy of MAST data. This means that the data is not transmitted over the internet, but rather within a data center. This leads to faster access, since data centers have high-quality (likely fiber optic) connections between their machines. 

### Why would I want to work on the cloud?
Using the cloud has several benefits; principally, as mentioned above, there's no need to download data to your local machine. This speeds up data access, and allows you to perform analyses that wouldn't be possible without a major upgrade to your hard drive capacity or internet service. You can access data whenever and wherever you want to, from any device, as long as you have an internet connection. 

![tike-cloud](TIKE-Cloud-Photo.png)

### What's the difference between working on the cloud and working on TIKE?
Although you might choose to work directly with data stored on the cloud, it can be complex to configure such a system. TIKE handles this complexity, making it as easy as opening a Jupyter Notebook.

### How can I access cloud-hosted data?

There are two approaches to accessing cloud-hosted data:
1. While on TIKE, loading files directly into memory (recommended)
2. A traditional download to your local machine from the cloud-hosted copy of MAST

Whenever possible, it's best to use the first method. The vast majority of users, with small tweaks to existing code, should be able to access data this way.

## Imports and Setup

We'll use the standard tools to open and plot a fits file:
- `matplotlib` to create the plot
- `numpy` to automatically set brightness limits in the plot

To access the cloud data, we need
- `astroquery.mast` to search for and select data

Finally, we need
- `lightkurve` to read and manipulate light curve data

In [None]:
import matplotlib.pyplot as plt
import numpy as np

from astroquery.mast import Observations

import lightkurve as lk

The most important step in this process is to enable cloud data access. Once we do, we'll be able to get cloud filenames and access files directly. If you're working locally, you can use this command to download data from the cloud copy of MAST data.

## 1. Query for MAST Observations
We've seen how to use `astroquery.mast` to query MAST data. Now let's put it to use!

### Workflow Reminder
Remember, the path from "I want MAST data" to "I have MAST data" has three steps:

1. Filter MAST Observations using metadata, such as Ra/Dec, exposure time, and wavelength.
2. Filter the underlying files associated with each Observation (e.g. using calibration level or file type).
3. Access the data, by downloading it or loading it directly into memory.

Here are our three querying functions again:
- `query_region()`
- `query_object()`
- `query_criteria()`

### Warmup: Count Results
You can append `_count` to any of the above functions to get the number of matching results. For example, we can query within 1 arcminue of the coordinates of Fomalhaut:

In [None]:
coordinates="22h57m39.04625s -29d37m20.0533s"

Now it's your turn! How many Observations in MAST are within 2 arcseconds of Trappist-1?

In [None]:
# TYPE ANSWER HERE


In [None]:
# hint: uncomment and run
#Observations.query_object?

#### Querying for an Light Curve

Let's choose a new star: Pi Mensae, a G-dwarf in the southern constellation Mensa, which means "Table".

We'll use the `query_criteria` function to look for TESS Observations within 2 arcseconds. The relevant keywords here are 'objectname', 'radius', and 'obs_collection'.

The full table can be a bit overwhelming. Let's only show a subset of columns.

In [None]:
cols = ['target_name', 's_ra', 's_dec', 'dataproduct_type', 'calib_level', 't_exptime', 'sequence_number', 'dataRights', 'distance']


The `distance` to all of these observations is zero, even though their coordinates (`s_ra` and `s_dec`) are different. What gives?

As it turns out, `distance` is a measure of the separation (in arcseconds) of our input coordinates and the Observation footprint. So long as our coordinates are within the footprint, the `distance` will be zero.

Since we want to plot an light curve, we'll select one of the 120-second cadence time series. Let's use sector 27. We could use standard Python indexing for this, but we could also just reformat our query. We will use the keywords 'objectname', 'obs_collection', 'sequence_number', 't_exptime', 'radius', and 'dataproduct_type'.


In [None]:
# option 1: use bitwise and 
# match = np.bitwise_and(tess_obs['sequence_number']==27, tess_obs['dataproduct_type']=="timeseries", tess_obs['t_exptime'==120)
# tess_obs[match]

# option 2: format the query
tess_obs = Observations.query_criteria(
)

tess_obs[cols]

As expected, we only get one matching observation back.

## 2: Get Products

Now that we have our Observation, we'll use the `get_product_list` to find the underlying files.

This returns multiple data products: a light curve and a target pixel file. You can use `Observations.filter_products` to filter these down to the product(s) you want.

## 3: Data Access

Once you've identified your file(s) of interest, you must choose your access method.

### Downloading

We won't say much about this method, since it's not recommended to do this on the cloud. Just know that the option exists, both on TIKE and your local machine

In [None]:
# img_path = Observations.download_products(science_products)

### Streaming to Memory
A downloaded file has a path on your computer (e.g. `Downloads/docs/copy-of-untitled1.txt`). We need to use the cloud equivalent of this. Fortunately, there's a function for that: `Observations.get_cloud_uris`

As of this past August, the `lightkurve` package can read data products from the cloud just by passing the URI. Let's see it in action:

These new features let you read light curves and other MAST data without the need for lengthy downloads, because the TIKE environment lets you work right next to the data!

The next session will go into these features in much more detail. This is just to get us started!

### Display the Light Curve

Finally, let's plot our light curve.

## Summary

Congratulations! By now you should understand
- what TIKE is, and the principles behind cloud platforms
- basic cloud terminology: buckets, servers, and cloud
- how to access MAST data through astroquery by name, region, or criteria
- how to query TESS data and show a light curve


For full details on how TESS collects and processes images and produces light curves see the [TESS Instrument Handbook](https://archive.stsci.edu/missions/tess/doc/TESS_Instrument_Handbook_v0.1.pdf).

## Additional Resources
Can't get enough? Here are some links to more information!

If you need an introduction (or a refresher!) to basic Python syntax, there are several great resources available online. [CodeAcademy](https://www.codecademy.com/learn/learn-python-3) is a great service with a totally free option for getting started with Python, note you will have to create an account to use it. Additionally, the Youtube channel FreeCodeCamp.org has a great [video tutorial](https://www.youtube.com/watch?v=rfscVS0vtbw) on everyting you need to get started programming in Python. Another good resource is the [Python 4 Everyone](https://www.py4e.com/) book. 

The full astropy documentation can be found [here](https://docs.astropy.org/en/stable/index.html).

For more info on FITS files, here is a link to the [FITS NASA site](https://fits.gsfc.nasa.gov/). 

SIMBAD is a web-based query service from the University of Strausberg, it is a great resource for getting quick info on stars and other astronomical targets. Here is the link to [Pi Mensae's SIMBAD page](https://simbad.u-strasbg.fr/simbad/sim-basic?Ident=pi+mensae&submit=SIMBAD+search)