# Flyby

Cassini had 23 (or perhaps 22) flybys to Enceladus. Find the precise times of the closest approach for each flyby. Doesn't NASA already know this?

search from `events` table where `title ILIKE '%flyby&' OR title ILIKE '&fly by&'` and include targets so we can filter for enceladus

Spot check the data vs ground truth, i.e. `import.master_plan`; first enceladus flyby should be in feb 17, 2005

Apparently that first flyby was called "obtain wideband examples of lightning whistlers", and target was "Saturn", not "Enceladus"???

## Sargeable vs non-sargeable

SARG is short for **search argument**. Sargable means that the database is able to perform an *index seek* to match the search predicate. If we're searching by an integer the engine could sort by that integer column, and automatically *seek* to that row without scanning every row.

Non-sargable means that the database is not able to use index seek, i.e. it must perform some SQL function, e.g. `WHERE UPPER(name) LIKE 'CASSINI'`. The storage engine must return all rows to SQL engine for intermediate evaluation before searching. This becomes a sequential scan; all rows must be evaluated.

In [2]:
import os
from dotenv import load_dotenv

In [3]:
load_dotenv("../.env")
user = os.environ.get('POSTGRES_USER')
pw = os.environ.get('POSTGRES_PASSWORD')
db_name = os.environ.get('POSTGRES_DB')
host = 'localhost'
port = 5432
conn_str = f'postgresql://{user}:{pw}@{host}:{port}/{db_name}'

In [5]:
%load_ext sql

In [6]:
%sql $conn_str

## Materialized views and indexing

View is not an actual table; just stored snippets of SQL. Create with:

In [None]:
%%sql
drop view if exists enceladus_events;
create view enceladus_events as
select
    events.time_stamp,
    events.time_stamp::date as date,
    event_types.description as event,
    to_tsvector(concat(
        events.description, ' ',
        events.title)
    ) as search
from events
inner join event_types
on event_types.id = events.event_type_id
where target_id=28
order by time_stamp;

**Materialized** views are similar, and the also allow indexing which improves search performance:

In [None]:
%%sql
create index idx_event_search
on enceladus_events using GIN(search)

Now when we search using `to_tsquery` it won't need to go through every item

Creating a view does not execute it, unlike creating a table; the view SQL only executes when it is queried against.

## full-text indexing

Prioritize useful terms and deprioritize *noise*. Critical when searching through our `title` or `description` columns in a large database.

In postgres, `to_tsvector(events.description)` is a function that indexes the string column. To make use of this indexed string column, 

1. create view with that `to_tsvector(events.description) as search`
2. use new `search` column in a where clause: `where search @@ to_tsquery('thermal')`. That will show all matches for `thermal`; other operators besides `@@` will do different things.

Combine `concat()` with `to_tsvector` to search through two different text columns: `to_tsvector(concat(events.description, ' ', events.title))`

## First flyby

Via historical context (i.e. domain knowledge), we know that feb 17 2005 was definitely first flyby. Time to identify it in our facts table, i.e. `events`

Look in events, and put back the text description by joining the dimension tables for manual inspection. This way we find out how the scientists actually labelled their flyby:

In [None]:
%%sql
select
    targets.description as target,
    events.time_stamp,
    event_types.description as event
from events
inner join event_types on event_types.id = events.event_type_id
inner join targets on targets.id = events.target_id
where events.time_stamp::date = '2005-02-17'
order by events.time_stamp;

This looks for all events on that date, with original target and event type descriptions as string.

One line reads: `Enceladus closest approach observation` with `Enceladus` as target, so let's put that restriction: `targets.description ILIKE 'enceladus'`. However instead of doing a slow string query, we can find what the target ID integer is via `select * from targets where description = 'Enceladus'` (28 for me; 40 in the book) and perform an index search. Cuts time in half from 94 to 55 ms; non-sargeable vs sargeable.

The flyby unexpectedly revealed some signs of an atmosphere. Second flyby threw all their instruments at it. The most active team on the 2005-03-09 flyby was CIRS (composite infared scanner), followed by UVIS (ultraviolet imaging spectrograph subsystem), to take UV images, then VIMS for infrared. This avalanche of readings confirmed that Enceladus indeed posessed an atmosphere

## all closest flybys

Use `concat` with `to_tsvector` to search for `closest` in description *and* title:

In [None]:
%%sql
drop view if exists enceladus_events;
create materialized view enceladus_events as
select
    events.id,
    events.title,
    events.description,
    events.time_stamp,
    events.time_stamp::date as date,
    event_types.description as event,
    to_tsvector(
        concat(events.description, ' ', events.title)
    ) as search
from events
inner join event_types
on event_types.id = events.event_type_id
where target_id=28
order by time_stamp;

/-- create index on our search column
create index idx_event_search
on enceladus_events using GIN(search)


/-- search for closest
select * from enceladus_events
where search @@ to_tsquery('closest')

This returns two closest flybys on 2009-11-02. Data entry error? Two actual flybys? It's possible the data may not be as reliable as hoped. Turn to a different dataset? Try INMS.