# Under the Ice

Goal is to create a thermochemical map that can be overlayed onto Enceladus' south pole. Use `chem_data.csv`

In [1]:
%load_ext sql

In [2]:
import os
from dotenv import load_dotenv

In [3]:
load_dotenv("../.env")
user = os.environ.get('POSTGRES_USER')
pw = os.environ.get('POSTGRES_PASSWORD')
db_name = os.environ.get('POSTGRES_DB')
host = 'localhost'
port = 5432
conn_str = f'postgresql://{user}:{pw}@{host}:{port}/{db_name}'

In [4]:
%sql $conn_str

In [10]:
%%sql
-- manifest does not accurately describe what's going on
-- visual check on .csv revealed the schema
CREATE TABLE IF NOT EXISTS chem_data (
    name TEXT,
    formula TEXT,
    molecular_weight INTEGER,
    peak INTEGER,
    sensitivity NUMERIC
);

COPY chem_data
FROM '/home/curious/data/INMS/chem_data.csv'
WITH DELIMITER ',' HEADER CSV;

 * postgresql://postgres:***@localhost:5432/enceladus
Done.
82 rows affected.


[]

In [11]:
%%sql 
select * from chem_data limit 5;

 * postgresql://postgres:***@localhost:5432/enceladus
5 rows affected.


name,formula,molecular_weight,peak,sensitivity
Water,H2O,18,18,0.000482
Cyanogen,C2N2,52,52,0.000603
Carbon Monoxide,CO,28,28,0.000603
Carbon Monoxide,CO(iso),28,29,0.000603
Propyne,C3H4,40,40,0.00061


## It shouldn't exist

Named after a giant in greek myth, it first intrigued scientists for its high reflectivity, but was quickly dismissed as a "dead iceball" like its neighbouring moons. Voyager found that it was sitting in the middle of the diffuse and less dramatic E-ring, and hypotheses abound speculating its contribution to the ring. Was it volcanism? Why hasn't it drained out eons ago? Why is it so reflective even after so many inevitable impacts?

E0-2 investigated the first strange phenomenon of Saturn's magnetic fields being *bent* by Enceladus. It also discovered that it spewed ice, creating a small atmosphere. Then the CIRS discovered a hot spot centered on the south pole, when it was expected to be the coldest. Readings of water and organic molecules were later registered. Even stranger - salt and silica were also found

Salt could only mean the presence of liquid water, and liquid meant more heat than predicted

First hypothesis posited tidal pull from Saturn and Dione as the heat source, but stress zones measured did not match

Enceladus is tidal locked to Saturn, rotating only once per *Saturn* year. This meant its face should be static relative to Saturn, except it was not. They also noted that changes in orbital distance changed plumes activity, presumably by varying the tidal stress; this indicated a flexible icy crust. Together this meant that the ice was completely detached from the core - Enceladus had a global ocean underneath.

And still the question remained - where is the heat source to keep things liquid?

Planetary science at the time had no working models to explain the global ocean and plume system, where it could peak 90 deg celsius

If the core was porous, then water could seep in and become heated enough to drive the plumes

With water and heat, the only thing left was the proper chemistry to make up the primordial ooze; specifically H2

E20-22 were dedicated to sample the plumes for hydrogen

## Relating `chem_data.csv` with INMS

Of note:

- dates
- source
- mass table
- mass per charge
- particle energy
- relative speed
- C1/C2 counters

Relate to `mass_peak` in `chem_data.csv`

First, create schema for INMS, and move `chem_data` inside

In [12]:
%%sql
-- make inms schema
DROP SCHEMA IF EXISTS inms CASCADE;
CREATE SCHEMA inms;

-- move chem_data in
ALTER TABLE chem_data
SET SCHEMA inms;

-- load from import.inms
-- recall that the speed is relative to TARGET, i.e. enceladus
SELECT
    sclk::timestamp as time_stamp,
    source::text,
    mass_table,
    alt_t::numeric(9,2) as altitude,
    mass_per_charge::numeric(6,3),
    p_energy::numeric(7,3),
    pythag3(
    sc_vel_t_scx::numeric,
    sc_vel_t_scy::numeric,
    sc_vel_t_scz::numeric
    ) as relative_speed,
    c1counts::integer as high_counts,
    c2counts::integer as low_counts
into inms.readings
from import.inms
order by time_stamp;

ALTER TABLE inms.readings
ADD id SERIAL PRIMARY KEY;

 * postgresql://postgres:***@localhost:5432/enceladus
Done.
Done.
Done.
13857176 rows affected.
Done.


[]

Relate `readings` to `flybys` via `flybys.id`:

In [13]:
%%sql
ALTER TABLE inms.readings
ADD flyby_id INT REFERENCES flybys(id);

 * postgresql://postgres:***@localhost:5432/enceladus
Done.


[]

### EXPLAIN and ANALYZE

An expensive op to be sure, checking each casted timestamp in the 13M rows table.
```sql
-- OPTION 1
UPDATE inms.readings
SET flyby_id = flybys_id
FROM flybys
WHERE flybys.date = inms.readings.time_stamp::date;
```

There is another option involving join:

```sql
-- OPTION 2
UPDATE inms.readings
SET flyby_id = (
    SELECT
        f.id
    FROM flybys f
    WHERE f.date = inms.readings.time_stamp::date
    LIMIT 1
);
```

Simply put `EXPLAIN` in front of the query and postgres will tell us what's what

In [17]:
%%sql
EXPLAIN UPDATE inms.readings
SET flyby_id = flybys.id
FROM flybys
WHERE flybys.date = readings.time_stamp::date;

 * postgresql://postgres:***@localhost:5432/enceladus
9 rows affected.


QUERY PLAN
Update on readings (cost=1.52..358353.90 rows=0 width=0)
-> Hash Join (cost=1.52..358353.90 rows=1593532 width=16)
Hash Cond: ((readings.time_stamp)::date = flybys.date)
-> Seq Scan on readings (cost=0.00..273133.04 rows=13856804 width=14)
-> Hash (cost=1.23..1.23 rows=23 width=14)
-> Seq Scan on flybys (cost=0.00..1.23 rows=23 width=14)
JIT:
Functions: 10
"Options: Inlining false, Optimization false, Expressions true, Deforming true"


In [18]:
%%sql
EXPLAIN UPDATE inms.readings
SET flyby_id = (
    SELECT
        f.id
    FROM flybys f
    WHERE f.date = inms.readings.time_stamp::date
    LIMIT 1
);

 * postgresql://postgres:***@localhost:5432/enceladus
9 rows affected.


QUERY PLAN
Update on readings (cost=0.00..18910534.42 rows=0 width=0)
-> Seq Scan on readings (cost=0.00..18910534.42 rows=13856804 width=10)
SubPlan 1
-> Limit (cost=0.00..1.34 rows=1 width=4)
-> Seq Scan on flybys f (cost=0.00..1.34 rows=1 width=4)
Filter: (date = (readings.time_stamp)::date)
JIT:
Functions: 9
"Options: Inlining true, Optimization true, Expressions true, Deforming true"


What do they mean??? These are steps taken by postgres should the query be made.

- all values are estimates
- `cost` - some nebulous unit of compute time; lower the faster
- `rows` - est number of row output

Our `EXPLAIN` results returned 358k for option 1 and 18.9 million for 2, perhaps due to the `WHERE` clause.

### Junction table

There is a hidden option 3 - a junction table exists solely to support a relation between two unrelated tables, i.e. `inms.readings` and `public.flybys`

Create a table with two fields storing the **primary keys** from the two tables. Then the primary key of the junction table could be a combination of the two fields, i.e. `flybys.id` and `readings.id`. Note that does not guarantee *unicity*; records from `inms.readings` could still appear twice

In [46]:
%%sql
-- junction table
DROP TABLE IF EXISTS flyby_readings;
CREATE TABLE flyby_readings(
    reading_id INT NOT NULL UNIQUE REFERENCES inms.readings(id),
    flyby_id INT NOT NULL REFERENCES public.flybys(id), -- CANNOT BE UNIQUE
    PRIMARY KEY (reading_id, flyby_id)
);

 * postgresql://postgres:***@localhost:5432/enceladus
Done.
Done.


[]

Filling in the data

In [21]:
%%sql
-- filling the ID table
EXPLAIN SELECT
    r.id,
    f.id
FROM inms.readings r
JOIN flybys f
ON r.time_stamp::DATE = f.date;

 * postgresql://postgres:***@localhost:5432/enceladus
8 rows affected.


QUERY PLAN
Hash Join (cost=1.52..358353.90 rows=1593532 width=8)
Hash Cond: ((r.time_stamp)::date = f.date)
-> Seq Scan on readings r (cost=0.00..273133.04 rows=13856804 width=12)
-> Hash (cost=1.23..1.23 rows=23 width=8)
-> Seq Scan on flybys f (cost=0.00..1.23 rows=23 width=8)
JIT:
Functions: 11
"Options: Inlining false, Optimization false, Expressions true, Deforming true"


In [24]:
%%sql
-- ANALYZE will execute the statement, and analyze performance
EXPLAIN ANALYZE SELECT
    r.id,
    f.id
FROM inms.readings r
JOIN flybys f
ON r.time_stamp::DATE = f.date;

 * postgresql://postgres:***@localhost:5432/enceladus
12 rows affected.


QUERY PLAN
Hash Join (cost=1.52..358353.90 rows=1593532 width=8) (actual time=794.183..14288.897 rows=13310039 loops=1)
Hash Cond: ((r.time_stamp)::date = f.date)
-> Seq Scan on readings r (cost=0.00..273133.04 rows=13856804 width=12) (actual time=0.948..10587.385 rows=13857176 loops=1)
-> Hash (cost=1.23..1.23 rows=23 width=8) (actual time=790.894..790.896 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on flybys f (cost=0.00..1.23 rows=23 width=8) (actual time=790.779..790.786 rows=23 loops=1)
Planning Time: 0.318 ms
JIT:
Functions: 11
"Options: Inlining false, Optimization false, Expressions true, Deforming true"


In [25]:
%%sql
EXPLAIN SELECT
    r.id,
    f.id
FROM inms.readings r
JOIN flybys f
ON DATE_PART('year', time_stamp) = f.year
AND DATE_PART('week', time_stamp) = f.week;

 * postgresql://postgres:***@localhost:5432/enceladus
10 rows affected.


QUERY PLAN
Gather (cost=1001.58..266304.11 rows=7968 width=8)
Workers Planned: 2
-> Hash Join (cost=1.57..264507.31 rows=3320 width=8)
"Hash Cond: ((date_part('year'::text, r.time_stamp) = (f.year)::double precision) AND (date_part('week'::text, r.time_stamp) = (f.week)::double precision))"
-> Parallel Seq Scan on readings r (cost=0.00..192301.68 rows=5773668 width=12)
-> Hash (cost=1.23..1.23 rows=23 width=12)
-> Seq Scan on flybys f (cost=0.00..1.23 rows=23 width=12)
JIT:
Functions: 14
"Options: Inlining false, Optimization false, Expressions true, Deforming true"


In [26]:
%%sql
EXPLAIN ANALYZE SELECT
    r.id,
    f.id
FROM inms.readings r
JOIN flybys f
ON DATE_PART('year', time_stamp) = f.year
AND DATE_PART('week', time_stamp) = f.week;

 * postgresql://postgres:***@localhost:5432/enceladus
15 rows affected.


QUERY PLAN
Gather (cost=1001.58..266304.11 rows=7968 width=8) (actual time=66.651..5979.141 rows=13857176 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=1.57..264507.31 rows=3320 width=8) (actual time=51.392..4376.661 rows=4619059 loops=3)
"Hash Cond: ((date_part('year'::text, r.time_stamp) = (f.year)::double precision) AND (date_part('week'::text, r.time_stamp) = (f.week)::double precision))"
-> Parallel Seq Scan on readings r (cost=0.00..192301.68 rows=5773668 width=12) (actual time=0.656..684.204 rows=4619059 loops=3)
-> Hash (cost=1.23..1.23 rows=23 width=12) (actual time=49.403..49.403 rows=23 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on flybys f (cost=0.00..1.23 rows=23 width=12) (actual time=49.163..49.190 rows=23 loops=3)
Planning Time: 0.205 ms


One is orders of magnitude faster

In [36]:
%%sql
with dups as (
    SELECT
        f.id flybyid,
        r.id readid,
        ROW_NUMBER() OVER (PARTITION BY f.id, r.id) AS dup_count
    FROM inms.readings r
    INNER JOIN flybys f
    ON DATE_PART('year', time_stamp) = f.year
    AND DATE_PART('week', time_stamp) = f.week
)
SELECT * FROM dups
WHERE dup_count > 1
ORDER BY dup_count
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/enceladus
0 rows affected.


flybyid,readid,dup_count


In [47]:
%%sql
INSERT INTO flyby_readings(reading_id, flyby_id)
SELECT
    r.id,
    f.id
    --ROW_NUMBER() OVER (PARTITION BY f.id, r.id) AS dup_count
FROM flybys f
INNER JOIN inms.readings r
ON DATE_PART('year', r.time_stamp) = f.year
AND DATE_PART('week', r.time_stamp) = f.week;

 * postgresql://postgres:***@localhost:5432/enceladus
13857176 rows affected.


[]

lists all constraints on a specific table, for troubleshooting

In [67]:
%%sql
SELECT con.*
       FROM pg_catalog.pg_constraint con
            INNER JOIN pg_catalog.pg_class rel
                       ON rel.oid = con.conrelid
            INNER JOIN pg_catalog.pg_namespace nsp
                       ON nsp.oid = connamespace
       WHERE nsp.nspname = 'public'
             AND rel.relname = 'flybys';


 * postgresql://postgres:***@localhost:5432/enceladus
1 rows affected.


oid,conname,connamespace,contype,condeferrable,condeferred,convalidated,conrelid,contypid,conindid,conparentid,confrelid,confupdtype,confdeltype,confmatchtype,conislocal,coninhcount,connoinherit,conkey,confkey,conpfeqop,conppeqop,conffeqop,confdelsetcols,conexclop,conbin
33501,flybys_2_pkey,2200,p,False,False,True,33495,0,33500,0,0,,,,True,0,True,[1],,,,,,,


### Indexes vs Joins

Even though it's good design, the cost needs to be considered; the added cost of storing a 13 million row table created from JOINs may not be justified by the added convenience, especially if *indexing* can already solve the problem of query performance

By indexing `time_stamp`, we would achieve the original goal of faster and easier querying. Index the field we want to join on.

### Timestamps and Indexes

TS to consider:

1. `ins.readings.time_stamp`
2. `flybys.window_start`
3. `flybys.window_end`

Different approaches to structuring and querying the data

### Simple BTREE index

Balanced tree index uses a binary tree with equal distribution of child nodes

```sql
CREATE INDEX CONCURRENTLY idx_ts 
ON inms.readings(time_stamp)
WHERE altitude IS NOT NULL;
```

Lets postgres do the heavy lifting, and takes more storage. Restricting indexing to only rows we're interested in helps.

Any op involving 13M rows will lock the table, but if we add `CONCURRENTLY`, the table will not lock. This feature is usually available only in enterprise versions of other databases.

In [50]:
%%sql
CREATE INDEX idx_ts 
ON inms.readings(time_stamp)
WHERE altitude IS NOT NULL;

 * postgresql://postgres:***@localhost:5432/enceladus
Done.


[]

Try querying using the newly indexed time_stamp field

In [79]:
%%sql
EXPLAIN ANALYZE SELECT * FROM inms.readings
WHERE time_stamp > '2015-12-19 17:48:55.275'
and time_stamp < '2015-12-19 17:49:35.275' 
and altitude is not null;

 * postgresql://postgres:***@localhost:5432/enceladus
4 rows affected.


QUERY PLAN
Index Scan using idx_ts on readings (cost=0.43..25.45 rows=451 width=55) (actual time=3.157..6.628 rows=1175 loops=1)
Index Cond: ((time_stamp > '2015-12-19 17:48:55.275'::timestamp without time zone) AND (time_stamp < '2015-12-19 17:49:35.275'::timestamp without time zone))
Planning Time: 0.978 ms
Execution Time: 6.706 ms


In [52]:
%%sql
SELECT * FROM inms.readings
WHERE time_stamp > '2015-12-19 17:48:55.275'
and time_stamp < '2015-12-19 17:49:35.275' 
and altitude is not null;

 * postgresql://postgres:***@localhost:5432/enceladus
1175 rows affected.


time_stamp,source,mass_table,altitude,mass_per_charge,p_energy,relative_speed,high_counts,low_counts,id,flyby_id
2015-12-19 17:48:55.307000,csn,16,5003.94,28.0,70.228,9.54,1,0,13616140,
2015-12-19 17:48:55.341000,csn,16,5003.93,28.0,70.228,9.54,2,0,13616141,
2015-12-19 17:48:55.375000,csn,16,5003.91,2.0,5.016,9.54,2,0,13616142,
2015-12-19 17:48:55.409000,csn,16,5003.9,44.0,110.359,9.54,0,0,13616143,
2015-12-19 17:48:55.443000,csn,16,5003.89,14.0,35.114,9.54,0,0,13616144,
2015-12-19 17:48:55.477000,csn,16,5003.88,15.0,37.622,9.54,0,0,13616145,
2015-12-19 17:48:55.511000,csn,16,5003.87,28.0,70.228,9.54,2,0,13616146,
2015-12-19 17:48:55.545000,csn,16,5003.85,16.0,40.13,9.54,0,0,13616147,
2015-12-19 17:48:55.579000,csn,16,5003.84,17.0,42.639,9.54,0,0,13616148,
2015-12-19 17:48:55.614000,csn,16,5003.83,18.0,45.147,9.54,0,0,13616149,


With timestamp indexed, join the `flybys` table with `window_start` and `window_end`

In [60]:
%%sql
SELECT
    name,
    mass_per_charge,
    time_stamp,
    inms.readings.altitude
FROM inms.readings
INNER JOIN flybys
ON time_stamp >= window_start::timestamp
AND time_stamp <= window_End::timestamp
WHERE flybys.id = 4
limit 10;

 * postgresql://postgres:***@localhost:5432/enceladus
(psycopg2.errors.UndefinedColumn) column "name" does not exist
LINE 2:     name,
            ^
HINT:  Perhaps you meant to reference the column "flybys.date".

[SQL: SELECT
    name,
    mass_per_charge,
    time_stamp,
    inms.readings.altitude
FROM inms.readings
INNER JOIN flybys
ON time_stamp >= window_start
AND time_stamp <= window_End
WHERE flybys.id = 4
limit 10;]
(Background on this error at: https://sqlalche.me/e/20/f405)


Missed `flybys.name`...

Use `flybys_id` to make it

In [71]:
%%sql
ALTER TABLE flybys
DROP name;

ALTER TABLE flybys
ADD name TEXT;

UPDATE flybys
SET name = CONCAT('E-', id - 1);

 * postgresql://postgres:***@localhost:5432/enceladus
Done.
Done.
23 rows affected.


[]

In [89]:
%%sql
EXPLAIN ANALYZE SELECT
    f.name,
    r.mass_per_charge,
    r.time_stamp,
    r.altitude
FROM inms.readings r
INNER JOIN flybys f
ON r.time_stamp >= f.window_start
AND r.time_stamp <= f.window_end
WHERE f.id = 4;

 * postgresql://postgres:***@localhost:5432/enceladus
13 rows affected.


QUERY PLAN
Nested Loop (cost=0.00..480995.69 rows=1539686 width=53) (actual time=257.263..1314.147 rows=1175 loops=1)
Join Filter: ((r.time_stamp >= f.window_start) AND (r.time_stamp <= f.window_end))
Rows Removed by Join Filter: 13856001
-> Seq Scan on flybys f (cost=0.00..1.29 rows=1 width=48) (actual time=18.254..18.259 rows=1 loops=1)
Filter: (id = 4)
Rows Removed by Filter: 22
-> Seq Scan on readings r (cost=0.00..273136.76 rows=13857176 width=21) (actual time=0.019..758.526 rows=13857176 loops=1)
Planning Time: 0.092 ms
JIT:
Functions: 8


Not seeing `index scan`...search through metadata on indexes

In [78]:
%%sql
SELECT * FROM pg_indexes
WHERE tablename = 'readings';

 * postgresql://postgres:***@localhost:5432/enceladus
2 rows affected.


schemaname,tablename,indexname,tablespace,indexdef
inms,readings,idx_ts,,CREATE INDEX idx_ts ON inms.readings USING btree (time_stamp) WHERE (altitude IS NOT NULL)
inms,readings,readings_pkey,,CREATE UNIQUE INDEX readings_pkey ON inms.readings USING btree (id)


#### When postgres chooses sequential over index

Apparently if the query reads over 5% of the table, it's still faster for sequential scan due to the IO cost of a random index scan. Thus the engine may still choose sequential scan. However when the query is looking for a small subset of the table, ~1%, indexed scan should be faster

In [91]:
%%sql
EXPLAIN ANALYZE SELECT
    f.name,
    r.mass_per_charge,
    r.time_stamp,
    r.altitude
FROM inms.readings r
INNER JOIN flybys f
ON r.time_stamp >= f.window_start
AND r.time_stamp <= f.window_end
WHERE f.id = 4


 * postgresql://postgres:***@localhost:5432/enceladus
13 rows affected.


QUERY PLAN
Nested Loop (cost=0.00..480995.69 rows=1539686 width=53) (actual time=232.840..1324.737 rows=1175 loops=1)
Join Filter: ((r.time_stamp >= f.window_start) AND (r.time_stamp <= f.window_end))
Rows Removed by Join Filter: 13856001
-> Seq Scan on flybys f (cost=0.00..1.29 rows=1 width=48) (actual time=10.823..10.826 rows=1 loops=1)
Filter: (id = 4)
Rows Removed by Filter: 22
-> Seq Scan on readings r (cost=0.00..273136.76 rows=13857176 width=21) (actual time=0.041..786.811 rows=13857176 loops=1)
Planning Time: 0.151 ms
JIT:
Functions: 8


## Ranges

Postgres supports numeric and datetime ranges. Instead of using `WHERE` and `<=/>=`, we can define a column, `analysis_window`:

In [92]:
%%sql
ALTER TABLE flybys
ADD analysis_window TSRANGE;

UPDATE flybys
SET analysis_window = tsrange(window_start, window_end, '[]'); 

 * postgresql://postgres:***@localhost:5432/enceladus
Done.
23 rows affected.


[]

The `'[]'` indicates inclusive bounds, vs `'()'` for exclusive bounds

In [93]:
%%sql
SELECT analysis_window FROM flybys LIMIT 5;

 * postgresql://postgres:***@localhost:5432/enceladus
5 rows affected.


analysis_window
"[2005-02-17 03:29:52.119000, 2005-02-17 03:30:32.119000]"
"[2005-03-09 09:07:43.098000, 2005-03-09 09:08:23.098000]"
"[2008-03-12 19:05:51.458000, 2008-03-12 19:06:31.458000]"
"[2008-10-09 19:06:19.605000, 2008-10-09 19:06:59.605000]"
"[2009-11-02 07:41:37.503000, 2009-11-02 07:42:17.503000]"


Use the range with special operators

In [94]:
%%sql
-- @> checks whether the given value is within the range
SELECT name FROM flybys
WHERE analysis_window @> '2005-02-17 03:30:12.119'::timestamp;

 * postgresql://postgres:***@localhost:5432/enceladus
1 rows affected.


name
E-0


Instead of `WHERE, <=, >=`, use `@>`:

In [99]:
%%sql
EXPLAIN ANALYZE
SELECT 
    f.name,
    r.mass_per_charge,
    r.time_stamp,
    r.altitude - f.nadir as distance
from inms.readings r
inner join flybys f 
on f.analysis_window @> r.time_stamp
where f.id = 4;

 * postgresql://postgres:***@localhost:5432/enceladus
13 rows affected.


QUERY PLAN
Nested Loop (cost=0.00..446388.68 rows=13857 width=77) (actual time=366.070..2119.223 rows=1175 loops=1)
Join Filter: (f.analysis_window @> r.time_stamp)
Rows Removed by Join Filter: 13856001
-> Seq Scan on flybys f (cost=0.00..2.58 rows=1 width=96) (actual time=8.104..8.110 rows=1 loops=1)
Filter: (id = 4)
Rows Removed by Filter: 22
-> Seq Scan on readings r (cost=0.00..273136.76 rows=13857176 width=21) (actual time=0.037..780.102 rows=13857176 loops=1)
Planning Time: 0.097 ms
JIT:
Functions: 8


### Index scan and BTREEs

`@>` cannot be used with BTREE index. Who keeps track of this stuff. Can we use range values on INMS data? If `ip_start` and `ip_end`, the before and after timestamps which bound the tiny 30 ms reading duration, had existed, we could add `tsrange` to inms.readings:

In [100]:
%%sql
ALTER TABLE inms.readings
ADD integration_period tsrange;

UPDATE inms.readings
SET integration_period = tsrange(ip_start, ip_end, '[]');

CREATE INDEX idx_ip_range ON inms.readings
USING GIST(integration_period);

 * postgresql://postgres:***@localhost:5432/enceladus
Done.
(psycopg2.errors.UndefinedColumn) column "ip_start" does not exist
LINE 2: SET integration_period = tsrange(ip_start, ip_end, '[]');
                                         ^

[SQL: UPDATE inms.readings
SET integration_period = tsrange(ip_start, ip_end, '[]');]
(Background on this error at: https://sqlalche.me/e/20/f405)


Ranges would then allow operations like

- overlap
- exclusion
- containment
- existence
- et al.

However we're using a range to constraint results, not querying the range directly

## Back to the speeds

Now that we know the velocities are relative to target, retry the speed calculation from INMS data using the `pythag3`'d relative_speed

In [105]:
%%sql
SELECT
    f.id,
    f.speed,
    avg(r.relative_speed)::numeric(9,1)
FROM flybys f
INNER JOIN inms.readings r
ON r.time_stamp BETWEEN f.window_start and f.window_end
GROUP BY f.speed, f.id
ORDER BY f.id;

 * postgresql://postgres:***@localhost:5432/enceladus
23 rows affected.


id,speed,avg
1,,6.6
2,,6.6
3,8.2,8.2
4,14.4,14.4
5,17.7,17.7
6,17.7,17.7
7,17.7,17.7
8,7.7,7.7
9,7.7,7.7
10,6.5,6.5


Seems like a good fit, which means we now also have credible data for the first two missing flybys

## Chem_data

With credible flyby speeds and timestamps, we can relate `chem_data` to see what molecules were detected when, via `inms.readings.mass_per_charge` and `chem_data.peak`

In [109]:
%%sql
SELECT
    f.name,
    r.time_stamp,
    r.altitude,
    c.name as chem
FROM inms.readings AS r
INNER JOIN flybys f
ON f.analysis_window @> r.time_stamp
INNER JOIN inms.chem_data c
ON r.mass_per_charge = c.peak
ORDER BY f.name, r.time_stamp
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/enceladus
5 rows affected.


name,time_stamp,altitude,chem
E-0,2005-02-17 03:30:09.873000,1273.32,Molecular Hydrogen
E-0,2005-02-17 03:30:09.873000,1273.32,Molecular Hydrogen
E-0,2005-02-17 03:30:09.873000,1273.32,Molecular Hydrogen
E-0,2005-02-17 03:30:09.873000,1273.32,Molecular Hydrogen
E-0,2005-02-17 03:30:09.941000,1273.28,Helium


In [114]:
%%sql
SELECT
    f.name,
    r.time_stamp,
    r.altitude,
    c.name as chem
FROM inms.readings AS r
INNER JOIN flybys f
ON f.analysis_window @> r.time_stamp
INNER JOIN inms.chem_data c
ON r.mass_per_charge = c.peak
WHERE f.id = 4
ORDER BY f.name, r.time_stamp

 * postgresql://postgres:***@localhost:5432/enceladus
2049 rows affected.


name,time_stamp,altitude,chem
E-3,2008-03-12 19:05:51.521000,169.46,Acetylene
E-3,2008-03-12 19:05:51.521000,169.46,Acetylene
E-3,2008-03-12 19:05:51.521000,169.46,Acetylene
E-3,2008-03-12 19:05:51.521000,169.46,Acetylene
E-3,2008-03-12 19:05:51.555000,169.12,Hydrogen cyanide
E-3,2008-03-12 19:05:51.589000,168.77,Ethane
E-3,2008-03-12 19:05:51.589000,168.77,Ethane
E-3,2008-03-12 19:05:51.589000,168.77,Ethane
E-3,2008-03-12 19:05:51.589000,168.77,Ethane
E-3,2008-03-12 19:05:51.589000,168.77,Carbon Monoxide


## High/low sensitivity

Relative density of each compound can be deducted from `high_counts` and `low_counts` in `inms.readings`

In [119]:
%%sql
SELECT
    c.name,
    sum(r.high_counts) as high_counts,
    sum(r.low_counts) as low_counts
FROM flybys f
INNER JOIN inms.readings r
ON r.time_stamp BETWEEN f.window_start AND f.window_end
INNER JOIN inms.chem_data c
ON c.peak = r.mass_per_charge
WHERE f.id = 4
GROUP BY c.name, f.speed
ORDER BY high_counts DESC;

 * postgresql://postgres:***@localhost:5432/enceladus
23 rows affected.


name,high_counts,low_counts
Molecular Hydrogen,5904,16
Carbon Monoxide,2972,6
Ethane,2348,4
Ethylene,2348,4
Molecular Nitrogen,2348,4
Methane,816,16
Carbon Dioxide,440,4
Acetylene,348,12
Water,297,2
Methane (isotope),268,8


As is, the results are misleading. We need to include `source` as well, which informs us on the analyses and how they were performed

From manifest:

Ion source used for this measurement

- 
Open Source Ion (os
- 
•
Closed Source Neutral(
- )
•
Open Source Neutral Beam 
- nb)
•
Open Souce Neutral Therma


Closed source - non-reactive neutrals e.g. CH4, N2

Open - positive-ions

E-3 show that H2 was found using closed source:l (osnt)

In [122]:
%%sql
SELECT
    c.name chem,
    r.source,
    sum(r.high_counts) as high_counts,
    sum(r.low_counts) as low_counts
FROM flybys f
INNER JOIN inms.readings r
ON r.time_stamp BETWEEN f.window_start AND f.window_end
INNER JOIN inms.chem_data c
ON c.peak = r.mass_per_charge
WHERE f.id = 4
GROUP BY c.name, r.source
ORDER BY high_counts DESC;

 * postgresql://postgres:***@localhost:5432/enceladus
42 rows affected.


chem,source,high_counts,low_counts
Molecular Hydrogen,csn,5892,8
Carbon Monoxide,csn,2944,5
Ethane,csn,2328,4
Molecular Nitrogen,csn,2328,4
Ethylene,csn,2328,4
Methane,csn,804,8
Carbon Dioxide,csn,436,0
Acetylene,csn,348,0
Water,csn,296,1
Methane (isotope),csn,252,0


From a recent paper, it has been posited that presence of H2 could be attributed to high-speed sampling of H2O-rich plume using the closed source method. E-21 flyby was used to try and detect H2 using the open source method.

So let's look at E-21

In [123]:
%%sql
SELECT
    c.name chem,
    r.source,
    sum(r.high_counts) as high_counts,
    sum(r.low_counts) as low_counts
FROM flybys f
INNER JOIN inms.readings r
ON r.time_stamp BETWEEN f.window_start AND f.window_end
INNER JOIN inms.chem_data c
ON c.peak = r.mass_per_charge
WHERE f.id = 22
AND r.source != 'csn'
GROUP BY c.name, r.source
ORDER BY high_counts DESC;

 * postgresql://postgres:***@localhost:5432/enceladus
8 rows affected.


chem,source,high_counts,low_counts
Water,osnb,34484,2
Molecular Hydrogen,osnb,3872,4
Carbon Dioxide,osnb,132,0
Propane,osnb,33,0
Carbon Monoxide,osnb,15,0
Molecular Nitrogen,osnb,12,0
Ethane,osnb,12,0
Ethylene,osnb,12,0


E-21 disproved fears that H2 was simply an artifact of closed source high-speed sampling. Plumes did indeed contain H2, a smoking gun indicator of active hydrothermal systems

What about other life necessities?

In [124]:
%%sql
SELECT
    f.name,
    c.name chem,
    r.source,
    sum(r.high_counts) as high_counts,
    sum(r.low_counts) as low_counts
FROM flybys f
INNER JOIN inms.readings r
ON r.time_stamp BETWEEN f.window_start AND f.window_end
INNER JOIN inms.chem_data c
ON c.peak = r.mass_per_charge
WHERE f.targeted = true
AND C.formula in ('H2', 'CH4', 'CO2', 'H2O')
GROUP BY f.id, f.name, c.name, r.source
ORDER BY f.id;


 * postgresql://postgres:***@localhost:5432/enceladus
46 rows affected.


name,chem,source,high_counts,low_counts
E-2,Molecular Hydrogen,csn,744,0
E-2,Water,csn,101,0
E-2,Methane,osi,4,0
E-2,Molecular Hydrogen,osi,4,4
E-2,Carbon Dioxide,osnb,0,0
E-2,Methane,osnb,0,0
E-2,Methane,csn,76,4
E-2,Water,osnb,1,0
E-2,Carbon Dioxide,csn,32,0
E-2,Carbon Dioxide,osi,0,0


Methanogensis is how deep water microbes convert H2 and CO2 to CH4 and water, forming the basis of deep sea ecosystems. All the ingredients and products are found on Enceladus