# Make sure learners are in the student_download directory

# Notebook Intro

* code and markdown
* shift+enter
* can execute multiple timea nd out of order
* add new cells
* restart kernel

# Query Language:

Astronomical Data Query Language: SQL (Structured Query Language) dialect  
Programming language to communicate with a database

# Connecting to Gaia

In [3]:
import requests
import pyvo as vo

# Most significant differences:
* Each user must create an account and find their token
* getting at Table metadata is different and not as informative
* running a job and getting the table is a different command
* ADQL shapes must have a first entry of 'ICRS'
* Default "units" in table are definition and units, we have to manually fix (I've written a function)
* We wont have access to the PanSTARRS photometry table for the final exercise in the JOIN section.

Create account: https://gaia.aip.de/accounts/signup/
Verify email with confirmation email
Your Name --> API token

In [4]:
url="http://TAPVizieR.u-strasbg.fr/TAPVizieR/tap"
tap_session = requests.Session()
tap_service = vo.dal.TAPService(url, session=tap_session)

# Databases and Tables

Database: collection of one or more named tables with one or more named columns

### This is the basic structure of all SQL queries

In [6]:
table_query = "SELECT TOP 10 table_name FROM tap_schema.tables"

Select (not add or modify or delete)

In [7]:
table_result = tap_service.run_sync(table_query)
table_table = table_result.to_table()
table_table

table_name
object
J/A+A/622/A164/tableb4
J/MNRAS/406/460/spectra
J/AJ/135/10/table5
J/A+A/591/A129/tablea5
J/ApJS/230/28/obs
J/ApJ/822/81/table5
J/ApJS/194/25/table3
J/AJ/158/141/table5
J/MNRAS/405/1930/table5
J/A+A/598/A62/tablea1


In [8]:
type(table_table)

astropy.table.table.Table

Like at database except persistent (stored on disk)

In [5]:
description_query = "SELECT * FROM tap_schema.tables WHERE table_name='I/345/gaia2'"

In [6]:
description_result= tap_service.run_sync(description_query)
description_table = description_result.to_table()
description_table

schema_name,table_name,table_type,description,utype
object,object,object,object,object
large_tables,I/345/gaia2,table,GaiaSource DR2 data ( Gaia collaboration),


# Columns

In [7]:
column_query = "select column_name, description, unit from tap_schema.columns where table_name = 'I/345/gaia2'"

In [8]:
column_table = tap_service.run_sync(column_query)
column_table.to_table().show_in_notebook()

idx,column_name,description,unit
0,designation,Unique source designation (unique across all Data Releases) (Gaia DR2 NNNNNNNNNNNNNNNNNNN) (designation) (1),
1,ra,Barycentric right ascension (ICRS) at Ep=2015.5 (ra),deg
2,ra_error,Standard error of right ascension (e_RA*cosDE) (ra_error),mas
3,dec,Barycentric declination (ICRS) at Ep=2015.5 (dec),deg
4,dec_error,Standard error of declination (dec_error),mas
5,solution_id,Solution Identifier (solution_id) (G1),
6,source_id,Unique source identifier (unique within a particular Data Release) (source_id) (G2),
7,random_index,Random index used to select subsets (random_index) (2),
8,ref_epoch,[2015.5] Reference epoch (ref_epoch),yr
9,parallax,? Absolute stellar parallax (parallax),mas


In [9]:
print(len(column_table))

101


## Exercise: 
Choose a Vizier table of your choice from website and get description (search for your favorite catalog here https://vizier.cds.unistra.fr/viz-bin/VizieR)

In [10]:
# Solution
description_query_sternberg = "SELECT * FROM tap_schema.tables WHERE table_name='II/256/sn'"
description_result_sternberg= tap_service.run_sync(description_query_sternberg)
description_table_sternberg = description_result_sternberg.to_table()
description_table_sternberg

schema_name,table_name,table_type,description,utype
object,object,object,object,object
II_photometry,II/256/sn,table,"The Catalog ( Tsvetkov D.Yu., Pavlyuk N.N., Bartunov O.S.)",


In [11]:
query1 = """SELECT
TOP 10 
source_id, ra, dec, parallax
FROM \"I/345/gaia2\""""
print(query1)

SELECT
TOP 10 
source_id, ra, dec, parallax
FROM "I/345/gaia2"


In [12]:
job1 = tap_service.run_sync(query1)

In [13]:
job1.infos

{'QUERY_STATUS': 'OK',
 'PROVIDER': 'CDS',
 'QUERY': 'SELECT TOP 10  source_id, ra, dec, parallax FROM "I/345/gaia2"'}

In [14]:
job1.fielddescs

[<FIELD ID="source_id" datatype="long" name="source_id" ucd="meta.id;meta.main"/>,
 <FIELD ID="ra" datatype="double" name="ra" ref="coosys_gaiadr2" ucd="pos.eq.ra;meta.main" unit="deg"/>,
 <FIELD ID="dec" datatype="double" name="dec" ref="coosys_gaiadr2" ucd="pos.eq.dec;meta.main" unit="deg"/>,
 <FIELD ID="parallax" datatype="double" name="parallax" ucd="pos.parallax" unit="mas"/>]

In [15]:
results1 = job1.to_table()

In [17]:
results1

source_id,ra,dec,parallax
Unnamed: 0_level_1,deg,deg,mas
int64,float64,float64,float64
135471519149587456,44.27783663006,31.3806184472,0.0522
135471514853916416,44.27234588784,31.3847972548,-0.1367
135471549213655808,44.27692696596,31.3862181075,0.9698
135471583573391232,44.26480540106,31.38265980594,0.4449
135471617933137024,44.27834355418,31.39338662446,0.4264
135471622228946048,44.2796614331,31.39990398464,1.1557
135471652292864384,44.22971318907,31.37532060425,-0.1837
135471652292868480,44.23210249423,31.38210552696,0.0555
135471690948280192,44.24072358824,31.38214601813,0.8395
135471686652930944,44.24875057151,31.38817876092,1.4454


Notice these columns have units when appropriate

# Exercise
Read the documentation (https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html) of this table and choose a column that looks interesting to you. Add the column name to the query and run it again. What are the units of the column you selected? What is its data type?

#Solution
For example, we can add radial_velocity : Radial velocity (double, Velocity[km/s] ) - Spectroscopic radial velocity in the solar barycentric reference frame. The radial velocity provided is the median value of the radial velocity measurements at all epochs.  
query1_with_rv = """SELECT   
TOP 10  
source_id, ra, dec, parallax, radial_velocity  
FROM gaiadr2.gaia_source  
"""  

# Asynchronous queries

* sync limited to 2000 rows
* can use count to figure out how many rows will be returned
* async results are stored locally for a few days
* first use of WHERE: only download the rows you need

In [18]:
query2 = """SELECT
TOP 3000
source_id, ra, dec, pmra, pmdec, parallax
FROM \"I/345/gaia2\"
WHERE parallax <1"""

In [19]:
job2 = tap_service.run_async(query2)

In [20]:
job2.infos

{'QUERY_STATUS': 'OK',
 'PROVIDER': 'CDS',
 'QUERY': 'SELECT TOP 3000 source_id, ra, dec, pmra, pmdec, parallax FROM "I/345/gaia2" WHERE parallax <1'}

In [21]:
job2.fielddescs

[<FIELD ID="source_id" datatype="long" name="source_id" ucd="meta.id;meta.main"/>,
 <FIELD ID="ra" datatype="double" name="ra" ref="coosys_gaiadr2" ucd="pos.eq.ra;meta.main" unit="deg"/>,
 <FIELD ID="dec" datatype="double" name="dec" ref="coosys_gaiadr2" ucd="pos.eq.dec;meta.main" unit="deg"/>,
 <FIELD ID="pmra" datatype="double" name="pmra" ref="coosys_gaiadr2" ucd="pos.pm;pos.eq.ra" unit="mas / yr"/>,
 <FIELD ID="pmdec" datatype="double" name="pmdec" ref="coosys_gaiadr2" ucd="pos.pm;pos.eq.dec" unit="mas / yr"/>,
 <FIELD ID="parallax" datatype="double" name="parallax" ucd="pos.parallax" unit="mas"/>]

In [22]:
results2 = job2.to_table()
results2

source_id,ra,dec,pmra,pmdec,parallax
Unnamed: 0_level_1,deg,deg,mas / yr,mas / yr,mas
int64,float64,float64,float64,float64,float64
4090728411324689792,274.00370353498,-21.90304545436,-380.708,640.299,-1856.5756
4052499285375616384,273.22350731748,-27.04011955991,1814.623,132.266,-1786.9964
4059697925504813440,262.07178056495,-28.34443767611,476.243,350.167,-1706.6966
4089303169338901632,275.61123203432,-24.03645016946,647.939,-1389.615,-1621.1692
4049954706219787776,270.76007637661,-30.7254521201,1188.279,-1035.576,-1511.6861
4065007295292484736,272.68112514411,-25.21471802626,1074.65,-742.094,-1481.1277
...,...,...,...,...,...
2021623892004471168,294.97897383293,25.00535849305,16.679,5.743,-44.7506
405363140861201408,25.9716800283,48.75690176266,-39.99,-56.917,-44.7427
5865116393503408384,200.36931291244,-63.87568569469,18.941,70.6,-44.7365


### Exercise

The clauses in a query have to be in the right order. Go back and change the order of the clauses in `query2` and run it again. The modified query should fail, but notice that you don’t get much useful debugging information.

For this reason, developing and debugging ADQL queries can be really hard. A few suggestions that might help:
* Whenever possible, start with a working query, either an example you find online or a query you have used in the past.
* Make small changes and test each change before you continue.
* While you are debugging, use TOP to limit the number of rows in the result. That will make each test run faster, which reduces your development time.
* Launching test queries synchronously might make them start faster, too.

In [23]:
query2_erroneous = """SELECT 
TOP 3000
WHERE parallax < 1
source_id, ref_epoch, ra, dec, parallax
FROM \"I/345/gaia2\"
"""

In [24]:
job2_err = tap_service.run_async(query2_erroneous)



DALQueryError: Query Error

`WHERE`operators:
* \>, <, >=, <=, != or <>
* AND / OR
* NOT: invert comparison results


Read about SQL operators here (https://www.w3schools.com/sql/sql_operators.asp) and then modify the previous query to select rows where bp_rp is between -0.75 and 2.

In [11]:
#Solution
query2_sol1 = """SELECT 
TOP 10
source_id, ref_epoch, ra, dec, parallax
FROM gaiadr2.gaia_source
WHERE parallax < 1 
  AND bp_rp > -0.75 AND bp_rp < 2
"""

# OR 

query2_sol2 = """SELECT 
TOP 10
source_id, ref_epoch, ra, dec, parallax
FROM gaiadr2.gaia_source
WHERE parallax < 1 
  AND bp_rp BETWEEN -0.75 AND 2
"""

Stars with this color at GD-1's distance would be hard to detect because they are faint --> eliminate foreground stars

# Formatting queries

In [5]:
columns = 'source_id, ra, dec, pmra, pmdec, parallax'

In [6]:
query3_base = """SELECT
TOP 10
{columns}
FROM \"I/345/gaia2\"
WHERE parallax < 1
AND bp_rp*1 BETWEEN -0.75 AND 2
"""

In [7]:
query3 = query3_base.format(columns=columns)

In [8]:
query3

'SELECT\nTOP 10\nsource_id, ra, dec, pmra, pmdec, parallax\nFROM "I/345/gaia2"\nWHERE parallax < 1\nAND bp_rp*1 BETWEEN -0.75 AND 2\n'

In [9]:
print(query3)

SELECT
TOP 10
source_id, ra, dec, pmra, pmdec, parallax
FROM "I/345/gaia2"
WHERE parallax < 1
AND bp_rp*1 BETWEEN -0.75 AND 2



In [10]:
job3 = tap_service.run_sync(query3)

In [11]:
job3.infos

{'QUERY_STATUS': 'OK',
 'PROVIDER': 'CDS',
 'QUERY': 'SELECT TOP 10 source_id, ra, dec, pmra, pmdec, parallax FROM "I/345/gaia2" WHERE parallax < 1 AND bp_rp*1 BETWEEN -0.75 AND 2 '}

In [12]:
results3 = job3.to_table()

In [13]:
results3

source_id,ra,dec,pmra,pmdec,parallax
Unnamed: 0_level_1,deg,deg,mas / yr,mas / yr,mas
int64,float64,float64,float64,float64,float64
4065007295292484736,272.68112514411,-25.21471802626,1074.65,-742.094,-1481.1277
4089995896030234624,277.47709019321,-22.31726676663,176.148,-75.61,-1303.7501
4103265386511561472,278.9009174759,-14.98276694189,159.992,-126.24,-1212.0058
4064704104946409728,273.98353695245,-25.76445838489,863.318,-573.93,-1170.3647
4089352058964259328,275.37301547876,-24.03866349786,1650.896,1364.132,-1142.5851
234052082432242176,63.56813620225,46.75409674382,-486.709,349.81,-1051.0301
6727658835736018304,273.4136374329,-38.37560146276,-61.859,-65.886,-1041.7257
4050001298041477248,271.46203146129,-30.07578497878,-1698.109,-1425.887,-932.8031
4197974878501154560,288.5558728788,-12.58188071755,259.481,74.743,-915.859
4052429427794816000,273.47107023179,-27.37362948985,632.087,-426.837,-854.2021


### Exercise
This query always selects sources with parallax less than 1. But suppose you want to take that upper bound as an input.
Modify `query3_base` to replace `1` with a format specifier like `{max_parallax}`. Now, when you call `format`, add a keyword argument that assigns a value to max_parallax, and confirm that the format specifier gets replaced with the value you provide.

In [None]:
#Solution
query_base_sol = """SELECT 
TOP 10
{columns}
FROM gaiadr2.gaia_source
WHERE parallax < {max_parallax} AND 
bp_rp*1 BETWEEN -0.75 AND 2
"""

query_sol = query_base_sol.format(columns=columns,
                          max_parallax=0.5)
print(query_sol)