# Explanation

The HSC data is too large to store as one sqlite database file using github.  So instead, it needs to be fetched by the user, separately from cloning the repository. This notebook is a work-in-progress to help automate that process, and make sure that the final schema is correct.

## SQL Query
In the future I'll actually write a script that queries + downloads the correct data. For now, you'll need to do it manually. You need to run a number (~6) queries of the form:

    SELECT object_id, 
         ra, dec, 
         detect_is_patch_inner, detect_is_tract_inner,
         detect_is_primary,
         gcmodel_flux, gcmodel_flux_err,
         gcmodel_flux_flags,
         rcmodel_flux, rcmodel_flux_err,
         rcmodel_flux_flags,
         icmodel_flux, icmodel_flux_err,
         icmodel_flux_flags,
         zcmodel_flux, zcmodel_flux_err,
         zcmodel_flux_flags,
         ycmodel_flux, ycmodel_flux_err,
         ycmodel_flux_flags
            from pdr1_cosmos_widedepth_median.forced
    LIMIT X
    OFFSET Y
    
where you need to change `X` (in the `LIMIT` line) and `Y` (in the `OFFSET` line), so that you run a few queries on a subset of all the rows. (For examply, download 250,000 records at a time by using: `LIMIT 250000 OFFSET 0`, then `LIMIT 250000 OFFSET 250000`, and so on).  

This will give you a number of partial sqlite files, which you will download into `data/partial_hsc_tables`.  This notebook then will combine those databases into a single table, and create a new sqlite file.

## To do list:
- add code that will actually query + download the correct data, using the command line tool provided by the HSC data release website.

# Code

In [1]:
import glob
import pandas as pd

In [5]:
database_filenames = sorted(glob.glob("data/partial_hsc_tables/*.sqlite3"))

In [11]:
dfs = [pd.read_sql_table("table_1", "sqlite:///{}".format(database_filename),
                         index_col="object_id")
       for database_filename in database_filenames]

In [23]:
combined = pd.concat(dfs)
combined.head()

Unnamed: 0_level_0,ra,dec,detect_is_patch_inner,detect_is_tract_inner,detect_is_primary,gcmodel_flux,gcmodel_flux_err,gcmodel_flux_flags,rcmodel_flux,rcmodel_flux_err,rcmodel_flux_flags,icmodel_flux,icmodel_flux_err,icmodel_flux_flags,zcmodel_flux,zcmodel_flux_err,zcmodel_flux_flags,ycmodel_flux,ycmodel_flux_err,ycmodel_flux_flags
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
43158034708430849,150.897479,1.688999,False,True,False,1.022866e-30,3.236877e-31,False,,,,3.268063e-30,3.96473e-31,False,1.879463e-31,5.945371000000001e-31,False,7.561983e-30,2.600942e-30,False
43158034708430850,150.898466,1.68969,False,True,False,2.6514959999999998e-30,3.508947e-31,False,,,,,,True,-6.070069e-31,5.435255e-31,False,5.880465e-30,2.467269e-30,False
43158034708430851,150.899683,1.691651,False,True,False,-1.5785530000000001e-31,3.3639900000000004e-31,False,,,,,,True,1.345349e-29,7.116031e-31,False,6.750262e-31,2.719863e-30,False
43158034708430852,150.900281,1.691985,False,True,False,,,True,,,,,,True,,,True,,,True
43158034708430853,150.903878,1.697209,True,True,True,,,True,,,,,,True,,,True,,,True


In [17]:
combined.shape

(1263503, 20)

In [19]:
hsc_database_filename = "HSC_COSMOS_median_forced.sqlite3"
combined.to_sql("hsc", "sqlite:///{}".format(hsc_database_filename))

In [22]:
combined.keys()

Index(['ra', 'dec', 'detect_is_patch_inner', 'detect_is_tract_inner',
       'detect_is_primary', 'gcmodel_flux', 'gcmodel_flux_err',
       'gcmodel_flux_flags', 'rcmodel_flux', 'rcmodel_flux_err',
       'rcmodel_flux_flags', 'icmodel_flux', 'icmodel_flux_err',
       'icmodel_flux_flags', 'zcmodel_flux', 'zcmodel_flux_err',
       'zcmodel_flux_flags', 'ycmodel_flux', 'ycmodel_flux_err',
       'ycmodel_flux_flags'],
      dtype='object')