# Store Data in SQL

In [49]:
import pandas as pd
import numpy as np
import sys

from os import environ
from sqlalchemy import create_engine


sys.path.append("..")

from util.parsing import get_variables

## Find the CSV Files and Variables of Interest

### How to Specify CSVs and Variables


The ```.csv``` files, their variables of interest, and the new names for these variables are specified in [```data/processing/variables_to_extract.txt```](../data/processing/variables_to_extract.txt). See the header of this file for the format.

### Read in Data

We grab the data from our csv files stored in [```data/csv/```](../data/csv/) and store it in our PostgreSQL database.

We construct a dictionary where the keys are the names of the ```.csv``` files and the values are dicts of the variables of interest. These dicts contain the PSID code for the variable as key and the user-renamed variable as value.

In [55]:
table_variable_dict = get_variables("../data/processing/variables_to_extract.txt")
df_dict = {}

for file, info in table_variable_dict .items():
    sql_name = info["sql_name"]
    variables = info["variables"]
    tmp = pd.read_csv("../data/csv/" + file, usecols=variables.keys()).rename(variables, axis=1)
    df_dict[sql_name] = tmp

### SQL Table Names

In [57]:
list(df_dict.keys())

['fam01',
 'ind17',
 'child02',
 'pcg02',
 'wlth01',
 'assess',
 'demog',
 'ta17',
 'cah']

## Write with SQLAlchemy and Pandas

Connect to PostgreSQL using SQLAlchemy. The user must specify the uri of an existing PostgreSQL database. The current URI is hardcoded in and must be changed to work for other users.

In [58]:
uri = "postgres+psycopg2://zhou@localhost:5432/psid"
engine = create_engine(uri, echo=False)

Write each dataframe into our SQL database.

In [53]:
for sql_name, df in df_dict.items():
    df.to_sql(sql_name, con=engine, if_exists="replace")