# Introduction to Python Package SQL_runner

# Agenda
1. What is SQL_runner
2. Why you should use SQL_runner
3. How to use SQL_runner
4. Resources

## PART1: WHAT IS SQL_runner
SQL_runner is a Python library providing PostgreSQL adaptors and QA tools. It supports for parsing, splitting, formatting SQL scripts as well as builidng QA tools for either customized or recurring projects.

## PART2: WHY YOU SHOULD USE SQL_runner

- Convenience: everything on the same page. (Greenplum, PostgreSQL, Excel, Manual Work, Comments, etc.)
- Markdown: easy to check what you are doing/ what you did, easy for project transition 
- Efficiency: One click to run recurring projects, no manual work needed
- QA tools: easy to build up customized QA pipeline/tools for different projects.
- Powerfulness: build a bridge between Ptyhon and Greenplum, which makes it possible to leverage the whole python world(time, calendar, pandas, numpy, matplotlib)

## PART3: HOW TO USE SQL_runner

### Step 0: Import SQL_runner package

In [9]:
import SQL_runner as sr
import pandas as pd

### Step 1: connect to GP database

In [10]:
db1 = srconnect_db('edw', 'wecheng')

edw is successfully connected


In [23]:
db2 = srconnect_db('ord', 'wecheng')

ord is successfully connected


In [4]:
db3 = sr connect_db('adw', 'wecheng')

adw is successfully connected


### Step 2: how to make replacement and run sql script

In [42]:
# Set up script path
folder = '\\\\csiadcai02\\Custom_Scripts\\wecheng_script_csiadcai02\\Demo\\'
demo_script_path = folder + 'Demo_sql.sql'
# Set up month_id
calendar_month = '201801'

In [43]:
srparse_replace_execute(sql = demo_script_path,
                         conn = db2,
                         replace_items={'@CALENDAR_MONTH': calendar_month,
                                        '@VERSION' : 'wenqi'},
                         print_script = False,
                         save_granted_public_table=False, 
                         save_granted_public_table_path='', 
                         output_last_query_string=False,
                         gzip_file= True
                        )

1. The path of running script: \\csiadcai02\Custom_Scripts\wecheng_script_csiadcai02\Demo\Demo_sql.sql

needed_replace: @VERSION
replace_to: wenqi

needed_replace: @CALENDAR_MONTH
replace_to: 201801

2. Input file: \\csor2gpl04\incoming\wecheng\Demo\demo_input_201801.txt
3. Create public table: public.demo_country_table_201801_wenqi
4. Table public.country_lookup is being used
5. Grant table: public.demo_country_table_201801_wenqi
6. Output_file: \\csor2gpl04\incoming\wecheng\Demo\demo_out_201801.txt
7. Table public.demo_country_table_201801_wenqi is being used

YOU ARE GOOD TO MOVE ON!


### Step 3: how to run a single line of sql

In [48]:
srrun_sql_query('''
select *
from public.demo_country_table_@CALENDAR_MONTH_@VERSION
limit 5
''',
                 conn = db2, 
                 replace_items={'@CALENDAR_MONTH': calendar_month,
                                      '@VERSION' : 'wenqi'})

Unnamed: 0,calendar_month,country_name,country_code
0,201801,Brazil,BR
1,201801,France,FR
2,201801,Italy,IT
3,201801,Spain,ES
4,201801,Canada,CA


### Step 4: how to loop through multiple months

In [65]:
# loop through 2 months(201801, 201802)
for month in range(201801,201803):
    srparse_replace_execute(sql = demo_script_path,
                            conn = db2,
                            replace_items={'@CALENDAR_MONTH': str(month),
                                          '@VERSION' : 'wenqi'})

1. The path of running script: \\csiadcai02\Custom_Scripts\wecheng_script_csiadcai02\Demo\Demo_sql.sql

needed_replace: @VERSION
replace_to: wenqi

needed_replace: @CALENDAR_MONTH
replace_to: 201801

2. Input file: \\csor2gpl04\incoming\wecheng\Demo\demo_input_201801.txt
3. Create public table: public.demo_country_table_201801_wenqi
4. Table public.country_lookup is being used
5. Grant table: public.demo_country_table_201801_wenqi
6. Output_file: \\csor2gpl04\incoming\wecheng\Demo\demo_out_201801.txt
7. Table public.demo_country_table_201801_wenqi is being used

YOU ARE GOOD TO MOVE ON!
1. The path of running script: \\csiadcai02\Custom_Scripts\wecheng_script_csiadcai02\Demo\Demo_sql.sql

needed_replace: @VERSION
replace_to: wenqi

needed_replace: @CALENDAR_MONTH
replace_to: 201802

2. Input file: \\csor2gpl04\incoming\wecheng\Demo\demo_input_201802.txt
3. Create public table: public.demo_country_table_201802_wenqi
4. Table public.country_lookup is being used
5. Grant table: public.dem

### Step 5: QA tools - compare with historical data

In [69]:
srcompare_gp_table(table1 = 'public.demo_country_table_201801_wenqi',
                   table2 = 'public.demo_country_table_201802_wenqi',
                   conn = db2,
                   columns = 'country_name, country_code',
                   table1_name = 'last_month',
                   table2_name = 'current_month').sort_values('country_name')

table1:last_month
table2:current_month

num_of_rows(last_month):9
num_of_rows(current_month):9


Unnamed: 0,country_name,country_code,lable
0,Brazil,BR,last_month
10,China,CN,current_month
2,United Kingdom,GB,last_month
9,United Kingdom,UK,current_month


## PART 4: Resources:  
- Python packaged used in the SQL_runner: sqlparse [click here](https://pypi.org/project/sqlparse/), psycopg2 [click here](http://initd.org/psycopg/), pandas, etc.
- sql scripts used in this demo [click here](https://github.com/wenqicheng/SQL_runner/blob/master/Demo_sql.sql)
- Wenqi's Linkedin [click here](https://www.linkedin.com/in/wenqicheng/)
- Wenqi's Github homepage [click here](https://github.com/wenqicheng)