# Compare-1: distinct NPI count should be similar for consecutive timespans

Description: check if distinct NPI counts are similar for consecutive timespans.

Starting Author: Amy Jin (amy@careset.com)

Date: July 23rd, 2018

https://docs.google.com/spreadsheets/d/1IYg01IpssJaWHo6KxO4_dSDgXtYNFy41S5cIHFLvlGQ/edit#gid=604789549

## Connection to Parenthood Server

In [2]:
# Packages import
import os
import sys
import numpy as np
import pandas as pd
from collections import Counter
import operator
import mysql.connector
import sshtunnel
import pureyaml

# Handle path
project_dir = !pwd  # dir of current script/notebook file
config_file = open(project_dir[0] + "/db.yaml");
config = pureyaml.load(config_file.read());

# Argument dictionary for sshtunnel
ssh_config = {
    'ssh_address_or_host': ('parenthood.set.care', 22),
    'ssh_username':        config['ssh_username'],
    'ssh_password':        config['ssh_password'],
    'remote_bind_address': ('127.0.0.1', 3306),
    'local_bind_address':  ('0.0.0.0', 3333),
}

# Argument dictionary for mysql.connector
mysql_config = {
    'user':     config['mysql_user'],
    'password': config['mysql_passwd'],
    'host':     config['mysql_host'],
    'database': 'patch',
    'port':     3333,
}

# Connect to Parenthood server
with sshtunnel.SSHTunnelForwarder(**ssh_config) as tunnel:
    print('SSH tunneling successful on port: {}'.format(tunnel.local_bind_port))
    connection = mysql.connector.connect(**mysql_config)
    cur = connection.cursor()
    print('MySQL server connected successfully!')

SSH tunneling successful on port: 3333
MySQL server connected successfully!


## Test Function

In [3]:
# --------------------------------------- Inputs: ---------------------------------------
# 1) db_name:                database name in server
# 2）table_name:             table name
# 3) npi:                    npi column
# --------------------------------------- Outputs: --------------------------------------
# 1) Test result:  distinct NPI count.


def compare_1(db_name, table_name, npi):

    with sshtunnel.SSHTunnelForwarder(**ssh_config) as tunnel:
        connection = mysql.connector.connect(**mysql_config)
        cur = connection.cursor()
        
        # MySQL query to find distict NPI count
        query = ('''
                SELECT COUNT(DISTINCT {col1})
                FROM {db}.{t1};
        '''.format(db = db_name, t1 = table_name, col1 = npi))

        cur.execute(query)
        
        print ("The distinct NPI count in {}.{} is:".format(db_name, table_name) + '\n')        
        for row in cur.fetchall():
            for i in range(0,len(row)):
                print (str(row[i]))
            #print ('\n')
            
        cur.close()
        connection.close()            

## Test Example

In [4]:
compare_1('_amy', 'test_data_good', 'npi')

The distinct NPI count in _amy.test_data_good is:

2000


In [5]:
compare_1('_amy', 'test_data_bad1', 'npi')

The distinct NPI count in _amy.test_data_bad1 is:

981


In [6]:
compare_1('_amy', 'test_data_bad2', 'npi')

The distinct NPI count in _amy.test_data_bad2 is:

596


## Internal Data Example

In [7]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RQ17','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RQ17 is:

143358


In [6]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFS2010','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFS2010 is:

117473


In [7]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFS2011','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFS2011 is:

117012


In [8]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFS2012','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFS2012 is:

113900


In [9]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFS2013','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFS2013 is:

111848


In [10]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFQ2014','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFQ2014 is:

108650


In [11]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFQ2015','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFQ2015 is:

86509


In [12]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_RIFQ2016','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_RIFQ2016 is:

87948


In [13]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_16_17_6','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_16_17_6 is:

93558


In [14]:
compare_1('npi_inst_icdproc', 'npi_inst_icdproc_16_17_9','npi')

The distinct NPI count in npi_inst_icdproc.npi_inst_icdproc_16_17_9 is:

95043
