# POC merge ownership export values with quality_vat_export_2003_2010

# Objective(s)

In the context of a poc,  we previously created ownership export values. Now, we need to merge the tables with quality_vat_export_2003_2010 to make sure the query is correct. Need to be careful with the duplicates

# Metadata

* Epic: Epic 1
* US: US 3
* Date Begin: 10/3/2020
* Duration Task: 0
* Description: Merge export values with quality_vat_export_2003_2010 to see if the query generates duplicates
* Step type:  
* Status: Active
* Source URL: US 03 Export share
* Task type: Jupyter Notebook
* Users: Thomas Pernet
* Watchers: Thomas Pernet
* User Account: https://468786073381.signin.aws.amazon.com/console
* Estimated Log points: 8
* Task tag: #lookup-table,#athena,#sql,#data-preparation
* Toggl Tag: #data-preparation
* Meetings:  
* Email Information:  
  * thread: Number of threads: 0(Default 0, to avoid display email)
  *  

# Input Cloud Storage [AWS/GCP]

## Table/file

* Origin: 
* Athena
* Name: 
* lag_foreign_export_ckjr
* lag_foreign_export_ckr
* lag_soe_export_ckjr
* lag_soe_export_ckr
* Github: 
  * https://github.com/thomaspernet/VAT_rebate_quality_china/blob/master/01_data_preprocessing/02_prepare_tables_model/00_POC_prepare_tables_model/00_export_share_foreign_SOE.md

# Destination Output/Delivery

## Table/file

* Origin: 
* Athena
* Name:
* quality_vat_export_covariate_2003_2010
* GitHub:
* https://github.com/thomaspernet/VAT_rebate_quality_china/blob/master/01_data_preprocessing/02_prepare_tables_model/00_POC_prepare_tables_model/01_merge_export_share_foreign_SOE_quality.md

In [1]:
from awsPy.aws_authorization import aws_connector
from awsPy.aws_s3 import service_s3
from awsPy.aws_glue import service_glue
from pathlib import Path
import pandas as pd
import numpy as np
import seaborn as sns
import os, shutil, json

path = os.getcwd()
parent_path = str(Path(path).parent.parent.parent)


name_credential = 'thomas_vat_credentials.csv'
region = 'eu-west-3'
bucket = 'chinese-data'
path_cred = "{0}/creds/{1}".format(parent_path, name_credential)

In [2]:
con = aws_connector.aws_instantiate(credential = path_cred,
                                       region = region)
client= con.client_boto()
s3 = service_s3.connect_S3(client = client,
                      bucket = bucket, verbose = True) 
glue = service_glue.connect_glue(client = client) 

In [3]:
pandas_setting = True
if pandas_setting:
    cm = sns.light_palette("green", as_cmap=True)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_colwidth', None)

# Prepare query POC

This notebook is in a POC stage, which means, you will write your queries and tests if it works. Once you are satisfied by the jobs, move the queries to the ETL. Prepare a pseudo JSON file to spare time during the US linked to the ETL. Use the following format to validate the queries:

## Templare prepare table

- To create a new table using existing table (i.e Athena tables), copy the template below and paste it inside the list `TABLES.PREPARATION.ALL_SCHEMA`
    - The list `ALL_SCHEMA` accepts one or more steps. Each steps, `STEPS_X` can be a sequence of queries execution. 

```
"PREPARATION":{
   "ALL_SCHEMA":[
      {
         "STEPS_0":{
            "name":"",
            "execution":[
               {
                  "database":"",
                  "name":"",
                  "output_id":"",
                  "query":{
                     "top":"",
                     "middle":"",
                     "bottom":""
                  }
               }
            ],
            "schema":[
               {
                  "Name":"",
                  "Type":"",
                  "Comment":""
               }
            ]
         }
      }
   ],
   "template":{
      "top":"CREATE TABLE {}.{} WITH (format = 'PARQUET') AS "
   }
}
``` 

To add a step, use this template inside `TABLES.PREPARATION.ALL_SCHEMA`

```
{
   "STEPS_X":{
      "name":"",
      "execution":[
         {
            "database":"",
            "name":"",
            "output_id":"",
            "query":{
               "top":"",
               "middle":"",
               "bottom":""
            }
         }
      ],
      "schema":[
               {
                  "Name":"",
                  "Type":"",
                  "Comment":""
               }
            ]
   }
}
```

To add a query execution with a within, use the following template inside the list `STEPS_X.execution`

```
{
   "database":"",
   "name":"",
   "output_id":"",
   "query":{
      "top":"",
      "middle":"",
      "bottom":""
   }
}
``` 



Each step name should follow this format `STEPS_0`, `STEPS_1`, `STEPS_2`, etc

## Templare add comments to Glue

The AWS Glue Data Catalog contains references to data that is used as sources and targets of your extract, transform, and load (ETL) jobs in AWS Glue. To create your data warehouse or data lake, you must catalog this data. The AWS Glue Data Catalog is an index to the location, schema, and runtime metrics of your data. You use the information in the Data Catalog to create and monitor your ETL jobs. Information in the Data Catalog is stored as metadata tables, where each table specifies a single data store.

We make use of the `boto3` API to add comments in the metastore. 

- To alter the metadata (only comments), copy the template below and paste inside the list `PREPARATION.STEPS_X.schema`. 

```
[
   {
      "Name":"",
      "Type":"",
      "Comment":""
   }
]
```

The schema is related to a table, and will be modified by Glue API. **Only** variables inside the list will be modified, the remaining variables will keep default's value.


# Prepare parameters file

There are three steps to prepara the parameter file:

1. Prepare `GLOBAL` parameters
2. Prepare `TABLES.CREATION`:
    - Usually a notebook in the folder `01_prepare_tables` 
3. Prepare `TABLES.PREPARATION`
    - Usually a notebook in the folder `02_prepare_tables_model` 
    
The parameter file is named `parameters_ETL.json` and will be moved each time in the root folder `01_data_preprocessing` for versioning. When the parameter file is finished, we will use it in the deployment process to run the entire process
 

# Steps

Merge all tables with `quality_vat_export_2003_2010`

- Matche variables for city, regime, product destination:
    - year
    - regime
    - geocode4_corr
    - hs6
    - iso_alpha
    
- Matche variables for city, regime, product:
    - year
    - regime
    - geocode4_corr
    - hs6

the new table name is `quality_vat_export_covariate_2003_2010`

In [4]:
s3.download_file(key = 'DATA/ETL/parameters_ETL.json')
with open('parameters_ETL.json', 'r') as fp:
    parameters = json.load(fp)
db = parameters['GLOBAL']['DATABASE']
s3_output = parameters['GLOBAL']['QUERIES_OUTPUT']

In [5]:
from datetime import date
today = date.today().strftime('%Y%M%d')

# Table `quality_vat_export_covariate_2003_2010`

- Table name: `quality_vat_export_covariate_2003_2010`

In [16]:
query = """
DROP TABLE quality_vat_export_covariate_2003_2010
"""
s3.run_query(
                    query=query,
                    database=db,
                    s3_output=s3_output,
                )

{'Results': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2020, 10, 4, 14, 50, 15, 753000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2020, 10, 4, 14, 50, 16, 327000, tzinfo=tzlocal())},
 'QueryID': '36254277-c1a2-4a11-974d-be953318492e'}

In [17]:
query = """
CREATE TABLE chinese_trade.quality_vat_export_covariate_2003_2010
WITH (
  format='PARQUET'
) AS
WITH merge_cov AS (
SELECT 
  quality_vat_export_2003_2010.cityen, 
  quality_vat_export_2003_2010.geocode4_corr, 
  quality_vat_export_2003_2010.year, 
  quality_vat_export_2003_2010.regime, 
  quality_vat_export_2003_2010.hs6, 
  hs4, 
  hs3, 
  quality_vat_export_2003_2010.country_en, 
  quality_vat_export_2003_2010.iso_alpha,
  gni_per_capita,
  gpd_per_capita,
  income_group,
  quantity, 
  value, 
  unit_price, 
  kandhelwal_quality, 
  price_adjusted_quality, 
  lag_tax_rebate, 
  ln_lag_tax_rebate, 
  lag_import_tax, 
  ln_lag_import_tax, 
  sigma, 
  sigma_price, 
  y, 
  prediction, 
  residual, 
  FE_ck, 
  FE_cst, 
  FE_ckr, 
  FE_csrt, 
  FE_kt, 
  FE_pj, 
  FE_jt, 
  FE_ct, 
  CASE WHEN lag_foreign_export_share_ckr IS NULL THEN 0 ELSE lag_foreign_export_share_ckr END AS lag_foreign_export_share_ckr,
  
  CASE WHEN lag_soe_export_share_ckr IS NULL THEN 0 ELSE lag_soe_export_share_ckr END AS lag_soe_export_share_ckr,
  
CASE WHEN lag_foreign_export_share_ckjr IS NULL THEN 0 ELSE lag_foreign_export_share_ckjr END AS lag_foreign_export_share_ckjr,
  
CASE WHEN lag_soe_export_share_ckjr IS NULL THEN 0 ELSE lag_soe_export_share_ckjr END AS lag_soe_export_share_ckjr

  FROM quality_vat_export_2003_2010 
  
  LEFT JOIN chinese_trade.lag_foreign_export_ckr
ON quality_vat_export_2003_2010.geocode4_corr = lag_foreign_export_ckr.geocode4_corr AND
quality_vat_export_2003_2010.year = lag_foreign_export_ckr.year AND
quality_vat_export_2003_2010.hs6 = lag_foreign_export_ckr.hs6 AND
quality_vat_export_2003_2010.regime = lag_foreign_export_ckr.regime


LEFT JOIN chinese_trade.lag_soe_export_ckr
ON quality_vat_export_2003_2010.geocode4_corr = lag_soe_export_ckr.geocode4_corr AND
quality_vat_export_2003_2010.year = lag_soe_export_ckr.year AND
quality_vat_export_2003_2010.hs6 = lag_soe_export_ckr.hs6 AND
quality_vat_export_2003_2010.regime = lag_soe_export_ckr.regime

LEFT JOIN chinese_trade.lag_foreign_export_ckjr
ON quality_vat_export_2003_2010.geocode4_corr = lag_foreign_export_ckjr.geocode4_corr AND
quality_vat_export_2003_2010.year = lag_foreign_export_ckjr.year AND
quality_vat_export_2003_2010.hs6 = lag_foreign_export_ckjr.hs6 AND
quality_vat_export_2003_2010.regime = lag_foreign_export_ckjr.regime AND
quality_vat_export_2003_2010.iso_alpha = lag_foreign_export_ckjr.iso_alpha

LEFT JOIN chinese_trade.lag_soe_export_ckjr
ON quality_vat_export_2003_2010.geocode4_corr = lag_soe_export_ckjr.geocode4_corr AND
quality_vat_export_2003_2010.year = lag_soe_export_ckjr.year AND
quality_vat_export_2003_2010.hs6 = lag_soe_export_ckjr.hs6 AND
quality_vat_export_2003_2010.regime = lag_soe_export_ckjr.regime AND
quality_vat_export_2003_2010.iso_alpha = lag_soe_export_ckjr.iso_alpha

LEFT JOIN world_bank.world_gdp_per_capita
ON quality_vat_export_2003_2010.iso_alpha = world_gdp_per_capita.iso_alpha03 AND 
quality_vat_export_2003_2010.year = world_gdp_per_capita.year 

WHERE quantity IS NOT NULL
  
  ) 
  
  SELECT 
  
  merge_cov.cityen, 
  merge_cov.geocode4_corr, 
  merge_cov.year, 
  merge_cov.regime, 
  merge_cov.hs6, 
  hs4, 
  hs3, 
  country_en, 
  merge_cov.iso_alpha, 
  gni_per_capita,
  gpd_per_capita,
  income_group,
  quantity, 
  value, 
  unit_price, 
  kandhelwal_quality, 
  price_adjusted_quality, 
  lag_tax_rebate, 
  ln_lag_tax_rebate, 
  lag_import_tax, 
  ln_lag_import_tax, 
  lag_soe_export_share_ckr,
  lag_foreign_export_share_ckr,
  lag_soe_export_share_ckjr,
  lag_foreign_export_share_ckjr,
  sigma, 
  sigma_price, 
  y, 
  prediction, 
  residual, 
  FE_ck, 
  FE_cst, 
  FE_ckr, 
  FE_csrt, 
  FE_kt, 
  FE_pj, 
  FE_jt, 
  FE_ct
  FROM merge_cov
  INNER JOIN (
    SELECT 
    year, regime, geocode4_corr, iso_alpha, hs6
    FROM merge_cov
  GROUP BY year, regime, geocode4_corr, iso_alpha, hs6
  HAVING COUNT(*) = 1
    ) as no_duplicate ON
    merge_cov.year = no_duplicate.year AND
    merge_cov.regime = no_duplicate.regime AND
    merge_cov.geocode4_corr = no_duplicate.geocode4_corr AND
    merge_cov.iso_alpha = no_duplicate.iso_alpha AND
    merge_cov.hs6 = no_duplicate.hs6


"""
s3.run_query(
                    query=query,
                    database=db,
                    s3_output=s3_output,
                )

{'Results': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2020, 10, 4, 14, 50, 17, 595000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2020, 10, 4, 14, 50, 40, 285000, tzinfo=tzlocal())},
 'QueryID': 'd692df6e-c929-4aab-87b0-ad10aa3f6112'}

Count nb of observation `quality_vat_export_2003_2010`

In [9]:
query = """
SELECT COUNT(*) as CNT
FROM "chinese_trade"."quality_vat_export_2003_2010" 
"""
s3.run_query(
                    query=query,
                    database=db,
                    s3_output=s3_output,
    filename="count", 
                )

Unnamed: 0,CNT
0,11669096


Count nb of observation `quality_vat_export_covariate_2003_2010`. Should have less observations because we removed the duplicates

In [18]:
query = """
SELECT COUNT(*) as CNT
FROM "chinese_trade"."quality_vat_export_covariate_2003_2010" 
"""
s3.run_query(
                    query=query,
                    database=db,
                    s3_output=s3_output,
    filename="count", 
                )

Unnamed: 0,CNT
0,5832945


Check duplicates

In [19]:
query = """
SELECT CNT, COUNT(*) AS CNT_DUPLICATE
FROM (
SELECT year, regime, geocode4_corr, iso_alpha, hs6, COUNT(*) as CNT
FROM "chinese_trade"."quality_vat_export_covariate_2003_2010" 
GROUP BY 
year, regime, geocode4_corr, iso_alpha, hs6
  )
  GROUP BY CNT
  ORDER BY CNT_DUPLICATE
"""
s3.run_query(
                    query=query,
                    database=db,
                    s3_output=s3_output,
    filename="duplicates", 
                )

Unnamed: 0,CNT,CNT_DUPLICATE
0,1,5832945


# Analytics

In this part, we are providing basic summary statistic. Since we have created the tables, we can parse the schema in Glue and use our json file to automatically generates the analysis.

The cells below execute the job in the key `ANALYSIS`. You need to change the `primary_key` and `secondary_key` 

## Table `XX`

In [20]:
table = 'quality_vat_export_covariate_2003_2010'
schema = glue.get_table_information(
    database = db,
    table = table
)['Table']
schema

{'Name': 'quality_vat_export_covariate_2003_2010',
 'DatabaseName': 'chinese_trade',
 'Owner': '468786073381',
 'CreateTime': datetime.datetime(2020, 10, 4, 14, 50, 40, tzinfo=tzlocal()),
 'UpdateTime': datetime.datetime(2020, 10, 4, 14, 50, 40, tzinfo=tzlocal()),
 'LastAccessTime': datetime.datetime(1970, 1, 1, 1, 0, tzinfo=tzlocal()),
 'Retention': 0,
 'StorageDescriptor': {'Columns': [{'Name': 'cityen',
    'Type': 'string',
    'Comment': ''},
   {'Name': 'geocode4_corr', 'Type': 'string', 'Comment': ''},
   {'Name': 'year', 'Type': 'string', 'Comment': ''},
   {'Name': 'regime', 'Type': 'string', 'Comment': ''},
   {'Name': 'hs6', 'Type': 'string', 'Comment': ''},
   {'Name': 'hs4', 'Type': 'string', 'Comment': ''},
   {'Name': 'hs3', 'Type': 'string', 'Comment': ''},
   {'Name': 'country_en', 'Type': 'string', 'Comment': ''},
   {'Name': 'iso_alpha', 'Type': 'string', 'Comment': ''},
   {'Name': 'gni_per_capita', 'Type': 'float', 'Comment': ''},
   {'Name': 'gpd_per_capita', 'Typ

## Count missing values

In [13]:
from datetime import date
today = date.today().strftime('%Y%M%d')

In [21]:
table_top = parameters["ANALYSIS"]["COUNT_MISSING"]["top"]
table_middle = ""
table_bottom = parameters["ANALYSIS"]["COUNT_MISSING"]["bottom"].format(
    db, table
)

for key, value in enumerate(schema["StorageDescriptor"]["Columns"]):
    if key == len(schema["StorageDescriptor"]["Columns"]) - 1:

        table_middle += "{} ".format(
            parameters["ANALYSIS"]["COUNT_MISSING"]["middle"].format(value["Name"])
        )
    else:
        table_middle += "{} ,".format(
            parameters["ANALYSIS"]["COUNT_MISSING"]["middle"].format(value["Name"])
        )
query = table_top + table_middle + table_bottom
output = s3.run_query(
    query=query,
    database=db,
    s3_output=s3_output,
    filename="count_missing",  ## Add filename to print dataframe
    destination_key=None,  ### Add destination key if need to copy output
)
display(
    output.T.rename(columns={0: "total_missing"})
    .assign(total_missing_pct=lambda x: x["total_missing"] / x.iloc[0, 0])
    .sort_values(by=["total_missing"], ascending=False)
    .style.format("{0:,.2%}", subset=["total_missing_pct"])
    .bar(subset="total_missing_pct", color=["#d65f5f"])
)

Unnamed: 0,total_missing,total_missing_pct
nb_obs,5832945,100.00%
gpd_per_capita,630881,10.82%
income_group,630881,10.82%
gni_per_capita,630881,10.82%
fe_kt,0,0.00%
prediction,0,0.00%
lag_foreign_export_share_ckr,0,0.00%
lag_soe_export_share_ckjr,0,0.00%
lag_foreign_export_share_ckjr,0,0.00%
sigma,0,0.00%


# Brief description table

In this part, we provide a brief summary statistic from the lattest jobs. For the continuous analysis with a primary/secondary key, please add the relevant variables you want to know the count and distribution

## Categorical Description

During the categorical analysis, we wil count the number of observations for a given group and for a pair.

### Count obs by group

- Index: primary group
- nb_obs: Number of observations per primary group value
- percentage: Percentage of observation per primary group value over the total number of observations

Returns the top 10 only

In [22]:
for field in schema["StorageDescriptor"]["Columns"]:
    if field["Type"] in ["string", "object", "varchar(12)"]:

        print("Nb of obs for {}".format(field["Name"]))

        query = parameters["ANALYSIS"]["CATEGORICAL"]["PAIR"].format(
            db, table, field["Name"]
        )
        output = s3.run_query(
            query=query,
            database=db,
            s3_output=s3_output,
            filename="count_categorical_{}".format(
                field["Name"]
            ),  ## Add filename to print dataframe
            destination_key=None,  ### Add destination key if need to copy output
        )

        ### Print top 10

        display(
            (
                output.set_index([field["Name"]])
                .assign(percentage=lambda x: x["nb_obs"] / x["nb_obs"].sum())
                .sort_values("percentage", ascending=False)
                .head(10)
                .style.format("{0:.2%}", subset=["percentage"])
                .bar(subset=["percentage"], color="#d65f5f")
            )
        )

Nb of obs for cityen


Unnamed: 0_level_0,nb_obs,percentage
cityen,Unnamed: 1_level_1,Unnamed: 2_level_1
Shenzhen,548330,9.40%
Shanghai,534609,9.17%
Dongguan,337831,5.79%
Canton,253336,4.34%
Ningbo,228647,3.92%
Beijing,192764,3.30%
Hangzhou,180731,3.10%
Qingdao,167449,2.87%
Tianjin,157711,2.70%
Nanjing,152037,2.61%


Nb of obs for geocode4_corr


Unnamed: 0_level_0,nb_obs,percentage
geocode4_corr,Unnamed: 1_level_1,Unnamed: 2_level_1
4403,548330,9.40%
3101,534609,9.17%
4419,337831,5.79%
4401,253336,4.34%
3302,228647,3.92%
1101,192764,3.30%
3301,180731,3.10%
3702,167449,2.87%
1201,157711,2.70%
3201,152037,2.61%


Nb of obs for year


Unnamed: 0_level_0,nb_obs,percentage
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2007,998575,17.12%
2008,833268,14.29%
2010,828552,14.20%
2009,805702,13.81%
2005,705563,12.10%
2006,687578,11.79%
2004,546670,9.37%
2003,427037,7.32%


Nb of obs for regime


Unnamed: 0_level_0,nb_obs,percentage
regime,Unnamed: 1_level_1,Unnamed: 2_level_1
ELIGIBLE,4921987,84.38%
NOT_ELIGIBLE,910958,15.62%


Nb of obs for hs6


Unnamed: 0_level_0,nb_obs,percentage
hs6,Unnamed: 1_level_1,Unnamed: 2_level_1
392690,30109,0.52%
420212,21975,0.38%
732690,20877,0.36%
848180,17082,0.29%
392640,17044,0.29%
850440,16740,0.29%
870899,16227,0.28%
640299,15728,0.27%
940540,15555,0.27%
940320,15434,0.26%


Nb of obs for hs4


Unnamed: 0_level_0,nb_obs,percentage
hs4,Unnamed: 1_level_1,Unnamed: 2_level_1
6204,83181,1.43%
4202,81256,1.39%
6104,75417,1.29%
9403,72185,1.24%
3926,71149,1.22%
9401,63886,1.10%
9405,62449,1.07%
8708,62207,1.07%
8516,49249,0.84%
3923,47982,0.82%


Nb of obs for hs3


Unnamed: 0_level_0,nb_obs,percentage
hs3,Unnamed: 1_level_1,Unnamed: 2_level_1
620,245618,4.21%
940,238928,4.10%
392,207439,3.56%
610,204309,3.50%
851,158753,2.72%
841,157273,2.70%
630,138033,2.37%
850,134056,2.30%
820,122935,2.11%
853,122623,2.10%


Nb of obs for country_en


Unnamed: 0_level_0,nb_obs,percentage
country_en,Unnamed: 1_level_1,Unnamed: 2_level_1
United States,296690,5.09%
Hong Kong,242113,4.15%
Japan,201991,3.46%
Korea,186759,3.20%
Germany,173057,2.97%
Australia,150018,2.57%
United Kingdom,148178,2.54%
Italy,140280,2.40%
Canada,133672,2.29%
Spain,117725,2.02%


Nb of obs for iso_alpha


Unnamed: 0_level_0,nb_obs,percentage
iso_alpha,Unnamed: 1_level_1,Unnamed: 2_level_1
USA,296690,5.09%
HKG,242113,4.15%
JPN,201991,3.46%
DEU,173057,2.97%
KOR,156487,2.68%
AUS,150018,2.57%
GBR,148178,2.54%
ITA,140280,2.40%
CAN,133672,2.29%
ESP,117725,2.02%


Nb of obs for income_group


Unnamed: 0_level_0,nb_obs,percentage
income_group,Unnamed: 1_level_1,Unnamed: 2_level_1
High income: OECD,2548802,43.70%
Upper middle income,1130519,19.38%
Lower middle income,893786,15.32%
,630881,10.82%
High income: nonOECD,504768,8.65%
Low income,124189,2.13%


Nb of obs for fe_ck


Unnamed: 0_level_0,nb_obs,percentage
fe_ck,Unnamed: 1_level_1,Unnamed: 2_level_1
38210,1577,0.03%
88136,1429,0.02%
242446,1365,0.02%
35102,1355,0.02%
39351,1327,0.02%
85703,1266,0.02%
13217,1175,0.02%
132035,1174,0.02%
38209,1146,0.02%
241592,1106,0.02%


Nb of obs for fe_cst


Unnamed: 0_level_0,nb_obs,percentage
fe_cst,Unnamed: 1_level_1,Unnamed: 2_level_1
319694,1346,0.02%
363529,1305,0.02%
136210,1250,0.02%
206440,1202,0.02%
110334,1185,0.02%
136192,1185,0.02%
312035,1165,0.02%
132238,1162,0.02%
145381,1136,0.02%
318556,1108,0.02%


Nb of obs for fe_ckr


Unnamed: 0_level_0,nb_obs,percentage
fe_ckr,Unnamed: 1_level_1,Unnamed: 2_level_1
53204,1013,0.02%
48471,989,0.02%
118110,977,0.02%
321366,912,0.02%
400193,867,0.01%
76619,812,0.01%
180779,807,0.01%
177696,807,0.01%
320046,806,0.01%
398454,800,0.01%


Nb of obs for fe_csrt


Unnamed: 0_level_0,nb_obs,percentage
fe_csrt,Unnamed: 1_level_1,Unnamed: 2_level_1
448140,914,0.02%
87779,848,0.01%
87781,807,0.01%
87791,806,0.01%
448144,793,0.01%
33905,792,0.01%
541759,775,0.01%
31145,765,0.01%
172565,759,0.01%
150720,739,0.01%


Nb of obs for fe_kt


Unnamed: 0_level_0,nb_obs,percentage
fe_kt,Unnamed: 1_level_1,Unnamed: 2_level_1
3581,5016,0.09%
3584,4438,0.08%
3583,4284,0.07%
25377,4025,0.07%
3582,3992,0.07%
12061,3483,0.06%
3579,3466,0.06%
3580,3457,0.06%
8615,3365,0.06%
3573,3228,0.06%


Nb of obs for fe_pj


Unnamed: 0_level_0,nb_obs,percentage
fe_pj,Unnamed: 1_level_1,Unnamed: 2_level_1
46049,921,0.02%
148320,831,0.01%
268336,723,0.01%
46011,693,0.01%
313917,689,0.01%
136393,688,0.01%
46048,688,0.01%
46075,676,0.01%
131266,675,0.01%
46084,667,0.01%


Nb of obs for fe_jt


Unnamed: 0_level_0,nb_obs,percentage
fe_jt,Unnamed: 1_level_1,Unnamed: 2_level_1
52,49628,0.85%
110,44350,0.76%
139,40081,0.69%
87,38538,0.66%
174,38414,0.66%
32,37902,0.65%
92,36690,0.63%
141,32377,0.56%
53,32291,0.55%
164,31376,0.54%


Nb of obs for fe_ct


Unnamed: 0_level_0,nb_obs,percentage
fe_ct,Unnamed: 1_level_1,Unnamed: 2_level_1
51,49628,0.85%
109,44350,0.76%
138,40081,0.69%
86,38538,0.66%
173,38414,0.66%
32,37902,0.65%
91,36690,0.63%
140,32377,0.56%
52,32291,0.55%
163,31376,0.54%


### Count obs by two pair

You need to pass the primary group in the cell below

- Index: primary group
- Columns: Secondary key -> All the categorical variables in the dataset
- nb_obs: Number of observations per primary group value
- Total: Total number of observations per primary group value (sum by row)
- percentage: Percentage of observations per primary group value over the total number of observations per primary group value (sum by row)

Returns the top 10 only

In [23]:
primary_key = "year"

In [24]:
for field in schema["StorageDescriptor"]["Columns"]:
    if field["Type"] in ["string", "object", "varchar(12)"]:
        if field["Name"] != primary_key:
            print(
                "Nb of obs for the primary group {} and {}".format(
                    primary_key, field["Name"]
                )
            )
            query = parameters["ANALYSIS"]["CATEGORICAL"]["MULTI_PAIR"].format(
                db, table, primary_key, field["Name"]
            )

            output = s3.run_query(
                query=query,
                database=db,
                s3_output=s3_output,
                filename="count_categorical_{}_{}".format(
                    primary_key, field["Name"]
                ),  # Add filename to print dataframe
                destination_key=None,  # Add destination key if need to copy output
            )

            display(
                (
                    pd.concat(
                        [
                            (
                                output.loc[
                                    lambda x: x[field["Name"]].isin(
                                        (
                                            output.assign(
                                                total_secondary=lambda x: x["nb_obs"]
                                                .groupby([x[field["Name"]]])
                                                .transform("sum")
                                            )
                                            .drop_duplicates(
                                                subset="total_secondary", keep="last"
                                            )
                                            .sort_values(
                                                by=["total_secondary"], ascending=False
                                            )
                                            .iloc[:10, 1]
                                            .to_list()
                                        )
                                    )
                                ]
                                .set_index([primary_key, field["Name"]])
                                .unstack([0])
                                .fillna(0)
                                .assign(total=lambda x: x.sum(axis=1))
                                .sort_values(by=["total"])
                            ),
                            (
                                output.loc[
                                    lambda x: x[field["Name"]].isin(
                                        (
                                            output.assign(
                                                total_secondary=lambda x: x["nb_obs"]
                                                .groupby([x[field["Name"]]])
                                                .transform("sum")
                                            )
                                            .drop_duplicates(
                                                subset="total_secondary", keep="last"
                                            )
                                            .sort_values(
                                                by=["total_secondary"], ascending=False
                                            )
                                            .iloc[:10, 1]
                                            .to_list()
                                        )
                                    )
                                ]
                                .rename(columns={"nb_obs": "percentage"})
                                .set_index([primary_key, field["Name"]])
                                .unstack([0])
                                .fillna(0)
                                .apply(lambda x: x / x.sum(), axis=1)
                            ),
                        ],
                        axis=1,
                    )
                    .fillna(0)
                    # .sort_index(axis=1, level=1)
                    .style.format("{0:,.2f}", subset=["nb_obs", "total"])
                    .bar(subset=["total"], color="#d65f5f")
                    .format("{0:,.2%}", subset=("percentage"))
                    .background_gradient(
                        cmap=sns.light_palette("green", as_cmap=True), subset=("nb_obs")
                    )
                )
            )

Nb of obs for the primary group year and cityen


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
Nanjing,15548.0,18648.0,22154.0,20790.0,20587.0,16264.0,18062.0,19984.0,152037.0,10.23%,12.27%,14.57%,13.67%,13.54%,10.70%,11.88%,13.14%
Tianjin,9656.0,12151.0,15019.0,15347.0,32480.0,25271.0,23729.0,24058.0,157711.0,6.12%,7.70%,9.52%,9.73%,20.59%,16.02%,15.05%,15.25%
Qingdao,14397.0,17844.0,20509.0,20454.0,28797.0,21997.0,21157.0,22294.0,167449.0,8.60%,10.66%,12.25%,12.22%,17.20%,13.14%,12.63%,13.31%
Hangzhou,16791.0,20598.0,24991.0,25700.0,27677.0,20237.0,20893.0,23844.0,180731.0,9.29%,11.40%,13.83%,14.22%,15.31%,11.20%,11.56%,13.19%
Beijing,12458.0,15680.0,22707.0,23795.0,34266.0,26087.0,26315.0,31456.0,192764.0,6.46%,8.13%,11.78%,12.34%,17.78%,13.53%,13.65%,16.32%
Ningbo,15251.0,20065.0,30173.0,29919.0,40495.0,28620.0,29364.0,34760.0,228647.0,6.67%,8.78%,13.20%,13.09%,17.71%,12.52%,12.84%,15.20%
Canton,22614.0,27648.0,35209.0,29678.0,40567.0,30283.0,36984.0,30353.0,253336.0,8.93%,10.91%,13.90%,11.71%,16.01%,11.95%,14.60%,11.98%
Dongguan,24876.0,22887.0,27113.0,22277.0,61817.0,64100.0,62674.0,52087.0,337831.0,7.36%,6.77%,8.03%,6.59%,18.30%,18.97%,18.55%,15.42%
Shanghai,31630.0,41513.0,48814.0,51563.0,100425.0,80068.0,90661.0,89935.0,534609.0,5.92%,7.77%,9.13%,9.64%,18.78%,14.98%,16.96%,16.82%
Shenzhen,28234.0,33826.0,48462.0,41149.0,97079.0,128865.0,85246.0,85469.0,548330.0,5.15%,6.17%,8.84%,7.50%,17.70%,23.50%,15.55%,15.59%


Nb of obs for the primary group year and geocode4_corr


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
geocode4_corr,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1101,12458.0,15680.0,22707.0,23795.0,34266.0,26087.0,26315.0,31456.0,192764.0,6.46%,8.13%,11.78%,12.34%,17.78%,13.53%,13.65%,16.32%
1201,9656.0,12151.0,15019.0,15347.0,32480.0,25271.0,23729.0,24058.0,157711.0,6.12%,7.70%,9.52%,9.73%,20.59%,16.02%,15.05%,15.25%
3101,31630.0,41513.0,48814.0,51563.0,100425.0,80068.0,90661.0,89935.0,534609.0,5.92%,7.77%,9.13%,9.64%,18.78%,14.98%,16.96%,16.82%
3201,15548.0,18648.0,22154.0,20790.0,20587.0,16264.0,18062.0,19984.0,152037.0,10.23%,12.27%,14.57%,13.67%,13.54%,10.70%,11.88%,13.14%
3301,16791.0,20598.0,24991.0,25700.0,27677.0,20237.0,20893.0,23844.0,180731.0,9.29%,11.40%,13.83%,14.22%,15.31%,11.20%,11.56%,13.19%
3302,15251.0,20065.0,30173.0,29919.0,40495.0,28620.0,29364.0,34760.0,228647.0,6.67%,8.78%,13.20%,13.09%,17.71%,12.52%,12.84%,15.20%
3702,14397.0,17844.0,20509.0,20454.0,28797.0,21997.0,21157.0,22294.0,167449.0,8.60%,10.66%,12.25%,12.22%,17.20%,13.14%,12.63%,13.31%
4401,22614.0,27648.0,35209.0,29678.0,40567.0,30283.0,36984.0,30353.0,253336.0,8.93%,10.91%,13.90%,11.71%,16.01%,11.95%,14.60%,11.98%
4403,28234.0,33826.0,48462.0,41149.0,97079.0,128865.0,85246.0,85469.0,548330.0,5.15%,6.17%,8.84%,7.50%,17.70%,23.50%,15.55%,15.59%
4419,24876.0,22887.0,27113.0,22277.0,61817.0,64100.0,62674.0,52087.0,337831.0,7.36%,6.77%,8.03%,6.59%,18.30%,18.97%,18.55%,15.42%


Nb of obs for the primary group year and regime


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
NOT_ELIGIBLE,31807.0,32806.0,35740.0,33455.0,213220.0,165638.0,198018.0,200274.0,910958.0,3.49%,3.60%,3.92%,3.67%,23.41%,18.18%,21.74%,21.98%
ELIGIBLE,395230.0,513864.0,669823.0,654123.0,785355.0,667630.0,607684.0,628278.0,4921987.0,8.03%,10.44%,13.61%,13.29%,15.96%,13.56%,12.35%,12.76%


Nb of obs for the primary group year and hs6


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
hs6,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
392640,1701.0,2088.0,1971.0,1768.0,3228.0,2130.0,2198.0,1960.0,17044.0,9.98%,12.25%,11.56%,10.37%,18.94%,12.50%,12.90%,11.50%
392690,2393.0,3063.0,3466.0,3457.0,5016.0,3992.0,4284.0,4438.0,30109.0,7.95%,10.17%,11.51%,11.48%,16.66%,13.26%,14.23%,14.74%
420212,1988.0,2298.0,2465.0,2182.0,4025.0,2961.0,3082.0,2974.0,21975.0,9.05%,10.46%,11.22%,9.93%,18.32%,13.47%,14.03%,13.53%
640299,1196.0,1363.0,1578.0,1508.0,2773.0,2235.0,2464.0,2611.0,15728.0,7.60%,8.67%,10.03%,9.59%,17.63%,14.21%,15.67%,16.60%
732690,1684.0,2059.0,2597.0,2522.0,3483.0,2741.0,2843.0,2948.0,20877.0,8.07%,9.86%,12.44%,12.08%,16.68%,13.13%,13.62%,14.12%
848180,1069.0,1325.0,1692.0,1745.0,2989.0,2669.0,2686.0,2907.0,17082.0,6.26%,7.76%,9.91%,10.22%,17.50%,15.62%,15.72%,17.02%
850440,913.0,1235.0,1475.0,1538.0,2990.0,2115.0,3109.0,3365.0,16740.0,5.45%,7.38%,8.81%,9.19%,17.86%,12.63%,18.57%,20.10%
870899,889.0,1369.0,1895.0,2065.0,2896.0,2560.0,2362.0,2191.0,16227.0,5.48%,8.44%,11.68%,12.73%,17.85%,15.78%,14.56%,13.50%
940320,967.0,1310.0,1702.0,1699.0,2965.0,2226.0,2163.0,2402.0,15434.0,6.27%,8.49%,11.03%,11.01%,19.21%,14.42%,14.01%,15.56%
940540,1263.0,1506.0,1775.0,1617.0,3019.0,2338.0,1955.0,2082.0,15555.0,8.12%,9.68%,11.41%,10.40%,19.41%,15.03%,12.57%,13.38%


Nb of obs for the primary group year and hs4


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
hs4,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
3923,3351.0,4467.0,5801.0,5740.0,8603.0,6269.0,6783.0,6968.0,47982.0,6.98%,9.31%,12.09%,11.96%,17.93%,13.07%,14.14%,14.52%
3926,6082.0,7647.0,8348.0,7927.0,12804.0,9062.0,9677.0,9602.0,71149.0,8.55%,10.75%,11.73%,11.14%,18.00%,12.74%,13.60%,13.50%
4202,6817.0,8584.0,9561.0,8526.0,15171.0,10506.0,10988.0,11103.0,81256.0,8.39%,10.56%,11.77%,10.49%,18.67%,12.93%,13.52%,13.66%
6104,5792.0,7222.0,9059.0,8013.0,12278.0,10215.0,11021.0,11817.0,75417.0,7.68%,9.58%,12.01%,10.62%,16.28%,13.54%,14.61%,15.67%
6204,6541.0,7813.0,10633.0,9790.0,14071.0,11785.0,11459.0,11089.0,83181.0,7.86%,9.39%,12.78%,11.77%,16.92%,14.17%,13.78%,13.33%
8516,2730.0,3333.0,4015.0,3743.0,9485.0,8126.0,8926.0,8891.0,49249.0,5.54%,6.77%,8.15%,7.60%,19.26%,16.50%,18.12%,18.05%
8708,3346.0,5159.0,7648.0,7796.0,10078.0,9801.0,9298.0,9081.0,62207.0,5.38%,8.29%,12.29%,12.53%,16.20%,15.76%,14.95%,14.60%
9401,3744.0,5398.0,7044.0,6584.0,12312.0,9230.0,9126.0,10448.0,63886.0,5.86%,8.45%,11.03%,10.31%,19.27%,14.45%,14.28%,16.35%
9403,4807.0,6647.0,8617.0,8332.0,12000.0,10353.0,10540.0,10889.0,72185.0,6.66%,9.21%,11.94%,11.54%,16.62%,14.34%,14.60%,15.08%
9405,5086.0,6243.0,7294.0,6673.0,11560.0,8559.0,8474.0,8560.0,62449.0,8.14%,10.00%,11.68%,10.69%,18.51%,13.71%,13.57%,13.71%


Nb of obs for the primary group year and hs3


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
hs3,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
392,15076.0,20347.0,24751.0,24083.0,38056.0,27210.0,28479.0,29437.0,207439.0,7.27%,9.81%,11.93%,11.61%,18.35%,13.12%,13.73%,14.19%
610,16722.0,20207.0,27422.0,23288.0,34436.0,28056.0,26891.0,27287.0,204309.0,8.18%,9.89%,13.42%,11.40%,16.85%,13.73%,13.16%,13.36%
620,21078.0,24827.0,31919.0,29117.0,40289.0,34692.0,32354.0,31342.0,245618.0,8.58%,10.11%,13.00%,11.85%,16.40%,14.12%,13.17%,12.76%
630,11119.0,13910.0,17629.0,16468.0,22044.0,19528.0,18764.0,18571.0,138033.0,8.06%,10.08%,12.77%,11.93%,15.97%,14.15%,13.59%,13.45%
820,9789.0,11550.0,15076.0,14361.0,20926.0,17239.0,16800.0,17194.0,122935.0,7.96%,9.40%,12.26%,11.68%,17.02%,14.02%,13.67%,13.99%
841,8597.0,11603.0,15762.0,16915.0,27217.0,24980.0,24676.0,27523.0,157273.0,5.47%,7.38%,10.02%,10.76%,17.31%,15.88%,15.69%,17.50%
850,8740.0,11048.0,13748.0,14844.0,22857.0,19521.0,20711.0,22587.0,134056.0,6.52%,8.24%,10.26%,11.07%,17.05%,14.56%,15.45%,16.85%
851,10773.0,13132.0,16639.0,15867.0,27305.0,22860.0,25658.0,26519.0,158753.0,6.79%,8.27%,10.48%,9.99%,17.20%,14.40%,16.16%,16.70%
853,8405.0,10295.0,12856.0,12345.0,20426.0,18724.0,18648.0,20924.0,122623.0,6.85%,8.40%,10.48%,10.07%,16.66%,15.27%,15.21%,17.06%
940,16028.0,21607.0,27584.0,26360.0,43553.0,34263.0,33874.0,35659.0,238928.0,6.71%,9.04%,11.54%,11.03%,18.23%,14.34%,14.18%,14.92%


Nb of obs for the primary group year and country_en


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
Spain,7973.0,10153.0,13991.0,14167.0,20696.0,17486.0,16468.0,16791.0,117725.0,6.77%,8.62%,11.88%,12.03%,17.58%,14.85%,13.99%,14.26%
Canada,9971.0,12141.0,16465.0,16343.0,23538.0,18381.0,18000.0,18833.0,133672.0,7.46%,9.08%,12.32%,12.23%,17.61%,13.75%,13.47%,14.09%
Italy,10215.0,12792.0,17363.0,17162.0,23586.0,20206.0,19462.0,19494.0,140280.0,7.28%,9.12%,12.38%,12.23%,16.81%,14.40%,13.87%,13.90%
United Kingdom,11084.0,14051.0,18244.0,17567.0,25152.0,19948.0,20631.0,21501.0,148178.0,7.48%,9.48%,12.31%,11.86%,16.97%,13.46%,13.92%,14.51%
Australia,12002.0,15134.0,18739.0,17874.0,24410.0,20102.0,20516.0,21241.0,150018.0,8.00%,10.09%,12.49%,11.91%,16.27%,13.40%,13.68%,14.16%
Germany,12616.0,15777.0,20925.0,20571.0,29614.0,24197.0,24474.0,24883.0,173057.0,7.29%,9.12%,12.09%,11.89%,17.11%,13.98%,14.14%,14.38%
Korea,17806.0,20954.0,27063.0,24786.0,32291.0,23374.0,19781.0,20704.0,186759.0,9.53%,11.22%,14.49%,13.27%,17.29%,12.52%,10.59%,11.09%
Japan,20786.0,24137.0,27908.0,25286.0,30811.0,25030.0,24033.0,24000.0,201991.0,10.29%,11.95%,13.82%,12.52%,15.25%,12.39%,11.90%,11.88%
Hong Kong,28691.0,31376.0,32377.0,27975.0,44350.0,23121.0,27585.0,26638.0,242113.0,11.85%,12.96%,13.37%,11.55%,18.32%,9.55%,11.39%,11.00%
United States,25412.0,30025.0,38538.0,36690.0,49628.0,38414.0,37902.0,40081.0,296690.0,8.57%,10.12%,12.99%,12.37%,16.73%,12.95%,12.77%,13.51%


Nb of obs for the primary group year and iso_alpha


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
ESP,7973.0,10153.0,13991.0,14167.0,20696.0,17486.0,16468.0,16791.0,117725.0,6.77%,8.62%,11.88%,12.03%,17.58%,14.85%,13.99%,14.26%
CAN,9971.0,12141.0,16465.0,16343.0,23538.0,18381.0,18000.0,18833.0,133672.0,7.46%,9.08%,12.32%,12.23%,17.61%,13.75%,13.47%,14.09%
ITA,10215.0,12792.0,17363.0,17162.0,23586.0,20206.0,19462.0,19494.0,140280.0,7.28%,9.12%,12.38%,12.23%,16.81%,14.40%,13.87%,13.90%
GBR,11084.0,14051.0,18244.0,17567.0,25152.0,19948.0,20631.0,21501.0,148178.0,7.48%,9.48%,12.31%,11.86%,16.97%,13.46%,13.92%,14.51%
AUS,12002.0,15134.0,18739.0,17874.0,24410.0,20102.0,20516.0,21241.0,150018.0,8.00%,10.09%,12.49%,11.91%,16.27%,13.40%,13.68%,14.16%
KOR,15633.0,18076.0,22688.0,21788.0,25686.0,19546.0,16325.0,16745.0,156487.0,9.99%,11.55%,14.50%,13.92%,16.41%,12.49%,10.43%,10.70%
DEU,12616.0,15777.0,20925.0,20571.0,29614.0,24197.0,24474.0,24883.0,173057.0,7.29%,9.12%,12.09%,11.89%,17.11%,13.98%,14.14%,14.38%
JPN,20786.0,24137.0,27908.0,25286.0,30811.0,25030.0,24033.0,24000.0,201991.0,10.29%,11.95%,13.82%,12.52%,15.25%,12.39%,11.90%,11.88%
HKG,28691.0,31376.0,32377.0,27975.0,44350.0,23121.0,27585.0,26638.0,242113.0,11.85%,12.96%,13.37%,11.55%,18.32%,9.55%,11.39%,11.00%
USA,25412.0,30025.0,38538.0,36690.0,49628.0,38414.0,37902.0,40081.0,296690.0,8.57%,10.12%,12.99%,12.37%,16.73%,12.95%,12.77%,13.51%


Nb of obs for the primary group year and income_group


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
Low income,7597.0,10529.0,13810.0,12876.0,19633.0,18599.0,19732.0,21413.0,124189.0,6.12%,8.48%,11.12%,10.37%,15.81%,14.98%,15.89%,17.24%
High income: nonOECD,32686.0,44035.0,59661.0,59492.0,89179.0,75757.0,71154.0,72804.0,504768.0,6.48%,8.72%,11.82%,11.79%,17.67%,15.01%,14.10%,14.42%
,52990.0,64876.0,75491.0,66551.0,119872.0,83094.0,84685.0,83322.0,630881.0,8.40%,10.28%,11.97%,10.55%,19.00%,13.17%,13.42%,13.21%
Lower middle income,59665.0,79383.0,104081.0,101850.0,154909.0,134568.0,127008.0,132322.0,893786.0,6.68%,8.88%,11.64%,11.40%,17.33%,15.06%,14.21%,14.80%
Upper middle income,76502.0,102326.0,134721.0,137841.0,185637.0,165436.0,160453.0,167603.0,1130519.0,6.77%,9.05%,11.92%,12.19%,16.42%,14.63%,14.19%,14.83%
High income: OECD,197597.0,245521.0,317799.0,308968.0,429345.0,355814.0,342670.0,351088.0,2548802.0,7.75%,9.63%,12.47%,12.12%,16.84%,13.96%,13.44%,13.77%


Nb of obs for the primary group year and fe_ck


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
fe_ck,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
13217,170.0,166.0,172.0,171.0,125.0,157.0,86.0,128.0,1175.0,14.47%,14.13%,14.64%,14.55%,10.64%,13.36%,7.32%,10.89%
35102,106.0,115.0,121.0,123.0,235.0,185.0,248.0,222.0,1355.0,7.82%,8.49%,8.93%,9.08%,17.34%,13.65%,18.30%,16.38%
38209,133.0,120.0,113.0,98.0,198.0,221.0,132.0,131.0,1146.0,11.61%,10.47%,9.86%,8.55%,17.28%,19.28%,11.52%,11.43%
38210,171.0,153.0,164.0,144.0,232.0,257.0,235.0,221.0,1577.0,10.84%,9.70%,10.40%,9.13%,14.71%,16.30%,14.90%,14.01%
39351,155.0,140.0,150.0,126.0,192.0,202.0,183.0,179.0,1327.0,11.68%,10.55%,11.30%,9.50%,14.47%,15.22%,13.79%,13.49%
85703,149.0,145.0,161.0,155.0,159.0,175.0,150.0,172.0,1266.0,11.77%,11.45%,12.72%,12.24%,12.56%,13.82%,11.85%,13.59%
88136,121.0,131.0,157.0,169.0,191.0,248.0,199.0,213.0,1429.0,8.47%,9.17%,10.99%,11.83%,13.37%,17.35%,13.93%,14.91%
132035,90.0,89.0,115.0,104.0,193.0,235.0,168.0,180.0,1174.0,7.67%,7.58%,9.80%,8.86%,16.44%,20.02%,14.31%,15.33%
241592,97.0,96.0,100.0,96.0,212.0,163.0,187.0,155.0,1106.0,8.77%,8.68%,9.04%,8.68%,19.17%,14.74%,16.91%,14.01%
242446,142.0,145.0,154.0,124.0,208.0,231.0,185.0,176.0,1365.0,10.40%,10.62%,11.28%,9.08%,15.24%,16.92%,13.55%,12.89%


Nb of obs for the primary group year and fe_cst


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage
year,2007,2008,2010,Unnamed: 4_level_1,2007,2008,2010
fe_cst,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
110064,0.0,1104.0,0.0,1104.0,0.00%,100.00%,0.00%
110334,0.0,1185.0,0.0,1185.0,0.00%,100.00%,0.00%
132238,0.0,1162.0,0.0,1162.0,0.00%,100.00%,0.00%
136210,0.0,1250.0,0.0,1250.0,0.00%,100.00%,0.00%
145381,0.0,1136.0,0.0,1136.0,0.00%,100.00%,0.00%
206440,0.0,1202.0,0.0,1202.0,0.00%,100.00%,0.00%
312035,0.0,0.0,1165.0,1165.0,0.00%,0.00%,100.00%
318556,1108.0,0.0,0.0,1108.0,100.00%,0.00%,0.00%
319694,0.0,1346.0,0.0,1346.0,0.00%,100.00%,0.00%
363529,0.0,1305.0,0.0,1305.0,0.00%,100.00%,0.00%


Nb of obs for the primary group year and fe_ckr


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
fe_ckr,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
48471,104.0,114.0,120.0,123.0,134.0,127.0,129.0,138.0,989.0,10.52%,11.53%,12.13%,12.44%,13.55%,12.84%,13.04%,13.95%
53202,124.0,112.0,107.0,90.0,112.0,107.0,66.0,81.0,799.0,15.52%,14.02%,13.39%,11.26%,14.02%,13.39%,8.26%,10.14%
53204,122.0,115.0,125.0,110.0,133.0,136.0,136.0,136.0,1013.0,12.04%,11.35%,12.34%,10.86%,13.13%,13.43%,13.43%,13.43%
76619,131.0,136.0,132.0,136.0,83.0,68.0,75.0,51.0,812.0,16.13%,16.75%,16.26%,16.75%,10.22%,8.37%,9.24%,6.28%
118110,88.0,98.0,124.0,133.0,118.0,157.0,129.0,130.0,977.0,9.01%,10.03%,12.69%,13.61%,12.08%,16.07%,13.20%,13.31%
177696,69.0,92.0,84.0,96.0,112.0,110.0,114.0,130.0,807.0,8.55%,11.40%,10.41%,11.90%,13.88%,13.63%,14.13%,16.11%
320046,94.0,95.0,99.0,94.0,120.0,114.0,95.0,95.0,806.0,11.66%,11.79%,12.28%,11.66%,14.89%,14.14%,11.79%,11.79%
321366,113.0,121.0,122.0,104.0,126.0,123.0,99.0,104.0,912.0,12.39%,13.27%,13.38%,11.40%,13.82%,13.49%,10.86%,11.40%
398454,80.0,131.0,143.0,139.0,49.0,143.0,59.0,56.0,800.0,10.00%,16.38%,17.88%,17.38%,6.12%,17.88%,7.38%,7.00%
400193,97.0,111.0,115.0,116.0,98.0,119.0,102.0,109.0,867.0,11.19%,12.80%,13.26%,13.38%,11.30%,13.73%,11.76%,12.57%


Nb of obs for the primary group year and fe_csrt


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage
year,2004,2005,2006,2008,2009,Unnamed: 6_level_1,2004,2005,2006,2008,2009
fe_csrt,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
31145,0.0,0.0,765.0,0.0,0.0,765.0,0.00%,0.00%,100.00%,0.00%,0.00%
33905,0.0,0.0,0.0,0.0,792.0,792.0,0.00%,0.00%,0.00%,0.00%,100.00%
87779,0.0,848.0,0.0,0.0,0.0,848.0,0.00%,100.00%,0.00%,0.00%,0.00%
87781,0.0,0.0,807.0,0.0,0.0,807.0,0.00%,0.00%,100.00%,0.00%,0.00%
87791,806.0,0.0,0.0,0.0,0.0,806.0,100.00%,0.00%,0.00%,0.00%,0.00%
150720,0.0,0.0,0.0,739.0,0.0,739.0,0.00%,0.00%,0.00%,100.00%,0.00%
172565,0.0,0.0,0.0,759.0,0.0,759.0,0.00%,0.00%,0.00%,100.00%,0.00%
448140,0.0,914.0,0.0,0.0,0.0,914.0,0.00%,100.00%,0.00%,0.00%,0.00%
448144,0.0,0.0,0.0,793.0,0.0,793.0,0.00%,0.00%,0.00%,100.00%,0.00%
541759,0.0,0.0,775.0,0.0,0.0,775.0,0.00%,0.00%,100.00%,0.00%,0.00%


Nb of obs for the primary group year and fe_kt


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage
year,2005,2006,2007,2008,2009,2010,Unnamed: 7_level_1,2005,2006,2007,2008,2009,2010
fe_kt,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
3573,0.0,0.0,3228.0,0.0,0.0,0.0,3228.0,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
3579,3466.0,0.0,0.0,0.0,0.0,0.0,3466.0,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3580,0.0,3457.0,0.0,0.0,0.0,0.0,3457.0,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%
3581,0.0,0.0,5016.0,0.0,0.0,0.0,5016.0,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
3582,0.0,0.0,0.0,3992.0,0.0,0.0,3992.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%
3583,0.0,0.0,0.0,0.0,4284.0,0.0,4284.0,0.00%,0.00%,0.00%,0.00%,100.00%,0.00%
3584,0.0,0.0,0.0,0.0,0.0,4438.0,4438.0,0.00%,0.00%,0.00%,0.00%,0.00%,100.00%
8615,0.0,0.0,0.0,0.0,0.0,3365.0,3365.0,0.00%,0.00%,0.00%,0.00%,0.00%,100.00%
12061,0.0,0.0,3483.0,0.0,0.0,0.0,3483.0,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
25377,0.0,0.0,4025.0,0.0,0.0,0.0,4025.0,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%


Nb of obs for the primary group year and fe_pj


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2003,2004,2005,2006,2007,2008,2009,2010,Unnamed: 9_level_1,2003,2004,2005,2006,2007,2008,2009,2010
fe_pj,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
46011,74.0,80.0,86.0,79.0,117.0,81.0,92.0,84.0,693.0,10.68%,11.54%,12.41%,11.40%,16.88%,11.69%,13.28%,12.12%
46048,57.0,81.0,86.0,87.0,103.0,96.0,86.0,92.0,688.0,8.28%,11.77%,12.50%,12.65%,14.97%,13.95%,12.50%,13.37%
46049,88.0,105.0,104.0,105.0,142.0,119.0,121.0,137.0,921.0,9.55%,11.40%,11.29%,11.40%,15.42%,12.92%,13.14%,14.88%
46075,61.0,67.0,79.0,71.0,103.0,101.0,93.0,101.0,676.0,9.02%,9.91%,11.69%,10.50%,15.24%,14.94%,13.76%,14.94%
46084,65.0,72.0,81.0,78.0,121.0,71.0,90.0,89.0,667.0,9.75%,10.79%,12.14%,11.69%,18.14%,10.64%,13.49%,13.34%
131266,72.0,79.0,86.0,90.0,98.0,77.0,88.0,85.0,675.0,10.67%,11.70%,12.74%,13.33%,14.52%,11.41%,13.04%,12.59%
148320,84.0,99.0,102.0,103.0,134.0,97.0,108.0,104.0,831.0,10.11%,11.91%,12.27%,12.39%,16.13%,11.67%,13.00%,12.52%
268336,74.0,77.0,86.0,76.0,124.0,92.0,92.0,102.0,723.0,10.24%,10.65%,11.89%,10.51%,17.15%,12.72%,12.72%,14.11%
279689,60.0,72.0,84.0,71.0,102.0,83.0,89.0,97.0,658.0,9.12%,10.94%,12.77%,10.79%,15.50%,12.61%,13.53%,14.74%
313917,67.0,83.0,90.0,80.0,105.0,83.0,90.0,91.0,689.0,9.72%,12.05%,13.06%,11.61%,15.24%,12.05%,13.06%,13.21%


Nb of obs for the primary group year and fe_jt


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2004,2005,2006,2007,2008,2009,2010,Unnamed: 8_level_1,2004,2005,2006,2007,2008,2009,2010
fe_jt,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
32,0.0,0.0,0.0,0.0,0.0,37902.0,0.0,37902.0,0.00%,0.00%,0.00%,0.00%,0.00%,100.00%,0.00%
52,0.0,0.0,0.0,49628.0,0.0,0.0,0.0,49628.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
53,0.0,0.0,0.0,32291.0,0.0,0.0,0.0,32291.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
87,0.0,38538.0,0.0,0.0,0.0,0.0,0.0,38538.0,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%
92,0.0,0.0,36690.0,0.0,0.0,0.0,0.0,36690.0,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%
110,0.0,0.0,0.0,44350.0,0.0,0.0,0.0,44350.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
139,0.0,0.0,0.0,0.0,0.0,0.0,40081.0,40081.0,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,100.00%
141,0.0,32377.0,0.0,0.0,0.0,0.0,0.0,32377.0,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%
164,31376.0,0.0,0.0,0.0,0.0,0.0,0.0,31376.0,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
174,0.0,0.0,0.0,0.0,38414.0,0.0,0.0,38414.0,0.00%,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%


Nb of obs for the primary group year and fe_ct


Unnamed: 0_level_0,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,nb_obs,total,percentage,percentage,percentage,percentage,percentage,percentage,percentage
year,2004,2005,2006,2007,2008,2009,2010,Unnamed: 8_level_1,2004,2005,2006,2007,2008,2009,2010
fe_ct,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
32,0.0,0.0,0.0,0.0,0.0,37902.0,0.0,37902.0,0.00%,0.00%,0.00%,0.00%,0.00%,100.00%,0.00%
51,0.0,0.0,0.0,49628.0,0.0,0.0,0.0,49628.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
52,0.0,0.0,0.0,32291.0,0.0,0.0,0.0,32291.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
86,0.0,38538.0,0.0,0.0,0.0,0.0,0.0,38538.0,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%
91,0.0,0.0,36690.0,0.0,0.0,0.0,0.0,36690.0,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%
109,0.0,0.0,0.0,44350.0,0.0,0.0,0.0,44350.0,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%,0.00%
138,0.0,0.0,0.0,0.0,0.0,0.0,40081.0,40081.0,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,100.00%
140,0.0,32377.0,0.0,0.0,0.0,0.0,0.0,32377.0,0.00%,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%
163,31376.0,0.0,0.0,0.0,0.0,0.0,0.0,31376.0,100.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
173,0.0,0.0,0.0,0.0,38414.0,0.0,0.0,38414.0,0.00%,0.00%,0.00%,0.00%,100.00%,0.00%,0.00%


## Continuous description

There are three possibilities to show the ditribution of a continuous variables:

1. Display the percentile
2. Display the percentile, with one primary key
3. Display the percentile, with one primary key, and a secondary key

### 1. Display the percentile

- pct: Percentile [.25, .50, .75, .95, .90]

In [25]:
table_top = ""
table_top_var = ""
table_middle = ""
table_bottom = ""

var_index = 0
size_continuous = len([len(x) for x in schema["StorageDescriptor"]["Columns"] if 
                       x['Type'] in ["float", "double", "bigint", "int"]])
cont = 0
for key, value in enumerate(schema["StorageDescriptor"]["Columns"]):
    if value["Type"] in ["float", "double", "bigint", "int"]:
        cont +=1

        if var_index == 0:
            table_top_var += "{} ,".format(value["Name"])
            table_top = parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"][
                "bottom"
            ].format(db, table, value["Name"], key)
        else:
            temp_middle_1 = "{} {}".format(
                parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"]["middle_1"],
                parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"]["bottom"].format(
                    db, table, value["Name"], key
                ),
            )
            temp_middle_2 = parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"][
                "middle_2"
            ].format(value["Name"])

            if cont == size_continuous:

                table_top_var += "{} {}".format(
                    value["Name"],
                    parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"]["top_3"],
                )
                table_bottom += "{} {})".format(temp_middle_1, temp_middle_2)
            else:
                table_top_var += "{} ,".format(value["Name"])
                table_bottom += "{} {}".format(temp_middle_1, temp_middle_2)
        var_index += 1

query = (
    parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"]["top_1"]
    + table_top
    + parameters["ANALYSIS"]["CONTINUOUS"]["DISTRIBUTION"]["top_2"]
    + table_top_var
    + table_bottom
)
output = s3.run_query(
    query=query,
    database=db,
    s3_output=s3_output,
    filename="count_distribution",  ## Add filename to print dataframe
    destination_key=None,  ### Add destination key if need to copy output
)
(output.sort_values(by="pct").set_index(["pct"]).style.format("{0:.2f}"))

Unnamed: 0_level_0,gni_per_capita,gpd_per_capita,quantity,value,unit_price,kandhelwal_quality,price_adjusted_quality,lag_tax_rebate,ln_lag_tax_rebate,lag_import_tax,ln_lag_import_tax,sigma,sigma_price,y,prediction,residual
pct,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0.25,3410.0,4633.59,383.0,3071.0,1.32,-0.9,-0.83,2.5,1.25,6.5,2.01,2.75,0.94,9.69,10.4,-2.23
0.5,14848.0,18470.02,3583.0,15103.0,3.5,0.05,1.33,4.0,1.61,10.0,2.4,3.36,4.35,12.81,12.6,0.14
0.75,38800.0,42724.76,23295.0,67711.0,12.0,0.95,3.89,5.0,1.79,15.0,2.77,4.67,9.45,16.81,16.08,2.31
0.95,49110.0,51808.77,335871.0,655359.0,704.0,3.11,10.0,12.0,2.56,22.0,3.14,25.03,30.62,34.19,35.12,7.0
0.99,63790.0,75143.7,2621439.0,3276799.0,28672.0,7.0,38.0,17.0,2.89,32.0,3.5,39.28,152.0,160.0,132.0,33.0


### 2. Display the percentile, with one primary key

The primary key will be passed to all the continuous variables

- index: 
    - Primary group
    - Percentile [.25, .50, .75, .95, .90] per primary group value
- Columns: Secondary group
- Heatmap is colored based on the row, ie darker blue indicates larger values for a given row

In [26]:
primary_key = "year"
table_top = ""
table_top_var = ""
table_middle = ""
table_bottom = ""
var_index = 0
cont = 0
for key, value in enumerate(schema["StorageDescriptor"]["Columns"]):

    if value["Type"] in ["float", "double", "bigint", "int"]:
        cont +=1

        if var_index == 0:
            table_top_var += "{} ,".format(value["Name"])
            table_top = parameters["ANALYSIS"]["CONTINUOUS"]["ONE_PAIR_DISTRIBUTION"][
                "bottom"
            ].format(
                db, table, value["Name"], key, primary_key
            )
        else:
            temp_middle_1 = "{} {}".format(
                parameters["ANALYSIS"]["CONTINUOUS"]["ONE_PAIR_DISTRIBUTION"][
                    "middle_1"
                ],
                parameters["ANALYSIS"]["CONTINUOUS"]["ONE_PAIR_DISTRIBUTION"][
                    "bottom"
                ].format(
                    db, table, value["Name"], key, primary_key
                ),
            )
            temp_middle_2 = parameters["ANALYSIS"]["CONTINUOUS"][
                "ONE_PAIR_DISTRIBUTION"
            ]["middle_2"].format(value["Name"], primary_key)

            if cont == size_continuous:

                table_top_var += "{} {}".format(
                    value["Name"],
                    parameters["ANALYSIS"]["CONTINUOUS"]["ONE_PAIR_DISTRIBUTION"][
                        "top_3"
                    ],
                )
                table_bottom += "{} {})".format(temp_middle_1, temp_middle_2)
            else:
                table_top_var += "{} ,".format(value["Name"])
                table_bottom += "{} {}".format(temp_middle_1, temp_middle_2)
        var_index += 1

query = (
    parameters["ANALYSIS"]["CONTINUOUS"]["ONE_PAIR_DISTRIBUTION"]["top_1"]
    + table_top
    + parameters["ANALYSIS"]["CONTINUOUS"]["ONE_PAIR_DISTRIBUTION"]["top_2"].format(
        primary_key
    )
    + table_top_var
    + table_bottom
)
output = s3.run_query(
    query=query,
    database=db,
    s3_output=s3_output,
    filename="count_distribution_primary_key",  # Add filename to print dataframe
    destination_key=None,  # Add destination key if need to copy output
)
(
    output.set_index([primary_key, "pct"])
    .unstack(1)
    .T.style.format("{0:,.2f}")
    .background_gradient(cmap=sns.light_palette("blue", as_cmap=True), axis=1)
)

Unnamed: 0_level_0,year,2003,2004,2005,2006,2007,2008,2009,2010
Unnamed: 0_level_1,pct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
gni_per_capita,0.25,2180.0,2530.0,2790.0,3100.0,3600.0,3970.0,4140.0,4510.0
gni_per_capita,0.5,13790.0,16200.0,18520.0,14160.0,15800.0,14270.0,14620.0,13190.0
gni_per_capita,0.75,26070.0,31570.0,35710.0,38500.0,40130.0,43460.0,43520.0,44480.0
gni_per_capita,0.95,39770.0,43510.0,46190.0,47880.0,49110.0,52670.0,52470.0,53900.0
gni_per_capita,0.99,44750.0,53800.0,61620.0,63990.0,62830.0,65580.0,70370.0,77570.0
gpd_per_capita,0.25,3969.73,4190.49,4337.88,4525.96,4745.3,4716.2,4729.74,5076.34
gpd_per_capita,0.5,18034.72,18470.02,19225.01,19367.58,19187.81,15207.51,14370.51,13113.53
gpd_per_capita,0.75,41532.22,42757.0,42641.68,43442.57,44742.41,43216.26,41983.07,44141.88
gpd_per_capita,0.95,46811.89,47926.75,48813.89,50033.88,51808.77,52727.52,51767.38,52022.12
gpd_per_capita,0.99,67385.3,68781.61,70471.44,72823.84,75143.7,75793.63,73189.2,74605.72


### 3. Display the percentile, with one primary key, and a secondary key

The primary and secondary key will be passed to all the continuous variables. The output might be too big so we print only the top 10 for the secondary key

- index:  Primary group
- Columns: 
    - Secondary group
    - Percentile [.25, .50, .75, .95, .90] per secondary group value
- Heatmap is colored based on the column, ie darker green indicates larger values for a given column

In [27]:
primary_key = 'year'
secondary_key = 'regime'

In [28]:
for key, value in enumerate(schema["StorageDescriptor"]["Columns"]):

    if value["Type"] in ["float", "double", "bigint", "int"]:

        query = parameters["ANALYSIS"]["CONTINUOUS"]["TWO_PAIRS_DISTRIBUTION"].format(
            db, table,
            primary_key,
            secondary_key,
            value["Name"],
        )

        output = s3.run_query(
            query=query,
            database=db,
            s3_output=s3_output,
            filename="count_distribution_{}_{}_{}".format(
                primary_key, secondary_key, value["Name"]
            ),  ## Add filename to print dataframe
            destination_key=None,  ### Add destination key if need to copy output
        )

        print(
            "Distribution of {}, by {} and {}".format(
                value["Name"], primary_key, secondary_key,
            )
        )

        display(
            (
                output.loc[
                    lambda x: x[secondary_key].isin(
                        (
                            output.assign(
                                total_secondary=lambda x: x[value["Name"]]
                                .groupby([x[secondary_key]])
                                .transform("sum")
                            )
                            .drop_duplicates(subset="total_secondary", keep="last")
                            .sort_values(by=["total_secondary"], ascending=False)
                            .iloc[:10, 1]
                        ).to_list()
                    )
                ]
                .set_index([primary_key, "pct", secondary_key])
                .unstack([0, 1])
                .fillna(0)
                .sort_index(axis=1, level=[1, 2])
                .style.format("{0:,.2f}")
                .background_gradient(cmap=sns.light_palette("green", as_cmap=True))
            )
        )

Distribution of gni_per_capita, by year and regime


Unnamed: 0_level_0,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita,gni_per_capita
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,2180.0,13790.0,26070.0,39770.0,44750.0,2520.0,14050.0,31570.0,43510.0,53800.0,2790.0,12650.0,35710.0,46190.0,61620.0,3100.0,14160.0,38230.0,47880.0,63990.0,3600.0,15000.0,40130.0,49110.0,62830.0,4030.0,15200.0,43460.0,52670.0,65580.0,4140.0,14110.0,43520.0,52470.0,70370.0,4580.0,12770.0,43850.0,53900.0,77570.0
NOT_ELIGIBLE,5480.0,23350.0,32550.0,39770.0,45620.0,7820.0,28270.0,38350.0,43510.0,54110.0,10160.0,34800.0,40560.0,46190.0,63790.0,8690.0,34250.0,41460.0,48820.0,69990.0,3600.0,23440.0,40440.0,49110.0,62830.0,3030.0,10050.0,42330.0,52670.0,65580.0,3550.0,17900.0,43520.0,53670.0,70370.0,4410.0,19280.0,44550.0,53900.0,77570.0


Distribution of gpd_per_capita, by year and regime


Unnamed: 0_level_0,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita,gpd_per_capita
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,3969.73,17627.22,41331.38,46811.89,67385.3,4112.67,18470.02,41903.98,47926.75,68781.61,4337.88,19225.01,42641.68,48813.89,70471.44,4525.96,19193.75,42785.59,50033.88,72823.84,4745.3,17842.53,44710.39,51808.77,75143.7,4801.88,16762.28,43216.26,52727.52,75793.63,4744.76,14103.13,41907.01,51767.38,73189.2,5076.34,12808.03,41531.93,52022.12,74605.72
NOT_ELIGIBLE,9135.63,38073.76,42744.01,48236.96,83941.37,9744.66,38619.87,43671.68,50546.49,86759.14,13134.43,39984.17,44637.86,52276.25,88432.62,11808.83,40798.07,45951.73,52316.96,89828.42,4745.3,22819.5,45687.27,51808.77,75143.7,3652.15,11801.55,43216.26,52727.52,75793.63,3928.45,18883.2,41983.07,51767.38,73189.2,4633.59,19808.07,44507.68,52022.12,74605.72


Distribution of quantity, by year and regime


Unnamed: 0_level_0,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity,quantity
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,591.0,4735.0,26623.0,360447.0,2883583.0,510.0,4383.0,25599.0,352255.0,2752511.0,421.0,3903.0,23935.0,327679.0,2555903.0,370.0,3711.0,24575.0,368639.0,2883583.0,427.0,3903.0,24575.0,335871.0,2359295.0,371.0,3823.0,25599.0,360447.0,2555903.0,295.0,3000.0,20927.0,282623.0,2031615.0,280.0,3000.0,22000.0,313343.0,2162687.0
NOT_ELIGIBLE,1487.0,8287.0,52991.0,917503.0,8388607.0,1399.0,8351.0,53247.0,925695.0,8912895.0,1292.0,7607.0,46719.0,786431.0,8650751.0,1250.0,7343.0,45247.0,753663.0,7340031.0,395.0,3063.0,19007.0,294911.0,2490367.0,252.0,2839.0,19967.0,348159.0,3014655.0,216.0,2059.0,13887.0,229375.0,2031615.0,156.0,1751.0,12703.0,229375.0,2031615.0


Distribution of value, by year and regime


Unnamed: 0_level_0,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,2399.0,11327.0,47935.0,430079.0,2228223.0,2559.0,12287.0,52223.0,491519.0,2490367.0,2631.0,12735.0,54271.0,507903.0,2621439.0,3095.0,15103.0,65535.0,638975.0,3407871.0,3287.0,15775.0,69119.0,614399.0,2883583.0,4511.0,21375.0,91647.0,786431.0,3670015.0,3527.0,16895.0,74495.0,622591.0,2686975.0,3823.0,18815.0,85503.0,745471.0,3276799.0
NOT_ELIGIBLE,3367.0,17231.0,90879.0,1179647.0,6684671.0,3519.0,18431.0,103295.0,1310719.0,7340031.0,3423.0,17407.0,97023.0,1245183.0,7864319.0,3647.0,18943.0,101375.0,1294335.0,8388607.0,2401.0,11519.0,56831.0,696319.0,4194303.0,2991.0,16383.0,90367.0,1114111.0,6553599.0,2095.0,10623.0,55295.0,700415.0,4587519.0,2000.0,11231.0,60927.0,778239.0,5111807.0


Distribution of unit_price, by year and regime


Unnamed: 0_level_0,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price,unit_price
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,0.8,2.17,6.73,176.0,7680.0,0.95,2.45,7.75,260.0,10240.0,1.07,2.8,9.0,440.0,14336.0,1.25,3.3,11.56,736.0,24576.0,1.27,3.45,11.52,672.0,24576.0,1.76,4.54,15.75,1280.0,38912.0,1.78,4.66,15.87,960.0,28672.0,1.97,5.02,17.69,1120.0,32768.0
NOT_ELIGIBLE,0.88,1.67,4.16,28.0,192.0,0.91,1.8,4.96,30.72,192.0,0.97,1.89,5.27,32.0,336.0,1.04,2.02,5.93,33.75,384.0,1.21,2.83,10.25,992.0,45056.0,1.52,4.04,19.56,3648.0,102399.99,1.64,3.77,14.37,1664.0,81919.99,1.93,4.52,17.53,2304.0,81919.99


Distribution of kandhelwal_quality, by year and regime


Unnamed: 0_level_0,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality,kandhelwal_quality
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,-0.54,0.38,1.38,4.06,9.37,-0.61,0.29,1.24,3.7,8.37,-0.68,0.22,1.15,3.54,8.0,-0.79,0.11,1.0,3.25,7.25,-0.79,0.14,1.01,3.11,7.0,-0.96,-0.04,0.74,2.53,5.5,-0.97,-0.04,0.8,2.68,5.69,-1.08,-0.1,0.73,2.52,5.25
NOT_ELIGIBLE,-0.79,0.13,1.13,3.48,9.56,-0.92,0.02,1.01,3.31,8.62,-1.08,-0.14,0.82,2.92,8.0,-1.21,-0.29,0.62,2.54,6.69,-1.1,-0.09,0.87,3.37,7.5,-1.4,-0.26,0.68,3.06,6.62,-1.42,-0.31,0.64,2.77,6.34,-1.54,-0.35,0.6,2.67,6.16


Distribution of price_adjusted_quality, by year and regime


Unnamed: 0_level_0,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality,price_adjusted_quality
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,-2.06,-0.02,2.45,9.0,44.0,-1.69,0.35,2.82,9.03,40.5,-1.4,0.65,3.16,9.2,38.5,-0.95,1.12,3.6,9.56,36.0,-0.92,1.11,3.57,9.28,33.0,-0.18,1.82,4.22,9.8,27.0,-0.23,1.79,4.19,9.7,31.5,-0.03,2.04,4.49,10.0,29.5
NOT_ELIGIBLE,-2.14,0.28,3.18,28.87,78.5,-1.74,0.64,3.5,30.0,82.75,-1.2,1.18,4.08,26.97,75.37,-0.62,1.72,4.61,28.0,79.0,-0.61,1.59,4.2,11.19,47.0,0.15,2.46,5.18,11.62,38.0,0.12,2.39,5.07,11.8,46.0,0.34,2.73,5.5,12.16,45.5


Distribution of lag_tax_rebate, by year and regime


Unnamed: 0_level_0,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate,lag_tax_rebate
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,0.0,2.0,2.0,4.0,5.67,0.0,2.0,2.0,4.0,5.0,4.0,4.0,4.0,4.0,12.0,4.0,4.0,4.0,4.0,13.0,4.0,4.0,4.0,6.5,13.25,4.0,5.0,8.0,10.5,17.0,4.0,4.33,10.0,13.0,17.0,2.0,2.5,7.33,12.0,17.0
NOT_ELIGIBLE,0.0,2.0,2.0,4.0,4.0,0.0,1.33,2.0,4.0,4.0,4.0,4.0,4.0,4.0,7.0,4.0,4.0,4.0,4.0,8.33,4.0,4.0,4.0,6.5,17.0,4.0,5.0,8.0,10.5,17.0,3.5,5.0,10.0,13.25,17.0,2.0,2.5,7.33,12.0,17.0


Distribution of ln_lag_tax_rebate, by year and regime


Unnamed: 0_level_0,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate,ln_lag_tax_rebate
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,0.0,1.1,1.1,1.61,1.9,0.0,1.1,1.1,1.61,1.79,1.61,1.61,1.61,1.61,2.5,1.61,1.61,1.61,1.61,2.61,1.61,1.61,1.61,2.01,2.64,1.61,1.79,2.2,2.44,2.89,1.61,1.67,2.4,2.64,2.89,1.1,1.25,2.12,2.56,2.89
NOT_ELIGIBLE,0.0,1.1,1.1,1.61,1.61,0.0,0.85,1.1,1.61,1.61,1.61,1.61,1.61,1.61,2.03,1.61,1.61,1.61,1.61,2.23,1.61,1.61,1.61,2.01,2.89,1.61,1.79,2.2,2.44,2.89,1.5,1.79,2.4,2.66,2.89,1.1,1.25,2.12,2.56,2.89


Distribution of lag_import_tax, by year and regime


Unnamed: 0_level_0,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax,lag_import_tax
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,8.0,13.3,20.0,24.0,32.0,7.3,12.0,18.0,22.2,32.0,6.5,10.0,15.3,21.0,32.0,6.5,10.0,14.0,21.0,30.0,6.5,10.0,14.43,21.0,32.0,6.5,10.0,14.0,21.0,32.0,6.5,10.0,14.0,21.0,30.0,6.5,10.0,14.0,21.0,30.0
NOT_ELIGIBLE,10.0,16.3,22.0,24.4,32.0,8.4,14.0,20.0,24.0,32.0,8.0,14.0,18.0,22.5,30.0,7.5,12.5,16.0,23.0,32.0,7.5,10.0,16.0,22.0,32.0,7.0,10.0,15.0,22.0,32.0,6.55,10.0,15.0,22.0,32.0,6.5,10.0,15.0,22.0,32.0


Distribution of ln_lag_import_tax, by year and regime


Unnamed: 0_level_0,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax,ln_lag_import_tax
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,2.2,2.66,3.04,3.22,3.5,2.12,2.56,2.94,3.14,3.5,2.01,2.4,2.79,3.09,3.5,2.01,2.4,2.71,3.09,3.43,2.01,2.4,2.74,3.09,3.5,2.01,2.4,2.71,3.09,3.5,2.01,2.4,2.71,3.09,3.43,2.01,2.4,2.71,3.09,3.43
NOT_ELIGIBLE,2.4,2.85,3.14,3.24,3.5,2.24,2.71,3.04,3.22,3.5,2.2,2.71,2.94,3.16,3.43,2.14,2.6,2.83,3.19,3.5,2.14,2.4,2.83,3.14,3.5,2.08,2.4,2.77,3.14,3.5,2.02,2.4,2.77,3.14,3.5,2.01,2.4,2.77,3.14,3.5


Distribution of sigma, by year and regime


Unnamed: 0_level_0,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma,sigma
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,2.77,3.39,4.67,25.03,39.28,2.77,3.39,4.67,25.03,39.28,2.77,3.36,4.67,25.03,39.28,2.75,3.39,4.67,25.03,39.28,2.75,3.36,4.67,15.09,33.55,2.75,3.33,4.67,25.03,33.55,2.75,3.33,4.67,25.03,33.55,2.75,3.33,4.67,25.03,33.55
NOT_ELIGIBLE,3.02,3.65,4.67,33.55,39.28,3.02,3.65,4.67,33.55,39.28,3.07,3.67,4.67,33.55,39.28,3.02,3.67,4.67,33.55,39.28,2.77,3.36,4.67,25.03,39.28,2.75,3.33,4.67,25.03,39.28,2.77,3.36,4.67,25.03,39.28,2.75,3.33,4.67,33.55,39.28


Distribution of sigma_price, by year and regime


Unnamed: 0_level_0,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price,sigma_price
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,-0.72,2.67,7.19,25.94,128.0,-0.19,3.09,7.76,27.5,130.0,0.22,3.52,8.33,28.37,132.0,0.75,4.12,9.28,30.5,152.0,0.77,4.27,9.22,29.0,131.0,1.86,5.27,10.39,31.19,158.0,1.87,5.34,10.35,31.56,156.0,2.14,5.64,10.74,32.5,165.0
NOT_ELIGIBLE,-0.49,1.91,6.51,31.09,116.0,-0.35,2.19,7.27,32.0,121.5,-0.12,2.42,7.66,33.25,120.0,0.12,2.8,8.28,37.0,133.6,0.66,3.71,9.03,31.19,150.0,1.43,4.99,11.0,38.44,216.0,1.58,4.7,10.5,40.0,200.0,2.01,5.39,11.37,45.0,216.0


Distribution of y, by year and regime


Unnamed: 0_level_0,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,y
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,8.2,11.32,15.0,29.81,132.0,8.68,11.73,15.43,31.31,137.0,9.04,12.05,15.82,32.0,140.0,9.6,12.6,16.59,34.31,160.0,9.7,12.79,16.68,32.06,137.0,10.82,13.76,17.71,34.75,161.0,10.52,13.55,17.49,34.75,160.0,10.82,13.87,17.87,36.19,171.0
NOT_ELIGIBLE,8.37,11.49,15.56,39.0,125.0,8.57,11.77,16.13,40.5,128.0,8.68,11.9,16.26,41.0,130.0,8.94,12.26,16.8,45.0,142.0,8.97,12.14,16.4,35.0,156.0,9.86,13.45,18.05,43.0,220.0,9.65,12.87,17.33,46.0,206.0,10.05,13.41,17.96,51.25,220.0


Distribution of prediction, by year and regime


Unnamed: 0_level_0,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction,prediction
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,8.24,10.13,13.47,30.5,108.0,8.91,10.74,14.14,31.5,109.0,9.4,11.24,14.71,31.75,112.0,10.24,12.04,15.69,33.75,132.0,10.36,12.2,15.65,31.87,108.75,11.79,13.61,17.2,33.5,128.0,11.52,13.36,16.87,34.0,112.0,12.01,13.81,17.39,35.75,133.0
NOT_ELIGIBLE,8.57,10.83,14.59,50.0,134.0,9.24,11.5,15.25,53.0,161.62,9.77,12.12,15.77,51.5,144.0,10.54,12.81,16.65,65.54,164.12,10.34,12.27,15.68,41.06,145.0,11.84,13.87,17.58,44.5,200.96,11.56,13.47,17.02,47.0,200.0,12.12,14.03,17.75,51.75,200.75


Distribution of residual, by year and regime


Unnamed: 0_level_0,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual,residual
year,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004,2005,2005,2005,2005,2005,2006,2006,2006,2006,2006,2007,2007,2007,2007,2007,2008,2008,2008,2008,2008,2009,2009,2009,2009,2009,2010,2010,2010,2010,2010
pct,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99,0.25,0.5,0.75,0.95,0.99
regime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3
ELIGIBLE,-1.5,1.0,3.12,7.14,24.0,-1.65,0.75,2.84,6.97,26.5,-1.77,0.58,2.67,6.92,28.0,-2.0,0.29,2.39,7.11,33.5,-1.96,0.35,2.4,6.61,28.0,-2.23,-0.11,1.87,6.44,32.5,-2.27,-0.09,1.98,6.94,37.0,-2.45,-0.27,1.83,7.0,40.0
NOT_ELIGIBLE,-2.7,0.39,3.01,7.47,25.5,-3.0,0.06,2.74,7.58,30.0,-3.52,-0.42,2.26,7.09,32.0,-4.0,-0.86,1.78,6.78,32.0,-2.83,-0.22,2.12,7.2,33.0,-3.3,-0.65,1.76,8.0,44.5,-3.38,-0.8,1.66,8.53,48.0,-3.57,-0.91,1.61,9.19,52.0


# Generation report

In [29]:
import os, time, shutil, urllib, ipykernel, json
from pathlib import Path
from notebook import notebookapp

In [30]:
def create_report(extension = "html", keep_code = False):
    """
    Create a report from the current notebook and save it in the 
    Report folder (Parent-> child directory)
    
    1. Exctract the current notbook name
    2. Convert the Notebook 
    3. Move the newly created report
    
    Args:
    extension: string. Can be "html", "pdf", "md"
    
    
    """
    
    ### Get notebook name
    connection_file = os.path.basename(ipykernel.get_connection_file())
    kernel_id = connection_file.split('-', 1)[0].split('.')[0]

    for srv in notebookapp.list_running_servers():
        try:
            if srv['token']=='' and not srv['password']:  
                req = urllib.request.urlopen(srv['url']+'api/sessions')
            else:
                req = urllib.request.urlopen(srv['url']+ \
                                             'api/sessions?token=' + \
                                             srv['token'])
            sessions = json.load(req)
            notebookname = sessions[0]['name']
        except:
            pass  
    
    sep = '.'
    path = os.getcwd()
    #parent_path = str(Path(path).parent)
    
    ### Path report
    #path_report = "{}/Reports".format(parent_path)
    #path_report = "{}/Reports".format(path)
    
    ### Path destination
    name_no_extension = notebookname.split(sep, 1)[0]
    source_to_move = name_no_extension +'.{}'.format(extension)
    dest = os.path.join(path,'Reports', source_to_move)
    
    ### Generate notebook
    if keep_code:
        os.system('jupyter nbconvert --to {} {}'.format(
    extension,notebookname))
    else:
        os.system('jupyter nbconvert --no-input --to {} {}'.format(
    extension,notebookname))
    
    ### Move notebook to report folder
    #time.sleep(5)
    shutil.move(source_to_move, dest)
    print("Report Available at this adress:\n {}".format(dest))

In [31]:
create_report(extension = "html")

Report Available at this adress:
 /Users/thomas/Google Drive/Projects/GitHub/Repositories/VAT_rebate_quality_china/01_data_preprocessing/02_prepare_tables_model/00_POC_prepare_tables_model/Reports/01_merge_export_share_foreign_SOE_quality.html
