# Transform ASIF data by constructing financial variables raw data data to S3

# Objective(s)

## Business needs 

Transform (creating financial variables) ASIF data using Athena and save output to S3 + Glue. 

## Description

### Objective 

Construct the financial ratio variables by aggregating the data (not anymore at the firm level)

The asif_financial_ratio  has the following levels:

* year
* city
* industry

**Construction variables**

* Rescale output, fa_net, employment
* construct the following ratio:
    * If possible compute by:
      1. industry level
      2. city-industry level
      3. city-industry-year level
  * Working capital = Current Assets - Current Liabilities
  * Asset Tangibility
  * Current Ratio: 
    * Cash = non-cash assets -  total current assets
      * non-cash assets = short-term investments, accounts receivable, inventory and supplies 
  * Liabilities/Assets (Total-Debt-to-Total-Assets)
  * Sales/Assets
  * Return on Asset
* Fixed effect:
  * city-industry
  * year-industry
  * city-year

**Steps** 

We will clean the table by doing the following steps:

1. Compute the financial ratio by aggregating the data

**Cautious**

* Make sure there is no duplicates when merging ratio from different level

**Target**

* The file is saved in S3: 
  * bucket: datalake-datascience 
  * path: DATA/ECON/FIRM_SURVEY/ASIF_CHINA/TRANSFORMED/FINANCIAL_RATIO 
* Glue data catalog should be updated
  * database: firms_survey 
  * table prefix: asif_city_industry 
    * table name (prefix + last folder S3 path): asif_city_industry_financial_ratio 

# Metadata

* Key: spr04tlko02392a
* Parent key (for update parent):  
* Notebook US Parent (i.e the one to update): 
* https://github.com/thomaspernet/Financial_dependency_pollution/blob/master/01_data_preprocessing/02_transform_tables/00_asif_financial_ratio.md
* Epic: Epic 2
* US: US 1
* Date Begin: 11/23/2020
* Duration Task: 1
* Description: Transform (creating financial variables) ASIF data using Athena and save output to S3 + Glue. 
* Step type: Transform table
* Status: Active
* Source URL: Create Task and Epics
* Task type: Jupyter Notebook
* Users: Thomas Pernet
* Watchers: Thomas Pernet
* User Account: https://468786073381.signin.aws.amazon.com/console
* Estimated Log points: 10
* Task tag: #athena,#glue,#crawler,#financial-ratio
* Toggl Tag: #data-transformation
* current nb commits: 
 * Meetings:  
* Presentation:  
* Email Information:  
  * thread: Number of threads: 0(Default 0, to avoid display email)
  *  

# Input Cloud Storage [AWS/GCP]

## Table/file

* Origin: 
* Athena
* Name: 
* asif_firms_prepared
* Github: 
  * https://github.com/thomaspernet/Financial_dependency_pollution/blob/master/01_data_preprocessing/01_prepare_tables/00_prepare_asif.md

# Destination Output/Delivery

## Table/file

* Origin: 
* S3
* Athena
* Name:
* DATA/ECON/FIRM_SURVEY/ASIF_CHINA/TRANSFORMED/FINANCIAL_RATIO
* asif_city_industry_financial_ratio
* GitHub:
* https://github.com/thomaspernet/Financial_dependency_pollution/blob/master/01_data_preprocessing/02_transform_tables/00_asif_financial_ratio.md
* URL: 
  * datalake-datascience/DATA/ECON/FIRM_SURVEY/ASIF_CHINA/TRANSFORMED/FINANCIAL_RATIO
* 

# Knowledge

## List of candidates

* [List of financial ratios that can be computed with ASIF panel data](https://roamresearch.com/#/app/thomas_db/page/PS3o9Z3VA)

In [1]:
from awsPy.aws_authorization import aws_connector
from awsPy.aws_s3 import service_s3
from awsPy.aws_glue import service_glue
from pathlib import Path
import pandas as pd
import numpy as np
import seaborn as sns
import os, shutil, json

path = os.getcwd()
parent_path = str(Path(path).parent.parent)


name_credential = 'financial_dep_SO2_accessKeys.csv'
region = 'eu-west-3'
bucket = 'datalake-datascience'
path_cred = "{0}/creds/{1}".format(parent_path, name_credential)

In [2]:
con = aws_connector.aws_instantiate(credential = path_cred,
                                       region = region)
client= con.client_boto()
s3 = service_s3.connect_S3(client = client,
                      bucket = bucket, verbose = True) 
glue = service_glue.connect_glue(client = client) 

In [3]:
pandas_setting = True
if pandas_setting:
    cm = sns.light_palette("green", as_cmap=True)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_colwidth', None)

# Prepare query 

Write query and save the CSV back in the S3 bucket `datalake-datascience` 

# Steps

Detail computation:

1. `working capital`:
    - Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)]
2. `Asset Tangibility: 
    - Total fixed assets [固定资产合计 (c85)] - Intangible assets [无形资产 (c91)]
3. `Current Ratio`:
    - Current asset [cuasset] / Current liabilities [c95]
4. `Cash/Assets`:
    - non-cash assets -  total current assets / non-cash assets
        - Cash [( 其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)) - cuasset)] /  Assets [其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)]
5. `Liabilities/Assets` (Total-Debt-to-Total-Assets)
    - (Total current liabilities + Total long-term liabilities)/ Total assets
        - Liabilities [(流动负债合计 (c95) + 长期负债合计 (c97))] /  Total assets [资产总计318 (c93)]
        - Total Liabilities [负债合计 (c98)]  /  Total assets [资产总计318 (c93)]
6. `Sales/Assets`:
    - Total annual revenue [全年营业收入合计 (c64) ] / ($\Delta$ Total assets 318 [$\Delta$ 资产总计318 (c98)]/2)
7. `Return on Asset`
    - (Total annual revenue - Income tax payable) [(全年营业收入合计 (c64) - 应交所得税 (c134))] / Total assets [资产总计318 (c98)]
    
    
**pct missing**

![](https://drive.google.com/uc?export=view&id=1LPNhZIPkJgx0-ZsM6NLNAB6dGH9h7ELo)

## Example step by step



In [5]:
DatabaseName = 'firms_survey'
s3_output_example = 'SQL_OUTPUT_ATHENA'

1. Add consistent city code

There is a need to remove the duplicates in `china_city_code_normalised` because it is possible to have the same code but different Chinese name link Chongqing

In [6]:
query = """
SELECT *
FROM chinese_lookup.china_city_code_normalised 
WHERE extra_code = '5001'
"""
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_1'
                )
output

Unnamed: 0,extra_code,geocode4_corr,citycn,cityen,citycn_correct,cityen_correct,province_cn,province_en
0,5001,5001,重庆市,Chongqing,重庆,Chongqing,重庆市,Chongqing
1,5001,5001,重庆,Chongqing,重庆,Chongqing,重庆市,Chongqing


In [7]:
query = """
WITH test AS (
SELECT firm, year, citycode, geocode4_corr, cic
  FROM firms_survey.asif_firms_prepared 
INNER JOIN 
  (
  SELECT extra_code, geocode4_corr
  FROM chinese_lookup.china_city_code_normalised 
  GROUP BY extra_code, geocode4_corr
  ) as no_dup_citycode
ON asif_firms_prepared.citycode = no_dup_citycode.extra_code
  )
  SELECT CNT, COUNT(*) 
  FROM(
  SELECT firm, year, geocode4_corr, cic, COUNT(*) AS CNT
  FROM test
  GROUP BY firm, year, geocode4_corr, cic
    )
    GROUP BY CNT
"""
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_1'
                )
output

Unnamed: 0,CNT,_col1
0,1,2088300


Make sure the output is the same before and after the use of city consistent code

In [9]:
query = """
WITH test AS (
SELECT 
  year, 
  geocode4_corr,
  cic, 
  SUM(output) as sum_output,
  SUM(c81) + SUM(c80) - SUM(c96) AS working_capital_cit, 
  SUM(c85) - SUM(c91) AS asset_tangibility_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(
    CAST(
      SUM(c95) AS DECIMAL(16, 5)
    ), 
    0
  ) AS current_ratio_cit, 
  CAST(
    (SUM(c79) + SUM(c80) + SUM(c81)) - SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(CAST(
    SUM(c93) AS DECIMAL(16, 5)
  ), 
    0
  ) AS cash_assets_cit, 
  CAST(
    SUM(c95) + SUM(c97) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c93) AS DECIMAL(16, 5)
    ), 
    0
  ) AS liabilities_assets_cit, 
  CAST(
    SUM(c64) - SUM(c134) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c98) AS DECIMAL(16, 5)
    ), 
    0
  ) AS return_on_asset_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      (
        SUM(c98) - lag(
          SUM(c98), 
          1
        ) over(
          partition by geocode4_corr, 
          cic 
          order by 
            geocode4_corr, 
            cic, 
            year
        )
      )/ 2 AS DECIMAL(16, 5)
    ), 
    0
  ) AS sales_assets_cit 
FROM firms_survey.asif_firms_prepared 
INNER JOIN 
  (
  SELECT extra_code, geocode4_corr
  FROM chinese_lookup.china_city_code_normalised 
  GROUP BY extra_code, geocode4_corr
  ) as no_dup_citycode
  
ON asif_firms_prepared.citycode = no_dup_citycode.extra_code
GROUP BY 
  geocode4_corr, 
  cic, 
  year 
)
SELECT SUM(sum_output) as sum_output
FROM test

"""
output_1 = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_1'
                )
output_1

Unnamed: 0,sum_output
0,166657829347


In [11]:
query = """
SELECT SUM(output) as sum_output
FROM firms_survey.asif_firms_prepared 

"""
output_2 = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_1'
                )
output_2

Unnamed: 0,sum_output
0,170680191587


In [12]:
output_1 > output_2

Unnamed: 0,sum_output
0,False


2. Computation ratio by city-industry-year

In [15]:
query = """

SELECT 
  year, 
  geocode4_corr,
  cic, 
  SUM(c81) + SUM(c80) - SUM(c96) AS working_capital_cit, 
  SUM(c85) - SUM(c91) AS asset_tangibility_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(
    CAST(
      SUM(c95) AS DECIMAL(16, 5)
    ), 
    0
  ) AS current_ratio_cit, 
  CAST(
    (SUM(c79) + SUM(c80) + SUM(c81)) - SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(CAST(
    SUM(c93) AS DECIMAL(16, 5)
  ), 
    0
  ) AS cash_assets_cit, 
  CAST(
    SUM(c95) + SUM(c97) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c93) AS DECIMAL(16, 5)
    ), 
    0
  ) AS liabilities_assets_cit, 
  CAST(
    SUM(c64) - SUM(c134) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c98) AS DECIMAL(16, 5)
    ), 
    0
  ) AS return_on_asset_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      (
        SUM(c98) - lag(
          SUM(c98), 
          1
        ) over(
          partition by geocode4_corr, 
          cic 
          order by 
            geocode4_corr, 
            cic, 
            year
        )
      )/ 2 AS DECIMAL(16, 5)
    ), 
    0
  ) AS sales_assets_cit 
FROM firms_survey.asif_firms_prepared 
INNER JOIN 
  (
  SELECT extra_code, geocode4_corr
  FROM chinese_lookup.china_city_code_normalised 
  GROUP BY extra_code, geocode4_corr
  ) as no_dup_citycode
  
ON asif_firms_prepared.citycode = no_dup_citycode.extra_code
WHERE year in ('2001', '2002', '2003', '2004', '2005', '2006', '2007') 
GROUP BY 
  geocode4_corr, 
  cic, 
  year 
LIMIT 
  10

"""
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_1'
                )
output

Unnamed: 0,year,geocode4_corr,cic,working_capital_cit,asset_tangibility_cit,current_ratio_cit,cash_assets_cit,liabilities_assets_cit,return_on_asset_cit,sales_assets_cit
0,2001,1101,1311,,15001.0,1.83709,,,,
1,2001,1101,1351,,497.0,1.28746,,,,
2,2002,1101,1351,,542.0,0.85517,,,,-0.1638
3,2003,1101,1351,,,0.86504,,,,1.51412
4,2004,1101,1351,237751.0,,0.90487,,,2.1785,3.93178
5,2005,1101,1351,210146.0,,0.95418,-0.21929,0.66754,2.70258,13.64152
6,2006,1101,1351,246596.0,,0.80885,-0.23367,0.68175,2.71386,5.44698
7,2007,1101,1351,246085.0,,0.62528,-0.22591,0.75284,3.18626,4.84
8,2000,1101,1512,,186351.0,0.90246,,0.58692,,
9,2001,1101,1512,,266860.0,1.31703,,,,15.22591


2. Computation ratio by city-industry

As an average over year 2002 to 2005

In [18]:
query = """
WITH ratio AS (
SELECT 
  year, 
  geocode4_corr,
  cic, 
  SUM(c81) + SUM(c80) - SUM(c96) AS working_capital_cit, 
  SUM(c85) - SUM(c91) AS asset_tangibility_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(
    CAST(
      SUM(c95) AS DECIMAL(16, 5)
    ), 
    0
  ) AS current_ratio_cit, 
  CAST(
    (SUM(c79) + SUM(c80) + SUM(c81)) - SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(CAST(
    SUM(c93) AS DECIMAL(16, 5)
  ), 
    0
  ) AS cash_assets_cit, 
  CAST(
    SUM(c95) + SUM(c97) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c93) AS DECIMAL(16, 5)
    ), 
    0
  ) AS liabilities_assets_cit, 
  CAST(
    SUM(c64) - SUM(c134) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c98) AS DECIMAL(16, 5)
    ), 
    0
  ) AS return_on_asset_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      (
        SUM(c98) - lag(
          SUM(c98), 
          1
        ) over(
          partition by geocode4_corr, 
          cic 
          order by 
            geocode4_corr, 
            cic, 
            year
        )
      )/ 2 AS DECIMAL(16, 5)
    ), 
    0
  ) AS sales_assets_cit 
FROM firms_survey.asif_firms_prepared 
INNER JOIN 
  (
  SELECT extra_code, geocode4_corr
  FROM chinese_lookup.china_city_code_normalised 
  GROUP BY extra_code, geocode4_corr
  ) as no_dup_citycode
  
ON asif_firms_prepared.citycode = no_dup_citycode.extra_code
WHERE year in ('2001', '2002', '2003', '2004', '2005') 
GROUP BY 
  geocode4_corr, 
  cic, 
  year 
  )
  SELECT
  geocode4_corr, 
  cic,
  AVG(working_capital_cit) AS working_capital_ci,
  AVG(asset_tangibility_cit) AS asset_tangibility_ci,
  AVG(current_ratio_cit) AS current_ratio_ci,
  AVG(cash_assets_cit) AS cash_assets_ci,
  AVG(liabilities_assets_cit) AS liabilities_assets_ci,
  AVG(return_on_asset_cit) AS return_on_asset_ci,
  AVG(sales_assets_cit) AS sales_assets_ci
  FROM ratio
  GROUP BY geocode4_corr, cic
  LIMIT 10
"""
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_2'
                )
output

Unnamed: 0,geocode4_corr,cic,working_capital_ci,asset_tangibility_ci,current_ratio_ci,cash_assets_ci,liabilities_assets_ci,return_on_asset_ci,sales_assets_ci
0,1101,1415,,6432.0,1.53949,,0.57835,,12.37887
1,1101,1493,-31021.5,,0.09502,-0.01881,0.74422,0.64927,-1.33252
2,1101,1513,,2859962.0,1.53131,,0.21446,,8.5062
3,1101,1789,,72.0,5.42152,,,,
4,1101,1925,,3571.333,1.75451,,0.19324,,202.58851
5,1101,2631,77131.5,,1.09402,-0.23297,0.62844,1.25175,2.56717
6,1101,3050,51031.75,405174.0,0.99917,-0.08008,0.61953,1.48202,429.48911
7,1101,3070,181565.75,108511.3,1.0921,-0.2185,0.63928,1.19696,-12.75603
8,1101,3148,149911.0,79041.33,1.11265,-0.17345,0.57475,1.22193,6.65139
9,1101,3155,,1989.0,0.45908,,0.55266,,4.08998


3. Computation ratio by industry

As an average over year 2002 to 2005

In [19]:
query = """
WITH ratio AS (
SELECT 
  year, 
  geocode4_corr,
  cic, 
  SUM(c81) + SUM(c80) - SUM(c96) AS working_capital_cit, 
  SUM(c85) - SUM(c91) AS asset_tangibility_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(
    CAST(
      SUM(c95) AS DECIMAL(16, 5)
    ), 
    0
  ) AS current_ratio_cit, 
  CAST(
    (SUM(c79) + SUM(c80) + SUM(c81)) - SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(CAST(
    SUM(c93) AS DECIMAL(16, 5)
  ), 
    0
  ) AS cash_assets_cit, 
  CAST(
    SUM(c95) + SUM(c97) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c93) AS DECIMAL(16, 5)
    ), 
    0
  ) AS liabilities_assets_cit, 
  CAST(
    SUM(c64) - SUM(c134) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c98) AS DECIMAL(16, 5)
    ), 
    0
  ) AS return_on_asset_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      (
        SUM(c98) - lag(
          SUM(c98), 
          1
        ) over(
          partition by geocode4_corr, 
          cic 
          order by 
            geocode4_corr, 
            cic, 
            year
        )
      )/ 2 AS DECIMAL(16, 5)
    ), 
    0
  ) AS sales_assets_cit 
FROM firms_survey.asif_firms_prepared 
INNER JOIN 
  (
  SELECT extra_code, geocode4_corr
  FROM chinese_lookup.china_city_code_normalised 
  GROUP BY extra_code, geocode4_corr
  ) as no_dup_citycode
  
ON asif_firms_prepared.citycode = no_dup_citycode.extra_code
WHERE year in ('2001', '2002', '2003', '2004', '2005') 
GROUP BY 
  geocode4_corr, 
  cic, 
  year 
  )
  SELECT
  cic,
  AVG(working_capital_cit) AS working_capital_i,
  AVG(asset_tangibility_cit) AS asset_tangibility_i,
  AVG(current_ratio_cit) AS current_ratio_i,
  AVG(cash_assets_cit) AS cash_assets_i,
  AVG(liabilities_assets_cit) AS liabilities_assets_i,
  AVG(return_on_asset_cit) AS return_on_asset_i,
  AVG(sales_assets_cit) AS sales_assets_it
  FROM ratio
  GROUP BY cic
  LIMIT 10
"""
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'example_3'
                )
output

Unnamed: 0,cic,working_capital_i,asset_tangibility_i,current_ratio_i,cash_assets_i,liabilities_assets_i,return_on_asset_i,sales_assets_it
0,3622,38343.150259,31409.538462,1.48731,-0.20733,0.58717,5.62412,12.27174
1,3929,95914.903614,,1.86058,-0.22818,0.55598,4.02962,-111.28421
2,2210,23945.919118,108385.191011,2.60178,-0.17457,0.48103,10.66836,5.90217
3,3632,16745.189815,13620.280632,1.54545,-0.1909,0.58318,8.22095,6.58961
4,4500,18030.867044,,84.60114,-0.23902,0.56558,1.85657,-5.2246
5,1498,,31947.697436,1.12852,,,,2.5114
6,1730,64584.009375,,1.79452,-0.16932,0.61261,5.05479,-0.5912
7,3487,,42851.627907,1.27867,,,,-6.139
8,4171,,253280.430769,1.27952,,,,1.22531
9,2770,31258.05,,2.54675,-0.18105,0.49835,8.27916,-11.32556


# Table `asif_city_industry_financial_ratio`


Since the table to create has missing value, please use the following at the top of the query

```
CREATE TABLE database.table_name WITH (format = 'PARQUET') AS
```


Choose a location in S3 to save the CSV. It is recommended to save in it the `datalake-datascience` bucket. Locate an appropriate folder in the bucket, and make sure all output have the same format

First, we need to delete the table (if exist)

In [20]:
table_name = 'asif_city_industry_financial_ratio'
s3_output = 'DATA/ECON/FIRM_SURVEY/ASIF_CHINA/TRANSFORMED/FINANCIAL_RATIO'

In [21]:
try:
    response = glue.delete_table(
        database=DatabaseName,
        table=table_name
    )
    print(response)
except Exception as e:
    print(e)

{'ResponseMetadata': {'RequestId': 'dde70c08-cf55-4164-94c8-66f56489d9b6', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Wed, 25 Nov 2020 12:26:59 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': 'dde70c08-cf55-4164-94c8-66f56489d9b6'}, 'RetryAttempts': 0}}


Clean up the folder with the previous csv file. Be careful, it will erase all files inside the folder

In [22]:
s3.remove_all_bucket(path_remove = s3_output)

True

In [24]:
%%time
query = """
CREATE TABLE {0}.{1} WITH (format = 'PARQUET') AS

WITH ratio AS (
SELECT 
  year, 
  geocode4_corr,
  cic, 
  SUM(c81) + SUM(c80) - SUM(c96) AS working_capital_cit, 
  SUM(c85) - SUM(c91) AS asset_tangibility_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(
    CAST(
      SUM(c95) AS DECIMAL(16, 5)
    ), 
    0
  ) AS current_ratio_cit, 
  CAST(
    (SUM(c79) + SUM(c80) + SUM(c81)) - SUM(cuasset) AS DECIMAL(16, 5)
  ) / NULLIF(CAST(
    SUM(c93) AS DECIMAL(16, 5)
  ), 
    0
  ) AS cash_assets_cit, 
  CAST(
    SUM(c95) + SUM(c97) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c93) AS DECIMAL(16, 5)
    ), 
    0
  ) AS liabilities_assets_cit, 
  CAST(
    SUM(c64) - SUM(c134) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      SUM(c98) AS DECIMAL(16, 5)
    ), 
    0
  ) AS return_on_asset_cit, 
  CAST(
    SUM(cuasset) AS DECIMAL(16, 5)
  )/ NULLIF(
    CAST(
      (
        SUM(c98) - lag(
          SUM(c98), 
          1
        ) over(
          partition by geocode4_corr, 
          cic 
          order by 
            geocode4_corr, 
            cic, 
            year
        )
      )/ 2 AS DECIMAL(16, 5)
    ), 
    0
  ) AS sales_assets_cit 
FROM firms_survey.asif_firms_prepared 
INNER JOIN 
  (
  SELECT extra_code, geocode4_corr
  FROM chinese_lookup.china_city_code_normalised 
  GROUP BY extra_code, geocode4_corr
  ) as no_dup_citycode
  
ON asif_firms_prepared.citycode = no_dup_citycode.extra_code
WHERE year in ('2001', '2002', '2003', '2004', '2005', '2006', '2007') 
GROUP BY 
  geocode4_corr, 
  cic, 
  year 
) 
SELECT 
  ratio.geocode4_corr, 
  ratio.cic, 
  ratio.year,
  working_capital_cit, 
  working_capital_ci, 
  working_capital_i, 
  asset_tangibility_cit, 
  asset_tangibility_ci, 
  asset_tangibility_i, 
  current_ratio_cit, 
  current_ratio_ci, 
  current_ratio_i, 
  cash_assets_cit, 
  cash_assets_ci, 
  cash_assets_i, 
  liabilities_assets_cit,
  liabilities_assets_ci, 
  liabilities_assets_i, 
  return_on_asset_cit, 
  return_on_asset_ci, 
  return_on_asset_i, 
  sales_assets_cit,
  sales_assets_ci,
  sales_assets_i
  FROM ratio
  LEFT JOIN (
    SELECT
  geocode4_corr, 
  cic,
  AVG(working_capital_cit) AS working_capital_ci,
  AVG(asset_tangibility_cit) AS asset_tangibility_ci,
  AVG(current_ratio_cit) AS current_ratio_ci,
  AVG(cash_assets_cit) AS cash_assets_ci,
  AVG(liabilities_assets_cit) AS liabilities_assets_ci,
  AVG(return_on_asset_cit) AS return_on_asset_ci,
  AVG(sales_assets_cit) AS sales_assets_ci
  FROM ratio
  WHERE year in ('2001', '2002', '2003', '2004', '2005') 
  GROUP BY geocode4_corr, cic
  
    ) as ratio_ci
    ON ratio.geocode4_corr = ratio_ci.geocode4_corr AND
    ratio.cic = ratio_ci.cic
  LEFT JOIN (
    SELECT
  cic,
  AVG(working_capital_cit) AS working_capital_i,
  AVG(asset_tangibility_cit) AS asset_tangibility_i,
  AVG(current_ratio_cit) AS current_ratio_i,
  AVG(cash_assets_cit) AS cash_assets_i,
  AVG(liabilities_assets_cit) AS liabilities_assets_i,
  AVG(return_on_asset_cit) AS return_on_asset_i,
  AVG(sales_assets_cit) AS sales_assets_i
  FROM ratio
  WHERE year in ('2001', '2002', '2003', '2004', '2005') 
  GROUP BY cic
    ) as ratio_i
    ON ratio.cic = ratio_i.cic    
""".format(DatabaseName, table_name)
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output,
                )
output

CPU times: user 1.12 s, sys: 98.1 ms, total: 1.21 s
Wall time: 8.97 s


{'Results': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2020, 11, 25, 13, 28, 48, 486000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2020, 11, 25, 13, 28, 57, 285000, tzinfo=tzlocal())},
 'QueryID': '1752a662-b577-4502-b7aa-82c8a659f1f5'}

In [25]:
query = """
SELECT COUNT(*) AS CNT
FROM {}.{} 
""".format(DatabaseName, table_name)
output = s3.run_query(
                    query=query,
                    database=DatabaseName,
                    s3_output=s3_output_example,
    filename = 'count_{}'.format(table_name)
                )
output

Unnamed: 0,CNT
0,285981


# Validate query

This step is mandatory to validate the query in the ETL. If you are not sure about the quality of the query, go to the next step.

To validate the query, please fillin the json below. Don't forget to change the schema so that the crawler can use it.

1. Add a partition key:
    - Inform if there is group in the table so that, the parser can compute duplicate
2. Add the steps number -> Not automtic yet. Start at 0
3. Change the schema if needed. It is highly recommanded to add comment to the fields
4. Provide a description -> detail the steps 

1. Add a partition key

In [27]:
partition_keys = ["geocode4_corr", "cic", "year"]

2. Add the steps number

In [28]:
step = 0

3. Change the schema

Bear in mind that CSV SerDe (OpenCSVSerDe) does not support empty fields in columns defined as a numeric data type. All columns with missing values should be saved as string. 

In [29]:
glue.get_table_information(
    database=DatabaseName,
    table=table_name)['Table']['StorageDescriptor']['Columns']

[{'Name': 'geocode4_corr', 'Type': 'string', 'Comment': ''},
 {'Name': 'cic', 'Type': 'string', 'Comment': ''},
 {'Name': 'year', 'Type': 'string', 'Comment': ''},
 {'Name': 'working_capital_cit', 'Type': 'bigint', 'Comment': ''},
 {'Name': 'working_capital_ci', 'Type': 'double', 'Comment': ''},
 {'Name': 'working_capital_i', 'Type': 'double', 'Comment': ''},
 {'Name': 'asset_tangibility_cit', 'Type': 'bigint', 'Comment': ''},
 {'Name': 'asset_tangibility_ci', 'Type': 'double', 'Comment': ''},
 {'Name': 'asset_tangibility_i', 'Type': 'double', 'Comment': ''},
 {'Name': 'current_ratio_cit', 'Type': 'decimal(21,5)', 'Comment': ''},
 {'Name': 'current_ratio_ci', 'Type': 'decimal(21,5)', 'Comment': ''},
 {'Name': 'current_ratio_i', 'Type': 'decimal(21,5)', 'Comment': ''},
 {'Name': 'cash_assets_cit', 'Type': 'decimal(21,5)', 'Comment': ''},
 {'Name': 'cash_assets_ci', 'Type': 'decimal(21,5)', 'Comment': ''},
 {'Name': 'cash_assets_i', 'Type': 'decimal(21,5)', 'Comment': ''},
 {'Name': 'lia

In [30]:
schema = [{'Name': 'geocode4_corr', 'Type': 'string', 'Comment': ''},
          {'Name': 'cic', 'Type': 'string', 'Comment': ''},
          {'Name': 'year', 'Type': 'string', 'Comment': ''},
          {'Name': 'working_capital_cit', 'Type': 'bigint', 'Comment': 'Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)] city industry year'},
          {'Name': 'working_capital_ci', 'Type': 'double', 'Comment': 'Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)] city industry'},
          {'Name': 'working_capital_i', 'Type': 'double', 'Comment': 'Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)] industry'},
          {'Name': 'asset_tangibility_cit', 'Type': 'bigint', 'Comment': 'Total fixed assets [固定资产合计 (c85)] - Intangible assets [无形资产 (c91)] city industry year'},
          {'Name': 'asset_tangibility_ci', 'Type': 'double', 'Comment': 'Total fixed assets [固定资产合计 (c85)] - Intangible assets [无形资产 (c91)] city industry'},
          {'Name': 'asset_tangibility_i', 'Type': 'double', 'Comment': 'Total fixed assets [固定资产合计 (c85)] - Intangible assets [无形资产 (c91)] industry'},
          {'Name': 'current_ratio_cit',
              'Type': 'decimal(21,5)', 'Comment': 'Current asset [cuasset] / Current liabilities [c95]  city industry year'},
          {'Name': 'current_ratio_ci', 'Type': 'decimal(21,5)', 'Comment': 'Current asset [cuasset] / Current liabilities [c95] city industry'},
          {'Name': 'current_ratio_i', 'Type': 'decimal(21,5)', 'Comment': 'Current asset [cuasset] / Current liabilities [c95] industry'},
          {'Name': 'cash_assets_cit', 'Type': 'decimal(21,5)', 'Comment': 'Cash [( 其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)) - cuasset)] /  Assets [其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)]  city industry year'},
          {'Name': 'cash_assets_ci', 'Type': 'decimal(21,5)', 'Comment': 'Cash [( 其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)) - cuasset)] /  Assets [其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)] city industry'},
          {'Name': 'cash_assets_i', 'Type': 'decimal(21,5)', 'Comment': 'Cash [( 其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)) - cuasset)] /  Assets [其中：短期投资 (c79) + 应收帐款 (c80) + 存货 (c81)] industry'},
          {'Name': 'liabilities_assets_cit',
              'Type': 'decimal(21,5)', 'Comment': 'Liabilities [(流动负债合计 (c95) + 长期负债合计 (c97))] /  Total assets [资产总计318 (c93)]  city industry year'},
          {'Name': 'liabilities_assets_ci',
              'Type': 'decimal(21,5)', 'Comment': 'Liabilities [(流动负债合计 (c95) + 长期负债合计 (c97))] /  Total assets [资产总计318 (c93)] city industry'},
          {'Name': 'liabilities_assets_i',
              'Type': 'decimal(21,5)', 'Comment': 'Liabilities [(流动负债合计 (c95) + 长期负债合计 (c97))] /  Total assets [资产总计318 (c93)] industry'},
          {'Name': 'return_on_asset_cit',
              'Type': 'decimal(21,5)', 'Comment': 'Total annual revenue [全年营业收入合计 (c64) ] / (Delta Total assets 318 [$\Delta$ 资产总计318 (c98)]/2)  city industry year'},
          {'Name': 'return_on_asset_ci',
              'Type': 'decimal(21,5)', 'Comment': 'Total annual revenue [全年营业收入合计 (c64) ] / (Delta Total assets 318 [$\Delta$ 资产总计318 (c98)]/2) city industry'},
          {'Name': 'return_on_asset_i',
              'Type': 'decimal(21,5)', 'Comment': 'Total annual revenue [全年营业收入合计 (c64) ] / (Delta Total assets 318 [$\Delta$ 资产总计318 (c98)]/2) industry'},
          {'Name': 'sales_assets_cit', 'Type': 'decimal(21,5)', 'Comment': '(Total annual revenue - Income tax payable) [(全年营业收入合计 (c64) - 应交所得税 (c134))] / Total assets [资产总计318 (c98)]  city industry year'},
          {'Name': 'sales_assets_ci', 'Type': 'decimal(21,5)', 'Comment': '(Total annual revenue - Income tax payable) [(全年营业收入合计 (c64) - 应交所得税 (c134))] / Total assets [资产总计318 (c98)] city industry'},
          {'Name': 'sales_assets_i', 'Type': 'decimal(21,5)', 'Comment': '(Total annual revenue - Income tax payable) [(全年营业收入合计 (c64) - 应交所得税 (c134))] / Total assets [资产总计318 (c98)] industry'}]

4. Provide a description

In [31]:
description = """
Compute the financial ratio by city industry year, city industry and industry
"""

5. provide metadata

- DatabaseName
- TablePrefix
- 

In [32]:
DatabaseName = 'firms_survey'

In [38]:
json_etl = {
    'step': step,
    'description':description,
    'query':query,
    'schema': schema,
    'partition_keys':partition_keys,
    'metadata':{
    'DatabaseName' : DatabaseName,
    'TableName' : table_name,
    'target_S3URI' : os.path.join('s3://',bucket, s3_output),
    'from_athena': 'True'    
    }
}
json_etl

{'step': 0,
 'description': '\nCompute the financial ratio by city industry year, city industry and industry\n',
 'query': '\nSELECT COUNT(*) AS CNT\nFROM firms_survey.asif_city_industry_financial_ratio \n',
 'schema': [{'Name': 'geocode4_corr', 'Type': 'string', 'Comment': ''},
  {'Name': 'cic', 'Type': 'string', 'Comment': ''},
  {'Name': 'year', 'Type': 'string', 'Comment': ''},
  {'Name': 'working_capital_cit',
   'Type': 'bigint',
   'Comment': 'Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)] city industry year'},
  {'Name': 'working_capital_ci',
   'Type': 'double',
   'Comment': 'Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)] city industry'},
  {'Name': 'working_capital_i',
   'Type': 'double',
   'Comment': 'Inventory [存货 (c81)] + Accounts receivable [应收帐款 (c80)] - Accounts payable [应付帐款  (c96)] industry'},
  {'Name': 'asset_tangibility_cit',
   'Type': 'bigint',
   'Comment': 'Total fixed assets

In [39]:
with open(os.path.join(str(Path(path).parent), 'parameters_ETL_Financial_dependency_pollution.json')) as json_file:
    parameters = json.load(json_file)

Remove the step number from the current file (if exist)

In [40]:
index_to_remove = next(
                (
                    index
                    for (index, d) in enumerate(parameters['TABLES']['TRANSFORMATION']['STEPS'])
                    if d["step"] == step
                ),
                None,
            )
if index_to_remove != None:
    parameters['TABLES']['TRANSFORMATION']['STEPS'].pop(index_to_remove)

In [41]:
parameters['TABLES']['TRANSFORMATION']['STEPS'].append(json_etl)

Save JSON

In [42]:
with open(os.path.join(str(Path(path).parent), 'parameters_ETL_Financial_dependency_pollution.json'), "w")as outfile:
    json.dump(parameters, outfile)

# Create or update the data catalog

The query is saved in the S3 (bucket `datalake-datascience`) but the table is not available yet in the Data Catalog. Use the function `create_table_glue` to generate the table and update the catalog.

Few parameters are required:

- name_crawler: Name of the crawler
- Role: Role to temporary provide an access tho the service
- DatabaseName: Name of the database to create the table
- TablePrefix: Prefix of the table. Full name of the table will be `TablePrefix` + folder name

To update the schema, please use the following structure

```
schema = [
    {
        "Name": "VAR1",
        "Type": "",
        "Comment": ""
    },
    {
        "Name": "VAR2",
        "Type": "",
        "Comment": ""
    }
]
```

In [43]:
glue.update_schema_table(
    database = DatabaseName,
    table = table_name,
    schema= schema)

{'ResponseMetadata': {'RequestId': '6281c00f-9efe-4d3e-a3e4-fe707d87f581',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Wed, 25 Nov 2020 12:30:28 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': '6281c00f-9efe-4d3e-a3e4-fe707d87f581'},
  'RetryAttempts': 0}}

## Check Duplicates

One of the most important step when creating a table is to check if the table contains duplicates. The cell below checks if the table generated before is empty of duplicates. The code uses the JSON file to create the query parsed in Athena. 

You are required to define the group(s) that Athena will use to compute the duplicate. For instance, your table can be grouped by COL1 and COL2 (need to be string or varchar), then pass the list ['COL1', 'COL2'] 

In [44]:
partition_keys = ["geocode4_corr", "cic", "year"]

with open(os.path.join(str(Path(path).parent), 'parameters_ETL_Financial_dependency_pollution.json')) as json_file:
    parameters = json.load(json_file)

In [45]:
### COUNT DUPLICATES
if len(partition_keys) > 0:
    groups = ' , '.join(partition_keys)

    query_duplicates = parameters["ANALYSIS"]['COUNT_DUPLICATES']['query'].format(
                                DatabaseName,table_name,groups
                                )
    dup = s3.run_query(
                                query=query_duplicates,
                                database=DatabaseName,
                                s3_output="SQL_OUTPUT_ATHENA",
                                filename="duplicates_{}".format(table_name))
    display(dup)


Unnamed: 0,CNT,CNT_DUPLICATE
0,1,285981


# Analytics

In this part, we are providing basic summary statistic. Since we have created the tables, we can parse the schema in Glue and use our json file to automatically generates the analysis.

The cells below execute the job in the key `ANALYSIS`. You need to change the `primary_key` and `secondary_key` 

For a full analysis of the table, please use the following Lambda function. Be patient, it can takes between 5 to 30 minutes. Times varies according to the number of columns in your dataset.

Use the function as follow:

- `output_prefix`:  s3://datalake-datascience/ANALYTICS/OUTPUT/TABLE_NAME/
- `region`: region where the table is stored
- `bucket`: Name of the bucket
- `DatabaseName`: Name of the database
- `table_name`: Name of the table
- `group`: variables name to group to count the duplicates
- `keys`: Variable name to perform the grouping -> Only one variable for now, Variable name to perform the secondary grouping -> Only one variable for now
    - format: 'A,B'
- `proba`: Chi-square analysis probabilitity
- `y_var`: Continuous target variables

Check the job processing in Sagemaker: https://eu-west-3.console.aws.amazon.com/sagemaker/home?region=eu-west-3#/processing-jobs

The notebook is available: https://s3.console.aws.amazon.com/s3/buckets/datalake-datascience?region=eu-west-3&prefix=ANALYTICS/OUTPUT/&showversions=false

Please, download the notebook on your local machine, and convert it to HTML:

```
cd "/Users/thomas/Downloads/Notebook"
aws s3 cp s3://datalake-datascience/ANALYTICS/OUTPUT/asif_unzip_data_csv/Template_analysis_from_lambda-2020-11-22-08-12-20.ipynb .

## convert HTML no code
jupyter nbconvert --no-input --to html Template_analysis_from_lambda-2020-11-21-14-30-45.ipynb
jupyter nbconvert --to html Template_analysis_from_lambda-2020-11-22-08-12-20.ipynb
```

Then upload the HTML to: https://s3.console.aws.amazon.com/s3/buckets/datalake-datascience?region=eu-west-3&prefix=ANALYTICS/HTML_OUTPUT/

Add a new folder with the table name in upper case

In [46]:
import boto3

key, secret_ = con.load_credential()
client_lambda = boto3.client(
    'lambda',
    aws_access_key_id=key,
    aws_secret_access_key=secret_,
    region_name = region)

In [47]:
primary_key = 'year'
secondary_key = 'cic'
y_var = 'working_capital_cit'

In [48]:
payload = {
    "input_path": "s3://datalake-datascience/ANALYTICS/TEMPLATE_NOTEBOOKS/template_analysis_from_lambda.ipynb",
    "output_prefix": "s3://datalake-datascience/ANALYTICS/OUTPUT/{}/".format(table_name.upper()),
    "parameters": {
        "region": "{}".format(region),
        "bucket": "{}".format(bucket),
        "DatabaseName": "{}".format(DatabaseName),
        "table_name": "{}".format(table_name),
        "group": "{}".format(','.join(partition_keys)),
        "keys": "{},{}".format(primary_key,secondary_key),
        "y_var": "{}".format(y_var),
        "threshold":0.5
    },
}
payload

{'input_path': 's3://datalake-datascience/ANALYTICS/TEMPLATE_NOTEBOOKS/template_analysis_from_lambda.ipynb',
 'output_prefix': 's3://datalake-datascience/ANALYTICS/OUTPUT/ASIF_CITY_INDUSTRY_FINANCIAL_RATIO/',
 'parameters': {'region': 'eu-west-3',
  'bucket': 'datalake-datascience',
  'DatabaseName': 'firms_survey',
  'table_name': 'asif_city_industry_financial_ratio',
  'group': 'geocode4_corr,cic,year',
  'keys': 'year,cic',
  'y_var': 'working_capital_cit',
  'threshold': 0.5}}

In [49]:
response = client_lambda.invoke(
    FunctionName='RunNotebook',
    InvocationType='RequestResponse',
    LogType='Tail',
    Payload=json.dumps(payload),
)
response

{'ResponseMetadata': {'RequestId': '389f5aae-56d0-40f7-9555-030198e2e385',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Wed, 25 Nov 2020 12:31:22 GMT',
   'content-type': 'application/json',
   'content-length': '75',
   'connection': 'keep-alive',
   'x-amzn-requestid': '389f5aae-56d0-40f7-9555-030198e2e385',
   'x-amzn-remapped-content-length': '0',
   'x-amz-executed-version': '$LATEST',
   'x-amz-log-result': 'U1RBUlQgUmVxdWVzdElkOiAzODlmNWFhZS01NmQwLTQwZjctOTU1NS0wMzAxOThlMmUzODUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiAzODlmNWFhZS01NmQwLTQwZjctOTU1NS0wMzAxOThlMmUzODUKUkVQT1JUIFJlcXVlc3RJZDogMzg5ZjVhYWUtNTZkMC00MGY3LTk1NTUtMDMwMTk4ZTJlMzg1CUR1cmF0aW9uOiAyMTkzLjc3IG1zCUJpbGxlZCBEdXJhdGlvbjogMjIwMCBtcwlNZW1vcnkgU2l6ZTogMTI4IE1CCU1heCBNZW1vcnkgVXNlZDogODEgTUIJSW5pdCBEdXJhdGlvbjogMjczLjQwIG1zCQo=',
   'x-amzn-trace-id': 'root=1-5fbe4e97-58b7bdcd2dc98f495a329d1b;sampled=0'},
  'RetryAttempts': 0},
 'StatusCode': 200,
 'LogResult': 'U1RBUlQgUmVxdWVzdElkOiAzODlmNWFhZS01NmQwLTQwZ

# Generation report

In [50]:
import os, time, shutil, urllib, ipykernel, json
from pathlib import Path
from notebook import notebookapp

In [52]:
def create_report(extension = "html", keep_code = False):
    """
    Create a report from the current notebook and save it in the 
    Report folder (Parent-> child directory)
    
    1. Exctract the current notbook name
    2. Convert the Notebook 
    3. Move the newly created report
    
    Args:
    extension: string. Can be "html", "pdf", "md"
    
    
    """
    
    ### Get notebook name
    connection_file = os.path.basename(ipykernel.get_connection_file())
    kernel_id = connection_file.split('-', 1)[0].split('.')[0]

    for srv in notebookapp.list_running_servers():
        try:
            if srv['token']=='' and not srv['password']:  
                req = urllib.request.urlopen(srv['url']+'api/sessions')
            else:
                req = urllib.request.urlopen(srv['url']+ \
                                             'api/sessions?token=' + \
                                             srv['token'])
            sessions = json.load(req)
            notebookname = sessions[0]['name']
        except:
            pass  
    
    sep = '.'
    path = os.getcwd()
    #parent_path = str(Path(path).parent)
    
    ### Path report
    #path_report = "{}/Reports".format(parent_path)
    #path_report = "{}/Reports".format(path)
    
    ### Path destination
    name_no_extension = notebookname.split(sep, 1)[0]
    source_to_move = name_no_extension +'.{}'.format(extension)
    dest = os.path.join(path,'Reports', source_to_move)
    
    ### Generate notebook
    if keep_code:
        os.system('jupyter nbconvert --to {} {}'.format(
    extension,notebookname))
    else:
        os.system('jupyter nbconvert --no-input --to {} {}'.format(
    extension,notebookname))
    
    ### Move notebook to report folder
    #time.sleep(5)
    shutil.move(source_to_move, dest)
    print("Report Available at this adress:\n {}".format(dest))

In [53]:
create_report(extension = "html", keep_code = True)

Report Available at this adress:
 /Users/thomas/Google Drive/Projects/GitHub/Repositories/Financial_dependency_pollution/01_data_preprocessing/02_transform_tables/Reports/00_asif_financial_ratio.html
