# Estimate baseline equation 

This notebook has been generated on 08/14/2020

## Objective(s)

*  Estimate the baseline regression with the new dataset. The baseline regression is the following:

Regress the quality index at the firm, product, city, destination and time  on the VAT rebate tax and eligibility and other control variables plus a bunch of fixed effect detailed later on

![](https://drive.google.com/uc?export=view&id=1stdefCGutycsRZU9EuLgUoe3am8tgg5T)

* Need to find a negative and significant coefficient
* include product-year fixed effect

## Metadata

* Task type:
  * Jupyter Notebook
* Users: :
  * Thomas Pernet
* Watchers:
  * Thomas Pernet
* Estimated Log points:
  * One being a simple task, 15 a very difficult one
  *  10
* Task tag
  *  #linear-regression,#baseline-results,#fixed-effect
* Toggl Tag
  * #baseline-result

## Input Cloud Storage [AWS/GCP]

* BigQuery 
  * Table: quality_vat_export_2003_2010
    * Notebook construction file (data lineage) 
      * md : [01_preparation_quality.md](https://github.com/thomaspernet/VAT_rebate_quality_china/blob/master/01_Data_preprocessing/01_preparation_quality.md)

## Destination Output/Delivery

1. Latex table (Latex & pdf)
  * Description: The table should look like the one from the paper: thomaspernet/VAT_rebate_quality_china: table 1
  * Github branch: master 
  * Folder: [02_Data_analysis/02_new_baseline_table/Tables](https://github.com/thomaspernet/VAT_rebate_quality_china/tree/master/02_Data_analysis/02_new_baseline_table/Tables)

## Things to know (Steps, Attention points or new flow of information)

* Documentation 
  * Coda: 
    * [US 1 Empirical analysis Baseline](https://coda.io/d/VAT-Rebate_d_s12qjWA8O/US-1-Empirical-analysis-Baseline_sugol): Details about FE and baseline regression
* Github
    1. Repo: [thomaspernet/VAT_rebate_quality_china: New FE table](https://github.com/thomaspernet/VAT_rebate_quality_china/blob/master/02_Data_analysis/01_new_fixed_effect/01_baseline_table.md#new-fe-table) → Table with the fixed effect to reproduce and baseline table

# Load Dataset

## inputs

- Filename: quality_vat_export_2003_2010
- Link: [BigQuery](https://console.cloud.google.com/bigquery?project=valid-pagoda-132423&p=valid-pagoda-132423&d=China&t=quality_vat_export_2003_2010&page=table)
- Type: Table

In [None]:
import pandas as pd 
import numpy as np
from pathlib import Path
import os, re,  requests, json 
from GoogleDrivePy.google_authorization import authorization_service
from GoogleDrivePy.google_platform import connect_cloud_platform

In [None]:
import function.latex_beautify as lb

%load_ext autoreload
%autoreload 2

In [1]:
options(warn=-1)
library(tidyverse)
library(lfe)
library(lazyeval)
library('progress')
path = "function/table_golatex.R"
source(path)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.1
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Loading required package: Matrix


Attaching package: ‘Matrix’


The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack




ERROR: Error in library(lazyeval): there is no package called ‘lazyeval’


In [None]:
path = os.getcwd()
parent_path = str(Path(path).parent)
project = 'valid-pagoda-132423'


auth = authorization_service.get_authorization(
    path_credential_gcp = "{}/creds/service.json".format(parent_path),
    verbose = False#
)

gcp_auth = auth.authorization_gcp()
gcp = connect_cloud_platform.connect_console(project = project, 
                                             service_account = gcp_auth) 

In [None]:
query = (
          "SELECT * "
            "FROM China.quality_vat_export_2003_2010 "

        )

In [2]:
#df_final = gcp.upload_data_from_bigquery(query = query, location = 'US')
#df_final.head()
path = '../../00_Data_catalogue/temporary_local_data/quality_vat_export_2003_2010.csv'
df_final <- read_csv(path) %>%
mutate_if(is.character, as.factor) %>%
    mutate_at(vars(starts_with("FE")), as.factor) %>%
mutate(regime = relevel(regime, ref='Not_Eligible'))

Parsed with column specification:
cols(
  .default = col_double(),
  cityen = [31mcol_character()[39m,
  regime = [31mcol_character()[39m,
  Country_en = [31mcol_character()[39m,
  ISO_alpha = [31mcol_character()[39m
)

See spec(...) for full column specifications.



In [None]:
#import pandas as pd
#path = '../../00_Data_catalogue/temporary_local_data/quality_vat_export_2003_2010.csv'
#print(pd.read_csv(path).dtypes.to_markdown())

# Models to estimate

Variables:


|       Variables        | Type    |
|:-----------------------|:--------|
| cityen                 | object  |
| geocode4_corr          | int64   |
| year                   | int64   |
| regime                 | object  |
| HS6                    | int64   |
| HS4                    | int64   |
| HS3                    | int64   |
| Country_en             | object  |
| ISO_alpha              | object  |
| Quantity               | int64   |
| value                  | int64   |
| unit_price             | float64 |
| kandhelwal_quality     | float64 |
| price_adjusted_quality | float64 |
| lag_tax_rebate         | float64 |
| ln_lag_tax_rebate      | float64 |
| lag_import_tax         | float64 |
| ln_lag_import_tax      | float64 |
| sigma                  | float64 |
| sigma_price            | float64 |
| y                      | float64 |
| prediction             | float64 |
| residual               | float64 |
| FE_cp                  | int64   |
| FE_cst                 | int64   |
| FE_cpr                 | int64   |
| FE_csrt                | int64   |
| FE_pt                  | int64   |
| FE_pd                  | int64   |
| FE_dt                  | int64   |
| FE_ct                  | int64   |

## Fixed Effect

| Benchmark | Origin    | Name                     | Description                                                                                                                                                                                                                                                                                                                                    | Math_notebook     |
|-----------|-----------|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|
| Yes       | Current   | firm-product-eligibility | captures all the factors that affect firms regardless of the time and type of regime. This firm‒product pair eliminates the demand shocks that firms face and that are not correlated with the types of status. The fixed effects are also responsible for potential correlations between subsidies, R&D, or trade policies and VAT rebates.   | $\alpha^{E}_{it}$ |
| Yes       | Current   | HS4-year-eligibility     |                                                                                                                                                                                                                                                                                                                                                | $\alpha^{E}_{st}$ |
| Yes       | Current   | city-year                | captures the differences in demand, capital intensity, or labor supply that prevail between cities each year                                                                                                                                                                                                                                   | $\alpha_{ct}$     |
| Yes       | Current   | destination-year         | Captures additional level of control, encompassing all the shocks and developments in the economies to which China exports.                                                                                                                                                                                                                    | $\alpha_{dt}$     |
|           | Candidate | Product-year             | account for all factors that affect product-level export irrespective of the trade regime in a given year                                                                                                                                                                                                                                      | $\alpha_{pt}$     |
|           | Candidate | product-destination      |                                                                                                                                                                                                                                                                                                                                                | $\alpha_{pd}$     |
|           | Candidate | Product-destination-year |                                                                                                                                                                                                                                                                                                                                                | $\alpha_{pdt}$    |

## Table 01

Equation to estimate:

$$ $$


- Overleaf:

In [18]:
t_0 <- felm(kandhelwal_quality ~ln_lag_tax_rebate+ ln_lag_import_tax 
            | FE_cp + FE_cst+FE_pd|0 | HS6, df_final %>% filter(regime == 'Eligible'),
            exactDOF = TRUE)
summary(t_0)


Call:
   felm(formula = kandhelwal_quality ~ ln_lag_tax_rebate + ln_lag_import_tax |      FE_cp + FE_cst + FE_pd | 0 | HS6, data = df_final %>% filter(regime ==      "Eligible"), exactDOF = TRUE) 

Residuals:
    Min      1Q  Median      3Q     Max 
-55.833  -0.587   0.000   0.656  55.833 

Coefficients:
                  Estimate Cluster s.e. t value Pr(>|t|)    
ln_lag_tax_rebate -0.26915      0.04787  -5.622    2e-08 ***
ln_lag_import_tax  0.01229      0.04290   0.287    0.774    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.875 on 4006510 degrees of freedom
Multiple R-squared(full model): 0.4402   Adjusted R-squared: 0.3144 
Multiple R-squared(proj model): 0.0001461   Adjusted R-squared: -0.2246 
F-statistic(full model, *iid*):3.499 on 900412 and 4006510 DF, p-value: < 2.2e-16 
F-statistic(proj model): 15.98 on 2 and 4663 DF, p-value: 1.213e-07 



In [17]:
t_1 <- felm(kandhelwal_quality ~ln_lag_tax_rebate + ln_lag_import_tax
            | FE_cp + FE_cst + FE_pd|0 | HS6, df_final %>% filter(regime != 'Eligible'),
            exactDOF = TRUE)
summary(t_1)


Call:
   felm(formula = kandhelwal_quality ~ ln_lag_tax_rebate + ln_lag_import_tax |      FE_cp + FE_cst + FE_pd | 0 | HS6, data = df_final %>% filter(regime !=      "Eligible"), exactDOF = TRUE) 

Residuals:
    Min      1Q  Median      3Q     Max 
-31.779  -0.473   0.000   0.525  23.857 

Coefficients:
                  Estimate Cluster s.e. t value Pr(>|t|)
ln_lag_tax_rebate -0.08788      0.06368  -1.380    0.168
ln_lag_import_tax -0.11199      0.09200  -1.217    0.224

Residual standard error: 1.934 on 577012 degrees of freedom
Multiple R-squared(full model): 0.6387   Adjusted R-squared: 0.4306 
Multiple R-squared(proj model): 2.346e-05   Adjusted R-squared: -0.5762 
F-statistic(full model, *iid*):3.068 on 332502 and 577012 DF, p-value: < 2.2e-16 
F-statistic(proj model):  1.67 on 2 and 4162 DF, p-value: 0.1884 



In [19]:
t_2 <- felm(kandhelwal_quality ~ln_lag_tax_rebate* regime + ln_lag_import_tax * regime
            | FE_cpr + FE_csrt + FE_pd|0 | HS6, df_final,
            exactDOF = TRUE)
summary(t_2)


Call:
   felm(formula = kandhelwal_quality ~ ln_lag_tax_rebate * regime +      ln_lag_import_tax * regime | FE_cpr + FE_csrt + FE_pd | 0 |      HS6, data = df_final, exactDOF = TRUE) 

Residuals:
    Min      1Q  Median      3Q     Max 
-55.856  -0.593   0.000   0.660  55.856 

Coefficients:
                                 Estimate Cluster s.e. t value Pr(>|t|)  
ln_lag_tax_rebate                -0.11724      0.06144  -1.908   0.0564 .
regimeEligible                         NA      0.00000      NA       NA  
ln_lag_import_tax                -0.07306      0.08197  -0.891   0.3728  
ln_lag_tax_rebate:regimeEligible -0.15361      0.07224  -2.126   0.0335 *
regimeEligible:ln_lag_import_tax  0.08477      0.08666   0.978   0.3280  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.908 on 4707517 degrees of freedom
Multiple R-squared(full model): 0.4519   Adjusted R-squared: 0.3228 
Multiple R-squared(proj model): 0.0001261   Adjusted R-squared: 

In [25]:
t_3 <- felm(kandhelwal_quality ~ln_lag_tax_rebate* regime + ln_lag_import_tax * regime
            | FE_cpr + FE_csrt+FE_pt|0 | HS6, df_final,
            exactDOF = TRUE)
summary(t_3)


Call:
   felm(formula = kandhelwal_quality ~ ln_lag_tax_rebate * regime +      ln_lag_import_tax * regime | FE_cpr + FE_csrt + FE_pt | 0 |      HS6, data = df_final, exactDOF = TRUE) 

Residuals:
    Min      1Q  Median      3Q     Max 
-82.744  -0.669   0.000   0.732  35.425 

Coefficients:
                                 Estimate Cluster s.e. t value Pr(>|t|)  
ln_lag_tax_rebate                      NA      0.00000      NA       NA  
regimeEligible                         NA      0.00000      NA       NA  
ln_lag_import_tax                      NA      0.00000      NA       NA  
ln_lag_tax_rebate:regimeEligible -0.15153      0.08488  -1.785   0.0743 .
regimeEligible:ln_lag_import_tax  0.05567      0.10460   0.532   0.5946  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.057 on 5028950 degrees of freedom
Multiple R-squared(full model): 0.3196   Adjusted R-squared: 0.2131 
Multiple R-squared(proj model): 3.447e-06   Adjusted R-squared: 

In [None]:
try:
    os.remove("table_1.txt")
except:
    pass
try:
    os.remove("table_1.tex")
except:
    pass

In [None]:
dep <- "Dependent variable: Quality of city $c$ for product $k$ exported to countr $c$ at year $t$"
table_1 <- go_latex(list(
    t_1
),
    title="VAT export tax and product's quality upgrading, baseline regression",
    dep_var = dep,
    addFE='',
    save=TRUE,
    note = FALSE,
    name="table_1.txt"
)

In [None]:
tbe1 = ""

In [None]:
lb.beautify(table_number = 1,
            new_row= False,
           table_nte = tbe1,
           jupyter_preview = True,
            resolution = 150)

Test Log Export value

In [None]:
t_0 <- felm(log(value) ~ln_lag_tax_rebate* regime + ln_lag_import_tax * regime
            | FE_cpr + FE_csrt+FE_pt+ FE_pd|0 | HS6, df_final,
            exactDOF = TRUE)
summary(t_0)
t_1 <- felm(log(value) ~ln_lag_tax_rebate+ ln_lag_import_tax 
            | FE_cp + FE_cst+FE_pd|0 | HS6, df_final %>% filter(regime == 'Eligible'),
            exactDOF = TRUE)
summary(t_1)
t_2 <- felm(log(value) ~ln_lag_tax_rebate + ln_lag_import_tax
            | FE_cp + FE_cst + FE_pd|0 | HS6, df_final %>% filter(regime != 'Eligible'),
            exactDOF = TRUE)
summary(t_2)
t_3 <- felm(log(value) ~ln_lag_tax_rebate* regime + ln_lag_import_tax * regime
            | FE_cpr + FE_csrt + FE_pd|0 | HS6, df_final,
            exactDOF = TRUE)
summary(t_3)

# CREATE REPORT

In [None]:
import os, time, shutil, urllib, ipykernel, json
from pathlib import Path
from notebook import notebookapp

In [None]:
def create_report(extension = "html"):
    """
    Create a report from the current notebook and save it in the 
    Report folder (Parent-> child directory)
    
    1. Exctract the current notbook name
    2. Convert the Notebook 
    3. Move the newly created report
    
    Args:
    extension: string. Can be "html", "pdf", "md"
    
    
    """
    
    ### Get notebook name
    connection_file = os.path.basename(ipykernel.get_connection_file())
    kernel_id = connection_file.split('-', 1)[1].split('.')[0]

    for srv in notebookapp.list_running_servers():
        try:
            if srv['token']=='' and not srv['password']:  
                req = urllib.request.urlopen(srv['url']+'api/sessions')
            else:
                req = urllib.request.urlopen(srv['url']+ \
                                             'api/sessions?token=' + \
                                             srv['token'])
            sessions = json.load(req)
            notebookname = sessions[0]['name']
        except:
            pass  
    
    sep = '.'
    #path = os.getcwd()
    #parent_path = str(Path(path).parent)
    
    ### Path report
    #path_report = "{}/Reports".format(parent_path)
    #path_report = "{}/Reports".format(path)
    
    ### Path destination
    name_no_extension = notebookname.split(sep, 1)[0]
    source_to_move = name_no_extension +'.{}'.format(extension)
    dest = os.path.join(path,'Reports', source_to_move)
    
    ### Generate notebook
    os.system('jupyter nbconvert --no-input --to {} {}'.format(
    extension,notebookname))
    
    ### Move notebook to report folder
    #time.sleep(5)
    shutil.move(source_to_move, dest)
    print("Report Available at this adress:\n {}".format(dest))