# Canadian Income Survey 
## Analyses for the "Estimating Costs of Poverty in Quebec" project


By: Thierry Gagné, PhD

Started: May 2nd, 2023
Last updated: May 5th, 2023

**GOAL**

To produce the following estimates:

1. Le revenu des ménages sous le seuil de la MPC au seuil pondéré (20 007).
2. Le revenu des ménages sous le seuil de la MPC au 1e quintile.
3. Le revenu des ménages sous le seuil de la MPC à la moyenne du 2e quintile.
4. Le revenu des ménages sous le premier quintile au 1e quintile.
5. Le revenu des ménages sous le premier quintile à la moyenne du 2e quintile.
6. Le revenu des ménages sous le seuil de revenu viable pondéré proposé par l’IRIS à cette limite (26 712).
7. Le revenu des ménages sous 50% de la médiane à cette limite.
8. Le revenu des ménages sous 60% de la médiane à cette limite.

**N.B.**

The CEPE report looks into the lost incomes in those aged 18-64 and measures it at the "family unit" level.

**INTRODUCTION**

The Canadian Income Survey (CIS) is a cross-sectional survey developed to provide a portrait of the income and income sources of Canadians, with their individual and household characteristics.

**REFERENCES:**

Link to the Public-Use Microdata File dataset:
https://abacus.library.ubc.ca/dataset.xhtml?persistentId=hdl:11272.1/AB2/KDU2UJ

Description of the dataset:
https://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=5200




In [70]:
# OPEN DATASET

import numpy as np
import pandas as pd
import readability
import warnings
import re
import statsmodels.stats.weightstats as weightstats
import math
# warnings.simplefilter(action='ignore', category=FutureWarning)

pd.options.display.max_colwidth = 100

# The easiest way to open the dataset was to open the Stata file, export it as a .csv, and open that here.

df = pd.read_csv("CIS2019_PUMF.csv")
print("The 2019 CSI PUMF sample has", len(df), "participants")

df = df[df["prov"] == 24]
print("The 2019 CSI PUMF sample has", len(df), "QC participants")

df = df[df["agegp"] >= 5]
df = df[df["agegp"] <= 13]
print("The 2019 CSI PUMF sample has", len(df), "QC participants aged 18-64.")

print("\nThere is", sum(df["EFMBIN18"].isna() == True), "missing cases on the EFMBIN18 variable.")
print("There is", sum(df["MBSCF18"].isna() == True), "missing cases on the MBSCF18 variable.")
print("There is", sum(df["hhsize"].isna() == True), "missing cases on the hhsize variable.")
print("There is", sum(df["fweight"].isna() == True), "missing cases on the fweight variable.")

df["efdispincequi"] = df.apply(lambda x: x["EFMBIN18"] / math.sqrt(x["hhsize"]), axis=1)

result1 = weightstats.DescrStatsW(df["efdispincequi"], weights = df["fweight"]).mean
result2 = weightstats.DescrStatsW(df["efdispincequi"], weights = df["fweight"]).quantile(0.5).loc[0.5]

print("\nMean equivalized disposable income:", f'{math.trunc(round(result1, 0)):,}', "$")
print("Median equivalized disposable income:", f'{math.trunc(round(result2, 0)):,}', "$")

#### PRODUCING RAW ESTIMATES

result1 = weightstats.DescrStatsW(df[df["MBSCF18"]==1]["efdispincequi"], weights = df[df["MBSCF18"]==1]["fweight"]).mean
result2 = weightstats.DescrStatsW(df["efdispincequi"], weights = df["fweight"]).quantile(0.2).loc[0.2]
result3 = weightstats.DescrStatsW(df[df["efdispincequi"] < result2]["efdispincequi"], weights = df[df["efdispincequi"] < result2]["fweight"]).mean
intermediate1 = weightstats.DescrStatsW(df["efdispincequi"], weights = df["fweight"]).quantile(0.4).loc[0.4]
result4 = weightstats.DescrStatsW(df[(df["efdispincequi"] >= result2) & (df["efdispincequi"] < intermediate1)]["efdispincequi"], weights = df[(df["efdispincequi"] >= result2) & (df["efdispincequi"] < intermediate1)]["fweight"]).mean
result5 = weightstats.DescrStatsW(df["efdispincequi"], weights = df["fweight"]).quantile(0.5).loc[0.5] * 0.5
result6 = weightstats.DescrStatsW(df[df["efdispincequi"] < result5]["efdispincequi"], weights = df[df["efdispincequi"] < result5]["fweight"]).mean
result7 = weightstats.DescrStatsW(df["efdispincequi"], weights = df["fweight"]).quantile(0.6).loc[0.6] * 0.6
result8 = weightstats.DescrStatsW(df[df["efdispincequi"] < result7]["efdispincequi"], weights = df[df["efdispincequi"] < result7]["fweight"]).mean

def clean(x):
    return f'{math.trunc(round(x, 0)):,}'

print("\nBASIC ESTIMATES")
print("\n1 - Mean equivalised disposable income in those under MBM:", clean(result1), "$")
print("2 - Equivalised disposable income for 1st quintile:", clean(result2), "$")
print("3 - Mean equivalised disposable income in 1st quintile:", clean(result3), "$")
print("4 - Mean equivalised disposable income in 2nd quintile:", clean(result4), "$")
print("5 - Equivalised disposable income for 50% of median (for 1 person):", clean(result5), "$")
print("6 - Mean equivalised disposable income in those below this 50% threshold:", clean(result6), "$")
print("7 - Equivalised disposable income for 60% of median (for 1 person):", clean(result7), "$")
print("8 - Mean equivalised disposable income in those below this 60% threshold:", clean(result8), "$")

print("\nN.B. These numbers are based on the STC derived variable EFMBIN18")

#### PRODUCING COSTS OF POVERTY ESTIMATES

def clean2(x):
    return f'{round(x, 2):,}'

print("\nOPPORTUNITY COSTS ESTIMATES IN 2019:")

result_op1 = sum(df[df["MBSCF18"] == 1]["fweight"]) * (20007 - result1)
result_op1 = result_op1 / (10**9)

print("\n1- Not lifting households under MBM to global MBM:", clean2(result_op1), "billion $")
print("    [= " + str(clean(sum(df[df["MBSCF18"] == 1]["fweight"]))) + " * (" + str(clean(20007)) + " - " + str(clean(result1)) + ")]")

result_op2 = sum(df[df["MBSCF18"] == 1]["fweight"]) * (26712 - result1)
result_op2 = result_op2 / (10**9)

print("\n2- Not lifting households under MBM to global IRIS \"viable income\" measure:", clean2(result_op2), "billion $")
print("    [= " + str(clean(sum(df[df["MBSCF18"] == 1]["fweight"]))) + " * (" + str(clean(26712)) + " - " + str(clean(result1)) + ")]")

result_op3 = sum(df[df["MBSCF18"] == 1]["fweight"]) * (result4 - result1)
result_op3 = result_op3 / (10**9)

result_op4 = sum(df[df["MBSCF18"] == 1]["fweight"]) * (result2 - result1)
result_op4 = result_op4 / (10**9)

print("\n3- Not lifting households under MBM to 2nd quintile:", clean2(result_op4), "billion $")
print("    [= " + str(clean(sum(df[df["MBSCF18"] == 1]["fweight"]))) + " * (" + str(clean(result2) + " - " + str(clean(result1)) + ")]"))

print("\n4- Not lifting households under MBM to mean of 2nd quintile:", clean2(result_op3), "billion $ (N.B. Method used by CEPE)")
print("    [= " + str(clean(sum(df[df["MBSCF18"] == 1]["fweight"]))) + " * (" + str(clean(result4) + " - " + str(clean(result1)) + ")]"))

result_op5 = sum(df[df["efdispincequi"] < result2]["fweight"]) * (result2 - result3)
result_op5 = result_op5 / (10**9)

result_op6 = sum(df[df["efdispincequi"] < result2]["fweight"]) * (result4 - result3)
result_op6 = result_op6 / (10**9)

print("\n5- Not lifting households in 1st quintile to 2nd quintile:", clean2(result_op5), "billion $ ")
print("    [= " + str(clean(sum(df[df["efdispincequi"] < result2]["fweight"]))) + " * (" + str(clean(result2) + " - " + str(clean(result3)) + ")]"))

print("\n6- Not lifting households in 1st quintile to mean of 2nd quintile:", clean2(result_op6), "billion $ (N.B. Method used by Laurie)")
print("    [= " + str(clean(sum(df[df["efdispincequi"] < result2]["fweight"]))) + " * (" + str(clean(result4) + " - " + str(clean(result3)) + ")]"))

result_op7 = sum(df[df["efdispincequi"] < result5]["fweight"]) * (result5 - result6)
result_op7 = result_op7 / (10**9)

result_op8 = sum(df[df["efdispincequi"] < result7]["fweight"]) * (result7 - result8)
result_op8 = result_op8 / (10**9)

print("\n7- Not lifting households below 50% of median to threshold:", clean2(result_op7), "billion $")
print("    [= " + str(clean(sum(df[df["efdispincequi"] < result5]["fweight"]))) + " * (" + str(clean(result5) + " - " + str(clean(result6)) + ")]"))

print("\n8- Not lifting households below 60% of median to threshold:", clean2(result_op8), "billion $")
print("    [= " + str(clean(sum(df[df["efdispincequi"] < result7]["fweight"]))) + " * (" + str(clean(result7) + " - " + str(clean(result8)) + ")]"))



The 2019 CSI PUMF sample has 72643 participants
The 2019 CSI PUMF sample has 13862 QC participants
The 2019 CSI PUMF sample has 8024 QC participants aged 18-64.

There is 0 missing cases on the EFMBIN18 variable.
There is 0 missing cases on the MBSCF18 variable.
There is 0 missing cases on the hhsize variable.
There is 0 missing cases on the fweight variable.

Mean equivalized disposable income: 46,243 $
Median equivalized disposable income: 42,580 $

BASIC ESTIMATES

1 - Mean equivalised disposable income in those under MBM: 11,439 $
2 - Equivalised disposable income for 1st quintile: 27,103 $
3 - Mean equivalised disposable income in 1st quintile: 16,299 $
4 - Mean equivalised disposable income in 2nd quintile: 32,151 $
5 - Equivalised disposable income for 50% of median (for 1 person): 21,290 $
6 - Mean equivalised disposable income in those below this 50% threshold: 12,552 $
7 - Equivalised disposable income for 60% of median (for 1 person): 28,653 $
8 - Mean equivalised disposable