# Data Checker

This notebook is meant to do various checks to make sure that the data is accurate.

The main two checkers are:

1. Check to ensure that the amounts in the nested tree is accurate.

2. Check that the total surplus (or deficit) adds up to what was presented in the financial statement.

*Note - the total revenue and the total spend will not reflect the Consolidated Statement of Change in Net Financial Assets page 16 of the financial statement. This is due to more granular data in Note 15 on page 35 where I separated out the taxes collected by the city (revenue) against the taxes remitted to various Metro Vancouver Organizations (expenses) while page 16 only showed the total difference to be in the revenue.*

## Check to ensure that the amounts in the nested tree is accurate.

We define a function here to resursively sum up the "amount" field, no matter how deeply nested it is. This wil help us get the tallies for the revenue + the spend.

In [9]:
def sum_amounts(node):
    if isinstance(node, dict):
        if isinstance(node.get("children"), list):
            return sum(sum_amounts(c) for c in node["children"])
        return float(node.get("amount", 0.0))
    if isinstance(node, list):
        return sum(sum_amounts(x) for x in node)
    return 0.0

Next, we import the data.

In [10]:
import json
from pathlib import Path

with open(Path("./final_data/sankey.json"), "r", encoding="utf-8") as f:
    data = json.load(f)


Now, we take the reported revenue and the reported spend in the JSON data and compare it against the sankey nested amounts.

In [11]:
deep_revenue  = sum_amounts(data.get("revenue_data", {}))
deep_spending = sum_amounts(data.get("spending_data", {}))

print("Sankey nested revenue total:", deep_revenue)
print("Sankey nested spending total:", deep_spending)
print("Reported revenue:", data.get("revenue"))
print("Reported spending:", data.get("spending"))
print("Revenue diff:", deep_revenue - data.get("revenue", 0.0))
print("Spending diff:", deep_spending - data.get("spending", 0.0))

Sankey nested revenue total: 4130.1849999999995
Sankey nested spending total: 3269.5509999999995
Reported revenue: 4130.185
Reported spending: 3269.551
Revenue diff: -9.094947017729282e-13
Spending diff: -4.547473508864641e-13


The revenue and the spending roughly checks out, and so it passes our test to ensure that the nested sums add up to the totals.

## Check that the total surplus (or deficit) adds up to what was presented in the financial statement.

Per page 16 in the financial statement (in the raw data as a PDF), the City of Vancouver had a surplus of 860,634,000.

We will confirm that our sankey data reflects this.

In [8]:


# Note that City of Vancouver reports in '000, while the sankey data is in millions. We convert it to millions here.
pdf_surplus = 860634/1000

sankey_revenue = data.get("revenue")

sankey_spend = data.get("spending")

sankey_surplus = sankey_revenue - sankey_spend

print("Financial Statement Surplus: ", pdf_surplus)
print("Sankey Statement Surplus: ", sankey_surplus)
print("Checker: ", round(pdf_surplus - sankey_surplus))

Financial Statement Surplus:  860.634
Sankey Statement Surplus:  860.6340000000005
Checker:  0


Since the difference is relatively close to zero, we passed the sense check for #2.