# Validation

OpenFisca-UK runs unit and integration tests on each new version (see [here](https://github.com/PSLmodels/openfisca-uk/tree/master/tests)).
In addition, the table below shows the aggregates produced by the model for the major taxes and benefits, and comparisons with UKMOD (latest [country report](https://www.iser.essex.ac.uk/research/publications/working-papers/cempa/cempa7-20.pdf)) and official sources.[^1]
UKMOD and administrative sources refer to 2018, and OpenFisca-UK is simulated on policy at the end of 2018.
Numbers are in billions of pounds.

[^1]: From the UKMOD country report: unless otherwise specified: Department for Work and Pensions https://www.gov.uk/government/publications/benefit-expenditure-and-caseload-tables-2018 ; Best Start Grant: https://www2.gov.scot/Topics/Statistics/Browse/Social-Welfare/SocialSecurityforScotland/BSGJune2019; Child tax credit and working tax credit: HMRC statistics 
https://www.gov.uk/government/statistics/child-and-working-tax-credits-statistics-finalised-annual-awards-2016-to-2017; Scottish Child Payment: Scottish Fiscal Commission https://www.fiscalcommission.scot/forecast/supplementary-costing-scottish-child-payment; Scottish Child Winter Heating Assistance: Scottish Fiscal Commission 
https://www.fiscalcommission.scot/forecast/supplementary-costing-child-winter-heating-assistance; Income tax: HMRC statistics https://www.gov.uk/government/statistics/income-tax-liabilities-statistics-tax-year-2014-to-2015-to-tax-year-2017-to-2018; National Insurance Contributions: ONS Blue Book Table 5.2.4s 

## Aggregate tables

OpenFisca-UK uprates input FRS data: below are comparisons between the aggregates calculated by OpenFisca-UK, UKMOD and external sources.

### Aggregates in full

In [1]:
import numpy as np
import pandas as pd
from openfisca_uk import Microsimulation

sim = Microsimulation(duplicate_records=2)

_ = np.nan
VARIABLES = [
    "income_tax",
    "total_NI",
    "universal_credit",
    "working_tax_credit",
    "child_tax_credit",
    "child_benefit",
    "housing_benefit",
    "pension_credit",
    "income_support",
    "JSA_income",
    "council_tax_less_benefit",
    "state_pension",
    "ESA_income",
]

df = pd.concat(
    [
        (sim.df(VARIABLES, map_to="household", period=year).sum() / 1e9)
        for year in range(2018, 2023)
    ],
    axis=1,
)
df.columns = list(range(2018, 2023))
df.index = [
    sim.simulation.tax_benefit_system.variables[var].label for var in df.index
]
df
ukmod_df = pd.DataFrame(
    {
        "Income Tax": [163.7, 165.9, 165.0, 173.9, _],
        "National Insurance (total)": [138.6, 144.2, 141.6, 148.0, _],
        "Universal Credit": [11.7, 24.8, 41.3, 40.4, _],
        "Working Tax Credit": [2.5, 1.6, 1.3, 0.6, _],
        "Child Tax Credit": [11.4, 7.1, 4.4, 2.8, _],
        "Housing Benefit": [15.1, 11.0, 8.6, 7.5, _],
        "Child Benefit": [11.5, 11.4, 11.6, 11.6, _],
        "Pension Credit": [4.1, 3.6, 3.6, 2.9, _],
        "Income Support": [_, _, _, _, _],
        "JSA (income-based)": [_, _, _, _, _],
        "Council Tax (less CTB)": [_, _, _, _, _],
    }
).T
ukmod_df.columns = list(range(2018, 2023))
# source: https://www.microsimulation.ac.uk/wp-content/uploads/2020/10/cempa7-20.pdf#page=130
# where missing, UKMOD does not separate benefits and therefore figures cannot be obtained

statistics = sim.simulation.tax_benefit_system.parameters.calibration
get_yearly = lambda param, multiplier: [
    round(param(f"{year}-01-01") * multiplier, 1) for year in range(2018, 2023)
]
external_df = pd.DataFrame(
    {
        "Income Tax": get_yearly(statistics.aggregate.income_tax, 1e-9),
        "National Insurance (total)": get_yearly(
            statistics.aggregate.total_NI, 1e-9
        ),
        "Universal Credit": get_yearly(
            statistics.aggregate.universal_credit, 1e-9
        ),
        "Working Tax Credit": get_yearly(
            statistics.aggregate.working_tax_credit, 1e-9
        ),
        "Child Tax Credit": get_yearly(
            statistics.aggregate.child_tax_credit, 1e-9
        ),
        "Housing Benefit": get_yearly(
            statistics.aggregate.housing_benefit, 1e-9
        ),
        "Child Benefit": get_yearly(statistics.aggregate.child_benefit, 1e-9),
        "Pension Credit": get_yearly(
            statistics.aggregate.pension_credit, 1e-9
        ),
        "Income Support": get_yearly(
            statistics.aggregate.income_support, 1e-9
        ),
        "JSA (income-based)": get_yearly(
            statistics.aggregate.JSA_income, 1e-9
        ),
        "Council Tax (less CTB)": get_yearly(
            statistics.aggregate.council_tax_less_benefit, 1e-9
        ),
        "State Pension": get_yearly(statistics.aggregate.state_pension, 1e-9),
        "ESA (income-based)": get_yearly(
            statistics.aggregate.ESA_income, 1e-9
        ),
    }
).T
external_df.columns = list(range(2018, 2023))

df = df.drop(2018, axis=1)
ukmod_df = ukmod_df.drop(2018, axis=1)
external_df = external_df.drop(2018, axis=1)
pd.concat(
    [df.apply(lambda col: col.round(1)), ukmod_df, external_df],
    axis=1,
    keys=["OpenFisca-UK", "UKMOD", "External"],
).fillna("")

Unnamed: 0_level_0,OpenFisca-UK,OpenFisca-UK,OpenFisca-UK,OpenFisca-UK,UKMOD,UKMOD,UKMOD,UKMOD,External,External,External,External
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
Income Tax,192.0,187.0,201.8,203.2,165.9,165.0,173.9,,193.6,188.2,203.6,205.0
National Insurance (total),144.5,140.4,158.1,150.8,144.2,141.6,148.0,,145.0,140.8,145.9,151.3
Universal Credit,18.4,38.2,41.0,43.5,24.8,41.3,40.4,,18.4,38.3,41.1,43.7
Working Tax Credit,4.2,3.2,2.4,1.8,1.6,1.3,0.6,,3.8,3.1,2.2,1.6
Child Tax Credit,14.7,11.5,8.2,6.4,7.1,4.4,2.8,,13.9,11.4,8.1,6.0
Child Benefit,11.1,11.2,11.1,11.3,11.4,11.6,11.6,,11.1,11.1,11.0,11.2
Housing Benefit,5.1,16.2,15.9,14.9,11.0,8.6,7.5,,18.4,17.3,17.1,15.9
Pension Credit,5.0,5.0,4.9,4.4,3.6,3.6,2.9,,5.1,5.1,5.0,4.5
Income Support,1.4,0.9,0.8,0.6,,,,,1.4,1.1,0.9,0.7
JSA (income-based),0.5,0.4,0.3,0.2,,,,,0.6,0.4,0.3,0.2


### Differences

#### Absolute

In [2]:
pd.concat(
    [
        external_df,
        (ukmod_df - external_df).round(1).fillna(""),
        (df - external_df).round(1).fillna(""),
    ],
    axis=1,
    keys=[
        "External",
        "UKMOD Difference (£bn)",
        "OpenFisca-UK Difference (£bn)",
    ],
).fillna("")

Unnamed: 0_level_0,External,External,External,External,UKMOD Difference (£bn),UKMOD Difference (£bn),UKMOD Difference (£bn),UKMOD Difference (£bn),OpenFisca-UK Difference (£bn),OpenFisca-UK Difference (£bn),OpenFisca-UK Difference (£bn),OpenFisca-UK Difference (£bn)
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
Income Tax,193.6,188.2,203.6,205.0,-27.7,-23.2,-29.7,,-1.6,-1.2,-1.8,-1.8
National Insurance (total),145.0,140.8,145.9,151.3,-0.8,0.8,2.1,,-0.5,-0.4,12.2,-0.5
Universal Credit,18.4,38.3,41.1,43.7,6.4,3.0,-0.7,,-0.0,-0.1,-0.1,-0.2
Working Tax Credit,3.8,3.1,2.2,1.6,-2.2,-1.8,-1.6,,0.4,0.1,0.2,0.2
Child Tax Credit,13.9,11.4,8.1,6.0,-6.8,-7.0,-5.3,,0.8,0.1,0.1,0.4
Housing Benefit,18.4,17.3,17.1,15.9,-7.4,-8.7,-9.6,,-13.3,-1.1,-1.2,-1.0
Child Benefit,11.1,11.1,11.0,11.2,0.3,0.5,0.6,,0.0,0.1,0.1,0.1
Pension Credit,5.1,5.1,5.0,4.5,-1.5,-1.5,-2.1,,-0.1,-0.1,-0.1,-0.1
Income Support,1.4,1.1,0.9,0.7,,,,,-0.0,-0.2,-0.1,-0.1
JSA (income-based),0.6,0.4,0.3,0.2,,,,,-0.1,-0.0,0.0,-0.0


#### Relative

In [3]:
pd.concat(
    [
        external_df,
        ((ukmod_df / external_df - 1).round(3) * 100).fillna(""),
        ((df / external_df - 1).round(3) * 100).fillna(""),
    ],
    axis=1,
    keys=["External", "UKMOD Difference (%)", "OpenFisca-UK Difference (%)"],
).fillna("")

Unnamed: 0_level_0,External,External,External,External,UKMOD Difference (%),UKMOD Difference (%),UKMOD Difference (%),UKMOD Difference (%),OpenFisca-UK Difference (%),OpenFisca-UK Difference (%),OpenFisca-UK Difference (%),OpenFisca-UK Difference (%)
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
Income Tax,193.6,188.2,203.6,205.0,-14.3,-12.3,-14.6,,-0.8,-0.6,-0.9,-0.9
National Insurance (total),145.0,140.8,145.9,151.3,-0.6,0.6,1.4,,-0.3,-0.3,8.4,-0.3
Universal Credit,18.4,38.3,41.1,43.7,34.8,7.8,-1.7,,-0.3,-0.3,-0.2,-0.4
Working Tax Credit,3.8,3.1,2.2,1.6,-57.9,-58.1,-72.7,,11.0,4.7,8.1,13.5
Child Tax Credit,13.9,11.4,8.1,6.0,-48.9,-61.4,-65.4,,5.7,1.3,0.9,7.2
Housing Benefit,18.4,17.3,17.1,15.9,-40.2,-50.3,-56.1,,-72.1,-6.3,-7.0,-6.4
Child Benefit,11.1,11.1,11.0,11.2,2.7,4.5,5.5,,0.4,0.7,1.3,1.2
Pension Credit,5.1,5.1,5.0,4.5,-29.4,-29.4,-42.0,,-1.9,-1.8,-1.7,-1.7
Income Support,1.4,1.1,0.9,0.7,,,,,-0.1,-14.7,-5.8,-9.5
JSA (income-based),0.6,0.4,0.3,0.2,,,,,-9.7,-3.7,10.0,-8.3


## Caseload tables

OpenFisca-UK uprates input FRS data: below are comparisons between the aggregates calculated by OpenFisca-UK, UKMOD and external sources.

### Caseloads in full

In [4]:
import numpy as np
import pandas as pd
from openfisca_uk import Microsimulation

sim = Microsimulation(duplicate_records=2)

_ = np.nan
VARIABLES = [
    "income_tax",
    "universal_credit",
    "working_tax_credit",
    "child_tax_credit",
    "child_benefit",
    "housing_benefit",
    "pension_credit",
    "income_support",
    "JSA_income",
    "state_pension",
    "ESA_income",
]


def get_caseload(variable, year):
    entity = sim.simulation.tax_benefit_system.variables[variable].entity.key
    value = sim.calc(variable, period=year).values > 0
    household_level = sim.map_to(value, entity, "household")
    return (
        sim.calc("household_weight", period=year).values * household_level
    ).sum() / 1e6


df = pd.concat(
    [
        (
            pd.Series(
                {
                    variable: get_caseload(variable, year)
                    for variable in VARIABLES
                }
            )
        )
        for year in range(2018, 2023)
    ],
    axis=1,
)
df.columns = list(range(2018, 2023))
df.index = [
    sim.simulation.tax_benefit_system.variables[var].label for var in df.index
]
df
ukmod_df = pd.DataFrame(
    {
        "Income Tax": [_, 29.3, 29.4, 29.9, 30.0],
        "Universal Credit": [_, 3.0, 4.6, 4.8, 5.6],
        "Working Tax Credit": [_, 0.5, 0.4, 0.2, 0.1],
        "Child Tax Credit": [_, 1.5, 0.9, 0.6, 0.2],
        "Housing Benefit": [_, 2.6, 2.0, 1.8, 1.5],
        "Child Benefit": [_, 7.2, 7.2, 7.1, 7.1],
        "Pension Credit": [_, 1.5, 1.5, 1.3, 1.3],
        "Income Support": [_, _, _, _, _],
        "JSA (income-based)": [_, _, _, _, _],
    }
).T
ukmod_df.columns = list(range(2018, 2023))
# source: https://www.microsimulation.ac.uk/wp-content/uploads/2020/10/cempa7-20.pdf#page=130
# where missing, UKMOD does not separate benefits and therefore figures cannot be obtained

statistics = sim.simulation.tax_benefit_system.parameters.calibration
get_yearly = lambda param, multiplier: [
    round(param(f"{year}-01-01") * multiplier, 1) for year in range(2018, 2023)
]
external_df = pd.DataFrame(
    {
        "Income Tax": get_yearly(statistics.count.income_tax, 1e-6),
        "Universal Credit": get_yearly(
            statistics.count.universal_credit, 1e-6
        ),
        "Working Tax Credit": get_yearly(
            statistics.count.working_tax_credit, 1e-6
        ),
        "Child Tax Credit": get_yearly(
            statistics.count.child_tax_credit, 1e-6
        ),
        "Housing Benefit": get_yearly(statistics.count.housing_benefit, 1e-6),
        "Child Benefit": get_yearly(statistics.count.child_benefit, 1e-6),
        "Pension Credit": get_yearly(statistics.count.pension_credit, 1e-6),
        "Income Support": get_yearly(statistics.count.income_support, 1e-6),
        "JSA (income-based)": get_yearly(statistics.count.JSA_income, 1e-6),
        "State Pension": get_yearly(statistics.count.state_pension, 1e-6),
        "ESA (income-based)": get_yearly(statistics.count.ESA_income, 1e-6),
    }
).T
external_df.columns = list(range(2018, 2023))

df = df.drop(2018, axis=1)
ukmod_df = ukmod_df.drop(2018, axis=1)
external_df = external_df.drop(2018, axis=1)
pd.concat(
    [df.apply(lambda col: col.round(1)), ukmod_df, external_df],
    axis=1,
    keys=["OpenFisca-UK", "UKMOD", "External"],
).fillna("")

Unnamed: 0_level_0,OpenFisca-UK,OpenFisca-UK,OpenFisca-UK,OpenFisca-UK,UKMOD,UKMOD,UKMOD,UKMOD,External,External,External,External
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
Income Tax,31.2,31.5,32.0,32.0,29.3,29.4,29.9,30.0,31.4,31.7,32.2,32.2
Universal Credit,2.1,4.0,4.3,4.6,3.0,4.6,4.8,5.6,2.1,4.1,4.3,4.6
Working Tax Credit,1.6,1.2,1.0,1.0,0.5,0.4,0.2,0.1,1.6,1.2,1.0,1.0
Child Tax Credit,2.7,2.1,1.7,1.6,1.5,0.9,0.6,0.2,2.8,2.1,1.7,1.7
Child Benefit,7.3,7.2,7.2,7.2,7.2,7.2,7.1,7.1,7.3,7.2,7.2,7.2
Housing Benefit,1.4,3.1,3.0,2.8,2.6,2.0,1.8,1.5,3.4,3.0,2.9,2.7
Pension Credit,1.6,1.5,1.5,1.4,1.5,1.5,1.3,1.3,1.6,1.5,1.5,1.4
Income Support,0.4,0.2,0.2,0.2,,,,,0.4,0.3,0.2,0.2
JSA (income-based),0.1,0.1,0.1,0.0,,,,,0.1,0.1,0.1,0.0
State Pension,12.5,12.3,12.4,12.6,,,,,12.6,12.4,12.5,12.7


### Differences

#### Absolute

In [5]:
pd.concat(
    [
        external_df,
        (ukmod_df - external_df).round(1).fillna(""),
        (df - external_df).round(1).fillna(""),
    ],
    axis=1,
    keys=[
        "External",
        "UKMOD Difference (m)",
        "OpenFisca-UK Difference (m)",
    ],
).fillna("")

Unnamed: 0_level_0,External,External,External,External,UKMOD Difference (m),UKMOD Difference (m),UKMOD Difference (m),UKMOD Difference (m),OpenFisca-UK Difference (m),OpenFisca-UK Difference (m),OpenFisca-UK Difference (m),OpenFisca-UK Difference (m)
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
Income Tax,31.4,31.7,32.2,32.2,-2.1,-2.3,-2.3,-2.2,-0.2,-0.2,-0.2,-0.2
Universal Credit,2.1,4.1,4.3,4.6,0.9,0.5,0.5,1.0,0.0,-0.1,-0.0,0.0
Working Tax Credit,1.6,1.2,1.0,1.0,-1.1,-0.8,-0.8,-0.9,-0.0,-0.0,0.0,-0.0
Child Tax Credit,2.8,2.1,1.7,1.7,-1.3,-1.2,-1.1,-1.5,-0.1,0.0,-0.0,-0.1
Housing Benefit,3.4,3.0,2.9,2.7,-0.8,-1.0,-1.1,-1.2,-2.0,0.1,0.1,0.1
Child Benefit,7.3,7.2,7.2,7.2,-0.1,0.0,-0.1,-0.1,-0.0,0.0,0.0,0.0
Pension Credit,1.6,1.5,1.5,1.4,-0.1,0.0,-0.2,-0.1,-0.0,-0.0,-0.0,0.0
Income Support,0.4,0.3,0.2,0.2,,,,,-0.0,-0.1,0.0,-0.0
JSA (income-based),0.1,0.1,0.1,0.0,,,,,0.0,0.0,-0.0,0.0
State Pension,12.6,12.4,12.5,12.7,,,,,-0.1,-0.1,-0.1,-0.1


#### Relative

In [6]:
pd.concat(
    [
        external_df,
        ((ukmod_df / external_df - 1).round(3) * 100).fillna(""),
        ((df / external_df - 1).round(3) * 100).fillna(""),
    ],
    axis=1,
    keys=["External", "UKMOD Difference (%)", "OpenFisca-UK Difference (%)"],
).fillna("")

Unnamed: 0_level_0,External,External,External,External,UKMOD Difference (%),UKMOD Difference (%),UKMOD Difference (%),UKMOD Difference (%),OpenFisca-UK Difference (%),OpenFisca-UK Difference (%),OpenFisca-UK Difference (%),OpenFisca-UK Difference (%)
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
Income Tax,31.4,31.7,32.2,32.2,-6.7,-7.3,-7.1,-6.8,-0.6,-0.6,-0.5,-0.5
Universal Credit,2.1,4.1,4.3,4.6,42.9,12.2,11.6,21.7,1.5,-1.3,-0.9,0.9
Working Tax Credit,1.6,1.2,1.0,1.0,-68.8,-66.7,-80.0,-90.0,-3.1,-1.3,1.7,-0.6
Child Tax Credit,2.8,2.1,1.7,1.7,-46.4,-57.1,-64.7,-88.2,-2.1,0.5,-1.4,-3.2
Housing Benefit,3.4,3.0,2.9,2.7,-23.5,-33.3,-37.9,-44.4,-59.6,3.4,3.6,2.8
Child Benefit,7.3,7.2,7.2,7.2,-1.4,0.0,-1.4,-1.4,-0.3,0.2,0.1,0.1
Pension Credit,1.6,1.5,1.5,1.4,-6.2,0.0,-13.3,-7.1,-2.2,-1.3,-1.4,0.4
Income Support,0.4,0.3,0.2,0.2,,,,,-11.7,-29.6,4.3,-21.7
JSA (income-based),0.1,0.1,0.1,0.0,,,,,42.7,0.9,-15.0,inf
State Pension,12.6,12.4,12.5,12.7,,,,,-1.2,-1.0,-0.9,-1.1


## Automated tests

Below are test results from the most recent version.

In [7]:
from openfisca_uk.tests.microsimulation.test_statistics import tests

pd.set_option("display.max_colwidth", 0)
pd.set_option("display.max_rows", 500)
pd.DataFrame({"Name": tests, "Passed": [test.test()[0] for test in tests]})

Unnamed: 0,Name,Passed
0,OpenFisca-UK Child Benefit aggregate error is less than 10.0% in 2019,True
1,OpenFisca-UK Child Benefit aggregate error is less than 10.0% in 2020,True
2,OpenFisca-UK Child Benefit aggregate error is less than 10.0% in 2021,True
3,OpenFisca-UK Child Benefit aggregate error is less than 10.0% in 2022,True
4,OpenFisca-UK Child Benefit caseload error is less than 10.0% in 2019,True
5,OpenFisca-UK Child Benefit caseload error is less than 10.0% in 2020,True
6,OpenFisca-UK Child Benefit caseload error is less than 10.0% in 2021,True
7,OpenFisca-UK Child Benefit caseload error is less than 10.0% in 2022,True
8,OpenFisca-UK Council Tax (less CTB) aggregate error is less than 10.0% in 2019,True
9,OpenFisca-UK Council Tax (less CTB) aggregate error is less than 10.0% in 2020,True
