# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f84785c6910>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'M1_Wildcard', 'ConfirmedCases',
       'ConfirmedDeaths', 'StringencyIndex', 'StringencyIndexForDisplay',
       'StringencyLegacyIndex', 'StringencyLegacyIndexForDisplay',
       'GovernmentResponseIndex', 'Gove

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89440,Zimbabwe,,Zimbabwe__nan,2020-07-27,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
89441,Zimbabwe,,Zimbabwe__nan,2020-07-28,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
89442,Zimbabwe,,Zimbabwe__nan,2020-07-29,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
89443,Zimbabwe,,Zimbabwe__nan,2020-07-30,213.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [15]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [16]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [17]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 143.96072294863617
Test MAE: 147.06037606429106


In [18]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.04383780182133295
Day -6 NewCases 0.38835038550625
Day -5 NewCases 0.26795066645359944
Day -4 NewCases 0.06055017624961843
Day -3 NewCases 0.024816828421479375
Day -2 NewCases 0.08895321464301278
Day -1 NewCases 0.19943197458860648
Day -26 C6_Stay at home requirements 5.934389586953503
Day -22 C2_Workplace closing 2.9795410868712344
Day -21 C2_Workplace closing 10.513420195418005
Intercept 26.970195244296576


In [19]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [20]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [21]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 67.88819334179738
2020-08-02: 73.1637233995244
2020-08-03: 84.95080954976001
2020-08-04: 96.29760644457569
2020-08-05: 97.85387095462863
2020-08-06: 106.45721477847292
2020-08-07: 135.76958680770292
2020-08-08: 151.2895591192481
2020-08-09: 165.14907624330607
2020-08-10: 175.8822876383444
2020-08-11: 184.82277383208378
2020-08-12: 200.10783734821467
2020-08-13: 220.9762227879765
2020-08-14: 238.3953180283373
2020-08-15: 253.5849088054375
2020-08-16: 266.77851061792927
2020-08-17: 280.49497533225224
2020-08-18: 297.7553663808854
2020-08-19: 317.10668692526184
2020-08-20: 335.39019779745865
2020-08-21: 352.21449183984174
2020-08-22: 368.1864836925478
2020-08-23: 385.02395537278454
2020-08-24: 403.8168194571238
2020-08-25: 423.6483821518853
2020-08-26: 443.11688375130797
2020-08-27: 455.9300441600354
2020-08-28: 483.8125527787333
2020-08-29: 507.45105659176943
2020-08-30: 529.4925862635175
2020-08-31: 551.2009183533467

Predicting for Afghanistan__na

2020-08-21: 332.52949033568933
2020-08-22: 337.2333797836605
2020-08-23: 349.1584329447484
2020-08-24: 366.35518012489314
2020-08-25: 384.98171899834836
2020-08-26: 402.63377188696376
2020-08-27: 405.1442753938821
2020-08-28: 414.42069215195147
2020-08-29: 427.50511883341926
2020-08-30: 444.26300164113235
2020-08-31: 461.86855427013364

Predicting for Belgium__nan
2020-08-01: 2540.588319654115
2020-08-02: 1941.3786995549303
2020-08-03: 1733.8393506677771
2020-08-04: 2206.179357524774
2020-08-05: 2110.987505744911
2020-08-06: 1644.175555307603
2020-08-07: 2234.579060562129
2020-08-08: 2160.0980980584536
2020-08-09: 2200.111627790725
2020-08-10: 2336.681315386948
2020-08-11: 2260.033647035484
2020-08-12: 2226.118547682987
2020-08-13: 2407.210984296075
2020-08-14: 2454.3558493361775
2020-08-15: 2523.25222904984
2020-08-16: 2577.879834278055
2020-08-17: 2574.171234762467
2020-08-18: 2614.8481739299714
2020-08-19: 2709.6309883944336
2020-08-20: 2780.0759839088023
2020-08-21: 2846.8012868693

2020-08-08: 221.4223356869823
2020-08-09: 249.87310642425024
2020-08-10: 269.61241971330935
2020-08-11: 272.0926765618326
2020-08-12: 274.8256543337797
2020-08-13: 297.12430452028735
2020-08-14: 320.80914208657913
2020-08-15: 345.0250722645299
2020-08-16: 362.2577851879418
2020-08-17: 372.3474337604004
2020-08-18: 385.07264134104145
2020-08-19: 405.5277665560985
2020-08-20: 428.697171808977
2020-08-21: 451.12424443295544
2020-08-22: 479.90690989370717
2020-08-23: 500.51863860275324
2020-08-24: 520.0141790804989
2020-08-25: 542.5178088818082
2020-08-26: 566.89816065013
2020-08-27: 592.9316063331883
2020-08-28: 619.7149848437266
2020-08-29: 643.8300319837463
2020-08-30: 667.6486493624402
2020-08-31: 692.9116289093404

Predicting for Brazil__nan
2020-08-01: 35247.445420322845
2020-08-02: 28844.034779582023
2020-08-03: 34882.01602560334
2020-08-04: 44507.41800354113
2020-08-05: 36477.51207077151
2020-08-06: 25528.825302665475
2020-08-07: 33022.17046947501
2020-08-08: 34602.489831750354
202

2020-08-08: 164.7521382629264
2020-08-09: 170.85929262019908
2020-08-10: 176.40230535123538
2020-08-11: 186.8710473328541
2020-08-12: 206.4618710845097
2020-08-13: 231.3974114532154
2020-08-14: 248.42306085239807
2020-08-15: 259.7405435767099
2020-08-16: 270.5426083514749
2020-08-17: 285.1938936717975
2020-08-18: 305.1370021716296
2020-08-19: 326.4755026363044
2020-08-20: 344.26031471290366
2020-08-21: 359.12330200394257
2020-08-22: 363.5101068080413
2020-08-23: 375.9680982488274
2020-08-24: 394.393425618877
2020-08-25: 414.11155567596614
2020-08-26: 432.0824819597489
2020-08-27: 434.4904041028912
2020-08-28: 443.86749584112323
2020-08-29: 457.5591502287736
2020-08-30: 475.25682613446276
2020-08-31: 493.66332597785146

Predicting for Cameroon__nan
2020-08-01: 77.39929147709643
2020-08-02: 158.19875523079386
2020-08-03: 229.39959605871013
2020-08-04: 144.7641580678032
2020-08-05: 112.06208888292682
2020-08-06: 123.56906124952405
2020-08-07: 176.87412045385005
2020-08-08: 236.44201397227


Predicting for Germany__nan
2020-08-01: 10744.424199753317
2020-08-02: 9931.32799249936
2020-08-03: 16696.95920189487
2020-08-04: 21010.160350276103
2020-08-05: 16749.260960899202
2020-08-06: 10176.601119742789
2020-08-07: 11937.888798382492
2020-08-08: 13828.010841742655
2020-08-09: 17688.039293653932
2020-08-10: 19101.631581740443
2020-08-11: 16653.672888121153
2020-08-12: 14234.110878194377
2020-08-13: 14704.950584795979
2020-08-14: 16454.003750312306
2020-08-15: 18597.10872525579
2020-08-16: 19107.528234798938
2020-08-17: 17934.85348494822
2020-08-18: 16984.68949190771
2020-08-19: 17378.758062255965
2020-08-20: 18648.76417211061
2020-08-21: 19888.229481883824
2020-08-22: 20189.10162090029
2020-08-23: 19730.087458575956
2020-08-24: 19458.154493275277
2020-08-25: 19897.290678878697
2020-08-26: 20810.103767440745
2020-08-27: 21602.43977534729
2020-08-28: 21884.273042015247
2020-08-29: 21822.269968137156
2020-08-30: 21901.848197711097
2020-08-31: 22370.38020013518

Predicting for Djib

2020-08-01: 521.8128526217465
2020-08-02: 596.9586153124287
2020-08-03: 608.6416379411996
2020-08-04: 565.0650518460363
2020-08-05: 453.6078953900423
2020-08-06: 402.98106768831127
2020-08-07: 586.5272884185813
2020-08-08: 668.412759510031
2020-08-09: 689.2201754134978
2020-08-10: 655.8675417344944
2020-08-11: 605.4525167311294
2020-08-12: 622.5402259189595
2020-08-13: 712.8978123669704
2020-08-14: 774.5892401151393
2020-08-15: 795.0348918645113
2020-08-16: 782.3280155802149
2020-08-17: 772.1525804872231
2020-08-18: 801.8730890189227
2020-08-19: 860.1875648267583
2020-08-20: 906.8365387620381
2020-08-21: 928.6882175559194
2020-08-22: 944.1909572743405
2020-08-23: 960.3496032021428
2020-08-24: 995.0393702522008
2020-08-25: 1041.551788905488
2020-08-26: 1081.7810160015072
2020-08-27: 1104.5314144237773
2020-08-28: 1127.2099970545523
2020-08-29: 1153.8212182241405
2020-08-30: 1190.789336463389
2020-08-31: 1232.8327531367786

Predicting for Finland__nan
2020-08-01: 443.9466760435379
2020-0

2020-08-29: 5555.624469276091
2020-08-30: 5675.007241375069
2020-08-31: 5832.154511925239

Predicting for Ghana__nan
2020-08-01: 168.9824389671018
2020-08-02: 192.82919721362614
2020-08-03: 152.2072945491483
2020-08-04: 108.32942674098295
2020-08-05: 102.49294088821208
2020-08-06: 143.14065736313836
2020-08-07: 219.19351474217814
2020-08-08: 240.95902615077097
2020-08-09: 226.23301235457336
2020-08-10: 209.19625159166117
2020-08-11: 216.33528728182264
2020-08-12: 253.10412869669972
2020-08-13: 296.9067367269254
2020-08-14: 315.8994629630772
2020-08-15: 315.59864619146475
2020-08-16: 315.1926400848617
2020-08-17: 330.08633441341334
2020-08-18: 360.49213672075194
2020-08-19: 391.56431616047513
2020-08-20: 410.02626518662555
2020-08-21: 418.7355172309895
2020-08-22: 428.5467754596517
2020-08-23: 447.53119150189156
2020-08-24: 474.3108673769326
2020-08-25: 500.45785485425364
2020-08-26: 519.9852358186979
2020-08-27: 522.9713015698019
2020-08-28: 536.8531285765323
2020-08-29: 556.9333825387

2020-08-11: 2921.915402202528
2020-08-12: 2854.9698259549286
2020-08-13: 3143.8957784889194
2020-08-14: 3306.505505935842
2020-08-15: 3417.2922558002656
2020-08-16: 3421.1901812057586
2020-08-17: 3348.858949462867
2020-08-18: 3388.7711498324834
2020-08-19: 3549.9428575423735
2020-08-20: 3689.5777687374652
2020-08-21: 3779.5699834420793
2020-08-22: 3792.830530601489
2020-08-23: 3796.5004497892414
2020-08-24: 3864.6152998438815
2020-08-25: 3986.0601846502113
2020-08-26: 4123.666216201692
2020-08-27: 4202.5380547813265
2020-08-28: 4247.724519961653
2020-08-29: 4294.778451215404
2020-08-30: 4377.62595661112
2020-08-31: 4491.251886121362

Predicting for Haiti__nan
2020-08-01: 58.68338338237927
2020-08-02: 70.82388223619517
2020-08-03: 81.78552079116736
2020-08-04: 82.695224800426
2020-08-05: 88.0996381546924
2020-08-06: 100.0453766867922
2020-08-07: 128.89231065344447
2020-08-08: 146.12162293325136
2020-08-09: 157.78034487251668
2020-08-10: 165.35920970907625
2020-08-11: 175.42157634090313


2020-08-01: 101.67214331003831
2020-08-02: 109.82273875437784
2020-08-03: 121.86598906306945
2020-08-04: 124.01722172904536
2020-08-05: 119.81832236020341
2020-08-06: 126.54374425140432
2020-08-07: 167.59524275571525
2020-08-08: 187.255893309759
2020-08-09: 200.35190728810647
2020-08-10: 206.3768265307045
2020-08-11: 211.9826534761977
2020-08-12: 228.5795539847586
2020-08-13: 254.8359256950759
2020-08-14: 274.9964765228906
2020-08-15: 289.66614279973044
2020-08-16: 300.45759948168836
2020-08-17: 312.8931186584785
2020-08-18: 331.64444520221014
2020-08-19: 353.9725760364167
2020-08-20: 373.96667094338164
2020-08-21: 390.6309559124764
2020-08-22: 395.0750334451851
2020-08-23: 406.63913036364573
2020-08-24: 424.7749649554166
2020-08-25: 445.3903933652469
2020-08-26: 464.87982053708936
2020-08-27: 480.28959831051606
2020-08-28: 492.26117777022813
2020-08-29: 507.2966254165674
2020-08-30: 525.996515693858
2020-08-31: 546.3167342242477

Predicting for Jordan__nan
2020-08-01: 3157.72617140626

2020-08-20: 1962.1971068666376
2020-08-21: 2028.4063770228468
2020-08-22: 2062.4521907036446
2020-08-23: 2072.624387795039
2020-08-24: 2108.1169913970766
2020-08-25: 2166.2397121498966
2020-08-26: 2235.8789763257464
2020-08-27: 2305.6458962300467
2020-08-28: 2351.175154041778
2020-08-29: 2386.6618393703084
2020-08-30: 2433.540571153614
2020-08-31: 2494.188420993352

Predicting for Liberia__nan
2020-08-01: 52.50728690777815
2020-08-02: 62.80356760880151
2020-08-03: 69.52766716255499
2020-08-04: 73.08761921118099
2020-08-05: 77.83056425675719
2020-08-06: 93.95280328794841
2020-08-07: 121.23560598204219
2020-08-08: 136.54615033598787
2020-08-09: 146.7303089216145
2020-08-10: 154.72470874432787
2020-08-11: 165.5748486976479
2020-08-12: 183.40907896436775
2020-08-13: 204.15051882286366
2020-08-14: 220.49749160249593
2020-08-15: 233.4704926156295
2020-08-16: 245.56482375300016
2020-08-17: 260.13519492895705
2020-08-18: 278.38784239748884
2020-08-19: 297.6266875846629
2020-08-20: 314.91480099

2020-08-02: 62.768596971286094
2020-08-03: 69.50509484568312
2020-08-04: 73.07565515800638
2020-08-05: 77.81468495504306
2020-08-06: 93.89890908202689
2020-08-07: 121.144313457006
2020-08-08: 136.49471498576906
2020-08-09: 146.69612654694552
2020-08-10: 154.6978968166125
2020-08-11: 165.53852446997882
2020-08-12: 183.34939918485537
2020-08-13: 204.08105224677075
2020-08-14: 220.4426679269667
2020-08-15: 233.42698542030493
2020-08-16: 245.5242887200397
2020-08-17: 260.0864009387483
2020-08-18: 278.3267236453337
2020-08-19: 297.5622342073008
2020-08-20: 314.85685107015183
2020-08-21: 330.35261903221544
2020-08-22: 356.2457457270654
2020-08-23: 378.43454137073576
2020-08-24: 399.4708966894095
2020-08-25: 420.1246777426079
2020-08-26: 439.9451574085958
2020-08-27: 461.3148321352847
2020-08-28: 485.80644902666955
2020-08-29: 509.7230638592107
2020-08-30: 533.0782498450019
2020-08-31: 556.0192244466277

Predicting for Mexico__nan
2020-08-01: 12040.513224656648
2020-08-02: 8202.044524149353
2

2020-08-31: 616.8633839383012

Predicting for Niger__nan
2020-08-01: 74.52197661662835
2020-08-02: 92.36270336420954
2020-08-03: 104.66283822955822
2020-08-04: 113.89987097536584
2020-08-05: 113.79853325576688
2020-08-06: 115.68419382792
2020-08-07: 148.37908138550702
2020-08-08: 169.1151741066978
2020-08-09: 184.23351891261328
2020-08-10: 194.11796546408857
2020-08-11: 200.79919146286852
2020-08-12: 214.13012978820217
2020-08-13: 236.87976599442374
2020-08-14: 256.9040246724361
2020-08-15: 273.08532021152126
2020-08-16: 285.7569971874378
2020-08-17: 298.19794898720767
2020-08-18: 314.9860452816708
2020-08-19: 335.5197847424543
2020-08-20: 355.29375286679925
2020-08-21: 372.79100177334203
2020-08-22: 378.01611937843626
2020-08-23: 389.25444755032845
2020-08-24: 406.15914083626905
2020-08-25: 425.7279817796089
2020-08-26: 444.9974242844107
2020-08-27: 448.8742874101544
2020-08-28: 458.67829034943975
2020-08-29: 471.7645752490259
2020-08-30: 488.810567781456
2020-08-31: 507.3560913246479

2020-08-30: 2136.2910978001787
2020-08-31: 2207.105198891523

Predicting for Philippines__nan
2020-08-01: 1673.4078218101436
2020-08-02: 1858.305491859683
2020-08-03: 1781.9359571052928
2020-08-04: 1578.5745259077694
2020-08-05: 1285.9891946078874
2020-08-06: 1117.1277759879488
2020-08-07: 1684.3903239456567
2020-08-08: 1887.6255716802455
2020-08-09: 1880.6668455319473
2020-08-10: 1740.4887530649694
2020-08-11: 1583.5181857801417
2020-08-12: 1625.4718631962778
2020-08-13: 1883.3267394516015
2020-08-14: 2028.0292758671378
2020-08-15: 2040.0073195179962
2020-08-16: 1967.403683355223
2020-08-17: 1917.3307346116826
2020-08-18: 1988.449149995141
2020-08-19: 2137.8521055847773
2020-08-20: 2239.043830496232
2020-08-21: 2262.7885031117153
2020-08-22: 2253.965000603213
2020-08-23: 2265.2822451273078
2020-08-24: 2338.9272284292338
2020-08-25: 2444.092631441077
2020-08-26: 2523.5732703028098
2020-08-27: 2562.5849599539624
2020-08-28: 2585.151051927401
2020-08-29: 2625.203384152906
2020-08-30: 269

2020-08-11: 23241.84610688655
2020-08-12: 22618.972270733222
2020-08-13: 25673.16499705888
2020-08-14: 27955.160304788715
2020-08-15: 28799.859046654015
2020-08-16: 27949.552006903607
2020-08-17: 26677.208018982674
2020-08-18: 26948.333465973443
2020-08-19: 28689.52386159763
2020-08-20: 30224.269040624997
2020-08-21: 30815.155008251193
2020-08-22: 30505.545267794958
2020-08-23: 30184.114200928565
2020-08-24: 30716.130828832993
2020-08-25: 31921.047844993886
2020-08-26: 33012.62472029816
2020-08-27: 33545.03206356868
2020-08-28: 33629.964714149806
2020-08-29: 33798.463479599755
2020-08-30: 34434.30673591999
2020-08-31: 35394.189014847376

Predicting for Rwanda__nan
2020-08-01: 73.48134163928393
2020-08-02: 84.70173035950222
2020-08-03: 89.34990226367975
2020-08-04: 97.92943126845616
2020-08-05: 104.83934003853098
2020-08-06: 111.00338926223583
2020-08-07: 142.86819980045587
2020-08-08: 159.2865314302748
2020-08-09: 170.56260444653816
2020-08-10: 180.82303882934985
2020-08-11: 190.920445

2020-08-21: 350.2134351246987
2020-08-22: 357.3855574385996
2020-08-23: 369.5208342722835
2020-08-24: 385.27578063493905
2020-08-25: 402.8518053103442
2020-08-26: 420.9167177453596
2020-08-27: 425.01694134121936
2020-08-28: 435.65683876540567
2020-08-29: 448.9047457029861
2020-08-30: 465.04877407923874
2020-08-31: 482.31599471680994

Predicting for Serbia__nan
2020-08-01: 6063.226880515817
2020-08-02: 6318.4025116252
2020-08-03: 7162.567750499312
2020-08-04: 7694.451781145578
2020-08-05: 6173.922521174802
2020-08-06: 4500.185552637034
2020-08-07: 6171.326371408255
2020-08-08: 6941.28269039083
2020-08-09: 7591.421704710173
2020-08-10: 7565.834536549232
2020-08-11: 6723.198903598561
2020-08-12: 6346.759972584057
2020-08-13: 7017.387444966625
2020-08-14: 7641.662857003134
2020-08-15: 8044.832878762342
2020-08-16: 7967.397679536704
2020-08-17: 7614.690421225386
2020-08-18: 7581.832174135853
2020-08-19: 7977.606259286651
2020-08-20: 8420.03773658076
2020-08-21: 8684.49545465878
2020-08-22: 

2020-08-02: 78.04556523166306
2020-08-03: 86.51361174239753
2020-08-04: 92.23271333388162
2020-08-05: 92.09593199794787
2020-08-06: 104.78467996777228
2020-08-07: 136.3313116153409
2020-08-08: 153.19032620458802
2020-08-09: 164.91969920633292
2020-08-10: 172.86535324144916
2020-08-11: 181.41942976068094
2020-08-12: 198.51870468951552
2020-08-13: 220.92193856829218
2020-08-14: 238.67732412660496
2020-08-15: 252.57666797449025
2020-08-16: 264.41133974913447
2020-08-17: 278.0571561772175
2020-08-18: 296.2697931175621
2020-08-19: 316.458587773865
2020-08-20: 334.76194670710925
2020-08-21: 350.83359718531
2020-08-22: 376.6458828493821
2020-08-23: 398.57776267474253
2020-08-24: 419.8355737197766
2020-08-25: 441.1828178801369
2020-08-26: 461.73786804232617
2020-08-27: 471.68308956698763
2020-08-28: 493.91727635633464
2020-08-29: 516.3836865908993
2020-08-30: 539.2703631175409
2020-08-31: 561.7169701325404

Predicting for Thailand__nan
2020-08-01: 63.502147619776636
2020-08-02: 77.403278143617

2020-08-10: 560.6441466550154
2020-08-11: 510.19934214323763
2020-08-12: 495.8530630586727
2020-08-13: 543.7426449234968
2020-08-14: 599.6857200114775
2020-08-15: 646.8126648320776
2020-08-16: 656.5954453876909
2020-08-17: 644.1978152029044
2020-08-18: 652.2017184943559
2020-08-19: 688.7504785676642
2020-08-20: 733.4887071595617
2020-08-21: 768.4855141263374
2020-08-22: 793.8928229499854
2020-08-23: 806.1346494482723
2020-08-24: 826.771638948208
2020-08-25: 861.2580927452941
2020-08-26: 900.1674845861878
2020-08-27: 934.6084861455652
2020-08-28: 961.7249831689976
2020-08-29: 984.6478997745144
2020-08-30: 1012.6339547071709
2020-08-31: 1047.7359955508468

Predicting for Ukraine__nan
2020-08-01: 12904.878012214025
2020-08-02: 12357.1846477962
2020-08-03: 12459.16060151467
2020-08-04: 12968.216426604511
2020-08-05: 10615.528413931444
2020-08-06: 8427.581060981009
2020-08-07: 12076.31103040822
2020-08-08: 12962.145748005552
2020-08-09: 13418.6167625245
2020-08-10: 13118.276938441044
2020-0

2020-08-02: 215.8894212499651
2020-08-03: 210.74479498130142
2020-08-04: 218.45190169304894
2020-08-05: 202.92817236382677
2020-08-06: 208.77370372032448
2020-08-07: 291.4657374151219
2020-08-08: 299.27781428021984
2020-08-09: 305.25402592632827
2020-08-10: 308.15482006781957
2020-08-11: 310.3413529643728
2020-08-12: 335.40375947627456
2020-08-13: 375.4933426151356
2020-08-14: 394.2079653160625
2020-08-15: 405.70131926090187
2020-08-16: 414.14500827238754
2020-08-17: 427.4349833903498
2020-08-18: 452.8258034935098
2020-08-19: 481.65928258261414
2020-08-20: 502.6141992028397
2020-08-21: 518.3392886721122
2020-08-22: 543.4497867878438
2020-08-23: 567.4366750611659
2020-08-24: 594.2822458332696
2020-08-25: 621.2705960380972
2020-08-26: 644.7720538998849
2020-08-27: 661.7977759495916
2020-08-28: 686.4473765188088
2020-08-29: 712.7045636778723
2020-08-30: 740.6878560457435
2020-08-31: 768.201934196519

Predicting for United States__Delaware
2020-08-01: 501.1735135488319
2020-08-02: 558.3455

2020-08-01: 1172.3799435254691
2020-08-02: 1381.192065703969
2020-08-03: 2214.512618101791
2020-08-04: 3684.9781736817467
2020-08-05: 2719.086812560486
2020-08-06: 1532.7209296913582
2020-08-07: 1650.8011480127411
2020-08-08: 1989.6612736194986
2020-08-09: 2706.6041496885373
2020-08-10: 3159.5994665096505
2020-08-11: 2700.7449683808577
2020-08-12: 2216.4118592532423
2020-08-13: 2218.303315175337
2020-08-14: 2520.5173369467975
2020-08-15: 2955.813717120428
2020-08-16: 3124.62886437187
2020-08-17: 2916.516154690835
2020-08-18: 2711.430958631184
2020-08-19: 2743.0457449008927
2020-08-20: 2970.2506505014267
2020-08-21: 3228.214031614568
2020-08-22: 3327.6287636770894
2020-08-23: 3252.56135810443
2020-08-24: 3186.295991606534
2020-08-25: 3248.6567943503105
2020-08-26: 3418.0981549368826
2020-08-27: 3582.09172930957
2020-08-28: 3657.206798853188
2020-08-29: 3652.2056741985407
2020-08-30: 3659.9041695643327
2020-08-31: 3739.5034264822843

Predicting for United States__Massachusetts
2020-08-01

2020-08-27: 1150.0620611188433
2020-08-28: 1166.284741029588
2020-08-29: 1191.6087033730778
2020-08-30: 1229.8027273538644
2020-08-31: 1271.7077087192988

Predicting for United States__Nebraska
2020-08-01: 1265.261567163049
2020-08-02: 1489.9216756710339
2020-08-03: 1886.1867996464475
2020-08-04: 1997.4350645052525
2020-08-05: 1717.5783278505478
2020-08-06: 1151.0072226145317
2020-08-07: 1489.0329623066323
2020-08-08: 1754.7313569710486
2020-08-09: 2000.3300816885844
2020-08-10: 2032.6146925705355
2020-08-11: 1832.3426845110394
2020-08-12: 1675.7347393824039
2020-08-13: 1819.9887728638864
2020-08-14: 2015.6223036147233
2020-08-15: 2167.1334030573944
2020-08-16: 2178.4903124340954
2020-08-17: 2089.49862513255
2020-08-18: 2057.421464481878
2020-08-19: 2154.139918975095
2020-08-20: 2291.9501130945546
2020-08-21: 2392.3114031897776
2020-08-22: 2412.2502725470167
2020-08-23: 2391.773320719899
2020-08-24: 2409.8556654520276
2020-08-25: 2491.2928760145205
2020-08-26: 2594.492112488584
2020-08

2020-08-22: 1686.5330575525354
2020-08-23: 1643.7970166243022
2020-08-24: 1654.4120886101487
2020-08-25: 1737.0952403958695
2020-08-26: 1838.357037240618
2020-08-27: 1892.1821635630454
2020-08-28: 1900.6236736033097
2020-08-29: 1900.165907342332
2020-08-30: 1932.6966209690463
2020-08-31: 2002.3207192774285

Predicting for United States__South Carolina
2020-08-01: 1459.9802705230268
2020-08-02: 1392.906723774127
2020-08-03: 1539.0339671791787
2020-08-04: 1647.6002951939117
2020-08-05: 1450.0543835252429
2020-08-06: 1086.1159562923626
2020-08-07: 1472.2163619498024
2020-08-08: 1595.6240627883092
2020-08-09: 1716.4857736360755
2020-08-10: 1734.7473403793997
2020-08-11: 1606.1060878466146
2020-08-12: 1546.0073502549967
2020-08-13: 1697.4052358787299
2020-08-14: 1817.4023096771243
2020-08-15: 1902.759994405609
2020-08-16: 1908.4960865398555
2020-08-17: 1864.1171727615979
2020-08-18: 1876.7494398816461
2020-08-19: 1968.9459718781773
2020-08-20: 2063.8120754592746
2020-08-21: 2128.50501689341

2020-08-19: 1369.5588769337264
2020-08-20: 1443.2046329229465
2020-08-21: 1484.9544154179857
2020-08-22: 1493.3611127070712
2020-08-23: 1497.8504554052683
2020-08-24: 1532.3502240551973
2020-08-25: 1593.3337898024286
2020-08-26: 1652.398345753921
2020-08-27: 1686.490599057042
2020-08-28: 1708.4439375268812
2020-08-29: 1732.3693163971225
2020-08-30: 1773.4514413060028
2020-08-31: 1827.4036465587005

Predicting for United States__Wyoming
2020-08-01: 430.0496880531549
2020-08-02: 628.7041959595417
2020-08-03: 732.9696326703051
2020-08-04: 678.868241695197
2020-08-05: 582.8892686236511
2020-08-06: 430.5291699842522
2020-08-07: 586.7435599839994
2020-08-08: 722.6250221987486
2020-08-09: 788.7308602425148
2020-08-10: 766.4957275270585
2020-08-11: 700.3024664656762
2020-08-12: 673.4728505684093
2020-08-13: 756.0812937321243
2020-08-14: 844.5112471459834
2020-08-15: 890.4925080164971
2020-08-16: 884.4806957311737
2020-08-17: 860.6985046255659
2020-08-18: 870.7302922547094
2020-08-19: 927.85016

CPU times: user 15.9 s, sys: 4.33 s, total: 20.2 s
Wall time: 18.8 s


In [22]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,67.888193
214,Aruba,,2020-08-02,73.163723
215,Aruba,,2020-08-03,84.95081
216,Aruba,,2020-08-04,96.297606
217,Aruba,,2020-08-05,97.853871


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [23]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [24]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,67.88819334179738
Aruba,,2020-08-02,73.1637233995244
Aruba,,2020-08-03,84.95080954976001
Aruba,,2020-08-04,96.29760644457569
Afghanistan,,2020-08-01,221.55244153541952
Afghanistan,,2020-08-02,283.13901472200837
Afghanistan,,2020-08-03,336.41679497414344
Afghanistan,,2020-08-04,399.5451746296149
Angola,,2020-08-01,102.16466310293828


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [25]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [26]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="./predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to ./predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [27]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 41 ms, sys: 17.1 ms, total: 58.2 ms
Wall time: 1.97 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [28]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-10
End date: 2021-06-08


In [29]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [30]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-10 to 2021-06-08...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 3.55 s, sys: 820 ms, total: 4.37 s
Wall time: 3min 7s
