# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x24b756e0ba8>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'M1_Wildcard', 'ConfirmedCases',
       'ConfirmedDeaths', 'StringencyIndex', 'StringencyIndexForDisplay',
       'StringencyLegacyIndex', 'StringencyLegacyIndexForDisplay',
       'GovernmentResponseIndex', 'Gove

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89176,Zimbabwe,,Zimbabwe__nan,2020-07-27,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
89177,Zimbabwe,,Zimbabwe__nan,2020-07-28,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
89178,Zimbabwe,,Zimbabwe__nan,2020-07-29,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
89179,Zimbabwe,,Zimbabwe__nan,2020-07-30,213.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [15]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [16]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [17]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 143.99183348823468
Test MAE: 147.08894148293928


In [18]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.044054272748459467
Day -6 NewCases 0.38818943395674793
Day -5 NewCases 0.267987985568403
Day -4 NewCases 0.060603381610973964
Day -3 NewCases 0.024635984695023948
Day -2 NewCases 0.08923571788822025
Day -1 NewCases 0.19920207572483511
Day -26 C6_Stay at home requirements 5.773239160619521
Day -22 C2_Workplace closing 2.8975369086913227
Day -21 C2_Workplace closing 10.74535993710082
Intercept 26.96089126501346


In [19]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [20]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [21]:
os.getcwd()

'C:\\Users\\marti\\Desktop\\Projects\\Pandemic-Prize\\covid-xprize\\covid_xprize\\examples\\predictors\\linear'

In [22]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 68.90672517815139
2020-08-02: 78.75962279046016
2020-08-03: 80.54279905477833
2020-08-04: 87.0935887074165
2020-08-05: 93.59258062852393
2020-08-06: 104.97895215458959
2020-08-07: 136.29639033408034
2020-08-08: 151.44636536090846
2020-08-09: 160.8150866202873
2020-08-10: 169.85801839988343
2020-08-11: 180.6294319956761
2020-08-12: 197.83009449421263
2020-08-13: 219.72698291681203
2020-08-14: 236.20863056584778
2020-08-15: 249.24998969929118
2020-08-16: 261.7101890920532
2020-08-17: 276.27840844709937
2020-08-18: 294.6321983761607
2020-08-19: 314.3604236374357
2020-08-20: 331.89975385766235
2020-08-21: 347.6169227889191
2020-08-22: 363.22682927851645
2020-08-23: 380.4893327791255
2020-08-24: 399.8246381552569
2020-08-25: 419.72096325592
2020-08-26: 438.6707519904341
2020-08-27: 451.0275120517328
2020-08-28: 478.9654654450156
2020-08-29: 502.7740303827118
2020-08-30: 525.0608758664021
2020-08-31: 546.7158382751627

Predicting for Afghanistan__nan
20

2020-08-28: 413.54775685942536
2020-08-29: 426.58592404358143
2020-08-30: 443.2209112140717
2020-08-31: 460.7170937678342

Predicting for Belgium__nan
2020-08-01: 2730.1979632842913
2020-08-02: 2813.763529664165
2020-08-03: 2163.7824135899723
2020-08-04: 1686.213165881071
2020-08-05: 1573.5273311829505
2020-08-06: 1553.810694364145
2020-08-07: 2488.6502982665106
2020-08-08: 2619.9234474181076
2020-08-09: 2345.562331034678
2020-08-10: 2080.2388490107396
2020-08-11: 1992.7252959611476
2020-08-12: 2190.720223334238
2020-08-13: 2596.3940538243155
2020-08-14: 2695.254905073923
2020-08-15: 2578.8985212689336
2020-08-16: 2448.00021099647
2020-08-17: 2445.9623828926624
2020-08-18: 2618.7202918998605
2020-08-19: 2835.370117359391
2020-08-20: 2911.0265947342946
2020-08-21: 2873.6707363570126
2020-08-22: 2853.792996756292
2020-08-23: 2907.027841436503
2020-08-24: 3044.555838821467
2020-08-25: 3185.935476045042
2020-08-26: 3255.3803117950956
2020-08-27: 3271.949176070933
2020-08-28: 3298.168661415

2020-08-01: 136.84581659927403
2020-08-02: 137.7988468709695
2020-08-03: 137.00313087671987
2020-08-04: 157.15017898221737
2020-08-05: 150.75260027024171
2020-08-06: 149.0967600467018
2020-08-07: 197.52810492409355
2020-08-08: 214.27669474047792
2020-08-09: 226.6386096183896
2020-08-10: 237.7598477910824
2020-08-11: 242.3864129537243
2020-08-12: 257.67441333102124
2020-08-13: 285.37183646459385
2020-08-14: 304.9894989063766
2020-08-15: 320.5429677579667
2020-08-16: 333.1023112414685
2020-08-17: 345.5368636231106
2020-08-18: 364.2677028114136
2020-08-19: 387.04318225422895
2020-08-20: 407.3227429426853
2020-08-21: 424.8775756573214
2020-08-22: 451.51871626577224
2020-08-23: 473.56951425780875
2020-08-24: 495.9233747225572
2020-08-25: 519.1652944633563
2020-08-26: 541.5278348624686
2020-08-27: 564.7910784888365
2020-08-28: 590.3725329841441
2020-08-29: 615.2278951006581
2020-08-30: 640.2677923289082
2020-08-31: 665.5137763548657

Predicting for Brazil__nan
2020-08-01: 35419.08539693116
2

2020-08-19: 308.9705068840412
2020-08-20: 327.1823555519127
2020-08-21: 342.9283260390576
2020-08-22: 379.4082656763324
2020-08-23: 406.19285530427584
2020-08-24: 429.3297801830998
2020-08-25: 451.68044656404817
2020-08-26: 473.24455433996656
2020-08-27: 504.1920385802301
2020-08-28: 536.2666894957445
2020-08-29: 565.5084362899388
2020-08-30: 592.4161324990364
2020-08-31: 618.5258172443096

Predicting for Cote d'Ivoire__nan
2020-08-01: 77.85248032050114
2020-08-02: 94.84791695842748
2020-08-03: 104.099804354474
2020-08-04: 94.38355945010962
2020-08-05: 89.38559159675441
2020-08-06: 107.68684438649612
2020-08-07: 145.85174136088068
2020-08-08: 166.8820043281226
2020-08-09: 176.36129360762646
2020-08-10: 177.47203217050023
2020-08-11: 183.9065232872797
2020-08-12: 203.9073464863522
2020-08-13: 230.32489548581248
2020-08-14: 249.98333673283722
2020-08-15: 262.0433173498274
2020-08-16: 270.6360119747928
2020-08-17: 283.4158880320139
2020-08-18: 303.34407946123736
2020-08-19: 325.8011642569

2020-08-09: 3363.1110874225037
2020-08-10: 3713.0700675836542
2020-08-11: 3519.3479323052857
2020-08-12: 3285.757760013455
2020-08-13: 3454.8408126128948
2020-08-14: 3576.5876720438832
2020-08-15: 3803.1050181334263
2020-08-16: 3945.814429341883
2020-08-17: 3885.3191794018785
2020-08-18: 3845.0626281694704
2020-08-19: 3936.8609115091067
2020-08-20: 4074.126913344433
2020-08-21: 4236.5438123963295
2020-08-22: 4330.133826338703
2020-08-23: 4344.230579907205
2020-08-24: 4374.018933645067
2020-08-25: 4464.006754809845
2020-08-26: 4591.46495520665
2020-08-27: 4709.103396455851
2020-08-28: 4795.1968292029305
2020-08-29: 4849.016166709912
2020-08-30: 4914.34243785056
2020-08-31: 5011.810349708254

Predicting for Germany__nan
2020-08-01: 15565.304231851434
2020-08-02: 11773.172872895571
2020-08-03: 10949.305356165763
2020-08-04: 15965.602651952364
2020-08-05: 15683.349691824662
2020-08-06: 10846.611745832264
2020-08-07: 13866.58160396314
2020-08-08: 13326.4733213026
2020-08-09: 14209.539675856

2020-08-07: 410.17452760569284
2020-08-08: 463.6479368140212
2020-08-09: 468.00325890916355
2020-08-10: 436.5374491206868
2020-08-11: 411.5178745871988
2020-08-12: 441.4509244936225
2020-08-13: 511.5299158147673
2020-08-14: 553.5074925490796
2020-08-15: 562.9581845974246
2020-08-16: 553.3994205471745
2020-08-17: 554.5429768920583
2020-08-18: 585.9924049781575
2020-08-19: 632.4685856825542
2020-08-20: 665.8972315274443
2020-08-21: 680.5040300180044
2020-08-22: 675.714523558599
2020-08-23: 685.2573974649918
2020-08-24: 713.8304768272344
2020-08-25: 749.5266053480557
2020-08-26: 778.0704932543429
2020-08-27: 782.5369920142616
2020-08-28: 789.9265283133832
2020-08-29: 805.8143099664014
2020-08-30: 832.5567978021863
2020-08-31: 862.5194930992889

Predicting for Ethiopia__nan
2020-08-01: 599.9294415982624
2020-08-02: 602.9402357439978
2020-08-03: 655.7366030158935
2020-08-04: 616.2233243527271
2020-08-05: 494.7503731609208
2020-08-06: 440.30956602704447
2020-08-07: 633.3983841901286
2020-08-

2020-08-29: 880.5456031688996
2020-08-30: 925.3949736897705
2020-08-31: 960.9317766210315

Predicting for Georgia__nan
2020-08-01: 3956.48528025528
2020-08-02: 4260.474885081105
2020-08-03: 4258.700126898798
2020-08-04: 3828.918885866515
2020-08-05: 3140.5270241155113
2020-08-06: 2608.4330819214824
2020-08-07: 3882.0454137304437
2020-08-08: 4337.094461467641
2020-08-09: 4384.092243732135
2020-08-10: 4081.7954589701226
2020-08-11: 3685.4120574748217
2020-08-12: 3712.644292375891
2020-08-13: 4271.007814420043
2020-08-14: 4601.927710532281
2020-08-15: 4651.6013198110895
2020-08-16: 4484.928527163677
2020-08-17: 4338.259474696003
2020-08-18: 4458.191750576893
2020-08-19: 4774.531703232134
2020-08-20: 5000.905086319445
2020-08-21: 5057.488865726139
2020-08-22: 5012.950244820454
2020-08-23: 5004.635476136177
2020-08-24: 5138.988061303145
2020-08-25: 5356.090019660207
2020-08-26: 5525.3978513012435
2020-08-27: 5595.532462645279
2020-08-28: 5621.077547320753
2020-08-29: 5680.567928337184
2020-

2020-08-17: 539.4265880783685
2020-08-18: 578.7190592902589
2020-08-19: 633.1013119256502
2020-08-20: 668.9318627375758
2020-08-21: 678.9970072272449
2020-08-22: 688.8802114071993
2020-08-23: 706.8775091035018
2020-08-24: 743.424858734119
2020-08-25: 785.608406927598
2020-08-26: 817.3172726399889
2020-08-27: 843.296591501394
2020-08-28: 863.6584429761829
2020-08-29: 890.5865028070918
2020-08-30: 926.6141601772883
2020-08-31: 964.7528275468258

Predicting for Croatia__nan
2020-08-01: 3361.8135396755383
2020-08-02: 3413.888681572356
2020-08-03: 3007.1349861439207
2020-08-04: 2654.786436300207
2020-08-05: 2343.543540188397
2020-08-06: 2065.5507370382497
2020-08-07: 3140.287523874405
2020-08-08: 3359.8613134639504
2020-08-09: 3223.7662460144497
2020-08-10: 2987.772966812679
2020-08-11: 2788.317780736651
2020-08-12: 2903.8720403186103
2020-08-13: 3358.828245565662
2020-08-14: 3536.6633973160037
2020-08-15: 3497.045139395386
2020-08-16: 3372.1763603930326
2020-08-17: 3318.864150513006
2020-0

2020-08-09: 22291.000459674593
2020-08-10: 20636.372987223564
2020-08-11: 18764.267095959658
2020-08-12: 19274.631804983015
2020-08-13: 22291.564462015584
2020-08-14: 23651.40896243859
2020-08-15: 23549.82058841308
2020-08-16: 22592.590278656116
2020-08-17: 21946.326606000417
2020-08-18: 22736.224481942583
2020-08-19: 24364.212180273535
2020-08-20: 25318.626654666135
2020-08-21: 25398.262002718595
2020-08-22: 25038.017958737008
2020-08-23: 25014.177741139218
2020-08-24: 25751.5252012997
2020-08-25: 26814.767763559517
2020-08-26: 27533.5010295674
2020-08-27: 27753.145966592405
2020-08-28: 27789.192603865988
2020-08-29: 28070.591589524687
2020-08-30: 28748.948797003897
2020-08-31: 29561.223786297654

Predicting for Jamaica__nan
2020-08-01: 103.82804767409753
2020-08-02: 117.36729539275156
2020-08-03: 120.68227800194424
2020-08-04: 124.5415234329355
2020-08-05: 117.53598994019973
2020-08-06: 126.65843375694682
2020-08-07: 170.00923111648092
2020-08-08: 190.23838041629756
2020-08-09: 200.8

2020-08-09: 1580.4308562760784
2020-08-10: 1520.0894778601175
2020-08-11: 1425.906931559155
2020-08-12: 1450.797525891651
2020-08-13: 1646.6990194908287
2020-08-14: 1742.6243840908164
2020-08-15: 1759.2800440476647
2020-08-16: 1728.4314610279919
2020-08-17: 1705.4595631741906
2020-08-18: 1762.3668168289637
2020-08-19: 1874.7526223489735
2020-08-20: 1950.1138315278176
2020-08-21: 1977.5889841648402
2020-08-22: 1989.3523365097392
2020-08-23: 2010.686480404326
2020-08-24: 2072.426848860066
2020-08-25: 2144.168843591038
2020-08-26: 2203.8789044961236
2020-08-27: 2252.8725212581526
2020-08-28: 2287.861169470044
2020-08-29: 2330.3673771609033
2020-08-30: 2390.915388230703
2020-08-31: 2457.1718986269993

Predicting for Liberia__nan
2020-08-01: 54.275729713601606
2020-08-02: 63.13832154346803
2020-08-03: 69.5708848461858
2020-08-04: 72.98026059914253
2020-08-05: 77.74106116696318
2020-08-06: 94.23447965205725
2020-08-07: 121.8629976504326
2020-08-08: 136.71765652138146
2020-08-09: 146.63819814


Predicting for Madagascar__nan
2020-08-01: 52.150266432044646
2020-08-02: 62.53870775491113
2020-08-03: 69.26177329309724
2020-08-04: 72.81281508838494
2020-08-05: 77.53653954383194
2020-08-06: 93.56524388587715
2020-08-07: 120.70280379286612
2020-08-08: 136.00239974819146
2020-08-09: 146.1720220215084
2020-08-10: 154.1436145812531
2020-08-11: 164.94625309524235
2020-08-12: 182.69007173003246
2020-08-13: 203.34211480024493
2020-08-14: 219.64866969133834
2020-08-15: 232.58972555022865
2020-08-16: 244.64433735140614
2020-08-17: 259.15354019309325
2020-08-18: 277.32491819294097
2020-08-19: 296.48923925139565
2020-08-20: 313.72423702717373
2020-08-21: 329.16704855252266
2020-08-22: 355.2378786533075
2020-08-23: 377.3259405900905
2020-08-24: 398.306262746845
2020-08-25: 418.8934838265753
2020-08-26: 438.6640389395419
2020-08-27: 460.03565294916706
2020-08-28: 484.5559052675445
2020-08-29: 508.42758204566525
2020-08-30: 531.7301659740679
2020-08-31: 554.6158536877706

Predicting for Mexico_

2020-08-18: 331.7399109665229
2020-08-19: 355.81468174041544
2020-08-20: 375.5256859374943
2020-08-21: 390.4044055892771
2020-08-22: 414.81591016482696
2020-08-23: 436.7522035676604
2020-08-24: 460.0105667029523
2020-08-25: 483.65577755481337
2020-08-26: 505.1607894720971
2020-08-27: 520.869902577893
2020-08-28: 543.9206865991546
2020-08-29: 567.7008717738172
2020-08-30: 592.5168438930116
2020-08-31: 617.0687191974914

Predicting for Niger__nan
2020-08-01: 73.7995353911185
2020-08-02: 84.23822535482115
2020-08-03: 99.34550749754837
2020-08-04: 105.29062254322201
2020-08-05: 104.6982729324329
2020-08-06: 111.40611092396244
2020-08-07: 143.52306458147078
2020-08-08: 162.217052232478
2020-08-09: 176.8538442002
2020-08-10: 185.45007398814022
2020-08-11: 192.70511008696846
2020-08-12: 207.59570373887715
2020-08-13: 230.08090001035583
2020-08-14: 249.18264806310134
2020-08-15: 264.60984703891063
2020-08-16: 276.76994188903
2020-08-17: 289.58771370762804
2020-08-18: 306.8895989321475
2020-08-

2020-08-11: 1662.0269825090961
2020-08-12: 2014.3622763985952
2020-08-13: 2494.433098978621
2020-08-14: 2548.579767804803
2020-08-15: 2316.5290896641955
2020-08-16: 2119.0485065673183
2020-08-17: 2154.4408836088614
2020-08-18: 2405.5883478218807
2020-08-19: 2656.2384764724115
2020-08-20: 2697.465303460201
2020-08-21: 2595.7602270243374
2020-08-22: 2533.9220606654844
2020-08-23: 2603.975076198913
2020-08-24: 2778.6276900215485
2020-08-25: 2931.39511344454
2020-08-26: 2975.180464901707
2020-08-27: 2951.846593446608
2020-08-28: 2955.741670541339
2020-08-29: 3036.046688073536
2020-08-30: 3166.2941497367733
2020-08-31: 3276.8184795365155

Predicting for Philippines__nan
2020-08-01: 1530.172362123874
2020-08-02: 1850.6440379888015
2020-08-03: 1994.2621296370064
2020-08-04: 1775.899437824449
2020-08-05: 1302.703753956648
2020-08-06: 1098.5292097774845
2020-08-07: 1641.7844410428538
2020-08-08: 1937.1934616862454
2020-08-09: 2022.1606172105617
2020-08-10: 1861.2073894837183
2020-08-11: 1628.90

2020-08-05: 21131.609263852773
2020-08-06: 16311.707703553417
2020-08-07: 23488.35957747276
2020-08-08: 26609.26661157806
2020-08-09: 27971.04903036342
2020-08-10: 26763.01444438715
2020-08-11: 23705.612477971474
2020-08-12: 23021.848585742817
2020-08-13: 26075.508197684183
2020-08-14: 28366.860921354717
2020-08-15: 29236.044633723563
2020-08-16: 28419.148923934896
2020-08-17: 27152.184932903925
2020-08-18: 27405.406270673473
2020-08-19: 29144.671163399216
2020-08-20: 30689.94964258605
2020-08-21: 31301.866504753732
2020-08-22: 31012.245426280355
2020-08-23: 30695.579969929975
2020-08-24: 31223.964050204457
2020-08-25: 32431.346319970144
2020-08-26: 33534.12592028214
2020-08-27: 34083.37060450952
2020-08-28: 34182.62270025215
2020-08-29: 34357.66103227774
2020-08-30: 34996.47581288405
2020-08-31: 35962.58227258014

Predicting for Rwanda__nan
2020-08-01: 71.8816584147928
2020-08-02: 83.52759907352102
2020-08-03: 91.77815139853107
2020-08-04: 90.4653783444508
2020-08-05: 90.8705692007435

2020-08-26: 398.71495739721803
2020-08-27: 401.4058542802249
2020-08-28: 410.6628687144066
2020-08-29: 423.73151203900875
2020-08-30: 440.36272077784446
2020-08-31: 457.78905957212544

Predicting for Serbia__nan
2020-08-01: 6535.843438631363
2020-08-02: 6660.244344041794
2020-08-03: 6823.647764375616
2020-08-04: 6998.721974206707
2020-08-05: 5994.720285577773
2020-08-06: 4546.50522669321
2020-08-07: 6400.734080015706
2020-08-08: 7006.756424936052
2020-08-09: 7312.246332815978
2020-08-10: 7191.193606586334
2020-08-11: 6551.508442983128
2020-08-12: 6348.031358782454
2020-08-13: 7084.3457000251765
2020-08-14: 7588.573942747657
2020-08-15: 7823.788003600713
2020-08-16: 7716.485539497466
2020-08-17: 7464.964134481616
2020-08-18: 7531.79288095899
2020-08-19: 7946.264401870992
2020-08-20: 8313.30054195921
2020-08-21: 8494.569161399553
2020-08-22: 8489.741089046858
2020-08-23: 8457.557389071191
2020-08-24: 8603.359863579257
2020-08-25: 8902.59746330872
2020-08-26: 9183.448133202912
2020-08-27:

2020-08-09: 164.41577010823966
2020-08-10: 172.09767144264475
2020-08-11: 181.73401020824824
2020-08-12: 199.0765608338408
2020-08-13: 221.62002758075272
2020-08-14: 238.92010882438774
2020-08-15: 252.17206715960396
2020-08-16: 263.980459620607
2020-08-17: 278.0344194543367
2020-08-18: 296.46074132019163
2020-08-19: 316.63080591677465
2020-08-20: 334.61508276277925
2020-08-21: 350.3740663205319
2020-08-22: 376.4110861245536
2020-08-23: 398.4807918590231
2020-08-24: 419.8560766910261
2020-08-25: 441.1411134410284
2020-08-26: 461.5033182447625
2020-08-27: 471.66388254192583
2020-08-28: 494.0406789086313
2020-08-29: 516.6390446845619
2020-08-30: 539.5958881945037
2020-08-31: 562.0136846780688

Predicting for Thailand__nan
2020-08-01: 61.0013864080702
2020-08-02: 75.46945360144812
2020-08-03: 85.03475658957943
2020-08-04: 86.95758435364276
2020-08-05: 90.04464242688883
2020-08-06: 101.65630320532121
2020-08-07: 131.6362913739823
2020-08-08: 149.70423352126792
2020-08-09: 161.3176661932376


2020-08-03: 13561.641341296201
2020-08-04: 12365.295994007758
2020-08-05: 10326.750327445212
2020-08-06: 8679.516872517508
2020-08-07: 13053.623061295892
2020-08-08: 14283.411151152035
2020-08-09: 14116.962479699761
2020-08-10: 13151.436391367928
2020-08-11: 11954.146302866791
2020-08-12: 12142.875545273337
2020-08-13: 13994.738289714775
2020-08-14: 14917.946580911444
2020-08-15: 14930.008005823473
2020-08-16: 14368.854754026279
2020-08-17: 13936.387719416063
2020-08-18: 14373.205084722114
2020-08-19: 15383.13151478992
2020-08-20: 16021.823118646667
2020-08-21: 16118.696841106934
2020-08-22: 15913.143157736476
2020-08-23: 15882.24226093332
2020-08-24: 16330.746832778836
2020-08-25: 17003.484922329888
2020-08-26: 17482.68575161366
2020-08-27: 17652.23576370679
2020-08-28: 17688.71963137803
2020-08-29: 17862.83591511375
2020-08-30: 18287.023271372433
2020-08-31: 18807.562210744618

Predicting for Uruguay__nan
2020-08-01: 216.57773529035967
2020-08-02: 251.82732114462414
2020-08-03: 258.2

2020-08-15: 446.7674795951295
2020-08-16: 441.8119644514944
2020-08-17: 443.9234818532983
2020-08-18: 466.97828665711296
2020-08-19: 504.51648326608483
2020-08-20: 536.3480173123698
2020-08-21: 553.2506957705948
2020-08-22: 571.8503498414475
2020-08-23: 589.8002097612309
2020-08-24: 616.4835640139959
2020-08-25: 649.0013551092559
2020-08-26: 677.9695871596084
2020-08-27: 695.5620613962328
2020-08-28: 716.8449002621214
2020-08-29: 740.2768705835197
2020-08-30: 768.8960794012496
2020-08-31: 799.8402578562593

Predicting for United States__Delaware
2020-08-01: 547.9141861428388
2020-08-02: 578.5250776628316
2020-08-03: 613.607264120076
2020-08-04: 606.1852096582377
2020-08-05: 560.3336825705401
2020-08-06: 445.22847960890687
2020-08-07: 610.6953948366354
2020-08-08: 677.2282434055257
2020-08-09: 712.6108495422993
2020-08-10: 709.0736191960782
2020-08-11: 674.2204816467897
2020-08-12: 669.5063839935974
2020-08-13: 744.5059798523566
2020-08-14: 800.5513664364421
2020-08-15: 831.898286994460


Predicting for United States__Louisiana
2020-08-01: 2558.1268874662915
2020-08-02: 1362.1773485730325
2020-08-03: 1557.6339310374024
2020-08-04: 2112.300222754389
2020-08-05: 2875.4532550318186
2020-08-06: 1854.5458776786909
2020-08-07: 2182.694261817366
2020-08-08: 1910.1978386173812
2020-08-09: 2078.1285331500953
2020-08-10: 2461.9202165341685
2020-08-11: 2613.623868842917
2020-08-12: 2390.968930290319
2020-08-13: 2389.167878595007
2020-08-14: 2349.616289461911
2020-08-15: 2501.32109295358
2020-08-16: 2711.5102745015092
2020-08-17: 2781.9629105974223
2020-08-18: 2735.8622809188187
2020-08-19: 2726.2304715839277
2020-08-20: 2759.89402929989
2020-08-21: 2882.3502726882043
2020-08-22: 3024.618943297424
2020-08-23: 3091.284065745197
2020-08-24: 3104.9430521680224
2020-08-25: 3127.790584607837
2020-08-26: 3189.285370471774
2020-08-27: 3289.3230085017944
2020-08-28: 3394.616264569045
2020-08-29: 3463.224135736203
2020-08-30: 3506.840224960721
2020-08-31: 3556.25830648868

Predicting for U

2020-08-15: 923.2625067907852
2020-08-16: 893.9105214844808
2020-08-17: 878.5789642686393
2020-08-18: 920.3418428790806
2020-08-19: 993.7267200124531
2020-08-20: 1045.9680158925921
2020-08-21: 1063.1786481144457
2020-08-22: 1060.6615154036265
2020-08-23: 1071.3776087701874
2020-08-24: 1112.0804117291862
2020-08-25: 1166.452870127184
2020-08-26: 1209.1522544315833
2020-08-27: 1226.6968482945751
2020-08-28: 1240.4611864132405
2020-08-29: 1264.0726177286854
2020-08-30: 1303.8679432158303
2020-08-31: 1349.6474656374196

Predicting for United States__Nebraska
2020-08-01: 1330.17255044436
2020-08-02: 1322.1199137397682
2020-08-03: 1561.757866844711
2020-08-04: 1796.7303607907431
2020-08-05: 1441.8158107632257
2020-08-06: 1053.491120571913
2020-08-07: 1400.2525647863424
2020-08-08: 1559.8661182223343
2020-08-09: 1747.1691663853876
2020-08-10: 1790.3906524949634
2020-08-11: 1609.1728499882963
2020-08-12: 1517.7671608706112
2020-08-13: 1656.0801407252177
2020-08-14: 1801.0621951021838
2020-08-1

2020-08-24: 7635.678405906122
2020-08-25: 7929.449264085939
2020-08-26: 8159.91400170012
2020-08-27: 8266.863093314852
2020-08-28: 8312.406880719205
2020-08-29: 8401.029624755372
2020-08-30: 8590.676634892769
2020-08-31: 8828.0282349506

Predicting for United States__Rhode Island
2020-08-01: 861.3992657015269
2020-08-02: 492.81476625186593
2020-08-03: 1040.9401051652173
2020-08-04: 1716.6321242183683
2020-08-05: 1132.5550534642366
2020-08-06: 769.4141683982318
2020-08-07: 878.3134721092433
2020-08-08: 935.9199247907216
2020-08-09: 1290.3856644196555
2020-08-10: 1476.729932608497
2020-08-11: 1259.2171388720674
2020-08-12: 1107.225621610635
2020-08-13: 1125.3247712853997
2020-08-14: 1243.4537936549802
2020-08-15: 1441.749649447899
2020-08-16: 1512.688405030346
2020-08-17: 1431.562912301987
2020-08-18: 1370.0435340955446
2020-08-19: 1396.3016818466224
2020-08-20: 1498.1378457276817
2020-08-21: 1615.526094361959
2020-08-22: 1659.4489110353386
2020-08-23: 1637.920460596424
2020-08-24: 1626.

2020-08-16: 4365.877248797856
2020-08-17: 4208.532973818745
2020-08-18: 4249.119711921447
2020-08-19: 4524.462762315782
2020-08-20: 4766.102515740357
2020-08-21: 4860.372593584282
2020-08-22: 4833.946999951258
2020-08-23: 4805.173364397522
2020-08-24: 4896.662396735072
2020-08-25: 5090.811463699995
2020-08-26: 5266.535388344196
2020-08-27: 5353.760064036155
2020-08-28: 5383.327724093908
2020-08-29: 5425.280621687262
2020-08-30: 5535.351638624455
2020-08-31: 5693.525249111232

Predicting for United States__West Virginia
2020-08-01: 829.9526082555417
2020-08-02: 949.7696445798135
2020-08-03: 1083.4001080054704
2020-08-04: 1013.4927604187817
2020-08-05: 840.5133075166373
2020-08-06: 659.68619451273
2020-08-07: 925.8967234284755
2020-08-08: 1073.1780600937757
2020-08-09: 1149.7523590452736
2020-08-10: 1106.1417821996793
2020-08-11: 1005.3612993198888
2020-08-12: 985.7317918546298
2020-08-13: 1111.2381427417304
2020-08-14: 1218.7836023841903
2020-08-15: 1269.341895495805
2020-08-16: 1250.34

2020-08-30: 545.267706757942
2020-08-31: 567.9580662965348

Predicting for Zimbabwe__nan
2020-08-01: 140.61104827252353
2020-08-02: 138.26075091182886
2020-08-03: 142.1479517585227
2020-08-04: 193.9264194421586
2020-08-05: 190.51803659494539
2020-08-06: 164.85582825166162
2020-08-07: 207.0190616053062
2020-08-08: 222.5059409874887
2020-08-09: 243.7960895111115
2020-08-10: 268.2603264697254
2020-08-11: 272.05106499518575
2020-08-12: 277.6396207729732
2020-08-13: 300.37118689270443
2020-08-14: 320.5488824380828
2020-08-15: 342.4671955371108
2020-08-16: 360.9831927319385
2020-08-17: 372.54916600152006
2020-08-18: 386.69643914208586
2020-08-19: 406.8089204899259
2020-08-20: 428.1929171822265
2020-08-21: 449.65631056576177
2020-08-22: 468.445794442883
2020-08-23: 484.9464967849322
2020-08-24: 503.12613422239843
2020-08-25: 524.1450141885684
2020-08-26: 546.4385228839337
2020-08-27: 568.5122174499627
2020-08-28: 589.1797120673693
2020-08-29: 609.1945993836807
2020-08-30: 630.3376313269725
20

In [23]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,68.906725
214,Aruba,,2020-08-02,78.759623
215,Aruba,,2020-08-03,80.542799
216,Aruba,,2020-08-04,87.093589
217,Aruba,,2020-08-05,93.592581


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [24]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [25]:
!head predictions/2020-08-01_2020-08-04.csv

'head' is not recognized as an internal or external command,
operable program or batch file.


In [27]:
os.getcwd()

'C:\\Users\\marti\\Desktop\\Projects\\Pandemic-Prize\\covid-xprize\\covid_xprize\\examples\\predictors\\linear'

In [27]:
%cd C:/Users/marti/Desktop/Projects/Pandemic-Prize/covid-xprize

C:\Users\marti\Desktop\Projects\Pandemic-Prize\covid-xprize


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [41]:
%cd ../../../../

C:\Users\marti\Desktop\Projects\Pandemic-Prize\covid-xprize


In [29]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [42]:
%cd covid_xprize/examples/predictors/linear

C:\Users\marti\Desktop\Projects\Pandemic-Prize\covid-xprize\covid_xprize\examples\predictors\linear


In [36]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="./predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to ./predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [37]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
Wall time: 1.76 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [38]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-09
End date: 2021-06-07


In [39]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [40]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-09 to 2021-06-07...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
Wall time: 3min 35s
