![](../img/330-banner.png)

# Tutorial 7

UBC 2024-25

## Outline

During this tutorial, you will 

All questions can be discussed with your classmates and the TAs - this is not a graded exercise!

### Imports

In [1]:
import matplotlib.pyplot as plt
import os
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import (
    TimeSeriesSplit,
    cross_val_score,
    cross_validate,
    train_test_split,
)
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, StandardScaler

plt.rcParams["font.size"] = 12
from datetime import datetime

DATA_DIR = os.path.join(os.path.abspath(".."), "data/")

## Time series analysis on a more complicated dataset 

For this exercise, we will use the [rain in Australia](https://www.kaggle.com/jsphyg/weather-dataset-rattle-package) dataset. Our goal is to predict whether or not it will rain tomorrow based on today's measurements.

In [2]:
rain_df = pd.read_csv(DATA_DIR + "weatherAUS.csv")
rain_df.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


In [3]:
rain_df.shape

(145460, 23)

**Questions of interest**

- Can we **forecast** into the future? Can we predict whether it's going to rain tomorrow?
    - The target variable is `RainTomorrow`. The target is categorical and not continuous in this case. 
- Can the date/time features help us predict the target value?


### Exploratory data analysis

We are doing some basic EDA to help you familiarize with the dataset - check our results below.

In [4]:
rain_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           145460 non-null  object 
 1   Location       145460 non-null  object 
 2   MinTemp        143975 non-null  float64
 3   MaxTemp        144199 non-null  float64
 4   Rainfall       142199 non-null  float64
 5   Evaporation    82670 non-null   float64
 6   Sunshine       75625 non-null   float64
 7   WindGustDir    135134 non-null  object 
 8   WindGustSpeed  135197 non-null  float64
 9   WindDir9am     134894 non-null  object 
 10  WindDir3pm     141232 non-null  object 
 11  WindSpeed9am   143693 non-null  float64
 12  WindSpeed3pm   142398 non-null  float64
 13  Humidity9am    142806 non-null  float64
 14  Humidity3pm    140953 non-null  float64
 15  Pressure9am    130395 non-null  float64
 16  Pressure3pm    130432 non-null  float64
 17  Cloud9am       89572 non-null

In [5]:
rain_df.describe(include="all")

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
count,145460,145460,143975.0,144199.0,142199.0,82670.0,75625.0,135134,135197.0,134894,...,142806.0,140953.0,130395.0,130432.0,89572.0,86102.0,143693.0,141851.0,142199,142193
unique,3436,49,,,,,,16,,16,...,,,,,,,,,2,2
top,2013-11-12,Canberra,,,,,,W,,N,...,,,,,,,,,No,No
freq,49,3436,,,,,,9915,,11758,...,,,,,,,,,110319,110316
mean,,,12.194034,23.221348,2.360918,5.468232,7.611178,,40.03523,,...,68.880831,51.539116,1017.64994,1015.255889,4.447461,4.50993,16.990631,21.68339,,
std,,,6.398495,7.119049,8.47806,4.193704,3.785483,,13.607062,,...,19.029164,20.795902,7.10653,7.037414,2.887159,2.720357,6.488753,6.93665,,
min,,,-8.5,-4.8,0.0,0.0,0.0,,6.0,,...,0.0,0.0,980.5,977.1,0.0,0.0,-7.2,-5.4,,
25%,,,7.6,17.9,0.0,2.6,4.8,,31.0,,...,57.0,37.0,1012.9,1010.4,1.0,2.0,12.3,16.6,,
50%,,,12.0,22.6,0.0,4.8,8.4,,39.0,,...,70.0,52.0,1017.6,1015.2,5.0,5.0,16.7,21.1,,
75%,,,16.9,28.2,0.8,7.4,10.6,,48.0,,...,83.0,66.0,1022.4,1020.0,7.0,7.0,21.6,26.4,,


- A number of missing values. 
- Some target values are also missing. Let's drop these rows. 

In [6]:
rain_df = rain_df[rain_df["RainTomorrow"].notna()]
rain_df.shape

(142193, 23)

### Parsing datetimes 

- In general, datetimes are a huge pain! 
    - Think of all the formats: MM-DD-YY, DD-MM-YY, YY-MM-DD, MM/DD/YY, DD/MM/YY, DD/MM/YYYY, 20:45, 8:45am, 8:45 PM, 8:45a, 08:00, 8:10:20, .......
  - Time zones.
  - Daylight savings...
- Thankfully, pandas does a pretty good job here.

In [7]:
dates_rain = pd.to_datetime(rain_df["Date"])
dates_rain

0        2008-12-01
1        2008-12-02
2        2008-12-03
3        2008-12-04
4        2008-12-05
            ...    
145454   2017-06-20
145455   2017-06-21
145456   2017-06-22
145457   2017-06-23
145458   2017-06-24
Name: Date, Length: 142193, dtype: datetime64[ns]

They are all the same format, so we can also compare dates:

In [8]:
dates_rain[1] - dates_rain[0] 

Timedelta('1 days 00:00:00')

In [9]:
dates_rain[1] > dates_rain[0]

True

In [10]:
(dates_rain[1] - dates_rain[0]).total_seconds()

86400.0

We can also easily extract information from the date columns. 

In [11]:
dates_rain[1]

Timestamp('2008-12-02 00:00:00')

In [12]:
dates_rain[1].month_name()

'December'

In [13]:
dates_rain[1].day_name()

'Tuesday'

In [14]:
dates_rain[1].is_year_end

False

In [15]:
dates_rain[1].is_leap_year

True

Above, pandas identified the date column automatically. You can also tell pandas to parse the dates when reading in the CSV:

In [16]:
rain_df = pd.read_csv(DATA_DIR + "weatherAUS.csv", parse_dates=["Date"])
rain_df.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


In [17]:
# Since we re-read the csv file, let's remove the missing targets again
rain_df = rain_df[rain_df["RainTomorrow"].notna()]
rain_df.shape

(142193, 23)

### <font color='red'>Question 1</font>
- How many time series are present in this dataset? 
- Are the measurements equally spaced? Use the function provided below to help you answer this question.


In [18]:
def plot_time_spacing_distribution(df, region="Adelaide"):
    """
    Plots the distribution of time spacing for a given region.
    
    Parameters:
        df (pd.DataFrame): The input DataFrame with columns 'Location' and 'Date'.
        region (str): The region (e.g., location) to analyze.
    """
    # Ensure 'Date' is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Filter data for the given region
    region_data = df[df['Location'] == region]
    
    if region_data.empty:
        print(f"No data available for region: {region}")
        return
    
    # Calculate time differences
    time_diffs = region_data['Date'].sort_values().diff().dropna()
    
    # Count the frequency of each time difference
    value_counts = time_diffs.value_counts().sort_index()
    
    # Display value counts
    print(f"Time spacing counts for {region}:\n{value_counts}\n")
    
    # Plot the bar chart
    plt.bar(value_counts.index.astype(str), value_counts.values, color='skyblue', edgecolor='black')
    plt.title(f"Time Difference Distribution for {region}")
    plt.xlabel("Time Difference (days)")
    plt.ylabel("Frequency")
    plt.xticks(rotation=45)
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.show()

### <font color='red'>Question 2</font>

Create train/test splits, using at least 20% samples for the test set. Remember that we should not be calling the usual `train_test_split` with shuffling because if we want to forecast, we aren't allowed to know what happened in the future!

Make sure to call the resulting dataframes `train_df` and `test_df` for the rest of the notebook to work.

### Preprocessing

We have different types of features requiring preprocessing. Let's define a preprocessor with a column transformer. 

This portion of the exercise is given to you, as it is not focused on using temporal information.

- We have missing data. 
- We have categorical features and numeric features. 
- To build a baseline, let's drop the date column and treat this as a usual supervised machine learning problem. 

In [19]:
numeric_features = [
    "MinTemp",
    "MaxTemp",
    "Rainfall",
    "Evaporation",
    "Sunshine",
    "WindGustSpeed",
    "WindSpeed9am",
    "WindSpeed3pm",
    "Humidity9am",
    "Humidity3pm",
    "Pressure9am",
    "Pressure3pm",
    "Cloud9am",
    "Cloud3pm",
    "Temp9am",
    "Temp3pm",
]
categorical_features = [
    "Location",
    "WindGustDir",
    "WindDir9am",
    "WindDir3pm",
    "RainToday",
]
drop_features = ["Date"]
target = ["RainTomorrow"]

We'll be doing feature engineering and preprocessing features several times. So I've written a function for preprocessing. 

In [20]:
def preprocess_features(
    train_df,
    test_df,
    numeric_features,
    categorical_features,
    drop_features,
    target
):

    all_features = set(numeric_features + categorical_features + drop_features + target)
    if set(train_df.columns) != all_features:
        print("Missing columns", set(train_df.columns) - all_features)
        print("Extra columns", all_features - set(train_df.columns))
        raise Exception("Columns do not match")

    numeric_transformer = make_pipeline(
        SimpleImputer(strategy="median"), StandardScaler()
    )
    categorical_transformer = make_pipeline(
        SimpleImputer(strategy="constant", fill_value="missing"),
        OneHotEncoder(handle_unknown="ignore", sparse_output=False),
    )

    preprocessor = make_column_transformer(
        (numeric_transformer, numeric_features),
        (categorical_transformer, categorical_features),
        ("drop", drop_features),
    )
    preprocessor.fit(train_df)
    ohe_feature_names = (
        preprocessor.named_transformers_["pipeline-2"]
        .named_steps["onehotencoder"]
        .get_feature_names_out(categorical_features)
        .tolist()
    )
    new_columns = numeric_features + ohe_feature_names

    X_train_enc = pd.DataFrame(
        preprocessor.transform(train_df), index=train_df.index, columns=new_columns
    )
    X_test_enc = pd.DataFrame(
        preprocessor.transform(test_df), index=test_df.index, columns=new_columns
    )

    y_train = train_df["RainTomorrow"]
    y_test = test_df["RainTomorrow"]

    return X_train_enc, y_train, X_test_enc, y_test, preprocessor

In [21]:
X_train_enc, y_train, X_test_enc, y_test, preprocessor = preprocess_features(
    train_df,
    test_df,
    numeric_features,
    categorical_features,
    drop_features, target
)

NameError: name 'train_df' is not defined

In [None]:
# Peek at X_train_enc to see the results of preprocessing
X_train_enc.head()

Unnamed: 0,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,...,WindDir3pm_SSE,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_W,WindDir3pm_WNW,WindDir3pm_WSW,WindDir3pm_missing,RainToday_No,RainToday_Yes,RainToday_missing
0,0.204302,-0.027112,-0.205323,-0.140641,0.160729,0.298612,0.666166,0.599894,0.115428,-1.433514,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
1,-0.741037,0.287031,-0.275008,-0.140641,0.160729,0.298612,-1.125617,0.373275,-1.314929,-1.288002,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
2,0.125523,0.372706,-0.275008,-0.140641,0.160729,0.450132,0.55418,0.826513,-1.632786,-1.045481,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
3,-0.457435,0.701128,-0.275008,-0.140641,0.160729,-1.216596,-0.341712,-1.099749,-1.261953,-1.724539,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,0.850283,1.315134,-0.158867,-0.140641,0.160729,0.07133,-0.789657,0.146656,0.698167,-0.899969,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


### <font color='red'>Question 3</font>

Let's treat this as a usual supervised machine learning problem and create a couple of baseline models, to get an idea of what the performance would be if we ignored the time feature.

- Fit and score a `DummyClassifier` (for this exercise, score also on the test set)
- Fit and score a `LogisticRegression` model. 
- Comment on the performance of these baselines.
- Examine the coefficients of the logistic regression model. What features have the biggest impact on the output?

### <font color='red'>Question 4</font>

- Use [`TimeSeriesSplit`](https://scikit-learn.org/stable/modules/cross_validation.html#time-series-split) to carry out cross-validation for a `LogisticRegression` model. 

Some notes before proceeding:
- Things are a bit more complicated here because this dataset has **multiple time series**, one per location. 
- Our approach today will be to ignore the fact that we have multiple time series and just OHE the location
- We'll have multiple measurements for a given timestamp, and that's OK.
- But, `TimeSeriesSplit` expects the dataframe to be sorted by date so let's sort it by date before trying cross-validation.

In [None]:
train_df_ordered = train_df.sort_values(by=["Date"])
y_train_ordered = train_df_ordered["RainTomorrow"]

In [None]:
train_df_ordered

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
45587,2007-11-01,Canberra,8.0,24.3,0.0,3.4,6.3,NW,30.0,SW,...,68.0,29.0,1019.7,1015.0,7.0,7.0,14.4,23.6,No,Yes
45588,2007-11-02,Canberra,14.0,26.9,3.6,4.4,9.7,ENE,39.0,E,...,80.0,36.0,1012.4,1008.4,5.0,3.0,17.5,25.7,Yes,Yes
45589,2007-11-03,Canberra,13.7,23.4,3.6,5.8,3.3,NW,85.0,N,...,82.0,69.0,1009.5,1007.2,8.0,7.0,15.4,20.2,Yes,Yes
45590,2007-11-04,Canberra,13.3,15.5,39.8,7.2,9.1,NW,54.0,WNW,...,62.0,56.0,1005.5,1007.0,2.0,7.0,13.5,14.1,Yes,Yes
45591,2007-11-05,Canberra,7.6,16.1,2.8,5.6,10.6,SSE,50.0,SSE,...,68.0,49.0,1018.3,1018.5,7.0,7.0,11.1,15.4,Yes,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57415,2015-06-30,Ballarat,-0.3,10.5,0.0,,,S,26.0,,...,99.0,63.0,1029.5,1027.7,,8.0,4.7,9.3,No,No
119911,2015-06-30,PerthAirport,10.1,23.5,0.0,3.2,5.8,NNE,31.0,NE,...,48.0,33.0,1023.6,1021.7,7.0,6.0,13.3,22.2,No,No
60455,2015-06-30,Bendigo,0.3,11.4,0.0,,,W,19.0,,...,89.0,56.0,1029.3,1027.4,8.0,7.0,6.4,10.5,No,No
66473,2015-06-30,MelbourneAirport,3.2,13.2,0.0,0.8,3.9,N,20.0,N,...,91.0,50.0,1029.6,1027.3,2.0,7.0,5.3,11.9,No,No


In [None]:
# Cross-validation

### <font color='red'>Question 5</font>

The feature `Date` is probably very useful to predict the target (e.g. different amounts of rain in different seasons) - let's include it as feature!

This is feature engineering!

Think of at least 3 ways to generate features from the `Date` column. Some examples to start:
- Create a column for "days since starting date"
- Using the month as numerical feature, or One-hot encoding it
- One-hot encoding seasons (this requires converting months to seasons - also, remember that seasons are opposite in Australia!)

After adding the new features to the dataset, re-train a `LogisticRegression` model, and see how the performance has changed. Also, observe the coefficients: are the new features important?


### Lag-based features

Realistically, it may be helpful to know if it rained yesterday to predict if it will rain today. Let's add lagged features to our data.

We can "lag" (or "shift") a time series in Pandas with the .shift() method. 

In [None]:
# Recreating training and test set, to start from a clean slate

train_df = rain_df.query("Date <= 20150630")
test_df = rain_df.query("Date >  20150630")

In [None]:
# Adding 1 lagged column
train_df = train_df.assign(Rainfall_lag1=train_df["Rainfall"].shift(1))

In [None]:
train_df[["Date", "Location", "Rainfall", "Rainfall_lag1"]][:20]

Unnamed: 0,Date,Location,Rainfall,Rainfall_lag1
0,2008-12-01,Albury,0.6,
1,2008-12-02,Albury,0.0,0.6
2,2008-12-03,Albury,0.0,0.0
3,2008-12-04,Albury,0.0,0.0
4,2008-12-05,Albury,1.0,0.0
5,2008-12-06,Albury,0.2,1.0
6,2008-12-07,Albury,0.0,0.2
7,2008-12-08,Albury,0.0,0.0
8,2008-12-09,Albury,0.0,0.0
9,2008-12-10,Albury,1.4,0.0


**Problem!** We have multiple time series here and we need to be more careful with this. 

When we switch from one location to another we do not want to take the value from the previous location. The function below will help with this task.

In [None]:
def create_lag_feature(df, orig_feature, lag):
    """Creates a new df with a new feature that's a lagged version of the original, where lag is an int."""
    # note: pandas .shift() kind of does this for you already, but oh well I already wrote this code

    new_df = df.copy()
    new_feature_name = "%s_lag%d" % (orig_feature, lag)
    new_df[new_feature_name] = np.nan
    for location, df_location in new_df.groupby(
        "Location"
    ):  # Each location is its own time series
        new_df.loc[df_location.index[lag:], new_feature_name] = df_location.iloc[:-lag][
            orig_feature
        ].values
    return new_df

### <font color='red'>Question 6</font>

- Use `create_lag_feature` to add lagged rainfall features (1 day is enough to start).
- Discuss: would it be ok to add this feature to the test set? If the answer is yes, add it to the test set too.
- Fit and score a `LogisticRegression` model on this new dataset. Compare the results to what was achieved before.
- Observe the coefficients for `Rainfall` and `Rainfall_lag1`. What is their relationship with the target?

Last, here are some more options for lagged features to check out:

- We could also create a lagged version of the target. In fact, this dataset already has that built in! `RainToday` is the lagged version of the target `RainTomorrow`.
- We could also create lagged version of other features, or more lags

In [None]:
rain_df_modified = create_lag_feature(rain_df, "Rainfall", 1)
rain_df_modified = create_lag_feature(rain_df_modified, "Rainfall", 2)
rain_df_modified = create_lag_feature(rain_df_modified, "Rainfall", 3)
rain_df_modified = create_lag_feature(rain_df_modified, "Humidity3pm", 1)

In [None]:
rain_df_modified[
    [
        "Date",
        "Location",
        "Rainfall",
        "Rainfall_lag1",
        "Rainfall_lag2",
        "Rainfall_lag3",
        "Humidity3pm",
        "Humidity3pm_lag1",
    ]
].head(10)

Unnamed: 0,Date,Location,Rainfall,Rainfall_lag1,Rainfall_lag2,Rainfall_lag3,Humidity3pm,Humidity3pm_lag1
0,2008-12-01,Albury,0.6,,,,22.0,
1,2008-12-02,Albury,0.0,0.6,,,25.0,22.0
2,2008-12-03,Albury,0.0,0.0,0.6,,30.0,25.0
3,2008-12-04,Albury,0.0,0.0,0.0,0.6,16.0,30.0
4,2008-12-05,Albury,1.0,0.0,0.0,0.0,33.0,16.0
5,2008-12-06,Albury,0.2,1.0,0.0,0.0,23.0,33.0
6,2008-12-07,Albury,0.0,0.2,1.0,0.0,19.0,23.0
7,2008-12-08,Albury,0.0,0.0,0.2,1.0,19.0,19.0
8,2008-12-09,Albury,0.0,0.0,0.0,0.2,9.0,19.0
9,2008-12-10,Albury,1.4,0.0,0.0,0.0,27.0,9.0


Note the pattern of `NaN` values. 

In [None]:
train_df = rain_df_modified.query("Date <= 20150630")
test_df = rain_df_modified.query("Date >  20150630")

In [None]:
X_train_enc, y_train, X_test_enc, y_test, preprocessor = preprocess_features(
    train_df,
    test_df,
    numeric_features
    + ["Rainfall_lag1", "Rainfall_lag2", "Rainfall_lag3", "Humidity3pm_lag1"],
    categorical_features,
    drop_features,
    target
)

In [None]:
lr_coef = score_lr_print_coeff(
    preprocessor, train_df, y_train, test_df, y_test, X_train_enc
)

Train score: 0.85
Test score: 0.85


In [None]:
lr_coef.loc[
    [
        "Rainfall",
        "Rainfall_lag1",
        "Rainfall_lag2",
        "Rainfall_lag3",
        "Humidity3pm",
        "Humidity3pm_lag1",
    ]
]

Unnamed: 0,Coef
Rainfall,0.107917
Rainfall_lag1,0.023105
Rainfall_lag2,0.018434
Rainfall_lag3,0.017829
Humidity3pm,1.278441
Humidity3pm_lag1,-0.26612


Note the pattern in the magnitude of the coefficients. 

<br><br><br><br>