<a href="https://colab.research.google.com/github/mfligiel/Models-for-MLOPS-Review/blob/main/WeatherModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Weather Data

I am going to predict Chicago's weather from the weather of 5 other places nearby using a weather API.  This model isn't the most useful, but is good for showcasing model monitoring.

In [1]:
import requests
import pandas as pd
import time

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
a = requests.get('http://www.7timer.info/bin/api.pl?lon=113.17&lat=23.09&product=astro&output=json')

In [4]:
print(a.json())

{'product': 'astro', 'init': '2021062812', 'dataseries': [{'timepoint': 3, 'cloudcover': 9, 'seeing': 6, 'transparency': 4, 'lifted_index': -4, 'rh2m': 9, 'wind10m': {'direction': 'SE', 'speed': 2}, 'temp2m': 30, 'prec_type': 'none'}, {'timepoint': 6, 'cloudcover': 9, 'seeing': 6, 'transparency': 4, 'lifted_index': -4, 'rh2m': 10, 'wind10m': {'direction': 'S', 'speed': 2}, 'temp2m': 29, 'prec_type': 'none'}, {'timepoint': 9, 'cloudcover': 9, 'seeing': 7, 'transparency': 5, 'lifted_index': -4, 'rh2m': 11, 'wind10m': {'direction': 'S', 'speed': 2}, 'temp2m': 29, 'prec_type': 'none'}, {'timepoint': 12, 'cloudcover': 9, 'seeing': 6, 'transparency': 4, 'lifted_index': -4, 'rh2m': 10, 'wind10m': {'direction': 'S', 'speed': 3}, 'temp2m': 30, 'prec_type': 'rain'}, {'timepoint': 15, 'cloudcover': 9, 'seeing': 6, 'transparency': 4, 'lifted_index': -6, 'rh2m': 8, 'wind10m': {'direction': 'S', 'speed': 3}, 'temp2m': 34, 'prec_type': 'rain'}, {'timepoint': 18, 'cloudcover': 9, 'seeing': 6, 'transpa

Ok.  This API works.  What I need to do is get the lat/long of 5 cities, and get this over time.

Let me try another one, to see how it does:

In [5]:
b = requests.get('https://www.metaweather.com/api/location/44418/2013/4/27/')

In [6]:
print(b.json())

[{'id': 366945, 'weather_state_name': 'Light Rain', 'weather_state_abbr': 'lr', 'wind_direction_compass': 'N', 'created': '2013-04-27T22:52:57.403100Z', 'applicable_date': '2013-04-27', 'min_temp': 3.07, 'max_temp': 10.01, 'the_temp': None, 'wind_speed': 9.85, 'wind_direction': 358.0, 'air_pressure': None, 'humidity': 74, 'visibility': 9.997862483098704, 'predictability': 75}, {'id': 373220, 'weather_state_name': 'Light Rain', 'weather_state_abbr': 'lr', 'wind_direction_compass': 'N', 'created': '2013-04-27T20:52:55.929470Z', 'applicable_date': '2013-04-27', 'min_temp': 3.07, 'max_temp': 10.01, 'the_temp': None, 'wind_speed': 9.85, 'wind_direction': 358.0, 'air_pressure': None, 'humidity': 74, 'visibility': 9.997862483098704, 'predictability': 75}, {'id': 371006, 'weather_state_name': 'Clear', 'weather_state_abbr': 'c', 'wind_direction_compass': 'NNE', 'created': '2013-04-27T18:52:50.537450Z', 'applicable_date': '2013-04-27', 'min_temp': 4.0, 'max_temp': None, 'the_temp': None, 'wind_s

This seems to work!  We need where on earth IDs for each place, and to get the max temp across a day.  I don't want to make a gazillion requests to them, and I want to slow it down, so I will add a sleep for 30 seconds when I make the requests.

I also need to know what I will do with the JSON it returns - i'll collapse it into a table, and then take the maximum temperature for any given day.

In [7]:
pd.DataFrame(b.json()).max()

id                                             373220
weather_state_name                            Showers
weather_state_abbr                                  s
wind_direction_compass                            WSW
created                   2013-04-27T22:52:57.403100Z
applicable_date                            2013-04-27
min_temp                                         5.21
max_temp                                         11.6
the_temp                                        12.76
wind_speed                                         12
wind_direction                                    358
air_pressure                                     1017
humidity                                           76
visibility                                    18.9655
predictability                                     75
dtype: object

In [8]:
pd.to_datetime(pd.DataFrame(b.json()).max()['created']).date()

datetime.date(2013, 4, 27)

In [9]:
pd.DataFrame(b.json()).max()['max_temp']

11.6

This should work!  I'll now find the IDs of 5 cities I will use to predict Chicago's weather:

Milwaukee\
Detroit\
Toronto\
St Louis\
Omaha, NE


I'll use this site to look it up: https://www.findmecity.com/

Milwaukee: 2451822\
Detroit: 2391585 \
Toronto: 4118\
St. Louis: 2486982\
Omaha, NE: 2465512


In [11]:
#dictionary of cities
cities = {'Milwaukee':'2451822', 'Detroit':'2391585', 'Toronto':'4118', 'St. Louis':'2486982', 'Omaha':'2465512', 'Chicago':'2379574'}

#empty list to enter these into:
values = []

#loop through cities
for k, v in cities.items():
  #just doing one at a time:
  if k != 'Chicago':
    continue
  #loop through 3 months
  for mth in ['3', '4', '5']:
    #just do days through 30, it's not time series, I don't care
    for day in range(1, 31):
      #what to request
      strng = 'https://www.metaweather.com/api/location/' + v +'/2021/' + mth + '/' +str(day) + '/'
      if day == 1:
        print(strng)
      reqst = requests.get(strng)
      #get the pieces
      date = pd.to_datetime(pd.DataFrame(reqst.json()).max()['created']).date()
      maxtemp = pd.DataFrame(reqst.json()).max()['max_temp']
      values.append([k, date, maxtemp])
      time.sleep(45)





https://www.metaweather.com/api/location/2379574/2021/3/1/
https://www.metaweather.com/api/location/2379574/2021/4/1/
https://www.metaweather.com/api/location/2379574/2021/5/1/


In [12]:
import pickle

#pickle.dump(values, open('.pkl', 'wb'))

In [13]:
pd.DataFrame(values).to_csv('Chicago.csv')

In [14]:
!ls

Chicago.csv  gdrive  sample_data


In [15]:
!cp Chicago.csv gdrive/MyDrive

Now that this is pulled in, I can begin doing a basic model.  Given that this is mostly for the purpose of tracking, I am fine with just making and SVM and doing minimal hyperparameter optimization.

To do this, I will
- load the files
- pivot them according to city
- drop date column (I am ignoring time series aspect here)
- run a quick grid search

In [9]:
!ls gdrive/MyDrive/ModelMonitoringBlog/

 Chicago.csv
 Detroit.csv
'First take on Evidently, and potential Datasets, MF 6.18.21.gdoc'
 Milwaukee.csv
'Model Code'
'Notes, 6.23.2021.gdoc'
 Omaha.csv
 St_Louis.csv
'Table of Contents.gdoc'
 Toronto.csv


In [33]:
#re creating the dictionary above 
cities = {'Milwaukee':'2451822', 'Detroit':'2391585', 'Toronto':'4118', 'St. Louis':'2486982', 'Omaha':'2465512', 'Chicago':'2379574'}

df = pd.DataFrame()

for i in cities.keys():
  if i == 'St. Louis':
    i = 'St_Louis'
  pth = "gdrive/MyDrive/ModelMonitoringBlog/" + i + ".csv"
  print(pth)
  to_append = pd.read_csv(pth)
  print(to_append.head())
  if df.empty:
    df = to_append
    print(df.empty)
  else:
    df = pd.concat([df, to_append], ignore_index=True)
  


gdrive/MyDrive/ModelMonitoringBlog/Milwaukee.csv
   Unnamed: 0          0           1      2
0           0  Milwaukee  2021-03-02  3.505
1           1  Milwaukee  2021-03-03  7.490
2           2  Milwaukee  2021-03-04  9.355
3           3  Milwaukee  2021-03-05  9.935
4           4  Milwaukee  2021-03-06  9.345
False
gdrive/MyDrive/ModelMonitoringBlog/Detroit.csv
   Unnamed: 0        0           1       2
0           0  Detroit  2021-03-02   8.360
1           1  Detroit  2021-03-03   6.820
2           2  Detroit  2021-03-04  12.215
3           3  Detroit  2021-03-05  11.255
4           4  Detroit  2021-03-06  10.325
gdrive/MyDrive/ModelMonitoringBlog/Toronto.csv
   Unnamed: 0        0           1      2
0           0  Toronto  2021-03-02  6.035
1           1  Toronto  2021-03-03  3.360
2           2  Toronto  2021-03-04  7.225
3           3  Toronto  2021-03-05  7.525
4           4  Toronto  2021-03-06  7.010
gdrive/MyDrive/ModelMonitoringBlog/St_Louis.csv
   Unnamed: 0          0     

In [34]:
df

Unnamed: 0.1,Unnamed: 0,0,1,2
0,0,Milwaukee,2021-03-02,3.505
1,1,Milwaukee,2021-03-03,7.490
2,2,Milwaukee,2021-03-04,9.355
3,3,Milwaukee,2021-03-05,9.935
4,4,Milwaukee,2021-03-06,9.345
...,...,...,...,...
535,85,Chicago,2021-05-27,28.900
536,86,Chicago,2021-05-28,23.165
537,87,Chicago,2021-05-29,23.900
538,88,Chicago,2021-05-30,28.300


In [35]:
#Now, to rename the columns
df.columns = ['drp', 'city', 'date', 'maxtemp']
df.drop('drp', axis=1, inplace=True)

In [38]:
df = df.pivot(index='date', columns='city', values='maxtemp')