# Taiwan Weatherman : A 7-days Forecasting Bot

This program aim to provide a clean, weekly weather forecast. The result will represent on a newly-opened, unsaved Excel file. It'll provide the highest, lowest temperature and a line chart of three cities (台北市、台中市、新竹市). Here's the concept of the program.

- Step.1 由API取得一周預報 ( json )
- Step.2 處理資料
- Step.3 將預報寫進Excel檔案

And the data format we make up will looks like :
```python
forecasts = 
[
    [{day1}, {day2}, {day3}, ...],    < Taipei
    [...],                            < Taichung
    [...]                             < Chingchu
]
```
Each day is represented by a dictionary:
```python
{
    Date: "2/16",
    High: 23,
    Low : 16,
    City: "臺中市",
    Description: "多雲時陰短暫陣雨"
}
```

My biggest harvest in the project is utilizing the data science package 'pandas'. Though I find myself not familiar with applying changes to dataframes, I still thank my teacher a lot in these two weeks. :)

## Step.1 由API取得一周預報 ( json )
* [Python’s Requests Library (Guide)](https://realpython.com/python-requests/)

Getting weather report data has many ways, the most accurate method is APIs from 中央氣象局. Today we are going to take a short cut, we will use APIs from Taipei City Government without prepared steps. But the data from Taipei City Government are less manageable (I supposted lol).

In [1]:
import requests

url = "https://data.taipei/opendata/datalist/apiAccess?scope=resourceAquire&rid=e6831708-02b4-4ef8-98fa-4b4ce53459d9"

try:
    response = requests.get(url)
    full_data = response.json()

    # If the response was successful, no Exception will be raised
    response.raise_for_status()
except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')
except Exception as err:
    print(f'Other error occurred: {err}')
else:
    print('Success!')

Success!


## Step.2 處理資料
We do not care about the weather of All the cities, choose some we're related to.

In [2]:
locs = ['臺北市','臺中市','新竹市']
forecasts = []
data = full_data["result"]["results"]

for loc in locs:
    row = []
    for dataset in data:
        if dataset['locationName'] == loc:
            row.append(dataset)
    forecasts.append(row)
forecasts

[[{'parameterName2': '22',
   'parameterUnit2': 'C',
   'parameterName1': '晴時多雲',
   'parameterUnit3': 'C',
   'parameterName3': '18',
   'parameterValue1': '2',
   'locationName': '臺北市',
   'endTime': '2020-02-19T18:00:00+08:00',
   'startTime': '2020-02-19T12:00:00+08:00',
   '_id': 1},
  {'parameterName2': '18',
   'parameterUnit2': 'C',
   'parameterName1': '晴時多雲',
   'parameterUnit3': 'C',
   'parameterName3': '15',
   'parameterValue1': '2',
   'locationName': '臺北市',
   'endTime': '2020-02-20T06:00:00+08:00',
   'startTime': '2020-02-19T18:00:00+08:00',
   '_id': 2},
  {'parameterName2': '23',
   'parameterUnit2': 'C',
   'parameterName1': '多雲',
   'parameterUnit3': 'C',
   'parameterName3': '15',
   'parameterValue1': '4',
   'locationName': '臺北市',
   'endTime': '2020-02-20T18:00:00+08:00',
   'startTime': '2020-02-20T06:00:00+08:00',
   '_id': 3},
  {'parameterName2': '20',
   'parameterUnit2': 'C',
   'parameterName1': '多雲時晴',
   'parameterUnit3': 'C',
   'parameterName3': '16

Clean these data
1. Select the columns we need, which are:
```python
'parameterName2', 'parameterName3', 'parameterName1', 'locationName', 'startTime'
```
2. Rename the columns thus the namings are ambiguous. 
```python
'High (最高溫)', 'Low (最低溫)', 'Description (天氣概況)', 'City (縣市)', 'Date (起始日期時間)'
```
> Warning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [3]:
# Using pandas
import pandas as pd
df = pd.DataFrame(data)
df2 = df[['parameterName2', 'parameterName3', 'parameterName1', 'locationName', 'startTime']]
df2.rename(columns={'parameterName2':'High',
                    'parameterName3':'Low',
                    'parameterName1':'Description',
                    'locationName'  :'City',
                    'startTime'     :'Date'}, 
           inplace=True)
df2

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


Unnamed: 0,High,Low,Description,City,Date
0,22,18,晴時多雲,臺北市,2020-02-19T12:00:00+08:00
1,18,15,晴時多雲,臺北市,2020-02-19T18:00:00+08:00
2,23,15,多雲,臺北市,2020-02-20T06:00:00+08:00
3,20,16,多雲時晴,臺北市,2020-02-20T18:00:00+08:00
4,26,16,晴時多雲,臺北市,2020-02-21T06:00:00+08:00
...,...,...,...,...,...
303,13,12,多雲時晴,連江縣,2020-02-23T18:00:00+08:00
304,16,12,晴時多雲,連江縣,2020-02-24T06:00:00+08:00
305,14,13,晴時多雲,連江縣,2020-02-24T18:00:00+08:00
306,17,13,晴時多雲,連江縣,2020-02-25T06:00:00+08:00


Divid the big dataframe into small dataframe by **city**. Store them in a list.

In [4]:
# get 3 city's row
# https://datatofish.com/select-rows-pandas-dataframe/
df_list = []
for city in locs:
    df_list.append(df2.loc[df2['City'] == city])

# test forecast in Taichung
df_list[1]

Unnamed: 0,High,Low,Description,City,Date
42,23,19,晴時多雲,臺中市,2020-02-19T12:00:00+08:00
43,19,14,晴時多雲,臺中市,2020-02-19T18:00:00+08:00
44,25,14,晴時多雲,臺中市,2020-02-20T06:00:00+08:00
45,21,16,晴時多雲,臺中市,2020-02-20T18:00:00+08:00
46,25,16,晴時多雲,臺中市,2020-02-21T06:00:00+08:00
47,21,16,晴時多雲,臺中市,2020-02-21T18:00:00+08:00
48,25,16,晴時多雲,臺中市,2020-02-22T06:00:00+08:00
49,22,18,晴時多雲,臺中市,2020-02-22T18:00:00+08:00
50,27,18,晴時多雲,臺中市,2020-02-23T06:00:00+08:00
51,23,19,晴時多雲,臺中市,2020-02-23T18:00:00+08:00


The data seems clean. But now we still have problem: the **Date** column is a string object and contains time zone. 
Create a method to parse string into datetime.
>PS:  The api is free from Taipei City Government, so the data we get is quite hard dealing with.

Sample input and output:
```python
input: '2020-02-16T06:00:00+08:00', 
output: datetime.date(2020, 2, 16, 6, 0)
```

In [5]:
# datetime.datetime.strptime has problems with timezone parsing.
from dateutil import parser # deal with zones, not use
from datetime import datetime

def parse_date(in_str):
    in_str = in_str.split('T') # take of the character 'T'
    date_str = in_str[0]
    time_str = in_str[1].split('+')[0] # take of the time zone, including character '+'
    date = datetime.strptime(f"{date_str} {time_str}", '%Y-%m-%d %H:%M:%S')
    return date

By abserving the datetimes, the sequence looks the belows. Each day contains 2 records, 1 is forecast in noon, another is in night. To repesent the 高溫低溫圖, we only need 1 record for each day. 
1. Keep the information of **date** and erase the **time** by lambda function.
2. Groupby the rows with same date. Select the highest temperature for column 'High', the lowest for 'Low'.
3. However, we want the other columns, so select the **first** element for 'City' and 'Description'.
```
Id  High   Low	Description	City	Date
42	16	12	多雲時晴	臺中市	2020-02-17 06:00:00
43	13	10	晴時多雲	臺中市	2020-02-17 18:00:00
44	19	10	晴時多雲	臺中市	2020-02-18 06:00:00
45	16	12	晴時多雲	臺中市	2020-02-18 18:00:00
```
into
```
    Date      High	Low	City	Description
2020-02-17	16	10	臺中市	多雲時晴
2020-02-18	19	10	臺中市	晴時多雲
```
There's an *'[SettingWithCopyWarning](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy)'* as well. Can't solve it. Help!

In [6]:
# parse the 'Date' colomn
# 不分白天晚上，只取預報的日期

# 都會噴warning 0.0，對dataframe不太了解
# adj_series = df_list[1]['Date'].apply(lambda s: parse_date(s))
# df_list[1].loc[:, ('Date')] = adj_series

for df in df_list:
    df.loc[:, ('Date')] = df['Date'].apply(lambda s: parse_date(s).date())
    
# test forecast in Taichung
df_list[1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item_labels[indexer[info_axis]]] = value


Unnamed: 0,High,Low,Description,City,Date
42,23,19,晴時多雲,臺中市,2020-02-19
43,19,14,晴時多雲,臺中市,2020-02-19
44,25,14,晴時多雲,臺中市,2020-02-20
45,21,16,晴時多雲,臺中市,2020-02-20
46,25,16,晴時多雲,臺中市,2020-02-21
47,21,16,晴時多雲,臺中市,2020-02-21
48,25,16,晴時多雲,臺中市,2020-02-22
49,22,18,晴時多雲,臺中市,2020-02-22
50,27,18,晴時多雲,臺中市,2020-02-23
51,23,19,晴時多雲,臺中市,2020-02-23


In [7]:
forecasts = []
for df in df_list:
    result_df = df.groupby(['Date']).agg({'High':max, 
                                               'Low' :min,
                                               'City':'first',
                                               'Description' :'first'}) 
    forecasts.append(result_df)

# test forecast in Taichung
forecasts[1]

Unnamed: 0_level_0,High,Low,City,Description
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-02-19,23,14,臺中市,晴時多雲
2020-02-20,25,14,臺中市,晴時多雲
2020-02-21,25,16,臺中市,晴時多雲
2020-02-22,25,16,臺中市,晴時多雲
2020-02-23,27,18,臺中市,晴時多雲
2020-02-24,26,18,臺中市,晴時多雲
2020-02-25,27,18,臺中市,晴時多雲


## Step.3 將預報寫進Excel檔案

I initially thought Excel had a strong plotting function. Utilizing 'xlwings' might be easy but powerful. However, I have trouble making the plots more concise and good looking due to lacking documentations. The offical docs didn't give precise arguments, and the Microsoft's chart type cannot work. :(
- https://docs.xlwings.org/en/stable/api.html#chart
- [xlChartTypes 微軟官網](https://docs.microsoft.com/zh-tw/office/vba/api/Excel.XlChartType)

Though the results looks okay, thankfully~ The whole thing I'd like to do more is to add the temperature for each point. If you could make it, please let me know. Thanks!

In [8]:
import xlwings as xw
wb = xw.Book()

In [9]:
for idx, city in enumerate(locs):
    # For each city, create a sheet
    sheet = wb.sheets.add(name=city, after=wb.sheets[-1])
    
    # Write forecast data (dataframe) in sheet
    sheet.range("A1").value = forecasts[idx]
    
    # Draw the temperature line chart
    chart = sheet.charts.add() # 在工作表新增一個空圖表物件
    chart.top = sheet.range('G1').top
    chart.left = sheet.range('G1').left
    
    chart.set_source_data(sheet.range('A1:C1').expand("down"))
    chart.chart_type = 'line'

### Output excel file sample:
![](https://i.imgur.com/FA5MYiw.png)