<a href="https://colab.research.google.com/github/xrisaD/ScalableMLProject/blob/main/2_feature_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>


## 🗒️ This notebook is divided into the following sections:
1. Parse Data
2. Feature Group Insertion

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import sys
sys.path.append('../')

In [2]:
import hopsworks
import pandas as pd
from datetime import datetime, timedelta
import time 
import requests

from features.feature_engineering import *

## <span style='color:#ff5f27'> 🌎 Locations

In [3]:
place_list = ['Abisko', 'Uppsala', 'Spånga']

place_streamflow = [2357, 2609, 2212]

lat_long = [[ 68.35, 18.82],[59.87, 17.60], [58.00, 12.73]]

## <span style='color:#ff5f27'> Dates

In [4]:
yesterday = get_yesterday(datetime.today())
yesterday

datetime.datetime(2022, 12, 26, 0, 0)

---

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [5]:
project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/5318




Connected. Call `.close()` to terminate connection gracefully.


In [6]:
streamflow_fg = fs.get_or_create_feature_group(
    name = 'streamflow_fg',
    version = 1
)

In [7]:
weather_fg = fs.get_or_create_feature_group(
    name = 'weather_fg',
    version = 1
)

## <span style='color:#ff5f27'>  🧙🏼‍♂️ Parsing Data

In [8]:
data_streamflow = []
for p1, p2 in zip(place_streamflow, place_list):
    data_streamflow.extend(get_streamflow_data(p1, yesterday, p2))
data_streamflow

[['2022-12-26', '30.0488', 'Abisko'],
 ['2022-12-26', '5.0344', 'Uppsala'],
 ['2022-12-26', '5.1179', 'Spånga']]

In [9]:
data_weather = [get_weather_data(ll, place) for ll, place in zip(lat_long, place_list)]
data_weather

[{'time': ['2022-12-26',
   '2022-12-27',
   '2022-12-28',
   '2022-12-29',
   '2022-12-30',
   '2022-12-31',
   '2023-01-01',
   '2023-01-02'],
  'temperature_2m_max': [-6.7, -1.5, -3.2, -5.7, -7.8, -6.2, -4.1, -0.4],
  'temperature_2m_min': [-12.7, -10.5, -10.3, -12.2, -9.0, -10.6, -5.6, -7.0],
  'precipitation_sum': [0.0, 0.0, 0.2, 0.0, 0.7, 0.0, 0.0, 0.0],
  'rain_sum': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'snowfall_sum': [0.0, 0.0, 0.14, 0.0, 0.42, 0.0, 0.0, 0.0],
  'precipitation_hours': [0.0, 0.0, 2.0, 0.0, 7.0, 0.0, 0.0, 0.0],
  'windspeed_10m_max': [13.0, 20.2, 18.4, 37.1, 24.6, 9.0, 33.8, 29.0],
  'windgusts_10m_max': [41.4, 43.6, 38.2, 76.7, 49.7, 16.2, 61.2, 54.4],
  'winddirection_10m_dominant': [81, 102, 201, 75, 107, 210, 296, 297],
  'et0_fao_evapotranspiration': [0.03,
   0.12,
   0.03,
   0.26,
   0.16,
   0.0,
   0.38,
   0.31],
  'place': 'Abisko'},
 {'time': ['2022-12-26',
   '2022-12-27',
   '2022-12-28',
   '2022-12-29',
   '2022-12-30',
   '2022-12-31',
 

---

## <span style='color:#ff5f27'> 🧑🏻‍🏫 Dataset Preparation

#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [10]:
df_streamflow = get_streamflow_df(data_streamflow)

df_streamflow

Unnamed: 0,date,streamflow,place
0,1672009200000,30.0488,Abisko
1,1672009200000,5.0344,Uppsala
2,1672009200000,5.1179,Spånga


#### <span style='color:#ff5f27'> 🌦 Weather Data

In [11]:
df_weather = get_weather_df(data_weather)

df_weather.head()

Unnamed: 0,date,temperature_2m_max,temperature_2m_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,et0_fao_evapotranspiration,place
0,1672009200000,-6.7,-12.7,0.0,0.0,0.0,0.0,13.0,41.4,81.0,0.03,Abisko
1,1672095600000,-1.5,-10.5,0.0,0.0,0.0,0.0,20.2,43.6,102.0,0.12,Abisko
2,1672182000000,-3.2,-10.3,0.2,0.0,0.14,2.0,18.4,38.2,201.0,0.03,Abisko
3,1672268400000,-5.7,-12.2,0.0,0.0,0.0,0.0,37.1,76.7,75.0,0.26,Abisko
4,1672354800000,-7.8,-9.0,0.7,0.0,0.42,7.0,24.6,49.7,107.0,0.16,Abisko


---

---

## <span style="color:#ff5f27;">⬆️ Uploading new data to the Feature Store</span>

In [12]:
streamflow_fg.insert(df_streamflow)

Uploading Dataframe: 0.00% |          | Rows 0/3 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/5318/jobs/named/streamflow_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x15375793d90>, None)

In [13]:
weather_fg.insert(df_weather)

Uploading Dataframe: 0.00% |          | Rows 0/24 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/5318/jobs/named/weather_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x1537532c340>, None)

---