# NAPOMENA
**Za trening se koristi "new_weather.csv", a za test pri slanju resenja "new_weather_2010-2011".**

## Konstruisanje kolone vremenskih prilika

Radi predikcije popunjenosti hotela, odlucili smo da proverimo postojanje korelacije izmedju vremenskih prilika nekog dana i popunjenosti soba u hotelu.
Podaci korisceni ovde skinuti su sa sajta [Visual Crossing](https://www.visualcrossing.com/weather/weather-data-services/Rijeka/us/last15days).

Nakon sredjivanja, dobijamo novu kolonu gde je:
- Suncan dan = 1
- Kisni dan = 0
- Sneg = 0.5.

In [7]:
import matplotlib.pyplot as plt
import pandas as pd

In [8]:
dataset = pd.read_csv("weather.csv")
dataset.head()

Unnamed: 0,datetime,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,humidity,precip,precipprob,preciptype,cloudcover
0,2008-01-01,43.4,32.0,39.0,40.9,25.0,34.3,49.7,0.02,100,"rain,snow",28.5
1,2008-01-02,42.9,30.9,35.7,40.2,23.0,31.2,57.5,0.008,100,"rain,snow",28.4
2,2008-01-03,39.9,30.7,35.3,39.3,23.2,32.1,58.9,0.0,0,"rain,snow",66.5
3,2008-01-04,51.5,37.9,44.3,51.5,32.4,42.4,57.0,0.0,0,,87.7
4,2008-01-05,49.4,40.8,46.3,49.4,37.6,45.5,90.9,0.051,100,rain,95.3


In [9]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   datetime      731 non-null    object 
 1   tempmax       731 non-null    float64
 2   tempmin       731 non-null    float64
 3   temp          731 non-null    float64
 4   feelslikemax  731 non-null    float64
 5   feelslikemin  731 non-null    float64
 6   feelslike     731 non-null    float64
 7   humidity      731 non-null    float64
 8   precip        731 non-null    float64
 9   precipprob    731 non-null    int64  
 10  preciptype    367 non-null    object 
 11  cloudcover    731 non-null    float64
dtypes: float64(9), int64(1), object(2)
memory usage: 68.7+ KB


Kolona 'perciptype' sadrzi podatak o tome da li je bila kisa, sneg ili sunce. Nju izdvajamo i kreiramo novu kolonu unutar 'new_weather.csv'.

In [10]:
import csv

def transform_preciptype(input_csv, output_csv):
    with open(input_csv, mode="r", encoding="utf-8") as infile, open(
        output_csv, mode="w", encoding="utf-8", newline=""
    ) as outfile:
        reader = csv.DictReader(infile)
        fieldnames = ["datetime", "transformed_preciptype"]  # Include datetime in the fieldnames

        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
        writer.writeheader()

        for row in reader:
            preciptype = row.get("preciptype", "")
            if preciptype == "rain":
                transformed_preciptype = 0
            elif preciptype == "rain,snow":
                transformed_preciptype = 0.5
            else:
                transformed_preciptype = 1

            # Write both datetime and transformed_preciptype to the output dataset
            writer.writerow({
                "datetime": row["datetime"],
                "transformed_preciptype": transformed_preciptype
            })

input_csv_path = "weather.csv"
output_csv_path = "new_weather.csv"
transform_preciptype(input_csv_path, output_csv_path)


In [11]:
new_w = pd.read_csv("new_weather.csv")
new_w.head()

Unnamed: 0,datetime,transformed_preciptype
0,2008-01-01,0.5
1,2008-01-02,0.5
2,2008-01-03,0.5
3,2008-01-04,1.0
4,2008-01-05,0.0


In [12]:
new_w.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 2 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   datetime                731 non-null    object 
 1   transformed_preciptype  731 non-null    float64
dtypes: float64(1), object(1)
memory usage: 11.5+ KB
