# Task

### Solve Notebook-1 Tasks Below

### Solve Notebook-2 Tasks Below

# Project :  Holiday weather


There is nothing I like better than taking a holiday. In this project I am going to use the historic weather data from the Weather Underground for London to try to predict two good weather weeks to take off as holiday. Of course the weather in the summer of 2025 may be very different to 2020 but it should give some indication of when would be a good time to take a summer break.

## Getting the data

Weather Underground keeps historical weather data collected in many airports around the world. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser):

http://www.wunderground.com/history

When the new page opens start typing 'LHR' in the 'Location' input box and when the pop up menu comes up with the option 'LHR, United Kingdom' select it and then click on 'Submit'. 

When the next page opens with London Heathrow data, click on the 'Custom' tab and select the time period From: 1 January 2023 to: 31 December 2023 and then click on 'Get History'. The data for that year should then be displayed further down the page. 

You can copy each month's data directly from the browser to a text editor like Notepad or TextEdit, to obtain a single file with as many months as you wish.


Now load the CSV file into a dataframe making sure that any extra spaces are skipped:

## Step-1 Cleaning the data


In [1]:
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
print("Warning filters have been set.")



In [2]:
import pandas as pd
import numpy as np
df = pd.read_csv('/kaggle/input/heathrow-airport-meteostat/export.csv')


In [3]:
# --- 1. Initial Inspection ---
print("--- Initial Data Information ---")
df.info()

print("\n--- First 5 Rows ---")
print(df.head())

--- Initial Data Information ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   date    365 non-null    object 
 1   tavg    365 non-null    float64
 2   tmin    365 non-null    float64
 3   tmax    365 non-null    float64
 4   prcp    365 non-null    float64
 5   snow    4 non-null      float64
 6   wdir    0 non-null      float64
 7   wspd    365 non-null    float64
 8   wpgt    365 non-null    int64  
 9   pres    365 non-null    float64
 10  tsun    365 non-null    int64  
dtypes: float64(8), int64(2), object(1)
memory usage: 31.5+ KB

--- First 5 Rows ---
                  date  tavg  tmin  tmax  prcp  snow  wdir  wspd  wpgt  \
0  2023-01-01 00:00:00  10.6   7.5  13.6   4.8   NaN   NaN  19.0    50   
1  2023-01-02 00:00:00   5.9   3.5   8.7   0.0   NaN   NaN  10.2    24   
2  2023-01-03 00:00:00   8.6   2.1  12.7   0.0   NaN   NaN  18.6    53   


In [4]:
# --- 2. Data Cleaning ---

# --- Fix Problem 1: Convert 'date' column from text (object) to a datetime object
df['date'] = pd.to_datetime(df['date'])

# --- Fix Problems 2 & 3: Drop empty columns (drop 'wdir' (100% empty) and 'snow' (99% empty) )
df = df.drop(columns=['wdir', 'snow'])

# --- Final Step: Set the 'date' as the index 
df = df.set_index('date')

# --- Verification 
print("--- Cleaned Data Info ---")

df.info()

print("\n--- Cleaned Data Head ---")
print(df.head())

--- Cleaned Data Info ---
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 365 entries, 2023-01-01 to 2023-12-31
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tavg    365 non-null    float64
 1   tmin    365 non-null    float64
 2   tmax    365 non-null    float64
 3   prcp    365 non-null    float64
 4   wspd    365 non-null    float64
 5   wpgt    365 non-null    int64  
 6   pres    365 non-null    float64
 7   tsun    365 non-null    int64  
dtypes: float64(6), int64(2)
memory usage: 25.7 KB

--- Cleaned Data Head ---
            tavg  tmin  tmax  prcp  wspd  wpgt    pres  tsun
date                                                        
2023-01-01  10.6   7.5  13.6   4.8  19.0    50  1007.9    61
2023-01-02   5.9   3.5   8.7   0.0  10.2    24  1016.9   174
2023-01-03   8.6   2.1  12.7   0.0  18.6    53  1018.5     6
2023-01-04  12.6  10.8  13.6   0.0  31.4    68  1014.6    49
2023-01-05  11.4   8.9  13.4   0.0  2

---

## Step-2 Finding a summer break

According to meteorologists, summer extends for the whole months of June, July, and August in the northern hemisphere and the whole months of December, January, and February in the southern hemisphere. So create a dataframe that holds just those months using the `datetime` index.

In [5]:
summer_df = df.loc['2023-06-01':'2023-08-31'].copy()

# --- Verification ---
# check the first and last few rows to make sure.
print(f"--- Summer Dataframe Created: {len(summer_df)} days ---")
print("\nFirst 5 rows (starting June 1st):")
print(summer_df.head())
print("\nLast 5 rows (ending August 31st):")
print(summer_df.tail())

--- Summer Dataframe Created: 92 days ---

First 5 rows (starting June 1st):
            tavg  tmin  tmax  prcp  wspd  wpgt    pres  tsun
date                                                        
2023-06-01  13.0   9.9  19.8   0.0  15.8    41  1025.5   439
2023-06-02  13.8  10.5  19.4   0.0  16.7    37  1024.5   473
2023-06-03  15.2   7.4  22.8   0.0  14.0    35  1023.4   538
2023-06-04  16.1   9.1  22.6   0.0  13.2    35  1023.8   781
2023-06-05  13.4   9.5  21.4   0.0  14.8    32  1024.5   513

Last 5 rows (ending August 31st):
            tavg  tmin  tmax  prcp  wspd  wpgt    pres  tsun
date                                                        
2023-08-27  15.4  10.5  21.5   1.0  14.1    38  1010.7   312
2023-08-28  16.2  12.0  21.0   0.3   9.5    27  1014.6   279
2023-08-29  16.6  14.5  21.5   0.0  11.7    37  1012.3   244
2023-08-30  15.6  10.5  21.2   0.5  13.2    34  1008.7   378
2023-08-31  14.2  11.3  18.9   0.0   9.0    25  1009.8     3


In [6]:
# 4. Analysis: Find Best 14-Day Holiday Period

# Define the period for our analysis
window_size = 14

# Calculate 14-Day Rolling Statistics 
# We calculate the statistics for our three key weather metrics (tmax, prcp, tsun) using a 14-day "sliding window" (.rolling). 
#We use .mean() for temperature and .sum() for totals.

summer_df['Avg_Max_Temp_14Day'] = summer_df['tmax'].rolling(window=window_size).mean()
summer_df['Total_Precip_14Day'] = summer_df['prcp'].rolling(window=window_size).sum()
summer_df['Total_Sunshine_14Day'] = summer_df['tsun'].rolling(window=window_size).sum()

# Normalize Metrics for Fair Comparison 
# To combine these different scales (C°, mm, minutes), we normalize each metric to a common scale from 0 to 1.
# Score_Temp: Normalized temperature (1.0 = hottest, 0.0 = coolest).

summer_df['Score_Temp'] = (summer_df['Avg_Max_Temp_14Day'] - summer_df['Avg_Max_Temp_14Day'].min()) / \
                          (summer_df['Avg_Max_Temp_14Day'].max() - summer_df['Avg_Max_Temp_14Day'].min())

# Score_Precip: Normalized precipitation (1.0 = least rain, 0.0 = most rain).
# Note: We reverse the calculation (max - value) / (max - min) so that a low rainfall gets a high score.

summer_df['Score_Precip'] = (summer_df['Total_Precip_14Day'].max() - summer_df['Total_Precip_14Day']) / \
                           (summer_df['Total_Precip_14Day'].max() - summer_df['Total_Precip_14Day'].min())

# Score_Sunshine: Normalized sunshine (1.0 = sunniest, 0.0 = cloudiest).
summer_df['Score_Sunshine'] = (summer_df['Total_Sunshine_14Day'] - summer_df['Total_Sunshine_14Day'].min()) / \
                              (summer_df['Total_Sunshine_14Day'].max() - summer_df['Total_Sunshine_14Day'].min())

# --- Calculate Final Holiday Score ---
# combine the normalized scores into a single 'Holiday_Score'.
# apply a 1.5x weight to temperature, making it the most important factor in our model.
summer_df['Holiday_Score'] = (summer_df['Score_Temp'] * 1.5) + \
                             summer_df['Score_Sunshine'] + \
                             summer_df['Score_Precip']

# --- Verification ---
# Display the top 5 best-scoring periods.
# use .dropna() to remove the first 13 days of June, as they do not have a full 14-day window for calculation.
print("--- Top 5 Best 14-Day Periods (by Holiday Score) ---")
print(summer_df.dropna().sort_values('Holiday_Score', ascending=False).head())

--- Top 5 Best 14-Day Periods (by Holiday Score) ---
            tavg  tmin  tmax  prcp  wspd  wpgt    pres  tsun  \
date                                                           
2023-06-27  18.7  14.4  23.2   0.0  14.7    33  1021.0   111   
2023-06-28  20.7  18.3  26.6   0.0  15.0    35  1017.2   115   
2023-06-26  18.6  13.9  25.7   0.0  19.8    46  1019.4   438   
2023-06-22  21.3  14.5  29.0   0.0   8.8    35  1018.7   570   
2023-06-23  21.6  16.4  27.6   0.0  11.6    37  1023.2   540   

            Avg_Max_Temp_14Day  Total_Precip_14Day  Total_Sunshine_14Day  \
date                                                                       
2023-06-27           26.992857                 7.6                6348.0   
2023-06-28           26.842857                 7.6                5759.0   
2023-06-26           27.392857                34.3                7006.0   
2023-06-22           27.685714                47.0                6951.0   
2023-06-23           27.728571            

In [7]:
# --- 5. Final Recommendation ---

# .idxmax() finds the date (the index) of the highest 'Holiday_Score'
# This is the *end date* of the best 14-day period.
best_end_date = summer_df['Holiday_Score'].idxmax()

# Calculate the start date
# (We subtract 13 days to get a 14-day inclusive period)
best_start_date = best_end_date - pd.Timedelta(days=(window_size - 1))

# Get the full row of stats for that best day
best_period_stats = summer_df.loc[best_end_date]


# --- Display The Final Result ---
print("--- 🏆 Holiday Recommendation ---")
print("\nBased on the 2023 data analysis, the best two-week holiday period is:")
print(f"  Start Date: {best_start_date.strftime('%B %d, %Y')}")
print(f"  End Date:   {best_end_date.strftime('%B %d, %Y')}")

print("\nWeather stats for this winning period:")
print(f"  Average Max Temp: {best_period_stats['Avg_Max_Temp_14Day']:.1f}°C")
print(f"  Total Precipitation: {best_period_stats['Total_Precip_14Day']:.1f} mm")
print(f"  Total Sunshine: {best_period_stats['Total_Sunshine_14Day']/60:.1f} hours")

--- 🏆 Holiday Recommendation ---

Based on the 2023 data analysis, the best two-week holiday period is:
  Start Date: June 14, 2023
  End Date:   June 27, 2023

Weather stats for this winning period:
  Average Max Temp: 27.0°C
  Total Precipitation: 7.6 mm
  Total Sunshine: 105.8 hours


# Publication Link