<a href="https://colab.research.google.com/github/ulvem/Sail-Into-the-Wind/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Reading Data

In [9]:
import pandas as pd

# Load the CSV file into a DataFrame
data = pd.read_csv('wind.csv')

# Print the first few rows of the DataFrame to inspect the data
print(data.head())

# Combine the 'Yr', 'Mo', and 'Dy' columns into a single datetime column
data['Yr_Mo_Dy'] = pd.to_datetime(data[['Yr', 'Mo', 'Dy']].astype(str).agg('-'.join, axis=1), format='%y-%m-%d')

# Drop the original 'Yr', 'Mo', and 'Dy' columns
data.drop(columns=['Yr', 'Mo', 'Dy'], inplace=True)

# Reorder the columns to have 'Yr_Mo_Dy' as the first column
cols = ['Yr_Mo_Dy'] + [col for col in data if col != 'Yr_Mo_Dy']
data = data[cols]

# Print the first few rows of the DataFrame to verify the changes
print(data.head())

# Verify the shape of the DataFrame
print("DataFrame shape:", data.shape)

   Yr  Mo  Dy    RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
0  61   1   1  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1  61   1   2  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
2  61   1   3  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
3  61   1   4  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
4  61   1   5  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

     CLO    BEL    MAL  
0  12.58  18.50  15.04  
1   9.67  17.54  13.83  
2   7.67  12.75  12.71  
3   5.88   5.46  10.88  
4  10.34  12.92  11.83  
    Yr_Mo_Dy    RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
0 2061-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1 2061-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
2 2061-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
3 2061-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   

Fixing Data
Year 2061? Surely, we don't possess data from the future. Construct a function to correct this anomaly and implement it. If the date's year exceeds 1989, it indicates a data error. Check how often it happens. In such instances, subtract 100 from the year to rectify the error.

All dates are between 1961 and 1978.



In [10]:
import pandas as pd

def correct_year_anomaly(df):
    # Count the number of anomalies
    anomaly_count = (df['Yr_Mo_Dy'].dt.year > 1989).sum()

    # Correct the year anomaly
    df.loc[df['Yr_Mo_Dy'].dt.year > 1989, 'Yr_Mo_Dy'] = df['Yr_Mo_Dy'] - pd.DateOffset(years=100)

    return df, anomaly_count

# Load the CSV file into a DataFrame
data = pd.read_csv('wind.csv')

# Combine the 'Yr', 'Mo', and 'Dy' columns into a single datetime column
data['Yr_Mo_Dy'] = pd.to_datetime(data[['Yr', 'Mo', 'Dy']].astype(str).agg('-'.join, axis=1), format='%y-%m-%d')

# Drop the original 'Yr', 'Mo', and 'Dy' columns
data.drop(columns=['Yr', 'Mo', 'Dy'], inplace=True)

# Reorder the columns to have 'Yr_Mo_Dy' as the first column
cols = ['Yr_Mo_Dy'] + [col for col in data if col != 'Yr_Mo_Dy']
data = data[cols]

# Correct the year anomaly and get the count of anomalies
data, anomaly_count = correct_year_anomaly(data)

# Print the corrected DataFrame and the count of anomalies
print(data.head())
print(f"Number of anomalies corrected: {anomaly_count}")


    Yr_Mo_Dy    RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
0 1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1 1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
2 1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
3 1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
4 1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

     CLO    BEL    MAL  
0  12.58  18.50  15.04  
1   9.67  17.54  13.83  
2   7.67  12.75  12.71  
3   5.88   5.46  10.88  
4  10.34  12.92  11.83  
Number of anomalies corrected: 2922


Setting Index
Set the corrected dates as indexes. Pay attention to the data type, it should be datetime64!

Date column type is transformed to datetime64.

Date column is set as the index of DataFrame.

The result should look like this:

RPT	VAL	ROS
Yr_Mo_Dy
1961-01-01	15.04	14.96	13.17
1961-01-02	14.71	NaN	10.83
1961-01-03	18.50	16.88	12.33
1961-01-04	10.58	6.63	11.75
1961-01-05	13.33	13.25	11.42


Step 1: Correct the Year Anomaly
First, we'll correct the year anomaly as we did previously.

In [11]:
import pandas as pd

def correct_year_anomaly(df):
    # Count the number of anomalies
    anomaly_count = (df['Yr_Mo_Dy'].dt.year > 1989).sum()

    # Correct the year anomaly
    df.loc[df['Yr_Mo_Dy'].dt.year > 1989, 'Yr_Mo_Dy'] = df['Yr_Mo_Dy'] - pd.DateOffset(years=100)

    return df, anomaly_count

# Load the CSV file into a DataFrame
data = pd.read_csv('wind.csv')

# Combine the 'Yr', 'Mo', and 'Dy' columns into a single datetime column
data['Yr_Mo_Dy'] = pd.to_datetime(data[['Yr', 'Mo', 'Dy']].astype(str).agg('-'.join, axis=1), format='%y-%m-%d')

# Drop the original 'Yr', 'Mo', and 'Dy' columns
data.drop(columns=['Yr', 'Mo', 'Dy'], inplace=True)

# Reorder the columns to have 'Yr_Mo_Dy' as the first column
cols = ['Yr_Mo_Dy'] + [col for col in data if col != 'Yr_Mo_Dy']
data = data[cols]

# Correct the year anomaly and get the count of anomalies
data, anomaly_count = correct_year_anomaly(data)

# Print the corrected DataFrame and the count of anomalies
print(data.head())
print(f"Number of anomalies corrected: {anomaly_count}")


    Yr_Mo_Dy    RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
0 1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1 1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
2 1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
3 1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
4 1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

     CLO    BEL    MAL  
0  12.58  18.50  15.04  
1   9.67  17.54  13.83  
2   7.67  12.75  12.71  
3   5.88   5.46  10.88  
4  10.34  12.92  11.83  
Number of anomalies corrected: 2922


Step 2: Set the Corrected Dates as Index

Now, we'll set the corrected dates as the index of the DataFrame and ensure the data type is datetime64.

In [12]:
# Set the 'Yr_Mo_Dy' column as the index
data.set_index('Yr_Mo_Dy', inplace=True)

# Ensure the index is of type datetime64
data.index = pd.to_datetime(data.index)

# Print the first few rows of the DataFrame to verify the changes
print(data.head())


              RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
Yr_Mo_Dy                                                                   
1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

              CLO    BEL    MAL  
Yr_Mo_Dy                         
1961-01-01  12.58  18.50  15.04  
1961-01-02   9.67  17.54  13.83  
1961-01-03   7.67  12.75  12.71  
1961-01-04   5.88   5.46  10.88  
1961-01-05  10.34  12.92  11.83  


Dealing with Missing Values
Compute how many values are missing for each location over the entire period.

Number of missing values for each location:

RPT 6
VAL 3
ROS 2
KIL 5
SHA 2
BIR 0
DUB 3
CLA 2
MUL 3
CLO 1
BEL 0
MAL 4

Step 1: Load the Data and Correct the Year Anomaly

First, we'll load the data and correct the year anomaly as we did previously.

In [13]:
import pandas as pd

def correct_year_anomaly(df):
    # Count the number of anomalies
    anomaly_count = (df['Yr_Mo_Dy'].dt.year > 1989).sum()

    # Correct the year anomaly
    df.loc[df['Yr_Mo_Dy'].dt.year > 1989, 'Yr_Mo_Dy'] = df['Yr_Mo_Dy'] - pd.DateOffset(years=100)

    return df, anomaly_count

# Load the CSV file into a DataFrame
data = pd.read_csv('wind.csv')

# Combine the 'Yr', 'Mo', and 'Dy' columns into a single datetime column
data['Yr_Mo_Dy'] = pd.to_datetime(data[['Yr', 'Mo', 'Dy']].astype(str).agg('-'.join, axis=1), format='%y-%m-%d')

# Drop the original 'Yr', 'Mo', and 'Dy' columns
data.drop(columns=['Yr', 'Mo', 'Dy'], inplace=True)

# Reorder the columns to have 'Yr_Mo_Dy' as the first column
cols = ['Yr_Mo_Dy'] + [col for col in data if col != 'Yr_Mo_Dy']
data = data[cols]

# Correct the year anomaly and get the count of anomalies
data, anomaly_count = correct_year_anomaly(data)

# Set the 'Yr_Mo_Dy' column as the index
data.set_index('Yr_Mo_Dy', inplace=True)

# Ensure the index is of type datetime64
data.index = pd.to_datetime(data.index)

# Print the first few rows of the DataFrame to verify the changes
print(data.head())
print(f"Number of anomalies corrected: {anomaly_count}")


              RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
Yr_Mo_Dy                                                                   
1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

              CLO    BEL    MAL  
Yr_Mo_Dy                         
1961-01-01  12.58  18.50  15.04  
1961-01-02   9.67  17.54  13.83  
1961-01-03   7.67  12.75  12.71  
1961-01-04   5.88   5.46  10.88  
1961-01-05  10.34  12.92  11.83  
Number of anomalies corrected: 2922


Step 2: Compute the Number of Missing Values for Each Location

Now, we'll compute the number of missing values for each location.

In [14]:
# Compute the number of missing values for each location
missing_values = data.isnull().sum()

# Print the number of missing values for each location
print("Number of missing values for each location:\n", missing_values)


Number of missing values for each location:
 RPT    6
VAL    3
ROS    2
KIL    5
SHA    2
BIR    0
DUB    3
CLA    2
MUL    3
CLO    1
BEL    0
MAL    4
dtype: int64


Calculating Average Windspeed

Compute the average wind speed across all locations and for the entire dataset (in other words, from every row simultaneously). Round off the final figure to two decimal places.

The expected result is around: 10.23



Step 2: Calculate the Average Wind Speed

Now, we'll calculate the average wind speed across all locations and for the entire dataset.

In [21]:
# Calculate the average wind speed across all locations and for the entire dataset
average_wind_speed = data.mean().mean()

# Round off the final figure to two decimal places
average_wind_speed = round(average_wind_speed, 2)

# Print the average wind speed
print("Average wind speed across all locations and for the entire dataset:", average_wind_speed)


Average wind speed across all locations and for the entire dataset: 10.23


Using Basic Descriptive Statistics
Construct a pd.DataFrame() named wind_stats and determine the minimum, maximum, average wind speeds, and standard deviations of the wind speeds for each location across all days.

The expected result looks like this:

RPT	VAL	ROS
count	6568.000000	6571.000000	6572.000000
mean	12.362987	10.644314	11.660526
std	5.618413	5.267356	5.008450
min	0.670000	0.210000	1.500000
50%	11.710000	10.170000	10.920000
max	35.800000	33.370000	33.840000


In [22]:
# Calculate descriptive statistics
wind_stats = data.describe().T

# Print the descriptive statistics
print("Descriptive statistics for wind speeds:\n", wind_stats)


Descriptive statistics for wind speeds:
       count       mean       std   min    25%    50%    75%    max
RPT  6568.0  12.362987  5.618413  0.67   8.12  11.71  15.92  35.80
VAL  6571.0  10.644314  5.267356  0.21   6.67  10.17  14.04  33.37
ROS  6572.0  11.660526  5.008450  1.50   8.00  10.92  14.67  33.84
KIL  6569.0   6.306468  3.605811  0.00   3.58   5.75   8.42  28.46
SHA  6572.0  10.455834  4.936125  0.13   6.75   9.96  13.54  37.54
BIR  6574.0   7.092254  3.968683  0.00   4.00   6.83   9.67  26.16
DUB  6571.0   9.797343  4.977555  0.00   6.00   9.21  12.96  30.37
CLA  6572.0   8.495053  4.499449  0.00   5.09   8.08  11.42  31.08
MUL  6571.0   8.493590  4.166872  0.00   5.37   8.17  11.19  25.88
CLO  6573.0   8.707332  4.503954  0.04   5.33   8.29  11.63  28.21
BEL  6574.0  13.121007  5.835037  0.13   8.71  12.50  16.88  42.38
MAL  6570.0  15.599079  6.699794  0.67  10.71  15.00  19.83  42.54


Daily Mean
Construct a pd.DataFrame() named wind_stats_daily and compute the minimum, maximum, average wind speed, and standard deviations of the wind speeds across all locations for each day separately.

The resulting table should look like this:

min	max	mean	std
Yr_Mo_Dy
1961-01-01	9.29	18.50	13.018182	2.808875
1961-01-02	6.50	17.54	11.336364	3.188994
1961-01-03	6.17	18.50	11.641818	3.681912
1961-01-04	1.79	11.75	6.619167	3.198126
1961-01-05	6.17	13.33	10.630000	2.445356


In [23]:
# Calculate daily descriptive statistics
wind_stats_daily = data.agg(['min', 'max', 'mean', 'std'], axis=1)

# Print the daily descriptive statistics
print("Daily descriptive statistics for wind speeds:\n", wind_stats_daily.head())


Daily descriptive statistics for wind speeds:
              min    max       mean       std
Yr_Mo_Dy                                    
1961-01-01  9.29  18.50  13.018182  2.808875
1961-01-02  6.50  17.54  11.336364  3.188994
1961-01-03  6.17  18.50  11.641818  3.681912
1961-01-04  1.79  11.75   6.619167  3.198126
1961-01-05  6.17  13.33  10.630000  2.445356


Average Wind Speed for January

Determine the average wind speed for each location during the month of January.

The result should be the following:

RPT	14.847325
VAL	12.914560
ROS	13.299624
KIL	7.199498
SHA	11.667734
BIR	8.054839
DUB	11.819355
CLA	9.512047
MUL	9.543208
CLO	10.053566
BEL	14.550520
MAL	18.028763


>Filter Data for January



In [24]:
# Filter data for January
january_data = data[data.index.month == 1]

# Print the first few rows of the January data to verify the changes
print(january_data.head())


              RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
Yr_Mo_Dy                                                                   
1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

              CLO    BEL    MAL  
Yr_Mo_Dy                         
1961-01-01  12.58  18.50  15.04  
1961-01-02   9.67  17.54  13.83  
1961-01-03   7.67  12.75  12.71  
1961-01-04   5.88   5.46  10.88  
1961-01-05  10.34  12.92  11.83  


>Calculate the Average Wind Speed for Each Location in January

In [25]:
# Calculate the average wind speed for each location in January
average_wind_speed_january = january_data.mean()

# Print the average wind speed for each location in January
print("Average wind speed for each location in January:\n", average_wind_speed_january)


Average wind speed for each location in January:
 RPT    14.847325
VAL    12.914560
ROS    13.299624
KIL     7.199498
SHA    11.667734
BIR     8.054839
DUB    11.819355
CLA     9.512047
MUL     9.543208
CLO    10.053566
BEL    14.550520
MAL    18.028763
dtype: float64


Getting Yearly Statistics
Compute the annual average wind speed for each location.

The result is the following:

RPT	VAL	ROS
Yr_Mo_Dy
1961	12.299583	10.351796	11.362369
1962	12.246923	10.110438	11.732712
1963	12.813452	10.836986	12.541151
1964	12.363661	10.920164	12.104372
1965	12.451370	11.075534	11.848767
1966	13.461973	11.557205	12.020630


In [27]:
import pandas as pd

def correct_year_anomaly(df):
    # Count the number of anomalies
    anomaly_count = (df['Yr_Mo_Dy'].dt.year > 1989).sum()

    # Correct the year anomaly
    df.loc[df['Yr_Mo_Dy'].dt.year > 1989, 'Yr_Mo_Dy'] = df['Yr_Mo_Dy'] - pd.DateOffset(years=100)

    return df, anomaly_count

# Load the CSV file into a DataFrame
data = pd.read_csv('wind.csv')

# Combine the 'Yr', 'Mo', and 'Dy' columns into a single datetime column
data['Yr_Mo_Dy'] = pd.to_datetime(data[['Yr', 'Mo', 'Dy']].astype(str).agg('-'.join, axis=1), format='%y-%m-%d')

# Drop the original 'Yr', 'Mo', and 'Dy' columns
data.drop(columns=['Yr', 'Mo', 'Dy'], inplace=True)

# Reorder the columns to have 'Yr_Mo_Dy' as the first column
cols = ['Yr_Mo_Dy'] + [col for col in data if col != 'Yr_Mo_Dy']
data = data[cols]

# Correct the year anomaly and get the count of anomalies
data, anomaly_count = correct_year_anomaly(data)

# Set the 'Yr_Mo_Dy' column as the index
data.set_index('Yr_Mo_Dy', inplace=True)

# Ensure the index is of type datetime64
data.index = pd.to_datetime(data.index)

# Print the first few rows of the DataFrame to verify the changes
print(data.head())
print(f"Number of anomalies corrected: {anomaly_count}")


              RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
Yr_Mo_Dy                                                                   
1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   

              CLO    BEL    MAL  
Yr_Mo_Dy                         
1961-01-01  12.58  18.50  15.04  
1961-01-02   9.67  17.54  13.83  
1961-01-03   7.67  12.75  12.71  
1961-01-04   5.88   5.46  10.88  
1961-01-05  10.34  12.92  11.83  
Number of anomalies corrected: 2922


In [28]:
# Group the data by year and calculate the mean for each location
annual_avg_wind_speed = data.resample('Y').mean()

# Print the annual average wind speed
print("Annual average wind speed for each location:\n", annual_avg_wind_speed)


Annual average wind speed for each location:
                   RPT        VAL        ROS       KIL        SHA       BIR  \
Yr_Mo_Dy                                                                     
1961-12-31  12.299583  10.351796  11.362369  6.958227  10.881763  7.729726   
1962-12-31  12.246923  10.110438  11.732712  6.960440  10.657918  7.393068   
1963-12-31  12.813452  10.836986  12.541151  7.330055  11.724110  8.434712   
1964-12-31  12.363661  10.920164  12.104372  6.787787  11.454481  7.570874   
1965-12-31  12.451370  11.075534  11.848767  6.858466  11.024795  7.478110   
1966-12-31  13.461973  11.557205  12.020630  7.345726  11.805041  7.793671   
1967-12-31  12.737151  10.990986  11.739397  7.143425  11.630740  7.368164   
1968-12-31  11.835628  10.468197  11.409754  6.477678  10.760765  6.067322   
1969-12-31  11.166356   9.723699  10.902000  5.767973   9.873918  6.189973   
1970-12-31  12.600329  10.726932  11.730247  6.217178  10.567370  7.609452   
1971-12-31  11.273

  annual_avg_wind_speed = data.resample('Y').mean()


Weekly Statistics
Reduce the dataset to a weekly frequency for each location, aggregating the data points to represent each week (by getting the mean values).

The weekly means should be the following:

RPT	VAL	ROS
Yr_Mo_Dy
1960-12-26/1961-01-01	15.040000	14.960000	13.170000
1961-01-02/1961-01-08	13.541429	11.486667	10.487143
1961-01-09/1961-01-15	12.468571	8.967143	11.958571
1961-01-16/1961-01-22	13.204286	9.862857	12.982857


In [29]:
# Resample the data to weekly frequency and calculate the mean for each week
weekly_data = data.resample('W').mean()

# Print the first few rows of the weekly data to verify the changes
print(weekly_data.head())


                  RPT        VAL        ROS        KIL        SHA        BIR  \
Yr_Mo_Dy                                                                       
1961-01-01  15.040000  14.960000  13.170000   9.290000        NaN   9.870000   
1961-01-08  13.541429  11.486667  10.487143   6.417143   9.474286   6.435714   
1961-01-15  12.468571   8.967143  11.958571   4.630000   7.351429   5.072857   
1961-01-22  13.204286   9.862857  12.982857   6.328571   8.966667   7.417143   
1961-01-29  19.880000  16.141429  18.225714  12.720000  17.432857  14.828571   

                  DUB        CLA        MUL        CLO        BEL        MAL  
Yr_Mo_Dy                                                                      
1961-01-01  13.670000  10.250000  10.830000  12.580000  18.500000  15.040000  
1961-01-08  11.061429   6.616667   8.434286   8.497143  12.481429  13.238571  
1961-01-15   7.535714   6.820000   5.712857   7.571429  11.125714  11.024286  
1961-01-22   9.257143   7.875714   7.145714 

More Weekly Statistics
Compute the minimum, maximum, average wind speeds, and standard deviations of the wind speeds across all locations for each week, starting from the week of January 2, 1961, covering the initial 52 weeks.

The result is the following:

RPT				VAL
min	max	mean	std	min	max	mean	std
Yr_Mo_Dy
1961-01-08	10.58	18.50	13.541429	2.631321	6.63	16.88	11.486667	3.949525
1961-01-15	9.04	19.75	12.468571	3.555392	3.54	12.08	8.967143	3.148945
1961-01-22	4.92	19.83	13.204286	5.337402	3.42	14.37	9.862857	3.837785
1961-01-29	13.62	25.04	19.880000	4.619061	9.96	23.91	16.141429	5.170224
1961-02-05	10.58	24.21	16.827143	5.251408	9.46	24.21	15.460000	5.187395


In [30]:
# Resample the data to weekly frequency and calculate the mean for each week
weekly_data = data.resample('W').mean()

# Print the first few rows of the weekly data to verify the changes
print(weekly_data.head())


                  RPT        VAL        ROS        KIL        SHA        BIR  \
Yr_Mo_Dy                                                                       
1961-01-01  15.040000  14.960000  13.170000   9.290000        NaN   9.870000   
1961-01-08  13.541429  11.486667  10.487143   6.417143   9.474286   6.435714   
1961-01-15  12.468571   8.967143  11.958571   4.630000   7.351429   5.072857   
1961-01-22  13.204286   9.862857  12.982857   6.328571   8.966667   7.417143   
1961-01-29  19.880000  16.141429  18.225714  12.720000  17.432857  14.828571   

                  DUB        CLA        MUL        CLO        BEL        MAL  
Yr_Mo_Dy                                                                      
1961-01-01  13.670000  10.250000  10.830000  12.580000  18.500000  15.040000  
1961-01-08  11.061429   6.616667   8.434286   8.497143  12.481429  13.238571  
1961-01-15   7.535714   6.820000   5.712857   7.571429  11.125714  11.024286  
1961-01-22   9.257143   7.875714   7.145714 

In [33]:
# Filter the data to start from January 2, 1961
weekly_data = weekly_data.loc['1961-01-02':'1962-01-01']

# Calculate the descriptive statistics for each week
weekly_stats = weekly_data.agg(['min', 'max', 'mean', 'std'], axis=1)

# Print the weekly descriptive statistics
print("Weekly descriptive statistics for wind speeds:\n", weekly_stats.head(52))


Weekly descriptive statistics for wind speeds:
                   min        max       mean       std
Yr_Mo_Dy                                             
1961-01-08   6.417143  13.541429   9.847659  2.601705
1961-01-15   4.630000  12.468571   8.353214  2.719649
1961-01-22   6.328571  13.204286   9.368413  2.224531
1961-01-29  12.720000  22.530000  16.958095  2.915635
1961-02-05   8.247143  16.827143  11.800357  2.807310
1961-02-12  10.774286  21.832857  15.891548  3.147412
1961-02-19   9.542857  21.167143  13.726825  3.105819
1961-02-26   8.524286  16.304286  12.604286  2.364323
1961-03-05   7.834286  17.842857  11.766766  2.535336
1961-03-12   6.881429  16.701429  10.612579  2.746233
1961-03-19   7.084286  19.350000  11.756310  3.320318
1961-03-26   6.648571  18.134286  10.462857  3.071975
1961-04-02   7.300000  13.900000  10.268433  1.883742
1961-04-09   5.958571  13.607143   9.412381  2.399840
1961-04-16   4.947143   9.482857   6.845595  1.803831
1961-04-23   7.768571  13.620000  