Heat-wave days were defined following Nissan et al. (2017), who showed an increase of about 20% in all-cause mortality in
Bangladesh when both day- and night-time temperatures remained elevated above the 95th percentile for at least three consecutive
days. Percentiles were defined across all days during the study period, 1979–2005.


Here day and night temperature means that T_max and T_min of daily temperature.


# Percentile 



A percentile is a statistical measure used to describe the relative position of a data point in a dataset. It indicates the value below which a given percentage of observations falls.

Here's a breakdown:

1. **Percentiles in Everyday Language**: You're likely familiar with percentiles in everyday language. For example, if you scored in the 90th percentile on a test, it means you scored better than 90% of the people who took the test.

2. **Mathematical Definition**: In statistics, the pth percentile of a dataset is the value below which p% of the data falls. For example, the 25th percentile (also known as the first quartile) is the value below which 25% of the data falls.

3. **Median and Quartiles**: The 50th percentile is a special case called the median, which is the middle value when the data is sorted. The 25th and 75th percentiles are called the first and third quartiles respectively.

4. **Percentile Formula**: If you have data sorted in ascending order, you can find the pth percentile using the following formula:
   
   - For a continuous dataset: 
     - \(P = \frac{{n-1}}{{100}} \times p\), where \(n\) is the total number of data points.
     - The pth percentile corresponds to the value at the index \(\text{round}(P)\) in the sorted data.
   
   - For a discrete dataset:
     - The pth percentile is the value in the dataset such that \(p\) percent of the data is less than or equal to that value.

5. **Use Cases**:
   - Percentiles are used in various fields. For example, in healthcare, percentiles are used to assess growth and development in children. In finance, percentiles are used to analyze income distributions.

6. **Interpretation**:
   - If a data point is at the 90th percentile, it means it's higher than 90% of the other data points.
   - If a data point is at the 10th percentile, it means it's higher than only 10% of the other data points.

Percentiles are particularly useful for understanding the distribution of data and identifying outliers or extreme values. They provide a more nuanced view than just looking at averages or medians.

In [60]:
import pandas as pd

t2m_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/t2m_df.csv')
tmax_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/tmax_df.csv')
tmin_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/tmin_df.csv')
surface_pressure_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/surface_pressure_df.csv')
slp_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/slp_df.csv')
precp_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/precp_df.csv')
uwind_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/uwind_df.csv')
vwind_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/vwind_df.csv')
geo_hgt_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/geo_hgt_df.csv')
rhum_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/rhum_df.csv')
air_temp_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/air_temp_df.csv')
shum_df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/indv df v1/shum_df.csv')


In [61]:
dfs=[t2m_df,tmax_df,tmin_df,surface_pressure_df,slp_df,precp_df,uwind_df, vwind_df,geo_hgt_df,rhum_df,air_temp_df,shum_df]
# Initialize the merged_df with the first DataFrame in the list
merged_df = dfs[0]

# Loop through the remaining DataFrames and merge them one by one
for df in dfs[1:]:
    merged_df = pd.merge(merged_df, df)

# Now merged_df contains the merged result
merged_df

Unnamed: 0.1,Unnamed: 0,time,lat,lon,t2m,tmax,tmin,Surface Pressure,slp,prate,...,rhum_500hpa,rhum_300hpa,air_temp_1000hpa,air_temp_850hpa,air_temp_500hpa,air_temp_300hpa,shum_1000hpa,shum_850hpa,shum_500hpa,shum_300hpa
0,0,1981-01-01,25.0,90.0,286.04797,293.80746,280.11120,100540.0,101615.0,2.675628e-06,...,6.50,11.25,290.12500,283.15002,262.40002,241.67500,0.009118,0.006580,0.000222,0.000119
1,1,1981-01-01,22.5,92.5,280.86148,291.43850,272.60922,96700.0,101680.0,4.526225e-06,...,5.25,9.25,287.44998,282.30002,262.12500,241.72500,0.007985,0.006525,0.000172,0.000087
2,2,1981-01-01,25.0,90.0,292.60080,296.82843,288.97372,101940.0,101453.0,1.624216e-07,...,7.75,5.25,290.87500,283.75000,263.97500,242.92502,0.007218,0.004200,0.000279,0.000052
3,3,1981-01-01,22.5,92.5,285.08072,294.98502,277.57010,95940.0,101483.0,1.415758e-07,...,6.75,1.50,288.59998,283.60000,264.25000,242.97500,0.006265,0.004450,0.000263,0.000065
4,4,1981-01-02,25.0,90.0,287.01675,294.82352,279.39893,100470.0,101550.0,6.726953e-06,...,11.25,9.75,291.22500,282.82500,261.82500,240.87500,0.009428,0.005455,0.000372,0.000080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61355,61355,2022-12-30,22.5,92.5,285.86066,297.28740,275.80807,96225.0,101720.0,0.000000e+00,...,11.75,11.25,291.50000,283.75000,267.52496,243.67499,0.005520,0.002625,0.000588,0.000129
61356,61356,2022-12-31,25.0,90.0,285.81174,295.73610,276.47375,100815.0,101907.5,1.094083e-06,...,7.00,17.00,291.34998,282.12497,263.47498,242.42499,0.008135,0.004732,0.000258,0.000170
61357,61357,2022-12-31,22.5,92.5,285.01230,295.29507,275.02954,96885.0,101807.5,2.083968e-07,...,7.25,15.00,290.37500,282.27496,263.12500,243.50000,0.007278,0.005328,0.000258,0.000166
61358,61358,2022-12-31,25.0,90.0,292.70038,298.03186,288.64160,102212.5,101722.5,0.000000e+00,...,4.50,13.75,292.59998,284.32498,266.82498,243.34998,0.006417,0.003502,0.000217,0.000151


In [62]:
df=merged_df

In [63]:
def remove_column(df, column_to_remove):
    # Remove the specified column from the DataFrame
    df = df.drop(column_to_remove, axis=1)
    return df


column_to_remove = 'Unnamed: 0'
df = remove_column(df, column_to_remove)

In [64]:
df

Unnamed: 0,time,lat,lon,t2m,tmax,tmin,Surface Pressure,slp,prate,daily_precp,...,rhum_500hpa,rhum_300hpa,air_temp_1000hpa,air_temp_850hpa,air_temp_500hpa,air_temp_300hpa,shum_1000hpa,shum_850hpa,shum_500hpa,shum_300hpa
0,1981-01-01,25.0,90.0,286.04797,293.80746,280.11120,100540.0,101615.0,2.675628e-06,0.231174,...,6.50,11.25,290.12500,283.15002,262.40002,241.67500,0.009118,0.006580,0.000222,0.000119
1,1981-01-01,22.5,92.5,280.86148,291.43850,272.60922,96700.0,101680.0,4.526225e-06,0.391066,...,5.25,9.25,287.44998,282.30002,262.12500,241.72500,0.007985,0.006525,0.000172,0.000087
2,1981-01-01,25.0,90.0,292.60080,296.82843,288.97372,101940.0,101453.0,1.624216e-07,0.014033,...,7.75,5.25,290.87500,283.75000,263.97500,242.92502,0.007218,0.004200,0.000279,0.000052
3,1981-01-01,22.5,92.5,285.08072,294.98502,277.57010,95940.0,101483.0,1.415758e-07,0.012232,...,6.75,1.50,288.59998,283.60000,264.25000,242.97500,0.006265,0.004450,0.000263,0.000065
4,1981-01-02,25.0,90.0,287.01675,294.82352,279.39893,100470.0,101550.0,6.726953e-06,0.581209,...,11.25,9.75,291.22500,282.82500,261.82500,240.87500,0.009428,0.005455,0.000372,0.000080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61355,2022-12-30,22.5,92.5,285.86066,297.28740,275.80807,96225.0,101720.0,0.000000e+00,0.000000,...,11.75,11.25,291.50000,283.75000,267.52496,243.67499,0.005520,0.002625,0.000588,0.000129
61356,2022-12-31,25.0,90.0,285.81174,295.73610,276.47375,100815.0,101907.5,1.094083e-06,0.094529,...,7.00,17.00,291.34998,282.12497,263.47498,242.42499,0.008135,0.004732,0.000258,0.000170
61357,2022-12-31,22.5,92.5,285.01230,295.29507,275.02954,96885.0,101807.5,2.083968e-07,0.018005,...,7.25,15.00,290.37500,282.27496,263.12500,243.50000,0.007278,0.005328,0.000258,0.000166
61358,2022-12-31,25.0,90.0,292.70038,298.03186,288.64160,102212.5,101722.5,0.000000e+00,0.000000,...,4.50,13.75,292.59998,284.32498,266.82498,243.34998,0.006417,0.003502,0.000217,0.000151


In [65]:
df['time']=pd.to_datetime(df['time'])

In [66]:
df.set_index('time',inplace=True)


In [67]:
df

Unnamed: 0_level_0,lat,lon,t2m,tmax,tmin,Surface Pressure,slp,prate,daily_precp,10m_uwind,...,rhum_500hpa,rhum_300hpa,air_temp_1000hpa,air_temp_850hpa,air_temp_500hpa,air_temp_300hpa,shum_1000hpa,shum_850hpa,shum_500hpa,shum_300hpa
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1981-01-01,25.0,90.0,286.04797,293.80746,280.11120,100540.0,101615.0,2.675628e-06,0.231174,-0.939862,...,6.50,11.25,290.12500,283.15002,262.40002,241.67500,0.009118,0.006580,0.000222,0.000119
1981-01-01,22.5,92.5,280.86148,291.43850,272.60922,96700.0,101680.0,4.526225e-06,0.391066,-0.572421,...,5.25,9.25,287.44998,282.30002,262.12500,241.72500,0.007985,0.006525,0.000172,0.000087
1981-01-01,25.0,90.0,292.60080,296.82843,288.97372,101940.0,101453.0,1.624216e-07,0.014033,-2.372857,...,7.75,5.25,290.87500,283.75000,263.97500,242.92502,0.007218,0.004200,0.000279,0.000052
1981-01-01,22.5,92.5,285.08072,294.98502,277.57010,95940.0,101483.0,1.415758e-07,0.012232,-0.787714,...,6.75,1.50,288.59998,283.60000,264.25000,242.97500,0.006265,0.004450,0.000263,0.000065
1981-01-02,25.0,90.0,287.01675,294.82352,279.39893,100470.0,101550.0,6.726953e-06,0.581209,-0.938654,...,11.25,9.75,291.22500,282.82500,261.82500,240.87500,0.009428,0.005455,0.000372,0.000080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-12-30,22.5,92.5,285.86066,297.28740,275.80807,96225.0,101720.0,0.000000e+00,0.000000,0.724410,...,11.75,11.25,291.50000,283.75000,267.52496,243.67499,0.005520,0.002625,0.000588,0.000129
2022-12-31,25.0,90.0,285.81174,295.73610,276.47375,100815.0,101907.5,1.094083e-06,0.094529,0.112595,...,7.00,17.00,291.34998,282.12497,263.47498,242.42499,0.008135,0.004732,0.000258,0.000170
2022-12-31,22.5,92.5,285.01230,295.29507,275.02954,96885.0,101807.5,2.083968e-07,0.018005,0.021775,...,7.25,15.00,290.37500,282.27496,263.12500,243.50000,0.007278,0.005328,0.000258,0.000166
2022-12-31,25.0,90.0,292.70038,298.03186,288.64160,102212.5,101722.5,0.000000e+00,0.000000,-0.440561,...,4.50,13.75,292.59998,284.32498,266.82498,243.34998,0.006417,0.003502,0.000217,0.000151


In [68]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 61360 entries, 1981-01-01 to 2022-12-31
Data columns (total 27 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   lat               61360 non-null  float64
 1   lon               61360 non-null  float64
 2   t2m               61360 non-null  float64
 3   tmax              61360 non-null  float64
 4   tmin              61360 non-null  float64
 5   Surface Pressure  61360 non-null  float64
 6   slp               61360 non-null  float64
 7   prate             61360 non-null  float64
 8   daily_precp       61360 non-null  float64
 9   10m_uwind         61360 non-null  float64
 10  10m_vwind         61360 non-null  float64
 11  geo_hgt_1000hpa   61360 non-null  float64
 12  geo_hgt_850hpa    61360 non-null  float64
 13  geo_hgt_500hpa    61360 non-null  float64
 14  geo_hgt_300hpa    61360 non-null  float64
 15  rhum_1000hpa      61360 non-null  float64
 16  rhum_850hpa       61360

In [69]:
df.index.min(), df.index.max()

(Timestamp('1981-01-01 00:00:00'), Timestamp('2022-12-31 00:00:00'))

Heat wave during 21-27 March 2021
Heat wave during 11-15 April 2021
Heat wave during 24-30 April 2021
Heat wave during 15-24 May 2021

In [74]:
df.loc['2021-05-15':'2021-05-24']

Unnamed: 0_level_0,lat,lon,t2m,tmax,tmin,Surface Pressure,slp,prate,daily_precp,10m_uwind,...,rhum_500hpa,rhum_300hpa,air_temp_1000hpa,air_temp_850hpa,air_temp_500hpa,air_temp_300hpa,shum_1000hpa,shum_850hpa,shum_500hpa,shum_300hpa
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-05-15,25.0,90.0,298.17624,302.0728,292.88605,99637.5,100690.0,6.908354e-05,5.968818,0.087481,...,9.25,22.5,300.84998,292.37497,268.425,246.32498,0.01699,0.010042,0.000489,0.000324
2021-05-15,22.5,92.5,295.06686,299.62427,290.39923,95937.5,100735.0,7.21472e-05,6.233518,0.401998,...,12.75,26.75,298.95,291.025,268.74997,246.47498,0.019283,0.01063,0.000695,0.00039
2021-05-15,25.0,90.0,303.37057,306.31326,299.00592,101032.5,100590.0,0.0,0.0,1.387169,...,0.0,19.0,302.425,293.875,270.175,245.99998,0.017125,0.009787,0.0,0.000269
2021-05-15,22.5,92.5,298.8169,303.80994,293.17493,95342.5,100707.5,2.505306e-05,2.164584,1.382105,...,6.0,20.25,301.07498,292.525,270.44995,246.32498,0.018218,0.01135,0.000359,0.000294
2021-05-16,25.0,90.0,300.0233,306.03564,294.12305,99670.0,100717.5,5.626714e-06,0.486148,0.318702,...,9.5,23.0,301.375,292.425,270.225,246.125,0.015652,0.010622,0.000575,0.00033
2021-05-16,22.5,92.5,295.88474,299.4533,291.4573,95977.5,100760.0,4.307547e-05,3.72172,0.379008,...,18.5,24.0,299.4,290.84998,270.625,246.12497,0.01933,0.011222,0.001165,0.000342
2021-05-16,25.0,90.0,303.79425,307.42624,300.36252,101120.0,100675.0,0.0,0.0,1.883998,...,6.0,24.25,302.7,293.925,271.025,245.92497,0.016348,0.009028,0.00039,0.000342
2021-05-16,22.5,92.5,299.61835,305.10757,293.91693,95397.5,100752.5,2.154912e-05,1.861844,1.764873,...,9.5,10.25,301.69998,292.8,271.45,245.84998,0.017505,0.01041,0.000648,0.000142
2021-05-17,25.0,90.0,301.12653,308.1484,294.4106,99612.5,100660.0,0.0,0.0,0.540554,...,23.0,13.25,302.025,292.72498,270.75,246.65001,0.01393,0.009983,0.001472,0.000199
2021-05-17,22.5,92.5,297.65652,303.01553,291.8659,95950.0,100720.0,3.928179e-06,0.339395,0.601909,...,17.0,13.25,300.8,291.84998,271.625,246.625,0.017657,0.011633,0.001156,0.000199


In [75]:
df['tmax'].describe(percentiles=[.5,.75,.90,.95])

count    61360.000000
mean       301.154367
std          3.810131
min        274.416500
50%        300.630160
75%        302.781700
90%        306.481320
95%        308.236734
max        323.688230
Name: tmax, dtype: float64

In [72]:
df['tmin'].describe(percentiles=[.5,.75,.95])

count    61360.000000
mean       290.876644
std          6.732665
min        266.896030
50%        292.849315
75%        295.839420
95%        299.799700
max        302.774630
Name: tmin, dtype: float64

In [5]:
temperature_data=df['tmax']
percentile_95=temperature_data.quantile(0.95)
percentile_95

310.690346

In [6]:
df['tmin'].describe(percentiles=[.5,.75,.95])

count    306800.000000
mean        288.582205
std           9.843414
min         247.475740
50%         290.831800
75%         296.321253
95%         300.766862
max         307.399380
Name: tmin, dtype: float64

In [7]:
temperature_data=df['tmin']
percentile_95=temperature_data.quantile(0.95)
percentile_95

300.76686249999995

In [8]:
df[df['tmax'] > 310].count()


time                18066
lat                 18066
lon                 18066
t2m                 18066
tmax                18066
tmin                18066
Surface Pressure    18066
slp                 18066
prate               18066
daily_precp         18066
10m_uwind           18066
10m_vwind           18066
geo_hgt_1000hpa     18066
geo_hgt_850hpa      18066
geo_hgt_500hpa      18066
geo_hgt_300hpa      18066
rhum_1000hpa        18066
rhum_850hpa         18066
rhum_500hpa         18066
rhum_300hpa         18066
air_temp_1000hpa    18066
air_temp_850hpa     18066
air_temp_500hpa     18066
air_temp_300hpa     18066
shum_1000hpa        18066
shum_850hpa         18066
shum_500hpa         18066
shum_300hpa         18066
dtype: int64

In [9]:
lat_values=df.lat.unique()
lon_values=df.lon.unique()
dates=df.time.unique()

In [10]:
from tqdm.auto import tqdm

consecutive_days = 0
index_list = []

for lat in tqdm(lat_values, desc='Latitudes',leave=True):
    for lon in tqdm(lon_values, desc='Longitudes',leave=False):
        for date in tqdm(dates, desc='Dates', leave=False):
            date_mask = df['time'] == date
            lat_mask = df['lat'] == lat
            lon_mask = df['lon'] == lon
            
            if (df[date_mask & lat_mask & lon_mask]['tmax'].values >= 310.690346).any() and (df[date_mask & lat_mask & lon_mask]['tmin'].values >= 300.76686249999995).any():
                consecutive_days += 1
                index_list.extend(df[date_mask & lat_mask & lon_mask].index)
            else:
                consecutive_days = 0
                index_list = []
            
            if consecutive_days >= 3:
                df.loc[index_list, 'Heatwave'] = 1
                if consecutive_days==3:
                    print(index_list[0])


Latitudes:   0%|          | 0/4 [00:00<?, ?it/s]

Longitudes:   0%|          | 0/5 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Longitudes:   0%|          | 0/5 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

2945
2945
68665
98065
98065
105305
105505
105505
105505
105505
105505
112545
141705
141705
185605
185605
229945
229945
244185
244185
244485
258325
258325
258325
258325
273525
273525
287785
302145
302145
302545
302765
302765
302765


Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Longitudes:   0%|          | 0/5 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

2730
2930
2930
2930
2930
2930
2930
2930
17330
17330
17330
17330
17330
17770
31910
32250
39230
39230
39230
53790
61530
68530
68530
68530
68530
68530
68530
68530
68530
68530
68530
68530
68530
68530
68530
68530
97970
97970
97970
97970
97970
97970
97970
112530
112530
112630
119830
141690
141690
141690
163890
163890
163890
170830
170830
178070
185610
193090
214750
214750
244210
244370
244370
244370
244370
244370
244370
244370
244370
251210
251210
251210
258350
280810
280810
302770


Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Longitudes:   0%|          | 0/5 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

3175
3175
3175
3175
3175
3175
3175
3175
9855
9855
17015
17655
17655
17655
17655
17655
17655
24435
24575
24575
24575
24575
24575
24935
31615
32535
32535
39275
39275
39275
39695
39695
39695
46635
46635
46635
46635
46635
46635
46635
46635
46635
46635
46915
46915
47215
53655
53655
53655
53655
54135
54235
54235
54455
68975
68975
75555
75555
75755
75755
75755
75875
75875
75875
83355
89975
89975
89975
89975
89975
89975
89975
89975
89975
89975
89975
90275
90655
90895
97415
97415
97415
97415
97995
97995
105395
105395
105395
105395
105395
112255
112255
112475
112475
112475
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
119755
120215
126375
126375
126375
126375
126615
126615
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
127075
133995
133995
134195
134195
134195
134415
134415
134415
134415
134415
134415
134615
134615
141495
141495
141495
141495
141655
141655
148435
148

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

2691
2691
9891
9891
16951
16951
16951
17111
17111
17371
17371
17371
24571
32251
32251
53771
54011
54011
54011
54011
68651
75491
97991
97991
112531
126991
126991
133911
133911
133911
141711
141711
163891
163891
163891
170691
170691
178071
214451
214451
214451
214931
229531
229531
236291
236291
244211
244391
244471
244471
244471
251211
251211
251211
258371
280291
280291
280291
280431
280851
280851
280851


Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

Dates:   0%|          | 0/15340 [00:00<?, ?it/s]

In [11]:
# df.to_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/testdfHW v1.csv')

In [12]:
import pandas as pd

df=pd.read_csv('/mnt/1A42C1DD42C1BE2F/MyProjects/ML_HEATWAVE/Preprocessed Data/testdf v3.csv')

In [13]:
df

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,time,lat,lon,t2m,tmax,tmin,Surface Pressure,slp,...,rhum_300hpa,air_temp_1000hpa,air_temp_850hpa,air_temp_500hpa,air_temp_300hpa,shum_1000hpa,shum_850hpa,shum_500hpa,shum_300hpa,Heatwave
0,0,0,1981-01-01,27.5,85.0,279.42917,285.85928,263.38593,80000.0,101135.0,...,17.50,292.92500,283.87500,260.90002,238.57501,0.007865,0.005078,0.000320,0.000132,
1,1,1,1981-01-01,25.0,87.5,274.31250,281.47565,263.82330,72370.0,101170.0,...,8.50,292.90002,284.07500,260.65002,238.77501,0.008783,0.005775,0.000120,0.000063,
2,2,2,1981-01-01,22.5,90.0,273.32410,280.04720,268.87910,71430.0,101365.0,...,3.50,291.10000,283.42500,260.22504,238.80000,0.009350,0.006583,0.000249,0.000033,
3,3,3,1981-01-01,20.0,92.5,275.79236,281.85210,270.35547,78660.0,101635.0,...,2.25,289.20000,282.60000,259.57500,238.47500,0.010030,0.007505,0.000610,0.000022,
4,4,4,1981-01-01,27.5,95.0,278.98663,283.70680,272.85193,86430.0,101815.0,...,4.50,289.22498,282.55000,258.80000,237.87500,0.011783,0.008733,0.000667,0.000031,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
306795,306795,306795,2022-12-31,20.0,85.0,295.16550,301.03357,287.86496,100435.0,101760.0,...,12.00,297.17500,287.59998,268.22498,242.40000,0.010760,0.006025,0.000693,0.000120,
306796,306796,306796,2022-12-31,27.5,87.5,296.73250,297.56732,295.06418,102035.0,101815.0,...,12.75,295.50000,287.02500,268.34998,242.74998,0.010610,0.004300,0.000657,0.000133,
306797,306797,306797,2022-12-31,25.0,90.0,295.04993,295.60022,294.09973,102087.5,101847.5,...,13.75,293.94998,286.62497,268.44998,242.99998,0.008788,0.002942,0.000576,0.000148,
306798,306798,306798,2022-12-31,22.5,92.5,293.97375,296.33350,290.93094,100190.0,101787.5,...,13.00,293.67500,286.69998,268.40000,243.09998,0.008037,0.002800,0.000610,0.000142,


# WE should take this extent
ax.set_extent([88,92.7,26.7, 20.7])