<a id="top"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:center;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Woman Life Freedom</b></div>

<div style="text-align:center;">
  <img src="https://www.cfg.polis.cam.ac.uk/sites/www.cfg.polis.cam.ac.uk/files/styles/leading/public/shutterstock_2214441509.png?itok=8kwjDfB1" alt="woman_life_freedom">
</div>

<div style="text-align: justify;">
This notebook is dedicated to the brave women of Iran who are fighting for their freedom. Despite facing significant obstacles, Iranian women have continued to stand up for their rights and demand greater freedom and equality. We recognize that the struggle for women's rights is ongoing and that there is much work to be done. By supporting the fight for women's life and freedom, we are hoping to create a better future for all, one in which every person has the opportunity to live a free and fulfilling life.</div>

<a id="top"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>WiDS 2023</b></div>

<a id="1.2"></a>
<h2 style="font-family: Verdana; font-size: 20px; font-style: normal; font-weight: normal; text-decoration: none; text-transform: none; letter-spacing: 2px; color: #155D07; background-color: #ffffff;"><b>WiDS 2023</b> Bayesian Optimization for CatBoost Hyperparameter Tuning and ...</h2>

<div style="text-align: justify;">In this Kaggle notebook, we have employed a variety of advanced machine learning techniques to improve our model's performance. Firstly, we have used <b>Bayesian optimization</b> to tune the hyperparameters of our <b>CatBoost model</b>, which is a powerful gradient boosting algorithm. This approach enables us to automatically search the hyperparameter space, saving us significant amounts of time and manual effort. 
Furthermore, we have utilized <b>Adversarial Validation</b>, a technique that involves training a model to differentiate between training and test data. This approach allows us to identify whether our model is overfitting or underfitting the training data, thus improving its overall robustness.
Additionally, we have implemented <b>Explainable Machine Learning using SHAP</b>, a method that provides insights into how different features affect our model's predictions. This enables us to understand and interpret our model's behavior more effectively, improving our ability to make informed decisions based on its outputs.
We have also employed <b>Pseudo Labeling</b>, a technique that involves using a model's predictions on unlabeled data to generate new labeled data. This approach can significantly increase the amount of training data available to us, improving our model's performance on the test data.
Lastly, we have used <b>Ensemble Learning</b>, which involves combining the outputs of multiple models to create a final prediction. This approach can improve the overall accuracy and robustness of our model, particularly in cases where individual models may struggle to capture the full complexity of the data.</div>

<a id="top"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Table of content</b></div>

<div style="background-color:aliceblue; padding:30px; font-size:15px;color:#034914">
    
<a id="TOC"></a>
## Table of Content
* [Importing Required Libraries](#lib)
* [Reading Dataset](#read_data)
* [Processing Dataset](#process)
* [Plitting the Dataset](#split)
* [Adversarial Validation](#adv)
* [Bayesian Optimization for CatBoost](#bocat)
* [Feature Importance](#fi)
* [Explainability](#xml)
* [Pseudo Labeling](#PL)
* [Ensemble Learning](#EL)
* [Submission](#submit)
* [List of Kaggle Notebooks Used as a Reference](#list)

<a id="lib"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Importing Required Libraries</b></div> 

In [2]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import LabelEncoder
from catboost import CatBoostRegressor

<a id="read_data"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Reading Dataset</b></div> 

In [3]:
train_raw = pd.read_csv('../../data/train_data.csv', parse_dates=["startdate"])
test_raw = pd.read_csv('../../data/test_data.csv', parse_dates=["startdate"])
submit = pd.read_csv('../../data/sample_solution.csv')
target = 'contest-tmp2m-14d__tmp2m'

train_raw.head()

Unnamed: 0,index,lat,lon,startdate,contest-pevpr-sfc-gauss-14d__pevpr,nmme0-tmp2m-34w__cancm30,nmme0-tmp2m-34w__cancm40,nmme0-tmp2m-34w__ccsm30,nmme0-tmp2m-34w__ccsm40,nmme0-tmp2m-34w__cfsv20,nmme0-tmp2m-34w__gfdlflora0,nmme0-tmp2m-34w__gfdlflorb0,nmme0-tmp2m-34w__gfdl0,nmme0-tmp2m-34w__nasa0,nmme0-tmp2m-34w__nmme0mean,contest-wind-h10-14d__wind-hgt-10,nmme-tmp2m-56w__cancm3,nmme-tmp2m-56w__cancm4,nmme-tmp2m-56w__ccsm3,nmme-tmp2m-56w__ccsm4,nmme-tmp2m-56w__cfsv2,nmme-tmp2m-56w__gfdl,nmme-tmp2m-56w__gfdlflora,nmme-tmp2m-56w__gfdlflorb,nmme-tmp2m-56w__nasa,nmme-tmp2m-56w__nmmemean,contest-rhum-sig995-14d__rhum,nmme-prate-34w__cancm3,nmme-prate-34w__cancm4,nmme-prate-34w__ccsm3,nmme-prate-34w__ccsm4,nmme-prate-34w__cfsv2,nmme-prate-34w__gfdl,nmme-prate-34w__gfdlflora,nmme-prate-34w__gfdlflorb,nmme-prate-34w__nasa,nmme-prate-34w__nmmemean,contest-wind-h100-14d__wind-hgt-100,nmme0-prate-56w__cancm30,nmme0-prate-56w__cancm40,nmme0-prate-56w__ccsm30,nmme0-prate-56w__ccsm40,nmme0-prate-56w__cfsv20,nmme0-prate-56w__gfdlflora0,nmme0-prate-56w__gfdlflorb0,nmme0-prate-56w__gfdl0,nmme0-prate-56w__nasa0,nmme0-prate-56w__nmme0mean,nmme0-prate-34w__cancm30,nmme0-prate-34w__cancm40,nmme0-prate-34w__ccsm30,nmme0-prate-34w__ccsm40,nmme0-prate-34w__cfsv20,nmme0-prate-34w__gfdlflora0,nmme0-prate-34w__gfdlflorb0,nmme0-prate-34w__gfdl0,nmme0-prate-34w__nasa0,nmme0-prate-34w__nmme0mean,contest-tmp2m-14d__tmp2m,contest-slp-14d__slp,contest-wind-vwnd-925-14d__wind-vwnd-925,nmme-prate-56w__cancm3,nmme-prate-56w__cancm4,nmme-prate-56w__ccsm3,nmme-prate-56w__ccsm4,nmme-prate-56w__cfsv2,nmme-prate-56w__gfdl,nmme-prate-56w__gfdlflora,nmme-prate-56w__gfdlflorb,nmme-prate-56w__nasa,nmme-prate-56w__nmmemean,contest-pres-sfc-gauss-14d__pres,contest-wind-uwnd-250-14d__wind-uwnd-250,nmme-tmp2m-34w__cancm3,nmme-tmp2m-34w__cancm4,nmme-tmp2m-34w__ccsm3,nmme-tmp2m-34w__ccsm4,nmme-tmp2m-34w__cfsv2,nmme-tmp2m-34w__gfdl,nmme-tmp2m-34w__gfdlflora,nmme-tmp2m-34w__gfdlflorb,nmme-tmp2m-34w__nasa,nmme-tmp2m-34w__nmmemean,contest-prwtr-eatm-14d__prwtr,contest-wind-vwnd-250-14d__wind-vwnd-250,contest-precip-14d__precip,contest-wind-h850-14d__wind-hgt-850,contest-wind-uwnd-925-14d__wind-uwnd-925,contest-wind-h500-14d__wind-hgt-500,cancm30,cancm40,ccsm30,ccsm40,cfsv20,gfdlflora0,gfdlflorb0,gfdl0,nasa0,nmme0mean,climateregions__climateregion,elevation__elevation,wind-vwnd-250-2010-1,wind-vwnd-250-2010-2,wind-vwnd-250-2010-3,wind-vwnd-250-2010-4,wind-vwnd-250-2010-5,wind-vwnd-250-2010-6,wind-vwnd-250-2010-7,wind-vwnd-250-2010-8,wind-vwnd-250-2010-9,wind-vwnd-250-2010-10,wind-vwnd-250-2010-11,wind-vwnd-250-2010-12,wind-vwnd-250-2010-13,wind-vwnd-250-2010-14,wind-vwnd-250-2010-15,wind-vwnd-250-2010-16,wind-vwnd-250-2010-17,wind-vwnd-250-2010-18,wind-vwnd-250-2010-19,wind-vwnd-250-2010-20,wind-uwnd-250-2010-1,wind-uwnd-250-2010-2,wind-uwnd-250-2010-3,wind-uwnd-250-2010-4,wind-uwnd-250-2010-5,wind-uwnd-250-2010-6,wind-uwnd-250-2010-7,wind-uwnd-250-2010-8,wind-uwnd-250-2010-9,wind-uwnd-250-2010-10,wind-uwnd-250-2010-11,wind-uwnd-250-2010-12,wind-uwnd-250-2010-13,wind-uwnd-250-2010-14,wind-uwnd-250-2010-15,wind-uwnd-250-2010-16,wind-uwnd-250-2010-17,wind-uwnd-250-2010-18,wind-uwnd-250-2010-19,wind-uwnd-250-2010-20,mjo1d__phase,mjo1d__amplitude,mei__mei,mei__meirank,mei__nip,wind-hgt-850-2010-1,wind-hgt-850-2010-2,wind-hgt-850-2010-3,wind-hgt-850-2010-4,wind-hgt-850-2010-5,wind-hgt-850-2010-6,wind-hgt-850-2010-7,wind-hgt-850-2010-8,wind-hgt-850-2010-9,wind-hgt-850-2010-10,sst-2010-1,sst-2010-2,sst-2010-3,sst-2010-4,sst-2010-5,sst-2010-6,sst-2010-7,sst-2010-8,sst-2010-9,sst-2010-10,wind-hgt-500-2010-1,wind-hgt-500-2010-2,wind-hgt-500-2010-3,wind-hgt-500-2010-4,wind-hgt-500-2010-5,wind-hgt-500-2010-6,wind-hgt-500-2010-7,wind-hgt-500-2010-8,wind-hgt-500-2010-9,wind-hgt-500-2010-10,icec-2010-1,icec-2010-2,icec-2010-3,icec-2010-4,icec-2010-5,icec-2010-6,icec-2010-7,icec-2010-8,icec-2010-9,icec-2010-10,wind-uwnd-925-2010-1,wind-uwnd-925-2010-2,wind-uwnd-925-2010-3,wind-uwnd-925-2010-4,wind-uwnd-925-2010-5,wind-uwnd-925-2010-6,wind-uwnd-925-2010-7,wind-uwnd-925-2010-8,wind-uwnd-925-2010-9,wind-uwnd-925-2010-10,wind-uwnd-925-2010-11,wind-uwnd-925-2010-12,wind-uwnd-925-2010-13,wind-uwnd-925-2010-14,wind-uwnd-925-2010-15,wind-uwnd-925-2010-16,wind-uwnd-925-2010-17,wind-uwnd-925-2010-18,wind-uwnd-925-2010-19,wind-uwnd-925-2010-20,wind-hgt-10-2010-1,wind-hgt-10-2010-2,wind-hgt-10-2010-3,wind-hgt-10-2010-4,wind-hgt-10-2010-5,wind-hgt-10-2010-6,wind-hgt-10-2010-7,wind-hgt-10-2010-8,wind-hgt-10-2010-9,wind-hgt-10-2010-10,wind-hgt-100-2010-1,wind-hgt-100-2010-2,wind-hgt-100-2010-3,wind-hgt-100-2010-4,wind-hgt-100-2010-5,wind-hgt-100-2010-6,wind-hgt-100-2010-7,wind-hgt-100-2010-8,wind-hgt-100-2010-9,wind-hgt-100-2010-10,wind-vwnd-925-2010-1,wind-vwnd-925-2010-2,wind-vwnd-925-2010-3,wind-vwnd-925-2010-4,wind-vwnd-925-2010-5,wind-vwnd-925-2010-6,wind-vwnd-925-2010-7,wind-vwnd-925-2010-8,wind-vwnd-925-2010-9,wind-vwnd-925-2010-10,wind-vwnd-925-2010-11,wind-vwnd-925-2010-12,wind-vwnd-925-2010-13,wind-vwnd-925-2010-14,wind-vwnd-925-2010-15,wind-vwnd-925-2010-16,wind-vwnd-925-2010-17,wind-vwnd-925-2010-18,wind-vwnd-925-2010-19,wind-vwnd-925-2010-20
0,0,0.0,0.833333,2014-09-01,237.0,29.02,31.64,29.57,30.73,29.71,31.52,31.68,30.56,29.66,30.46,31246.63,28.3,29.47,27.13,27.36,27.71,28.25,27.7,28.72,28.38,28.11,81.72,25.33,17.55,13.59,25.28,38.05,18.06,23.2,38.59,16.5,24.02,16666.81,17.41,5.89,14.37,11.6,17.63,1.17,2.6,0.32,14.88,9.54,35.64,17.54,5.19,16.93,23.16,3.28,3.06,8.41,14.53,14.19,28.74448,101352.08,4.41,18.45,18.36,10.35,35.4,34.54,19.54,35.99,28.31,18.89,24.43,98644.97,-2.56,27.83,29.34,27.57,27.98,27.3,28.27,28.42,28.3,28.55,28.17,42.45,-3.52,94.31,1535.52,-5.22,5899.66,30.18,32.86,28.85,30.73,29.33,31.66,31.45,31.33,29.51,30.65,BSh,200.0,-111.29,33.66,-129.06,20.57,-123.14,-158.0,-125.92,104.95,15.14,-99.89,7.88,5.91,-208.23,18.67,21.0,134.88,43.65,-44.7,-3.7,-65.02,628.66,130.79,163.84,80.55,-86.61,83.69,-79.66,99.19,-11.93,21.48,62.06,285.66,-114.96,-28.03,-109.81,125.75,-71.99,35.85,-17.34,19.48,4,1.23,0.961,56,4,-2277.72,410.1,-2321.02,-1423.47,1064.98,-816.0,77.17,90.35,-160.02,413.91,352.2,-22.37,-19.69,13.58,19.29,-12.78,-25.2,7.55,-33.72,23.53,-7267.97,1100.68,-3189.61,993.02,1410.84,-1868.95,-1254.45,714.05,1549.2,-602.97,-4.33,0.97,0.15,-0.16,-0.08,0.15,-0.06,0.03,0.03,0.13,143.64,-13.59,-64.22,-0.32,124.3,-1.43,-81.98,61.4,89.64,17.96,-9.56,66.65,3.0,-69.2,-69.16,27.55,-18.55,-54.43,-12.14,39.02,-72427.68,-16054.1,10487.61,-4560.34,7128.13,-2281.45,-6076.15,-2209.63,3864.18,-3051.21,-25749.7,-5160.59,-1507.91,3391.32,-288.52,-1585.41,1544.02,944.73,-1267.75,-2402.46,-107.46,42.55,29.16,-63.35,23.47,45.56,-33.43,-3.89,4.18,69.09,-27.68,-37.21,8.32,9.56,-2.03,48.13,28.09,-13.5,11.9,4.58
1,1,0.0,0.833333,2014-09-02,228.9,29.02,31.64,29.57,30.73,29.71,31.52,31.68,30.56,29.66,30.46,31244.78,28.3,29.47,27.13,27.36,27.71,28.25,27.7,28.72,28.38,28.11,82.56,25.33,17.55,13.59,25.28,38.05,18.06,23.2,38.59,16.5,24.02,16667.31,17.41,5.89,14.37,11.6,17.63,1.17,2.6,0.32,14.88,9.54,35.64,17.54,5.19,16.93,23.16,3.28,3.06,8.41,14.53,14.19,28.370585,101396.02,3.74,18.45,18.36,10.35,35.4,34.54,19.54,35.99,28.31,18.89,24.43,98686.8,-2.39,27.83,29.34,27.57,27.98,27.3,28.27,28.42,28.3,28.55,28.17,42.66,-4.49,100.85,1538.0,-5.2,5901.03,30.18,32.86,28.85,30.73,29.33,31.66,31.45,31.33,29.51,30.65,BSh,200.0,-99.47,53.8,-117.91,56.54,-123.66,-150.93,-109.57,117.1,-2.39,-113.06,1.33,17.87,-206.98,23.89,5.08,139.95,45.29,-37.26,3.63,-50.56,615.58,135.48,166.71,69.06,-89.23,85.25,-68.43,109.73,6.93,11.32,72.99,269.52,-130.3,-21.22,-93.94,148.57,-62.79,28.76,-7.75,25.38,4,1.53,0.961,56,4,-2287.34,354.17,-2270.79,-1419.57,977.45,-670.75,77.48,-48.07,-71.8,507.96,350.96,-21.58,-20.66,12.14,19.55,-13.34,-25.84,6.36,-34.63,22.98,-7189.77,970.54,-3095.67,891.34,1382.87,-1740.29,-953.14,769.46,1667.04,-849.11,-4.33,0.97,0.15,-0.16,-0.08,0.15,-0.06,0.03,0.03,0.13,143.41,-13.11,-61.28,-2.19,119.37,-9.0,-95.96,64.9,92.06,29.89,-5.77,61.97,1.23,-71.98,-63.53,38.16,-16.09,-50.22,-8.39,36.31,-70659.22,-16485.03,12437.04,-5318.37,8109.37,-1957.36,-6672.23,-3786.46,2626.55,-3623.29,-25474.37,-5356.7,-1367.76,3188.99,-221.06,-1193.63,1256.48,2018.62,-1110.56,-2413.81,-105.73,45.91,34.83,-56.93,36.28,41.43,-38.06,-1.65,10.08,71.93,-21.13,-36.57,8.77,21.17,4.44,48.6,27.41,-23.77,15.44,3.42
2,2,0.0,0.833333,2014-09-03,220.69,29.02,31.64,29.57,30.73,29.71,31.52,31.68,30.56,29.66,30.46,31239.27,28.3,29.47,27.13,27.36,27.71,28.25,27.7,28.72,28.38,28.11,83.29,25.33,17.55,13.59,25.28,38.05,18.06,23.2,38.59,16.5,24.02,16668.39,17.41,5.89,14.37,11.6,17.63,1.17,2.6,0.32,14.88,9.54,35.64,17.54,5.19,16.93,23.16,3.28,3.06,8.41,14.53,14.19,28.133059,101429.25,3.4,18.45,18.36,10.35,35.4,34.54,19.54,35.99,28.31,18.89,24.43,98712.85,-2.76,27.83,29.34,27.57,27.98,27.3,28.27,28.42,28.3,28.55,28.17,43.23,-5.44,101.25,1540.32,-5.0,5902.18,30.18,32.86,28.85,30.73,29.33,31.66,31.45,31.33,29.51,30.65,BSh,200.0,-88.76,74.89,-102.43,94.19,-117.99,-138.5,-88.66,115.57,-20.02,-118.37,-8.5,24.55,-194.54,28.53,-15.99,140.68,42.8,-27.63,9.96,-32.91,602.14,142.06,168.92,57.24,-89.53,86.13,-61.32,118.23,20.24,-1.16,85.99,251.42,-139.9,-12.64,-82.93,168.5,-52.32,24.45,2.0,31.32,4,1.46,0.961,56,4,-2295.33,334.95,-2261.8,-1396.14,902.95,-552.13,116.42,-150.51,39.03,631.29,349.86,-20.77,-21.34,10.97,19.5,-13.59,-26.26,5.42,-35.04,22.54,-7090.19,880.24,-3070.28,774.09,1336.25,-1635.57,-702.21,797.41,1743.11,-1129.0,-4.33,0.97,0.15,-0.16,-0.08,0.15,-0.06,0.03,0.03,0.13,145.35,-10.54,-61.45,-2.44,116.55,-15.54,-104.55,67.94,93.13,43.62,3.96,56.82,-2.95,-71.58,-55.77,46.78,-13.1,-46.99,-3.23,32.81,-68699.81,-16789.52,14299.35,-5947.6,9137.52,-1747.08,-7296.78,-5193.92,1591.47,-4080.94,-25200.29,-5546.88,-1230.46,2996.82,-111.6,-796.13,936.58,2959.85,-995.13,-2302.45,-102.51,46.8,42.82,-47.66,44.84,35.24,-44.97,-0.51,16.77,68.09,-10.72,-34.16,6.99,32.16,5.01,48.53,19.21,-33.16,15.11,4.82
3,3,0.0,0.833333,2014-09-04,225.28,29.02,31.64,29.57,30.73,29.71,31.52,31.68,30.56,29.66,30.46,31232.86,28.3,29.47,27.13,27.36,27.71,28.25,27.7,28.72,28.38,28.11,83.26,25.33,17.55,13.59,25.28,38.05,18.06,23.2,38.59,16.5,24.02,16667.39,17.41,5.89,14.37,11.6,17.63,1.17,2.6,0.32,14.88,9.54,35.64,17.54,5.19,16.93,23.16,3.28,3.06,8.41,14.53,14.19,28.256798,101440.85,3.29,18.45,18.36,10.35,35.4,34.54,19.54,35.99,28.31,18.89,24.43,98711.7,-3.0,27.83,29.34,27.57,27.98,27.3,28.27,28.42,28.3,28.55,28.17,43.11,-5.76,101.9,1541.1,-4.61,5903.07,30.18,32.86,28.85,30.73,29.33,31.66,31.45,31.33,29.51,30.65,BSh,200.0,-77.04,79.46,-88.59,121.78,-103.53,-124.32,-77.34,104.81,-37.61,-114.71,-13.53,26.21,-173.41,31.15,-34.71,139.34,41.75,-18.27,20.78,-19.75,589.63,149.98,171.96,41.14,-91.42,87.07,-55.28,123.91,27.45,-12.97,95.4,227.56,-145.99,-3.04,-72.03,183.67,-45.98,20.95,11.51,33.25,4,1.51,0.961,56,4,-2263.73,438.36,-2352.35,-1427.64,825.67,-480.55,183.76,-228.7,206.58,774.61,348.91,-20.01,-21.92,9.78,19.24,-13.75,-26.48,4.58,-35.28,22.25,-6914.23,876.33,-3137.47,736.64,1258.48,-1558.63,-516.17,824.13,1746.57,-1363.79,-4.33,0.97,0.15,-0.16,-0.08,0.15,-0.06,0.03,0.03,0.13,146.9,-6.5,-65.74,-2.59,116.69,-27.0,-107.03,71.46,93.19,57.48,16.86,49.49,-13.59,-69.2,-49.13,55.99,-9.86,-43.77,6.23,27.68,-66588.29,-16959.46,15886.51,-6369.78,10010.87,-1643.65,-7883.22,-6267.81,816.9,-4445.18,-24789.7,-5692.21,-1177.18,2799.89,-38.07,-362.72,608.32,3796.72,-966.93,-2125.09,-96.11,39.69,56.54,-38.25,47.34,27.05,-53.85,-3.6,17.6,60.2,0.33,-31.04,6.17,39.66,-1.41,50.59,8.29,-37.22,18.24,9.74
4,4,0.0,0.833333,2014-09-05,237.24,29.02,31.64,29.57,30.73,29.71,31.52,31.68,30.56,29.66,30.46,31226.16,28.3,29.47,27.13,27.36,27.71,28.25,27.7,28.72,28.38,28.11,82.5,25.33,17.55,13.59,25.28,38.05,18.06,23.2,38.59,16.5,24.02,16665.65,17.41,5.89,14.37,11.6,17.63,1.17,2.6,0.32,14.88,9.54,35.64,17.54,5.19,16.93,23.16,3.28,3.06,8.41,14.53,14.19,28.372353,101419.53,3.27,18.45,18.36,10.35,35.4,34.54,19.54,35.99,28.31,18.89,24.43,98686.46,-3.4,27.83,29.34,27.57,27.98,27.3,28.27,28.42,28.3,28.55,28.17,42.98,-6.09,82.95,1539.73,-4.25,5903.36,30.18,32.86,28.85,30.73,29.33,31.66,31.45,31.33,29.51,30.65,BSh,200.0,-65.4,70.1,-78.36,144.48,-94.88,-107.74,-78.18,92.37,-53.54,-104.17,-19.09,24.1,-151.92,34.04,-38.14,137.02,40.43,-10.92,27.23,-7.78,576.23,159.72,177.59,17.51,-96.24,91.71,-52.36,122.64,28.07,-19.38,98.55,205.57,-148.74,4.09,-61.63,189.46,-39.14,11.62,17.77,33.8,4,1.51,0.961,56,4,-2198.01,627.74,-2452.71,-1519.72,770.7,-457.91,264.03,-285.36,427.52,871.73,348.03,-19.25,-22.54,8.51,18.92,-13.85,-26.78,3.79,-35.41,22.23,-6652.36,1003.01,-3247.54,823.97,1198.6,-1474.82,-405.27,848.8,1698.71,-1571.86,-4.33,0.97,0.15,-0.16,-0.08,0.15,-0.06,0.03,0.03,0.13,148.67,-1.42,-70.59,-5.65,118.16,-37.81,-111.03,77.17,90.67,69.27,27.42,40.16,-22.93,-69.88,-44.64,63.34,-7.72,-44.71,17.8,19.15,-64311.3,-17057.07,17285.6,-6673.71,10458.26,-1488.12,-8267.89,-7134.7,227.75,-4620.95,-24181.96,-5754.12,-1208.87,2582.56,-35.19,80.43,355.94,4507.2,-1028.33,-1885.9,-89.19,33.68,69.34,-30.75,46.03,22.73,-62.81,-9.07,17.09,52.09,9.83,-31.8,7.47,38.62,-5.21,54.73,-2.58,-42.3,21.91,10.95


<a id = "process"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Preprocessing Dataset</b></div> 

In [4]:
def rmse(actual, predicted):
    return mean_squared_error(actual, predicted, squared=False)

def location_nom(train, test):
    # Ref: https://www.kaggle.com/code/flaviafelicioni/wids-2023-different-locations-train-test-solved
    scale = 14

    train.loc[:,'lat']=round(train.lat,scale)
    train.loc[:,'lon']=round(train.lon,scale)
    test.loc[:,'lat']=round(test.lat,scale)
    test.loc[:,'lon']=round(test.lon,scale)

    all_df = pd.concat([train, test], axis=0)
    all_df['loc_group'] = all_df.groupby(['lat','lon']).ngroup()
    train = all_df.iloc[:len(train)]
    test = all_df.iloc[len(train):].drop(target, axis=1)
    
    return train, test

def categorical_encode(train, test):
    # TODO: change it to one-hot encoding
    le = LabelEncoder()
    train['climateregions__climateregion'] = le.fit_transform(train['climateregions__climateregion'])
    test['climateregions__climateregion'] = le.transform(test['climateregions__climateregion'])
    return train, test
    
def fill_na(df):
    # TODO: fill na with mean or median
    df = df.sort_values(by=['loc_group', 'startdate']).ffill()
    return df

def creat_new_featute(df):
    df['year'] = df['startdate'].dt.year
    df['month'] = df['startdate'].dt.month
    df['day_of_year'] = df['startdate'].dt.dayofyear
    # TODO: add periodical features
    # df['day_of_week'] = df['startdate'].dt.dayofweek
    # df['week_of_year'] = df['startdate'].dt.isocalendar().week
    return df

#TODO: drop features with high correlation
def feature_engineering(train_raw, test_raw):
    train, test = location_nom(train_raw, test_raw)
    train = fill_na(train)
    train = creat_new_featute(train)
    test = creat_new_featute(test)
    train, test = categorical_encode(train, test)

    drop_cols = ['index', 'startdate', 'lat', 'lon', target]
    features = [col for col in train.columns if col not in drop_cols]
    X = train[features]
    X_test = test[features]
    y = train[target]

    return X, y, X_test

# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Train and Validation</b></div> 

In [5]:
X, y, X_test = feature_engineering(train_raw.copy(), test_raw.copy())
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33, random_state=42)
print(f'Train_shape: {X_train.shape}    |   Val_shape: {X_val.shape}    |   Test_shape: {X_test.shape}')

Train_shape: (251741, 245)    |   Val_shape: (123993, 245)    |   Test_shape: (31354, 245)


<a id = "adv"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Adversarial Validation</b></div>

Adversarial Validation is a technique used to ensure that the distribution of data in the training set is similar to that of the test set. This is important because if the data in the training set is not representative of the test set, the model's predictions may not be accurate.

To perform Adversarial Validation, the following steps are taken:

1. Combine the train and test features into a single set
1. Create a target label to indicate whether a sample is from the train or test set
1. Build a model to classify samples as belonging to the train or test set

If the model is able to accurately distinguish between train and test samples, this indicates that there are features in the data that are different between the two sets. Adversarial Validation can be used to identify these features by using the feature importance generated by the model and evaluating the separation between the train and test datasets using the AUC metric.

Reference: Pan, J., Pham, V., Dorairaj, M., Chen, H., & Lee, J. Y. (2020). Adversarial validation approach to concept drift problem in user targeting automation systems at uber. [arXiv preprint arXiv:2004.03045](https://arxiv.org/abs/2004.03045).

In [6]:
import lightgbm as lgb

def run_adversial_validation(train_X_ml, test_X_ml):
    
    lgb_params = {'n_estimators':100,
                'boosting_type': 'gbdt',
                'objective': 'binary',
                'metric': 'auc',
                'verbose': 0
                    }
    # combine train & test features, create label to identify test vs train
    ad_y = np.array([1]*train_X_ml.shape[0] + [0]*test_X_ml.shape[0])
    ad_X = pd.concat([train_X_ml, test_X_ml])

    # evaluate model performance using cross-validation
    lgb_data = lgb.Dataset(ad_X, ad_y)
    cv_lgb = lgb.cv(lgb_params, lgb_data)

    print("Adversarial Validation AUC Score: {}".format(cv_lgb['auc-mean'][-1]))
    
    # train model & get feature importance
    ad_val_mod = lgb.train(lgb_params, lgb_data)
    
    print(pd.DataFrame(
        {'feat':ad_X.columns, 
         'imp':ad_val_mod.feature_importance()}).sort_values('imp', ascending = False))
    
    return ad_val_mod

In [7]:
ad_val_mod = run_adversial_validation(X_train, X_test)



You can set `force_col_wise=true` to remove the overhead.
You can set `force_col_wise=true` to remove the overhead.
You can set `force_col_wise=true` to remove the overhead.
You can set `force_col_wise=true` to remove the overhead.
You can set `force_col_wise=true` to remove the overhead.
Adversarial Validation AUC Score: 1.0
You can set `force_col_wise=true` to remove the overhead.




                                   feat  imp
0    contest-pevpr-sfc-gauss-14d__pevpr  221
1              nmme0-tmp2m-34w__cancm30  105
242                                year  100
2              nmme0-tmp2m-34w__cancm40   34
3               nmme0-tmp2m-34w__ccsm30   25
..                                  ...  ...
129               wind-uwnd-250-2010-14    0
130               wind-uwnd-250-2010-15    0
131               wind-uwnd-250-2010-16    0
132               wind-uwnd-250-2010-17    0
244                         day_of_year    0

[245 rows x 2 columns]


It seems that there is a concept drift between train and test dataset and removing features like "contest-pevpr-sfc-gauss-14d__pevpr" and "nmme0-tmp2m-34w__cancm30" might help to increase the performance. 

In [None]:
# X.drop(['contest-pevpr-sfc-gauss-14d__pevpr','nmme0-tmp2m-34w__cancm30'], inplace = True)
# X_test.drop(['contest-pevpr-sfc-gauss-14d__pevpr','nmme0-tmp2m-34w__cancm30'], inplace = True)

<a id="bocat"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Bayesian Optimization for CatBoost</b></div> 

[Here's](https://en.wikipedia.org/wiki/Bayesian_optimization) a wikipedia article about Bayesian Optimization, it's essentially a way to find good parameters by searching for these parameters sequentially. So the next parameter search values depend on the performance of the previous parameter values. This is a popular technique for finding optimal parameters. This may take some time to run. Also, you can tune many parameters, below are just some of the parameters that I choose to tune. For this notebook, I'm going to fix the number of estimators to be 100 to save time when running this notebook, but you can increase it to your liking or tune that parameter too if you'd like. You can change the number of iterations and initial points.

In [10]:
# source: https://medium.com/ai-in-plain-english/catboost-cross-validated-bayesian-hyperparameter-tuning-91f1804b71dd

X1, Y1 = X.copy(), y.copy()

from catboost import Pool, cv, CatBoostRegressor
from bayes_opt import BayesianOptimization
from bayes_opt import BayesianOptimization as BO
import warnings
from sklearn.model_selection import * 
from sklearn.metrics import *

Use_BO = True

if Use_BO:
    #n_estimators,
    # num_leaves
    def CB_opt(depth, learning_rate, subsample, l2_leaf_reg, model_size_reg): 

        scores = []
    #     skf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 1944)
        trainx, valx, trainy, valy = train_test_split(X1, Y1, test_size=0.33, random_state=42)

        reg = CatBoostRegressor(   
                                        verbose = 0,
                                        #iterations=10,
                                        #n_estimators = 10,
                                        learning_rate = learning_rate,
                                        subsample = subsample, 
                                        l2_leaf_reg = l2_leaf_reg,
                                        max_depth = int(depth),
                                        #num_leaves = int(num_leaves),
                                        random_state = 1212,
                                        #grow_policy = "Lossguide",
    #                                     max_bin = int(max_bin),  
                                        use_best_model = True, 
                                        # bootstrap_type='Bayesian',
                                        loss_function='RMSE',
                                        model_size_reg = model_size_reg
                                    )

        reg.fit(trainx, trainy, eval_set = (valx, valy))
        y_pred = reg.predict(valx)
        scores.append(rmse(valy, y_pred))

        return 1/np.mean(scores)

    #"n_estimators": (150,1200),
    # "num_leaves": (100,150),
    # "max_bin":(150,300),
    pbounds = {
               "depth": (6, 7),
               "learning_rate": (0.09, 0.0980689972639084),
               "subsample":(0.7, 0.800000011920929),
               "l2_leaf_reg":(2,4),
               "model_size_reg": (0.48, 0.5)
    }

    optimizer = BayesianOptimization(f = CB_opt, pbounds = pbounds,  verbose = 2, random_state = 1212)

    optimizer.maximize(init_points = 7, n_iter = 30, acq = 'ucb', alpha = 1e-6)

    print(optimizer.max)

    max_bo_params = optimizer.max['params']

    max_bo_params

|   iter    |  target   |   depth   | l2_lea... | learni... | model_... | subsample |
-------------------------------------------------------------------------------------


Passing acquisition function parameters or gaussian process parameters to maximize
is no longer supported, and will cause an error in future releases. Instead,
please use the "set_gp_params" method to set the gp params, and pass an instance
 of bayes_opt.util.UtilityFunction using the acquisition_function argument

  optimizer.maximize(init_points = 7, n_iter = 30, acq = 'ucb', alpha = 1e-6)


| [0m1        [0m | [0m1.684    [0m | [0m6.245    [0m | [0m3.609    [0m | [0m0.09141  [0m | [0m0.4994   [0m | [0m0.7225   [0m |
| [95m2        [0m | [95m1.735    [0m | [95m6.869    [0m | [95m2.563    [0m | [95m0.09585  [0m | [95m0.495    [0m | [95m0.7247   [0m |
| [0m3        [0m | [0m1.731    [0m | [0m6.772    [0m | [0m2.171    [0m | [0m0.09719  [0m | [0m0.4844   [0m | [0m0.7934   [0m |
| [0m4        [0m | [0m1.69     [0m | [0m6.091    [0m | [0m3.615    [0m | [0m0.09314  [0m | [0m0.4989   [0m | [0m0.7017   [0m |
| [0m5        [0m | [0m1.716    [0m | [0m6.542    [0m | [0m2.583    [0m | [0m0.09693  [0m | [0m0.4901   [0m | [0m0.7719   [0m |
| [0m6        [0m | [0m1.697    [0m | [0m6.311    [0m | [0m2.531    [0m | [0m0.09148  [0m | [0m0.4864   [0m | [0m0.7726   [0m |
| [0m7        [0m | [0m1.687    [0m | [0m6.908    [0m | [0m2.939    [0m | [0m0.0909   [0m | [0m0.4853   [0m | [0m0.7682   [0m |

KeyboardInterrupt: 

In [17]:
Use_BO_result = False

if Use_BO_result:
    opt_params = {
              'iterations':2000,
              'verbose':0,
              'learning_rate' : max_bo_params['learning_rate'],
              'subsample' : max_bo_params['subsample'], 
              'l2_leaf_reg' : max_bo_params['l2_leaf_reg'],
              'max_depth' : int(max_bo_params['depth']), 
              'use_best_model' : True, 
              'loss_function' : 'RMSE',
              'model_size_reg' : max_bo_params['model_size_reg']
             }
else:
    opt_params = {
          'iterations':25000,
          'verbose':2,
          'learning_rate' : 0.0980689972639084,
          'subsample' : 0.7443133148363695, 
          'l2_leaf_reg' : 2.3722386345448316,
          'max_depth' : int(6.599144674342465),
          'use_best_model' : True, 
          'loss_function' : 'RMSE',
          'model_size_reg' : 0.4833187897595954
         }

In [21]:
## catBoost Pool object
train_pool = Pool(data=X1,label = Y1)

X_train, X_test2, y_train, y_test = train_test_split(X1, Y1, test_size=0.33, random_state=42)
# set verbose higher to see more information
bst = CatBoostRegressor(**opt_params)
bst.fit(train_pool, eval_set=(X_test2, y_test), plot=False,)#silent=True)
print(bst.get_best_score())

0:	learn: 9.0160862	test: 9.0117759	best: 9.0117759 (0)	total: 110ms	remaining: 45m 39s
2:	learn: 7.5600058	test: 7.5565097	best: 7.5565097 (2)	total: 221ms	remaining: 30m 40s
4:	learn: 6.3970904	test: 6.3950056	best: 6.3950056 (4)	total: 372ms	remaining: 31m 1s
6:	learn: 5.4473414	test: 5.4456523	best: 5.4456523 (6)	total: 520ms	remaining: 30m 57s
8:	learn: 4.6840196	test: 4.6824534	best: 4.6824534 (8)	total: 620ms	remaining: 28m 41s
10:	learn: 4.0783284	test: 4.0765575	best: 4.0765575 (10)	total: 722ms	remaining: 27m 19s
12:	learn: 3.5961510	test: 3.5943468	best: 3.5943468 (12)	total: 820ms	remaining: 26m 15s
14:	learn: 3.2138718	test: 3.2122997	best: 3.2122997 (14)	total: 912ms	remaining: 25m 18s
16:	learn: 2.9048395	test: 2.9032687	best: 2.9032687 (16)	total: 1.01s	remaining: 24m 48s
18:	learn: 2.6598907	test: 2.6587649	best: 2.6587649 (18)	total: 1.11s	remaining: 24m 19s
20:	learn: 2.4534963	test: 2.4520579	best: 2.4520579 (20)	total: 1.2s	remaining: 23m 50s
22:	learn: 2.2982026	t

In [24]:
y_pred_cat = bst.predict(X_test)
submit[target] = y_pred_cat
submit.to_csv('submission0.csv', index = False)

<a id="fi"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Feature Importance</b></div>

In [None]:
import matplotlib.pyplot as plt
feature_importance = bst.feature_importances_
max_features = 50
sorted_idx = np.argsort(feature_importance)[-max_features:]
fig = plt.figure(figsize=(8, 12))
plt.barh(range(len(sorted_idx)), feature_importance[sorted_idx], align='center')
plt.yticks(range(len(sorted_idx)), np.array(X_val.columns)[sorted_idx])
plt.title('Feature Importance')

<a id = "xml"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Machine Learning Explainability</b></div>

<div style="text-align: justify;">In this section, we showcase a sample of SHAP explainability evaluation for our model. SHAP (SHapley Additive exPlanations) is a popular approach for providing model interpretability by measuring the contribution of each feature to a prediction. This enables us to identify which features have the greatest impact on our model's outputs, and how they are related to the predicted values. 
To perform the SHAP evaluation, we first generate a set of test data and extract the features we want to evaluate. We then use the SHAP library to compute the SHAP values for each feature, which represent the change in the predicted value as a result of changing the feature value while holding all other features constant. These values are visualized using a SHAP summary plot, which shows the features ranked by their importance and the direction of their impact on the predicted value. Through the SHAP summary plot, we can observe the top contributing features to our model's predictions, and how they are positively or negatively correlated with the output. This enables us to gain insights into the underlying relationships between the features and the target variable, improving our understanding of how our model makes its predictions. Overall, the SHAP explainability evaluation provides a valuable tool for gaining insights into the inner workings of our model and making informed decisions based on its outputs.</div>

In [None]:
import shap

explainer = shap.Explainer(bst)
shap_values = explainer(X1)

# visualize the first prediction's explanation
shap.plots.waterfall(shap_values[0])

<a id = "PL"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Pseudo Labeling and Postprocessing</b></div>

In [26]:
# Pseudo Labelling
train_pseudo = X_test.copy()
ddf = pd.read_csv('submission0.csv')
y_test_pred  = ddf[target] #bst.predict(X_test)
train_pseudo[target] = y_test_pred
train_mod = pd.concat([X_train.copy(), train_pseudo], axis=0).reset_index(drop=True)
features = [c for c in X_test.columns if (c != 'id')]
display(train_mod)

XX = train_mod[features]
yy = train_mod[target]
y_oof_pred = np.zeros(len(yy))

X_testt = X_test[features].values
y_test_pred2 = np.zeros(len(X_testt))

Unnamed: 0,contest-pevpr-sfc-gauss-14d__pevpr,nmme0-tmp2m-34w__cancm30,nmme0-tmp2m-34w__cancm40,nmme0-tmp2m-34w__ccsm30,nmme0-tmp2m-34w__ccsm40,nmme0-tmp2m-34w__cfsv20,nmme0-tmp2m-34w__gfdlflora0,nmme0-tmp2m-34w__gfdlflorb0,nmme0-tmp2m-34w__gfdl0,nmme0-tmp2m-34w__nasa0,nmme0-tmp2m-34w__nmme0mean,contest-wind-h10-14d__wind-hgt-10,nmme-tmp2m-56w__cancm3,nmme-tmp2m-56w__cancm4,nmme-tmp2m-56w__ccsm3,nmme-tmp2m-56w__ccsm4,nmme-tmp2m-56w__cfsv2,nmme-tmp2m-56w__gfdl,nmme-tmp2m-56w__gfdlflora,nmme-tmp2m-56w__gfdlflorb,nmme-tmp2m-56w__nasa,nmme-tmp2m-56w__nmmemean,contest-rhum-sig995-14d__rhum,nmme-prate-34w__cancm3,nmme-prate-34w__cancm4,nmme-prate-34w__ccsm3,nmme-prate-34w__ccsm4,nmme-prate-34w__cfsv2,nmme-prate-34w__gfdl,nmme-prate-34w__gfdlflora,nmme-prate-34w__gfdlflorb,nmme-prate-34w__nasa,nmme-prate-34w__nmmemean,contest-wind-h100-14d__wind-hgt-100,nmme0-prate-56w__cancm30,nmme0-prate-56w__cancm40,nmme0-prate-56w__ccsm30,nmme0-prate-56w__ccsm40,nmme0-prate-56w__cfsv20,nmme0-prate-56w__gfdlflora0,nmme0-prate-56w__gfdlflorb0,nmme0-prate-56w__gfdl0,nmme0-prate-56w__nasa0,nmme0-prate-56w__nmme0mean,nmme0-prate-34w__cancm30,nmme0-prate-34w__cancm40,nmme0-prate-34w__ccsm30,nmme0-prate-34w__ccsm40,nmme0-prate-34w__cfsv20,nmme0-prate-34w__gfdlflora0,nmme0-prate-34w__gfdlflorb0,nmme0-prate-34w__gfdl0,nmme0-prate-34w__nasa0,nmme0-prate-34w__nmme0mean,contest-slp-14d__slp,contest-wind-vwnd-925-14d__wind-vwnd-925,nmme-prate-56w__cancm3,nmme-prate-56w__cancm4,nmme-prate-56w__ccsm3,nmme-prate-56w__ccsm4,nmme-prate-56w__cfsv2,nmme-prate-56w__gfdl,nmme-prate-56w__gfdlflora,nmme-prate-56w__gfdlflorb,nmme-prate-56w__nasa,nmme-prate-56w__nmmemean,contest-pres-sfc-gauss-14d__pres,contest-wind-uwnd-250-14d__wind-uwnd-250,nmme-tmp2m-34w__cancm3,nmme-tmp2m-34w__cancm4,nmme-tmp2m-34w__ccsm3,nmme-tmp2m-34w__ccsm4,nmme-tmp2m-34w__cfsv2,nmme-tmp2m-34w__gfdl,nmme-tmp2m-34w__gfdlflora,nmme-tmp2m-34w__gfdlflorb,nmme-tmp2m-34w__nasa,nmme-tmp2m-34w__nmmemean,contest-prwtr-eatm-14d__prwtr,contest-wind-vwnd-250-14d__wind-vwnd-250,contest-precip-14d__precip,contest-wind-h850-14d__wind-hgt-850,contest-wind-uwnd-925-14d__wind-uwnd-925,contest-wind-h500-14d__wind-hgt-500,cancm30,cancm40,ccsm30,ccsm40,cfsv20,gfdlflora0,gfdlflorb0,gfdl0,nasa0,nmme0mean,climateregions__climateregion,elevation__elevation,wind-vwnd-250-2010-1,wind-vwnd-250-2010-2,wind-vwnd-250-2010-3,wind-vwnd-250-2010-4,wind-vwnd-250-2010-5,wind-vwnd-250-2010-6,wind-vwnd-250-2010-7,wind-vwnd-250-2010-8,wind-vwnd-250-2010-9,wind-vwnd-250-2010-10,wind-vwnd-250-2010-11,wind-vwnd-250-2010-12,wind-vwnd-250-2010-13,wind-vwnd-250-2010-14,wind-vwnd-250-2010-15,wind-vwnd-250-2010-16,wind-vwnd-250-2010-17,wind-vwnd-250-2010-18,wind-vwnd-250-2010-19,wind-vwnd-250-2010-20,wind-uwnd-250-2010-1,wind-uwnd-250-2010-2,wind-uwnd-250-2010-3,wind-uwnd-250-2010-4,wind-uwnd-250-2010-5,wind-uwnd-250-2010-6,wind-uwnd-250-2010-7,wind-uwnd-250-2010-8,wind-uwnd-250-2010-9,wind-uwnd-250-2010-10,wind-uwnd-250-2010-11,wind-uwnd-250-2010-12,wind-uwnd-250-2010-13,wind-uwnd-250-2010-14,wind-uwnd-250-2010-15,wind-uwnd-250-2010-16,wind-uwnd-250-2010-17,wind-uwnd-250-2010-18,wind-uwnd-250-2010-19,wind-uwnd-250-2010-20,mjo1d__phase,mjo1d__amplitude,mei__mei,mei__meirank,mei__nip,wind-hgt-850-2010-1,wind-hgt-850-2010-2,wind-hgt-850-2010-3,wind-hgt-850-2010-4,wind-hgt-850-2010-5,wind-hgt-850-2010-6,wind-hgt-850-2010-7,wind-hgt-850-2010-8,wind-hgt-850-2010-9,wind-hgt-850-2010-10,sst-2010-1,sst-2010-2,sst-2010-3,sst-2010-4,sst-2010-5,sst-2010-6,sst-2010-7,sst-2010-8,sst-2010-9,sst-2010-10,wind-hgt-500-2010-1,wind-hgt-500-2010-2,wind-hgt-500-2010-3,wind-hgt-500-2010-4,wind-hgt-500-2010-5,wind-hgt-500-2010-6,wind-hgt-500-2010-7,wind-hgt-500-2010-8,wind-hgt-500-2010-9,wind-hgt-500-2010-10,icec-2010-1,icec-2010-2,icec-2010-3,icec-2010-4,icec-2010-5,icec-2010-6,icec-2010-7,icec-2010-8,icec-2010-9,icec-2010-10,wind-uwnd-925-2010-1,wind-uwnd-925-2010-2,wind-uwnd-925-2010-3,wind-uwnd-925-2010-4,wind-uwnd-925-2010-5,wind-uwnd-925-2010-6,wind-uwnd-925-2010-7,wind-uwnd-925-2010-8,wind-uwnd-925-2010-9,wind-uwnd-925-2010-10,wind-uwnd-925-2010-11,wind-uwnd-925-2010-12,wind-uwnd-925-2010-13,wind-uwnd-925-2010-14,wind-uwnd-925-2010-15,wind-uwnd-925-2010-16,wind-uwnd-925-2010-17,wind-uwnd-925-2010-18,wind-uwnd-925-2010-19,wind-uwnd-925-2010-20,wind-hgt-10-2010-1,wind-hgt-10-2010-2,wind-hgt-10-2010-3,wind-hgt-10-2010-4,wind-hgt-10-2010-5,wind-hgt-10-2010-6,wind-hgt-10-2010-7,wind-hgt-10-2010-8,wind-hgt-10-2010-9,wind-hgt-10-2010-10,wind-hgt-100-2010-1,wind-hgt-100-2010-2,wind-hgt-100-2010-3,wind-hgt-100-2010-4,wind-hgt-100-2010-5,wind-hgt-100-2010-6,wind-hgt-100-2010-7,wind-hgt-100-2010-8,wind-hgt-100-2010-9,wind-hgt-100-2010-10,wind-vwnd-925-2010-1,wind-vwnd-925-2010-2,wind-vwnd-925-2010-3,wind-vwnd-925-2010-4,wind-vwnd-925-2010-5,wind-vwnd-925-2010-6,wind-vwnd-925-2010-7,wind-vwnd-925-2010-8,wind-vwnd-925-2010-9,wind-vwnd-925-2010-10,wind-vwnd-925-2010-11,wind-vwnd-925-2010-12,wind-vwnd-925-2010-13,wind-vwnd-925-2010-14,wind-vwnd-925-2010-15,wind-vwnd-925-2010-16,wind-vwnd-925-2010-17,wind-vwnd-925-2010-18,wind-vwnd-925-2010-19,wind-vwnd-925-2010-20,loc_group,year,month,day_of_year,contest-tmp2m-14d__tmp2m
0,158.93,18.39,21.47,17.64,19.88,16.98,16.02,16.34,18.78,19.43,18.33,30890.89,9.38,13.10,10.37,11.29,9.89,10.18,9.45,9.80,9.61,10.34,54.54,18.87,7.58,15.63,18.85,24.83,14.74,22.46,24.03,19.97,18.55,16394.42,21.16,9.42,10.22,19.91,39.61,17.91,23.41,14.18,42.67,22.05,21.16,9.42,10.22,19.91,39.61,17.91,23.41,14.18,42.67,22.05,101757.90,0.59,18.87,7.58,15.63,18.85,24.83,14.74,22.46,24.03,19.97,18.55,91480.65,21.27,9.38,13.10,10.37,11.29,9.89,10.18,9.45,9.80,9.61,10.34,14.49,-0.22,12.87,1516.54,-0.36,5752.26,18.39,21.47,17.64,19.88,16.98,16.02,16.34,18.78,19.43,18.33,13,1000.0,39.69,40.18,19.14,171.12,14.16,-88.87,-123.62,-46.48,-84.05,-47.04,-47.02,-45.14,33.55,70.67,98.14,66.87,161.17,-4.48,-33.01,86.26,46.23,-132.80,218.39,-136.35,177.67,19.27,-1.25,8.19,-107.29,122.04,171.08,236.90,-105.64,196.77,-87.18,142.62,68.88,57.44,56.88,-27.54,6.0,1.08,2.532,67.0,4.0,1150.88,-2445.28,-817.14,-352.64,254.66,-1507.29,-633.03,1687.54,781.83,932.46,185.10,-49.72,-89.58,-21.34,-1.33,-30.02,4.80,17.31,6.34,22.94,-225.58,-2891.66,-1781.42,555.80,567.93,447.86,-2376.68,1591.43,-370.77,-2911.18,-4.29,0.93,0.14,-0.16,-0.07,0.15,-0.03,0.03,0.02,0.11,-6.18,-71.57,-56.69,-49.23,21.54,3.49,98.08,100.81,66.57,92.71,29.45,58.00,15.34,-70.45,3.41,12.47,-10.60,-23.56,25.03,6.97,-6263.25,-37015.65,8217.60,-107.25,5793.35,-5350.35,174.55,6473.22,2389.64,-1358.42,-10141.11,-15313.61,2631.60,3887.11,1061.78,-682.50,1804.00,-1516.22,-5477.48,2361.29,10.58,34.11,67.40,13.32,-39.33,31.45,-66.48,-50.80,30.94,41.74,-18.12,7.56,71.09,28.97,-32.43,-11.45,-4.40,6.22,6.80,13.80,345,2015,10,287,
1,17.68,-2.01,-1.47,-1.93,0.01,-1.26,-3.83,-3.46,-0.44,-1.53,-1.77,30825.85,-6.22,-5.35,-5.67,-6.74,-4.93,-6.09,-8.63,-8.10,-9.47,-6.80,81.55,17.98,19.03,42.70,53.32,24.75,25.27,32.52,33.25,33.63,31.38,16254.70,11.32,7.81,25.00,27.96,31.15,27.87,26.58,20.60,32.68,23.44,11.32,7.81,25.00,27.96,31.15,27.87,26.58,20.60,32.68,23.44,102250.98,-0.31,17.81,18.44,43.54,52.03,24.24,25.61,32.64,33.39,33.69,31.27,76444.83,27.04,-6.34,-5.47,-5.71,-6.84,-4.93,-6.24,-8.72,-8.20,-9.62,-6.90,3.89,2.73,60.96,1489.58,0.30,5556.30,-2.01,-1.47,-1.93,0.01,-1.26,-3.83,-3.46,-0.44,-1.53,-1.77,9,2600.0,73.87,302.11,-137.79,112.61,60.89,131.40,-76.80,81.60,125.63,-197.91,-63.35,42.31,-3.53,-132.38,139.16,-46.28,72.16,-9.26,-60.79,91.22,-626.43,191.92,48.69,-175.07,380.75,-166.76,115.75,-56.34,-86.48,-252.51,178.98,-73.98,24.34,57.34,-91.16,107.05,45.84,-34.97,112.04,-40.47,5.0,2.19,2.120,66.0,4.0,5945.77,-996.99,-1043.69,2072.55,70.08,648.38,261.43,807.45,114.34,-262.54,-96.38,-34.69,-44.93,-33.81,-19.30,7.99,7.78,1.82,-21.70,19.72,11055.97,-155.90,-1445.19,-3312.38,117.52,801.54,-45.05,-169.39,-403.14,-1307.57,2.75,-1.05,0.34,-0.67,1.70,1.27,0.77,-1.17,-0.65,0.14,-220.35,24.10,-65.54,107.42,-132.35,21.35,49.05,-66.12,-40.31,19.54,113.17,-31.53,56.94,-48.61,-11.99,49.59,-17.41,-44.27,-24.40,32.94,137543.80,-25713.89,-10771.22,5851.95,7264.96,4400.45,-78.60,-177.79,-2107.62,650.47,37531.22,-6392.17,-3364.19,6483.21,4220.46,4979.65,-526.36,-1024.52,-4020.52,1644.72,143.86,-10.15,-90.34,60.18,82.73,73.02,2.04,10.86,11.91,69.28,-78.56,-49.10,-48.89,-0.90,-52.67,-13.40,-13.62,16.01,32.08,41.37,194,2015,12,356,
2,15.29,5.69,6.56,6.70,7.49,6.76,9.54,10.16,6.81,6.14,7.32,30956.16,5.41,6.36,6.28,7.54,5.53,6.31,9.00,8.98,5.24,6.74,89.93,84.06,73.51,91.59,98.53,122.05,83.73,109.94,88.33,69.72,91.27,16244.28,76.41,38.96,78.15,59.15,103.74,72.24,88.77,92.33,65.88,75.07,76.41,38.96,78.15,59.15,103.74,72.24,88.77,92.33,65.88,75.07,102028.82,2.97,84.06,73.51,91.59,98.53,122.05,83.73,109.94,88.33,69.72,91.27,94306.15,26.92,5.41,6.36,6.28,7.54,5.53,6.31,9.00,8.98,5.24,6.74,14.22,-2.75,92.79,1512.39,1.41,5665.31,5.69,6.56,6.70,7.49,6.76,9.54,10.16,6.81,6.14,7.32,6,400.0,177.15,29.94,75.09,-111.61,77.63,167.06,-252.45,-101.56,-26.69,65.92,103.34,97.05,133.04,-138.29,1.12,126.09,91.49,56.25,-67.77,39.93,-872.13,-0.26,-138.81,32.26,270.70,141.02,35.53,-154.07,1.58,15.21,199.36,80.80,163.79,-42.52,-66.36,40.15,90.18,-44.13,62.04,-126.98,2.0,1.64,2.120,66.0,4.0,3568.62,-2085.95,2594.89,960.63,1526.92,1032.51,-251.98,1558.51,260.35,1432.35,-160.60,-50.94,-33.97,-21.56,-6.41,4.63,11.62,10.64,-27.71,21.19,8607.18,1101.11,3804.89,-1737.41,690.70,-1338.08,-920.14,-611.26,-2237.03,-1165.11,3.68,-0.94,1.07,0.40,2.22,1.35,0.32,-1.59,-0.55,0.02,-254.53,-44.90,74.20,90.93,-47.40,100.80,24.46,15.62,34.10,12.77,98.35,-3.99,69.64,-25.24,49.58,22.49,46.43,27.02,20.73,-13.53,130492.38,-19301.93,-12863.16,-4115.84,4600.47,3461.17,1387.01,-2069.35,3429.98,1276.14,37913.47,-452.36,1639.97,6107.55,344.00,-1118.80,-676.08,-498.58,-3433.04,2399.14,212.11,97.59,11.68,13.57,-31.16,18.46,7.86,-42.84,-23.45,44.48,19.11,74.71,-21.27,-18.74,-37.49,-100.61,28.04,2.20,25.30,8.73,209,2016,1,14,
3,39.74,-6.40,-4.13,-5.81,-8.79,-6.02,-3.84,-3.43,-5.28,-9.13,-5.87,31142.85,-7.57,-5.77,-6.38,-10.55,-6.69,-5.96,-5.20,-5.74,-9.60,-7.05,80.52,16.25,14.47,35.96,28.70,19.91,15.88,7.82,6.93,10.73,17.41,16190.97,16.53,9.92,33.60,27.89,18.73,4.02,5.84,13.99,8.03,15.39,16.53,9.92,33.60,27.89,18.73,4.02,5.84,13.99,8.03,15.39,102069.21,1.32,16.38,14.22,36.12,29.25,19.80,15.79,7.37,6.66,10.52,17.35,79326.91,16.89,-7.53,-5.54,-6.46,-10.30,-6.51,-5.79,-5.24,-5.61,-9.31,-6.92,5.45,-7.30,4.97,1486.45,4.25,5548.75,-6.40,-4.13,-5.81,-8.79,-6.02,-3.84,-3.43,-5.28,-9.13,-5.87,1,1700.0,219.24,14.36,-14.82,-114.61,-68.39,95.64,-159.02,-52.15,-50.32,-133.45,-51.75,70.65,90.48,-120.92,92.55,1.34,59.68,61.30,-105.77,64.75,-795.32,44.21,88.90,-3.61,334.11,147.68,-40.27,-227.42,-54.47,135.45,34.74,94.08,156.29,-146.32,52.07,29.69,-59.16,-109.34,-35.81,6.33,2.0,1.01,2.216,67.0,4.0,4640.54,-2397.68,1794.34,-402.15,1670.34,-284.54,-840.19,-89.00,394.32,516.57,-172.79,-45.98,-34.73,-17.52,-5.57,9.65,10.00,11.35,-30.31,22.76,9071.67,-533.98,3727.64,-86.26,2424.25,-1344.54,-176.58,1172.31,-2186.17,-1099.11,4.20,-0.64,0.78,0.11,2.18,1.54,0.33,-1.09,-0.38,0.14,-274.03,-82.73,11.36,-21.52,-37.10,95.97,-8.71,30.98,14.38,-144.14,68.67,-50.61,35.21,-9.52,8.73,-18.83,0.50,19.03,-17.52,-13.68,118221.12,-4474.44,-10897.90,-18408.43,-1285.08,-9787.47,5241.92,-3170.09,3474.15,921.23,36699.96,-1684.54,3731.29,6280.39,-1825.29,-1379.44,1448.66,603.45,-4145.40,1269.27,204.20,28.40,24.73,-36.42,-22.33,-3.59,5.83,8.27,-12.58,45.44,-28.86,-2.19,-19.22,-13.55,-78.15,-70.60,38.93,-31.88,32.66,-4.57,366,2016,1,21,
4,176.57,9.80,11.86,11.74,13.53,11.94,12.30,12.23,13.11,14.51,12.33,30839.99,4.63,7.60,8.01,8.40,8.03,7.89,6.93,7.13,8.81,7.49,44.72,15.11,13.93,15.03,12.40,29.04,18.90,17.06,17.19,18.15,17.42,16336.34,11.48,3.96,15.49,6.54,15.42,4.71,7.68,21.85,6.33,10.39,11.48,3.96,15.49,6.54,15.42,4.71,7.68,21.85,6.33,10.39,101516.47,-0.16,15.11,13.93,15.03,12.40,29.04,18.90,17.06,17.19,18.15,17.42,90627.08,36.27,4.63,7.60,8.01,8.40,8.03,7.89,6.93,7.13,8.81,7.49,7.91,-0.92,5.09,1477.20,1.13,5661.54,9.80,11.86,11.74,13.53,11.94,12.30,12.23,13.11,14.51,12.33,0,1000.0,117.50,53.00,-267.30,84.46,22.32,75.77,87.05,30.59,62.11,-44.81,-137.94,51.80,-11.32,-158.15,55.42,-77.68,-1.79,-20.83,-6.69,82.05,-531.09,107.77,164.21,-261.94,129.71,-86.51,183.96,146.44,-164.26,-29.75,-29.39,-183.80,-53.69,-61.51,-129.72,-11.21,115.83,-74.81,87.03,-31.54,4.0,1.84,2.300,66.0,4.0,4815.39,-611.88,256.97,1462.68,-1403.76,-780.31,-691.27,-573.13,353.65,-788.19,-69.69,-31.70,-51.48,-41.71,-18.59,4.47,7.70,3.90,-24.09,23.19,11609.51,41.27,-2233.35,-469.90,-1647.65,1565.29,497.46,1286.30,-113.85,170.01,1.31,-1.28,0.35,-1.17,1.70,0.63,1.47,-1.07,-0.76,0.01,-177.94,34.89,-18.98,-54.81,-142.23,-61.47,63.04,-47.84,-55.98,-62.52,27.86,-45.92,27.22,-21.99,21.88,100.05,-44.71,-46.63,-44.30,-23.67,132827.84,-29091.12,-8058.55,4726.17,6382.10,-3880.04,1630.84,-2531.94,-1885.66,-443.92,36827.56,-10602.86,-2367.67,4189.74,-705.94,7639.38,-1018.66,-518.71,-3275.44,-208.88,155.62,-84.89,-16.38,53.79,72.55,32.09,-41.25,12.76,-68.56,-2.28,-71.86,-89.94,-22.72,26.19,-41.52,-17.59,-6.89,20.52,4.04,31.79,35,2015,12,348,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
283090,62.72,4.60,8.71,6.05,10.08,6.39,8.42,9.08,5.53,6.97,7.32,30269.05,0.86,3.26,0.38,2.02,0.12,1.97,1.76,2.99,-0.04,1.48,84.04,15.75,15.00,23.40,17.04,25.75,17.52,14.97,13.17,18.07,17.85,16282.58,29.33,22.37,10.43,26.77,39.38,13.68,14.48,12.69,49.26,24.27,20.65,21.58,19.90,14.77,32.20,11.33,7.29,20.26,19.56,18.62,101614.98,2.31,20.08,12.22,28.12,18.81,26.08,20.08,15.88,14.37,15.68,19.04,97766.83,28.72,0.65,2.70,-0.65,2.58,0.24,0.64,2.25,2.68,-1.51,1.06,14.63,-8.44,10.28,1482.82,5.42,5675.63,13.23,16.76,14.93,17.56,13.27,15.95,16.11,15.64,16.58,15.56,9,100.0,79.36,126.23,149.53,-73.96,-83.96,119.03,33.55,-30.57,-143.82,-1.09,-49.43,25.67,-5.21,102.18,-119.97,13.69,28.58,78.24,20.49,18.09,-35.14,-36.77,211.11,-289.95,82.45,-36.04,11.55,-114.90,342.44,-29.17,-0.96,-181.15,-26.49,200.48,55.53,1.38,8.65,-103.19,-60.29,66.83,7.0,0.85,-0.360,24.0,2.0,-40.15,-1596.10,1456.09,83.41,1105.86,414.44,1907.19,-1744.50,1945.79,893.91,112.99,25.81,9.68,-39.74,17.62,0.29,-1.49,19.49,-1.36,46.00,491.27,1709.15,586.73,1796.48,2535.74,1666.95,3024.84,-1180.00,-1198.58,-1138.54,-4.32,0.97,0.15,-0.16,-0.08,0.15,-0.05,0.03,0.03,0.13,-15.83,-27.35,-24.32,-139.74,-54.09,3.60,-72.60,53.67,-4.23,108.75,-15.32,-64.60,29.44,50.86,4.48,9.01,11.34,44.33,8.21,100.43,54805.76,2743.07,8315.89,1369.72,8744.09,-1363.40,-11625.93,4732.76,-4379.07,337.90,1560.58,-2239.27,5637.89,-161.45,759.60,3270.84,-1256.94,6652.56,-48.77,-2055.26,74.40,94.31,-58.00,-56.75,29.33,-62.88,-25.85,9.34,20.33,28.17,74.96,-8.49,32.39,38.82,7.42,11.75,-23.62,-0.24,-5.94,51.23,513,2022,12,361,4.632752
283091,73.41,4.60,8.71,6.05,10.08,6.39,8.42,9.08,5.53,6.97,7.32,30264.55,0.29,2.57,-0.44,1.27,-0.57,1.29,1.28,2.55,-0.86,0.82,82.11,15.20,14.27,23.79,17.29,25.03,17.21,15.38,13.83,17.91,17.77,16299.52,29.33,22.37,10.43,26.77,39.38,13.68,14.48,12.69,49.26,24.27,20.65,21.58,19.90,14.77,32.20,11.33,7.29,20.26,19.56,18.62,101591.56,1.68,20.30,12.38,29.36,19.41,25.72,20.68,15.54,13.63,15.55,19.18,97779.05,28.94,0.21,2.03,-1.40,1.75,-0.44,0.10,1.57,1.97,-2.45,0.37,14.36,-6.98,9.82,1482.58,5.85,5681.97,13.23,16.76,14.93,17.56,13.27,15.95,16.11,15.64,16.58,15.56,9,100.0,78.56,132.96,173.33,-45.34,-72.49,122.20,41.89,-55.42,-134.11,16.91,-46.19,-2.24,-6.10,92.96,-131.78,10.69,39.38,75.90,25.77,-8.15,-50.48,-21.23,195.37,-310.64,85.88,-32.52,14.22,-113.05,344.92,-42.21,20.90,-173.21,-26.72,217.12,49.17,-15.14,29.96,-97.61,-59.82,72.50,7.0,0.83,-0.360,24.0,2.0,82.57,-1308.30,1419.99,163.08,1033.97,416.51,1805.74,-1680.92,1988.12,887.32,107.32,25.99,9.89,-41.46,18.53,1.38,0.01,20.16,-1.15,46.21,907.33,2003.18,395.46,1695.68,2315.93,1797.24,2973.73,-1224.08,-1158.38,-1274.67,-4.32,0.97,0.15,-0.16,-0.08,0.15,-0.05,0.03,0.02,0.13,-21.85,-12.96,-21.65,-134.98,-52.51,5.79,-72.07,44.37,8.10,113.05,-5.85,-70.02,26.97,46.15,13.24,8.21,8.07,44.14,7.34,99.20,56765.61,3612.49,8140.59,705.62,8250.75,-476.39,-11776.01,4240.13,-5443.53,133.35,2535.83,-1883.36,5449.57,-199.80,468.59,3832.39,-1515.52,6793.52,-166.08,-1680.05,81.76,87.34,-45.69,-46.78,35.40,-60.45,-33.44,-3.25,26.70,31.26,88.57,0.83,26.23,37.64,13.01,17.84,-22.05,-3.03,1.31,51.45,513,2022,12,362,4.634675
283092,70.00,4.60,8.71,6.05,10.08,6.39,8.42,9.08,5.53,6.97,7.32,30274.65,-0.29,1.87,-1.26,0.52,-1.25,0.62,0.80,2.11,-1.68,0.16,82.06,14.66,13.54,24.18,17.54,24.32,16.91,15.78,14.48,17.74,17.68,16309.26,29.33,22.37,10.43,26.77,39.38,13.68,14.48,12.69,49.26,24.27,20.65,21.58,19.90,14.77,32.20,11.33,7.29,20.26,19.56,18.62,101767.27,1.25,20.52,12.54,30.60,20.01,25.36,21.28,15.21,12.90,15.43,19.32,97948.35,27.07,-0.23,1.36,-2.14,0.92,-1.13,-0.45,0.88,1.25,-3.39,-0.32,13.12,-8.80,9.82,1493.83,5.85,5686.87,13.23,16.76,14.93,17.56,13.27,15.95,16.11,15.64,16.58,15.56,9,100.0,74.84,150.70,206.73,-15.07,-62.16,126.82,43.70,-72.78,-117.67,33.18,-39.79,-42.47,-4.28,85.95,-139.16,15.26,41.78,78.40,28.46,-17.89,-63.30,-12.51,191.96,-333.01,88.01,-26.44,12.95,-113.83,344.57,-66.24,49.08,-165.63,-34.83,229.83,54.88,-29.92,47.52,-92.01,-57.12,75.01,6.0,0.63,-0.360,24.0,2.0,266.34,-897.59,1214.76,264.80,1093.65,507.94,1707.46,-1571.39,2137.75,834.76,101.53,26.09,10.12,-43.26,19.41,2.33,1.64,20.82,-1.14,46.32,1409.29,2321.67,92.14,1531.15,2238.19,1967.45,2940.61,-1313.48,-1150.94,-1480.57,-4.32,0.96,0.15,-0.17,-0.08,0.15,-0.04,0.03,0.02,0.12,-24.12,1.04,-24.31,-128.72,-52.02,17.05,-76.89,39.81,11.87,126.35,5.46,-74.95,25.32,43.24,21.94,10.01,4.24,41.76,12.59,95.82,59110.96,4381.64,7817.17,-38.77,7907.34,267.82,-11951.57,3562.52,-6542.30,-258.13,3643.73,-1591.55,5151.51,-205.17,268.35,4274.22,-1594.58,6849.92,-298.66,-1383.21,85.94,99.33,-36.30,-36.61,40.24,-55.91,-37.35,-18.20,37.80,33.81,99.43,10.90,21.06,36.53,14.15,23.12,-25.60,-5.88,9.32,45.32,513,2022,12,363,4.425073
283093,79.81,4.60,8.71,6.05,10.08,6.39,8.42,9.08,5.53,6.97,7.32,30296.92,-1.44,0.49,-2.91,-0.97,-2.63,-0.73,-0.17,1.23,-3.33,-1.16,79.89,14.11,12.80,24.57,17.79,23.60,16.60,16.19,15.13,17.58,17.60,16320.17,29.33,22.37,10.43,26.77,39.38,13.68,14.48,12.69,49.26,24.27,20.65,21.58,19.90,14.77,32.20,11.33,7.29,20.26,19.56,18.62,101759.48,2.52,20.97,12.85,33.07,21.21,24.65,22.47,14.54,11.42,15.18,19.59,97903.39,24.13,-0.66,0.69,-2.88,0.08,-1.81,-0.99,0.20,0.54,-4.34,-1.02,12.69,-9.38,9.77,1495.69,6.63,5698.05,13.23,16.76,14.93,17.56,13.27,15.95,16.11,15.64,16.58,15.56,9,100.0,63.40,170.72,241.08,9.86,-43.19,130.16,43.43,-81.75,-112.67,59.77,-31.63,-80.04,4.64,96.39,-149.67,11.62,40.45,72.18,27.02,-18.97,-77.58,-2.18,193.47,-350.82,91.88,-28.26,6.72,-120.70,358.11,-80.70,71.47,-150.88,-38.75,242.76,62.94,-56.13,67.94,-89.58,-59.32,71.09,7.0,0.52,-0.360,24.0,2.0,505.52,-418.16,879.22,387.42,1223.87,677.50,1550.32,-1484.53,2355.87,844.93,95.67,26.25,10.39,-45.02,20.43,3.06,3.35,21.71,-1.34,46.15,1979.81,2684.11,-288.38,1301.52,2241.30,2113.48,2963.73,-1367.68,-1202.52,-1726.70,-4.31,0.96,0.15,-0.17,-0.08,0.15,-0.03,0.03,0.02,0.12,-25.98,14.77,-32.70,-116.19,-54.16,29.58,-79.77,42.04,16.35,145.45,10.57,-78.15,24.56,44.56,33.34,13.70,-0.30,36.75,18.46,91.51,61794.80,5042.54,7394.83,-787.08,7612.69,921.65,-12050.60,2800.10,-7634.04,-693.43,4914.04,-1239.59,4708.61,-196.26,68.16,4564.55,-1396.66,7051.92,-368.56,-1068.40,88.03,117.18,-35.44,-24.04,42.13,-59.46,-36.64,-35.02,45.66,35.37,109.39,21.37,20.42,36.05,6.38,29.00,-27.06,-1.42,16.06,31.88,513,2022,12,364,5.008143


In [28]:
yy[np.isnan(yy)] = 0
train_pool = Pool(data=XX,label = yy)

X_train3, X_test3, y_trai3, y_test3 = train_test_split(XX, yy, test_size=0.33, random_state=42)

bst2 = CatBoostRegressor(**opt_params)
bst2.fit(train_pool, eval_set=(X_test3, y_test3), plot=True,silent=True)
print(bst2.get_best_score())

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  yy[np.isnan(yy)] = 0


MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))

{'learn': {'RMSE': 0.007829498592474067}, 'validation': {'RMSE': 0.007881709771620286}}


In [29]:
y_pred_cat = bst2.predict(X_test)
submit[target] = y_pred_cat
submit.to_csv('submission1.csv', index = False)

<a id = "EL"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Ensemble Learning</b></div>

In [30]:
import lightgbm as lgb

# set up parameters for LightGBM
params = {'boosting_type': 'gbdt',
          'objective': 'regression',
          'metric': 'rmse',
          'max_depth': 4,
          'num_leaves': 31,
          'learning_rate': 0.05,
          'feature_fraction': 0.9,
          'bagging_fraction': 0.8,
          'bagging_freq': 5,
          'early_stopping_round': 50,
          'n_estimators': 15000}

reg_lgb = lgb.LGBMRegressor(**params)

reg_lgb.fit(X_train3, y_trai3, eval_set=(X_test3, y_test3),verbose=100)

y_pred_cat = bst2.predict(X_test)

y_pred_lgb = reg_lgb.predict(X_test)

ensemble_preds = y_pred_lgb*0.60+y_pred_cat*0.40



[100]	valid_0's rmse: 0.25542
[200]	valid_0's rmse: 0.202219
[300]	valid_0's rmse: 0.176698
[400]	valid_0's rmse: 0.158693
[500]	valid_0's rmse: 0.146652
[600]	valid_0's rmse: 0.137246
[700]	valid_0's rmse: 0.129094
[800]	valid_0's rmse: 0.123615
[900]	valid_0's rmse: 0.118599
[1000]	valid_0's rmse: 0.114125
[1100]	valid_0's rmse: 0.110321
[1200]	valid_0's rmse: 0.106653
[1300]	valid_0's rmse: 0.103873
[1400]	valid_0's rmse: 0.10095
[1500]	valid_0's rmse: 0.0982762
[1600]	valid_0's rmse: 0.0961838
[1700]	valid_0's rmse: 0.0943425
[1800]	valid_0's rmse: 0.09244
[1900]	valid_0's rmse: 0.0910512
[2000]	valid_0's rmse: 0.0895354
[2100]	valid_0's rmse: 0.0882368
[2200]	valid_0's rmse: 0.0868187
[2300]	valid_0's rmse: 0.0857039
[2400]	valid_0's rmse: 0.0847194
[2500]	valid_0's rmse: 0.0837633
[2600]	valid_0's rmse: 0.0827735
[2700]	valid_0's rmse: 0.0819588
[2800]	valid_0's rmse: 0.0812568
[2900]	valid_0's rmse: 0.080396
[3000]	valid_0's rmse: 0.0797055
[3100]	valid_0's rmse: 0.0790706
[3200

In [31]:
submit_cat = submit.copy()
submit_cat[target] = y_pred_cat
submit_cat.to_csv('y_pred_cat.csv', index = False)

submit_lgb = submit.copy()
submit_lgb[target] = y_pred_lgb
submit_lgb.to_csv('y_pred_lgb.csv', index = False)

<a id="submit"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Submission</b></div>

In [32]:
submit[target] = ensemble_preds
submit.to_csv('submission_final.csv', index = False)

<a id = "list"></a>
<div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>List of Kaggle Notebooks Used as a Reference</b></div>

<div style="background-color:aliceblue; padding:30px; font-size:15px;color:#034914">

* [[WiDS 2023] Simple basline - RMSE 1.14](https://www.kaggle.com/code/ducanger/wids-2023-simple-basline-rmse-1-14) by [DAT DO](https://www.kaggle.com/ducanger) used as base especially for preprocessing.
* [🔥 EDA & ML on Game Play 🎮 (ongoing)](https://www.kaggle.com/code/nguyenthicamlai/eda-ml-on-game-play-ongoing) by [Nguyen Thi Cam Lai](https://www.kaggle.com/nguyenthicamlai) used for HTML-based headers
* [[WiDS 2021] Tips & Tricks (CatBoost Version)](https://www.kaggle.com/code/kooaslansefat/tips-tricks-catboost-version) used for hyperparameter tunning for CatBoost and adversarial validation
* [WiDS2023_Data_Buddies](https://www.kaggle.com/code/nicholasdominic/wids2023-data-buddies) by [Nicholas Dominic](https://www.kaggle.com/nicholasdominic) for Ensemble Learning

<center> <a href="#TOC" role="button" aria-pressed="true" >⬆️Back to Table of Contents ⬆️</a>

<div style="border-radius:10px;border:#034914 solid;padding: 15px;background-color:aliceblue;font-size:90%;text-align:left">

<h4><b>Authors :</b> Mojgan Hashemian and Koorosh Aslansefat </h4>  
    
<center> <strong> If you liked this Notebook, please do upvote. </strong>
    
<center> <strong> If you have any questions, feel free to contact us! </strong>

<center> <img src="https://gregcfuzion.files.wordpress.com/2022/01/kind-regards-2.png" style='width: 600px; height: 300px;'>