This code has been written for the Kaggle competition to detect the severity of insurance claims.
The dataset consists of 116 categorical features (each one named with a generic nomenclature of 'cat'+str(i)) and 15 continuous features (these too names in a similar fashion).
The training data consisted of 188318 examples and the testing data consisted of 125546 examples.
The data can be obtained from: https://www.kaggle.com/c/allstate-claims-severity/data
The code will take you through the steps of exploration, visualization and transformation of data and finally evaluation of different of models for regression.