# Car Price Modelling & Analysis

This notebook contains the initial exploration of modelling approaches to tackle the following challenges:

1. A user is looking for a specific make/model of car and wants to know how much it might cost.
2. A user is looking for a specific type of car and wondering what makes/models fit within that range (e.g. SUV, mini-van, truck)

The results should be easily interpretable by users. Expected user flow is as follows:

1. User is looking for make/model
   1. User comes to the site
   2. Selects type of vehicle they're looking for
      - From: SUV, Truck, Van, Sedan, Sports Car
   3. After that they can select a make/model if they want but is optional and will show info for average price by make and/or model (?)
   4. They can put in a desired age of vehicle in years as well as desired mileage range but is optional.
   5. With the inputs entered the following cases are displayed:
       - Just type of vehicle: a break down by make is shown with price by age of vehicle
       - Type of vehicle and make: a break down by model of vehicle price and age of vehicle
       - Type/make/model: details on price by age as well as mileage
       - Type/budget price: find for all makes and models the age that is closest to the budget amount how to optimize for both age and mileage? Have milage per year perhaps?
         - Possibly leave out mileage and include the average mileapge per year plus conf. interval to say this age with ~ X-Y km's?

## Modelling Approach

First we need to group makes/models into vehicle types SUV, Truck, Sedan, etc. A hash table mapping make/model to vehicle type must be created. 

### Baseline

A OLS linear regression model for each vehicle using age in years and mileage will be used to predict the price for all vehicles. 

### Linear Mixed Effects Modelling of Price

To improve the model and meet the above expectations, a linear mixed effects model will be used. It will have independent intercepts for both make and model. This has the following advantages:
- we can predict on group average i.e. all SUV's
- we can get out all the different make/model values independently

This will be done with `statsmodel`


In [None]:
import os, sys

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import plotly.express as px

cur_dir = os.getcwd()
SRC_PATH = cur_dir[: cur_dir.index("fortunato-wheels-engine") + len("fortunato-wheels-engine")]
if SRC_PATH not in sys.path:
    sys.path.append(SRC_PATH)

from src.data.car_ads import CarAds

%load_ext autoreload
%autoreload 2

In [None]:
ads = CarAds()
ads.get_car_ads(sources=["cargurus", "kijiji"])

In [None]:
ads.df.info()

In [None]:
ads.preprocess_ads()

## Vehicle Type Mapping

Starting with SUV's as the target group the following makes/models are used:


In [None]:
suv = pd.DataFrame([
        ["Toyota", "RAV4"],
        ["Toyota", "Highlander"],
        # ["Toyota", "RAV4 Prime"], # only 15 in dataset
        ["Subaru", "Forester"],
        ["Subaru", "Outback"],
        ["Subaru", "Impreza"],
        # ["Subaru", "WRX"],
        ["Subaru", "Crosstrek"],
        ["Honda", "CR-V"],
        # ["Honda", "CR-V Hybrid"],
        ["Honda", "HR-V"],
    ],
    columns=["make", "model"]
)

In [None]:
suv_df = ads.df.query("(make in @suv.make) & (model in @suv.model) & (price < 120_000)").reset_index(drop=True)
suv_df["year_posted"] = suv_df.listed_date.dt.year
# suv_df.make = suv_df.make.cat.remove_unused_categories();
# suv_df.model = suv_df.model.cat.remove_unused_categories();

In [None]:
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(
    suv_df,
    test_size=0.25,
    random_state=42,
    stratify=suv_df["model"],
)

model_features = [
    "age_at_posting",
    "mileage_per_year",
    "mileage",
    "make",
    "model",
    "price",
    "listed_date",
    "year_posted",
    "wheel_system"
    
]

train_df = train_df[model_features].dropna().reset_index(drop=True)
test_df = test_df[model_features].dropna().reset_index(drop=True)

In [None]:
test_df.model.value_counts() / len(test_df)

In [None]:
train_df.model.value_counts() / len(train_df)

In [None]:
# plot how many ads there are by the top 30 make_name values
fig = px.histogram(
    # ads.loc[ads.make_name.isin(ads.make_name.value_counts().index[:15])],
    train_df.loc[train_df.model.isin(train_df.model.value_counts().index[:20])],
    x="model",
    title="Number of Ads by Model",
    color="model",
    labels={"model": "Model"},
    color_discrete_sequence=px.colors.qualitative.Dark24,
    height=500,
    category_orders={"model": train_df.model.value_counts().index[:15]}
)
fig.show()

In [None]:
fig = px.scatter(
    train_df,
    x = "age_at_posting",
    y = "price",
    color = "make",
    opacity=0.2
)
fig.show()

In [None]:
# linear reression model based on age and mileage
ols = (
    smf.ols(
        "price ~ age_at_posting + mileage_per_year + mileage", 
        train_df[["make", "age_at_posting", "price", "mileage_per_year", "mileage"]], 
    )
    .fit()
)
print(ols.summary())

In [None]:
rmse = np.std(ols.resid)
mape = np.mean(np.abs(ols.resid / train_df.price))
print(f"OLS Model price RMSE in CAD: ${rmse:.0f}")
print(f"OLS Model price MAPE in CAD: {mape:.2%}")

In [None]:
fig = px.histogram(
    x = ols.resid,
    color=train_df.make,
    title = "OLS Model of Price Residuals Distribution",
    labels = {
        "x" : "Model Residual ($CAD)"
    }, 
    height = 800,
    barmode="overlay",
    histnorm="percent",
).update_layout(
    # set xa axis limits
    xaxis = dict(
        range = [-25_000, 25_000]
    )
)
fig.add_vline(
    x=ols.resid.median(), 
    line_dash = 'dash', 
    line_color = 'firebrick',
    annotation_text = f" Median: ${ols.resid.median():.0f}",
)
fig.show()

## Mixed Effect Model by Make

Introducing and individual model per make of vehicle.

In [None]:
# linear reression model based on age and mileage
ols_make = (
    smf.mixedlm(
        formula = "price ~ age_at_posting + mileage_per_year", 
        data = train_df, 
        groups = "make",
        re_formula = "~ age_at_posting + mileage_per_year" # random effects with different slopes and intercepts
    )
    .fit(method=["lbfgs"])
)
print(ols_make.summary())

In [None]:
rmse = np.std(ols_make.resid)
mape = np.mean(np.abs(ols_make.resid / train_df.price))
print(f"OLS Model price RMSE in CAD: ${rmse:.0f}")
print(f"OLS Model price MAPE in CAD: {mape:.2%}")

In [None]:
fig = px.histogram(
    x = ols_make.resid,
    color=train_df[["make", "price", "age_at_posting", "mileage_per_year"]].dropna().make,
    title = "OLS Model of Price Residuals Distribution",
    labels = {
        "x" : "Model Residual ($CAD)"
    }, 
    height = 800,
    barmode="overlay",
    histnorm="percent",
).update_layout(
    xaxis = dict(
        range = [-25_000, 25_000]
    )
)
fig.add_vline(
    x=ols_make.resid.median(), 
    line_dash = 'dash', 
    line_color = 'firebrick',
    annotation_text = f" Median: ${ols_make.resid.median():.0f}",
)
fig.show()

# Mixed Effects by Model

Assumes no two brands share the same model name of a vehicle.

In [None]:
train_df["make_model"] = train_df.make.astype(str) + " " + train_df.model.astype(str)
# train_df_no_crv = train_df.query("make_model != 'Honda CR-V'").reset_index(drop=True)

In [None]:
ols_md = (
    smf.mixedlm(
        formula = "price ~ age_at_posting + wheel_system + mileage_per_year",
        data = train_df,
        groups = "model",
        re_formula = "~ age_at_posting" # random effects with different slopes and intercepts
    )
)
# fit with mthod lbfgs and set maxfun to 1000 as a kwargs
# ols_model = ols_md.fit(method=["lbfgs"], options = {'maxfun':1000})
ols_model = ols_md.fit(method=["lbfgs"])

print(ols_model.summary())

In [None]:
ols_model.random_effects

In [None]:
rmse = np.std(ols_model.resid)
mape = np.mean(np.abs(ols_model.resid / train_df.price))
print(f"OLS Model price RMSE in CAD: ${rmse:.0f}")
print(f"OLS Model price MAPE: {mape:.2%}")

In [None]:
fig = px.histogram(
    x = ols_model.resid,
    color=train_df[["model", "price", "age_at_posting", "mileage_per_year"]].dropna().model,
    title = "OLS Model of Price Residuals Distribution",
    labels = {
        "x" : "Model Residual ($CAD)"
    }, 
    height = 800,
    barmode="overlay",
    histnorm="percent",
    nbins=250,
).update_layout(
    xaxis = dict(
        range = [-25_000, 25_000]
    )
)

fig.add_vline(
    x=ols_model.resid.median(), 
    line_dash = 'dash', 
    line_color = 'firebrick',
    annotation_text = f" Median: ${ols_model.resid.median():.0f}",
)
fig.add_vline(
    x = ols_model.resid.mean() + 1.63*rmse,
    line_color = "lime",
    annotation_text = f" 90% CI: +\-${ols_model.resid.mean() + rmse:.0f}",
)
fig.add_vline(
    x = ols_model.resid.mean() - 1.63*rmse,
    line_color = "lime",
)
fig.show()

In [None]:
# calculate percent error and remove any with > 200%
train_df["abs_percent_error"] = np.abs(ols_model.resid / train_df.price)

fig = px.histogram(
    x = train_df.query("abs_percent_error < 2").abs_percent_error,
    color=train_df.query("abs_percent_error < 2").model,
    title = "OLS Model of Price Abs Percent Error Distribution",
    labels = {
        "x" : "Percent Error (%)"
    }, 
    height = 800,
    barmode="overlay",
    histnorm="percent",
).update_layout(
    xaxis = dict(
        range = [0, 2]
    )
)

fig.add_vline(
    x=ols_model.resid.median(), 
    line_dash = 'dash', 
    line_color = 'firebrick',
    annotation_text = f" Median: ${ols_model.resid.median():.0f}",
)

fig.show()

In [None]:
# plot residuals vs age_at_posting
fig = px.scatter(
    train_df,
    x = "age_at_posting",
    y = ols_model.resid,
    color = "model",
    title = "OLS Model of Price Residuals vs Age at Posting",
    labels = {
        "x" : "Age at Posting (years)",
        "y" : "Model Residual ($CAD)"
    },
    height = 600,
)

fig.show()

We can see the residuals are ROUGHLY centered with the exception of older vehicles. How many vehicles do we have by year? Seeing below as expected the majority of ads are < 10 years old at posting so this is potentially ok.

In [None]:
train_df.age_at_posting.value_counts()

## Evaluate Model on Test Set

In [None]:
test_df["pred_price"] = mle_model.predict(test_df)

In [None]:
rmse = np.std(test_df.pred_price - test_df.price)
mape = np.mean(np.abs((test_df.pred_price - test_df.price) / test_df.price))
print(f"OLS Model price RMSE in CAD: ${rmse:.0f}")
print(f"OLS Model price MAPE: {mape:.2%}")

In [None]:
test_residuals = test_df.pred_price - test_df.price

fig = px.histogram(
    x = test_residuals,
    color=test_df.model,
    title = "OLS Model of Price Residuals Distribution",
    labels = {
        "x" : "Model Residual ($CAD)"
    }, 
    height = 800,
    barmode="overlay",
    histnorm="percent",
).update_layout(
    xaxis = dict(
        # range = [-25_000, 25_000]
    )
)

fig.add_vline(
    x=test_residuals.median(), 
    line_dash = 'dash', 
    line_color = 'firebrick',
    annotation_text = f" Median: ${test_residuals.median():.0f}",
)
fig.add_vline(
    x = test_residuals.mean() + 1.63*rmse,
    line_color = "lime",
    annotation_text = f" 90% CI: +\-${test_residuals.mean() + rmse:.0f}",
)
fig.add_vline(
    x = test_residuals.mean() - 1.63*rmse,
    line_color = "lime",
)
fig.show()

# Cleaning Up & Using Custom Model Class CarPricePredictorLME

The class `CarPricePredictorLME` is a wrapper around the `statsmodels` `MixedLM` class. It is used to predict the price of a vehicle based on the make, model, age and mileage. 

It has the following functionality built on top of the `MixedLM` class:
- `predict` method properly handles mixed effects parameters and adds the correct parameters into the predictions which the standard predict function does not.
- `save_model` method saves the model to disk
- `load_model` method loads the model from disk and correctly parses the parameters of the model into the correct format for the `predict` method.

In [None]:
from src.models.linear_mixed_effects import CarPricePredictorLME
import json

In [None]:
# loading in vehicle types and makes/models
with open(os.path.join(SRC_PATH, "models", "vehicle-type-model-config.json"), "r") as f:
    vehicle_types = json.load(f)

suv_prod = pd.DataFrame(vehicle_types["suv"], columns=["make", "model"])

min_price = 1000
max_price = 200_000
min_ads = 4000

# anly get makes and models in suv_prod and make sure a minimum number of 4000 ads for each model
suv_prod_df =  ads.df.query(
    "(make in @suv_prod.make) "
    "& (model in @suv_prod.model)"
    "& (price > @min_price)"
    "& (price < @max_price)"
    ).reset_index(drop=True)

# remove any rwos where the makes/models have less than 4000 ads
suv_prod_df = suv_prod_df.groupby(["make", "model"]).filter(lambda x: len(x) > min_ads).reset_index(drop=True)

train_df, test_df = train_test_split(
    suv_prod_df,
    test_size=0.25,
    random_state=42,
    stratify=suv_prod_df["model"],
)

model_features = [
    "age_at_posting",
    "mileage_per_year",
    "mileage",
    "make",
    "model",
    "price",
    "listed_date",
    "wheel_system"
]

train_df = train_df[model_features].dropna().reset_index(drop=True)
test_df = test_df[model_features].dropna().reset_index(drop=True)

In [None]:
# plot how many ads there are by the top 30 make_name values
fig = px.histogram(
    # ads.loc[ads.make_name.isin(ads.make_name.value_counts().index[:15])],
    suv_prod_df.loc[suv_prod_df.model.isin(suv_prod_df.model.value_counts().index[:60])],
    x="model",
    title="Number of Ads by Model",
    color="model",
    labels={"model": "Model"},
    color_discrete_sequence=px.colors.qualitative.Dark24,
    height=500,
    category_orders={"model": suv_prod_df.model.value_counts().index[:60]}
)
fig.show()

In [None]:
mle_model = CarPricePredictorLME(
    target = "price",
    fixed_continuous_features = ["age_at_posting", "mileage_per_year"],
    fixed_categorical_features= ["wheel_system"],
    random_effects = ["age_at_posting"],
    group = "model"
)

mle_model.fit(train_df)
results = mle_model.evaluate_model()

In [None]:
import plotly.graph_objects as go

train_df["pred_price"] = mle_model.model.fittedvalues
train_df["residuals"] = mle_model.model.resid
train_df["pred_residuals"] = train_df.price - train_df.pred_price 

In [None]:
# calculate rmse and mape for each model of vehicle
results_by_model = train_df.groupby(["make", "model"]).apply(lambda x: {
    "train_rmse" : np.std(x.residuals),
    "train_mape" : np.mean(np.abs((x.residuals) / x.price))
})
# convert results by mdel into df with rmse and mape extracted into their own columns without make/model as index
results_by_model = pd.DataFrame(results_by_model.tolist(), index=results_by_model.index).reset_index()

In [None]:
from plotly.subplots import make_subplots

# plot distribution of RMSE and MAPE by model
fig = make_subplots(
    rows=2,
    cols=1,
    subplot_titles=("RMSE", "MAPE"),
    row_heights=[0.5, 0.5],
    specs=[[{"type": "bar"}],[{"type": "bar"}]],
)

fig.add_trace(
    go.Bar(
        x=results_by_model.model,
        y=results_by_model.train_rmse,
        name="RMSE",
        marker_color="firebrick",
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Bar(
        x=results_by_model.model,
        y=results_by_model.train_mape,
        name="MAPE",
        marker_color="royalblue",
    ),
    row=2,
    col=1,
)

# sort x axis by RMSE
fig.update_xaxes(
    row=1,
    col=1,
    categoryorder="array",
    categoryarray=results_by_model.sort_values("rmse", ascending=False).model,
)
fig.update_xaxes(
    row=2,
    col=1,
    categoryorder="array",
    categoryarray=results_by_model.sort_values("rmse", ascending=False).model,
)

fig.update_layout(
    title_text="RMSE and MAPE by Model",
    height=800,
    width=1000,
    showlegend=False,
    xaxis_tickangle=-45,
    xaxis_tickfont_size=10,
    yaxis_tickfont_size=10,
)

fig.show()

## Evalaute on Test Set

In [None]:
test_df["pred_price"] = mle_model.model.predict(test_df)
test_df["residuals"] = test_df.price - test_df.pred_price
# calculate overall rmse and mape
rmse = np.std(test_df.residuals)
mape = np.mean(np.abs((test_df.residuals) / test_df.price))
print(f"Test RMSE: ${rmse:.0f}")
print(f"Test MAPE: {mape*100:.2f}%")

In [None]:
# calculate rmse and mape for each model of vehicle in test_df
results_by_model_test = test_df.groupby(["make", "model"]).apply(lambda x: {
    "test_rmse" : np.std(x.residuals),
    "test_mape" : np.mean(np.abs((x.residuals) / x.price))
})
# convert results by mdel into df with rmse and mape extracted into their own columns without make/model as index
results_by_model_test = pd.DataFrame(results_by_model_test.tolist(), index=results_by_model_test.index).reset_index()

In [None]:
from plotly.subplots import make_subplots

# plot distribution of RMSE and MAPE by model
fig = make_subplots(
    rows=2,
    cols=1,
    subplot_titles=("RMSE", "MAPE"),
    row_heights=[0.5, 0.5],
    specs=[[{"type": "bar"}],[{"type": "bar"}]],
)

fig.add_trace(
    go.Bar(
        x=results_by_model_test.model,
        y=results_by_model_test.rmse,
        name="RMSE",
        marker_color="firebrick",
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Bar(
        x=results_by_model_test.model,
        y=results_by_model_test.mape,
        name="MAPE",
        marker_color="royalblue",
    ),
    row=2,
    col=1,
)

# sort x axis by RMSE
fig.update_xaxes(
    row=1,
    col=1,
    categoryorder="array",
    categoryarray=results_by_model_test.sort_values("rmse", ascending=False).model,
)
fig.update_xaxes(
    row=2,
    col=1,
    categoryorder="array",
    categoryarray=results_by_model_test.sort_values("rmse", ascending=False).model,
)

fig.update_layout(
    title_text="RMSE and MAPE by Model",
    height=800,
    width=1000,
    showlegend=False,
    xaxis_tickangle=-45,
    xaxis_tickfont_size=10,
    yaxis_tickfont_size=10,
)

fig.show()

In [None]:
# combine valid_make_models and results_by_model_test based on model column 
# to get make, model, and train_rmse, test_rmse, train_mape test_mape for each model
results_by_model_overall = results_by_model_test.merge(results_by_model, on=["make", "model"], how="left")

In [None]:
results_by_model_overall

In [None]:
results_by_model_overall["rmse_diff"] = results_by_model_overall.test_rmse - results_by_model_overall.train_rmse
results_by_model_overall["mape_diff"] = results_by_model_overall.test_mape - results_by_model_overall.train_mape

In [None]:
results_by_model_overall.sort_values("test_rmse", ascending=False)

In [None]:
# select makes/models with mape < 0.2 and rmse < 5000
valid_make_models = results_by_model_overall.query("test_mape < 0.2 | test_rmse < 5000")
valid_make_models

In [None]:
# export valide make/models to valid-vehicle-types.json
# with format {"suv": [[make, model], []]}
json_data = {"suv" : valid_make_models[["make", "model"]].values.tolist()}
with open(os.path.join(SRC_PATH, "models", "valid-vehicle-type-models.json"), 'w') as fp:
    json.dump(json_data, fp, indent = 3)

## Retraining with only Valid Makes/Models and All Data

In [None]:
min_price = 1000
max_price = 200_000
min_ads = 4000

# anly get makes and models in suv_prod and make sure a minimum number of 4000 ads for each model
prod_df =  ads.df.query(
    "(make in @valid_make_models.make) "
    "& (model in @valid_make_models.model)"
    "& (price > @min_price)"
    "& (price < @max_price)"
    ).reset_index(drop=True)

# remove any rwos where the makes/models have less than 4000 ads
prod_df = prod_df.groupby(["make", "model"]).filter(lambda x: len(x) > min_ads).reset_index(drop=True)

train_df, test_df = train_test_split(
    prod_df,
    test_size=0.25,
    random_state=42,
    stratify=prod_df["model"],
)

model_features = [
    "age_at_posting",
    "mileage_per_year",
    "mileage",
    "make",
    "model",
    "price",
    "listed_date",
    "wheel_system"
]

train_df = prod_df[model_features].dropna().reset_index(drop=True)
# test_df = test_df[model_features].dropna().reset_index(drop=True)

In [None]:
mle_model = CarPricePredictorLME(
    target = "price",
    fixed_continuous_features = ["age_at_posting", "mileage_per_year"],
    fixed_categorical_features= ["wheel_system"],
    random_effects = ["age_at_posting"],
    group = "model"
)

mle_model.fit(train_df)
results = mle_model.evaluate_model()

In [None]:
train_df["pred_price"] = mle_model.model.fittedvalues
train_df["residuals"] = mle_model.model.resid

In [None]:
# plot age at posting vs price and the models predictions across the range of 0-10 years
fig = px.scatter(
    train_df,
    x = "age_at_posting",
    y = "pred_price",
    color = train_df.model,
    hover_data=['make', 'model', 'price', 'pred_price', 'wheel_system', 'mileage_per_year', 'age_at_posting'],
    title = "Linear Mixed Effects Model of Price vs Age at Posting, Drive System, and Mileage per Year",
    labels = {
        "age_at_posting" : "Age at Posting (years)",
        "pred_price" : "Predicted Price ($CAD)",
    },
    height = 800,
    width = 1200,
).update_layout(
    xaxis = dict(
        # range = [0, 10]
    )
)

fig.show()

In [None]:
fig = px.histogram(
    train_df.query("(residuals < 25_000) & (residuals > -25_000)"),
    x = "residuals",
    color="model",
    title = "OLS Model of Price Residuals Distribution",
    labels = {
        "x" : "Model Residual ($CAD)"
    }, 
    height = 800,
    barmode="overlay",
    histnorm="percent",
    nbins=250,
).update_layout(
    xaxis = dict(
        range = [-25_000, 25_000]
    )
)

fig.add_vline(
    x=train_df.residuals.median(), 
    line_dash = 'dash', 
    line_color = 'firebrick',
    annotation_text = f" Median: ${train_df.residuals.median():.0f}",
)
fig.add_vline(
    x = train_df.residuals.mean() + 1.63*results["rmse"],
    line_color = "lime",
    annotation_text = f" 90% CI: +\-${train_df.residuals.mean() + 1.63*results['rmse']:.0f}",
)
fig.add_vline(
    x = train_df.residuals.mean() - 1.63*results["rmse"],
    line_color = "lime",
)
fig.show()

## Generate Summary Dictionary for Each Make/Model

In [None]:
# calculate rmse and mape for each model of vehicle
results_by_model = train_df.groupby(["make", "model"]).apply(lambda x: {
    "rmse" : np.std(x.residuals),
    "mape" : np.mean(np.abs((x.residuals) / x.price))
})
# convert results by mdel into df with rmse and mape extracted into their own columns without make/model as index
results_by_model = pd.DataFrame(results_by_model.tolist(), index=results_by_model.index).reset_index()

model_summary = []
for model in valid_make_models.model:

    model_df = train_df.query("model == @model")
    model_summary.append({
        "vehicle_type": "suv",
        "make" : model_df.make.values[0],
        "model" : model,
        "model_version": 1, 
        "model_type" : "linear_mixed_effects",
        "n" : len(model_df),
        "rmse" : round(results_by_model.query("model == @model").rmse.values[0], 1),
        "mape" : round(results_by_model.query("model == @model").mape.values[0], 3),
        "model_stats" : {
            "median_residual" : round(model_df.residuals.median(), 1),
            "mean_residual" : round(model_df.residuals.mean(), 0),
        },
        "vehicle_stats": {
            "median_price" : round(model_df.price.median(), 0),
            "mean_price" : round(model_df.price.mean(), 0),
            "avg_age_at_posting" : round(model_df.age_at_posting.mean(), 2)
        },
        "model_details" : {
            "fixed_continuous_features" : mle_model.fixed_continuous_features,
            "fixed_categorical_features" : mle_model.fixed_categorical_features,
            "random_effects" : mle_model.random_effects,
            "group": mle_model.group,
            "intercept" : round(mle_model.model.params["Intercept"] + mle_model.model.random_effects[model]["model"], 1),
            "slope_age_at_posting": round(mle_model.model.params["age_at_posting"] + mle_model.model.random_effects[model]["age_at_posting"], 1),
            "slope_mileage_per_year": round(mle_model.model.params["mileage_per_year"], 4),
        }
    })

model_summary[0]

## Save model

In [None]:
mle_model.save_model(os.path.join(SRC_PATH, "models", "vehicle-type", "suv-price-model-v1.pkl"))

In [None]:
ads.df.query("make == 'Toyota'").model.value_counts()

In [None]:
sample = pd.DataFrame({
    "make" : ["Toyota"],
    "model": ["RAV4"],
    "age_at_posting": [6],
    "mileage_per_year": [8000],
    "wheel_system": ["FWD"]
})

mle_model.predict(sample)

# Model Deployment on Azure

Sample multi moel from docs: https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/custom-container/minimal/multimodel