# Compare the Empirical Performance between 2 LBFs

## 1. Introduction

We compare the empirical performance of two local basis functions (LBFs) to determine which model is more effective on a specific task. Through this comparison, we can choose the LBF that best suits our application needs, improving the accuracy and performance of the model.

This task is to compare the performance between polynomial and radial basis function regression models on 4 datasets. 

We want to find which LBF model is better for the given regression task. More specifically, we want to investigate which family of basis functions is better suited for approximating highly nonlinear relationships between two scalar-valued variables.

The report proceeds as follows: Section 2 presents the method we used. Section 3 introduces the experimental results and analysis, and we summarize the task in the Section 4.

## 2. Methodology

We compare 2 different linear basis functions (LBFs) in this section. The LBF model can be expressed as follows:

$$
y=\phi(x)^{\top} \boldsymbol{\beta}+\varepsilon
$$

where$\boldsymbol{\phi}(x):=\left[1, \phi_1(x), \ldots, \phi_p(x)\right]^{\top}, \boldsymbol{\beta}:=\left[\beta_0, \beta_1, \ldots, \beta_p\right]^{\top}$, and $\varepsilon$ is a random noise. We use 2 different basis functions ${\{\phi_i\}}^p_{i=1}$.


The first function is in the family of polynomial basis functions:

$$
\phi_i(x):=x^i,
$$

The second function is in the family of radial basis functions:

$$
\phi_i(x):=\exp \left\{-\frac{\left(x-\frac{i}{p+1}\right)^2}{2 s^2}\right\} .
$$

Before comparing the two basis function families, you must set the value of $p$ for the polynomial regression model, as well as the values of $p$ and $s$ for the radial basis function regression model. These hyperparameter values should be selected for each dataset, using a validation set, by minimising the validation mean squared error (MSE). 

In our study, the optimal value of $p$ (for each basis function family) should be selected by exhaustively searching through an equally-spaced grid from 1 to 10, with a spacing of 1:

$$
P := {1, 2, 3, . . . , 10}.
$$

For the radial basis functions, in addition to selecting $p$, you should also select the optimal value of s by exhaustively searching through another equally-spaced grid from 0.1 to 1, with a spacing of 0.1:

$$
S := {0.1, 0.2, 0.3, . . . , 1}.
$$

That is, for each dataset, the optimal values must be determined for three hyperparameters in total: $p_{pol} ∈ P$, $p_{rad} ∈ P$, and $s ∈ S$, where $p_{pol}$ denotes the number of polynomial basis functions (i.e., the degree of the polynomial) and $p_{rad}$ denotes the numbers of radial basis functions.

Given optimal hyperparameter values, we use the least squares method to estimate the parameter vector $\beta$ by solving the normal equation ：
$$
   \boldsymbol{\beta} = (\Phi^{\top} \Phi)^{-1} \Phi^{\top} \mathbf{y}
$$
   where $y$ is  a column vector composed of observation values。


After obtaining the parameter vector $ beta $, we can use the test set to compare the performance of two models on four datasets.


## 3. Empirical Study

In this task, we use four benchmark datasets: A, B, C, and D. Each dataset contains 5,000 observations of the the response and predictor variables, which are named y and x, respectively. A scatter plot of each dataset is shown in Figure 1. 

<img src="http://tva1.sinaimg.cn/large/006ZwIoWly1hjfnkc8kdaj30n40k1adh.jpg" style="zoom:60%;" />
Figure 1: Benchmark Datasets

To obtain the empirical results, we first divided each dataset into training and testing sets using a 2/8 split method, as described in Section 2. Then, we employed an exhaustive search approach to select the optimal values for the polynomial degree (\(p_{pol}\)), radial basis function degree (\(p_{rad}\)), and width (\(s\)) by minimizing the validation mean squared error (MSE) on the training set. Specifically, we iterated through the ranges of degrees from 1 to 10 for both polynomial and radial basis functions (\(p_{pol} \in \{1, 2, ..., 10\}\), \(p_{rad} \in \{1, 2, ..., 10\}\)) and widths from 0.1 to 1 with intervals of 0.1 (\(s \in \{0.1, 0.2, ..., 1\}\)). We trained polynomial regression and radial basis function regression models for each combination and selected the parameters that performed the best on the validation set.

For each dataset, we derived the optimal combinations of polynomial degree (\(p_{pol}\)), radial basis function degree (\(p_{rad}\)), and width (\(s\)), which were organized into a table as follows:

| Dataset | Optimal \(p_{pol}\) | Optimal \(p_{rad}\) | Optimal \(s\) |
|---------|---------------------|---------------------|---------------|
| Dataset a | 3                   | 4                   | 0.3           |
| Dataset b | 8                   | 1                   | 0.4           |
| Dataset c | 6                   | 3                   | 0.2           |
| Dataset d | 2                   | 1                   | 0.1           |

From this table, we determined which basis function family achieved the best performance on each dataset. For instance, on Dataset a, the polynomial regression model (\(p_{pol}=3\)) and the radial basis function regression model (\(p_{rad}=4\), \(s=0.3\)) performed the best.

Finally, we summarized the test mean squared error (MSE) for each basis function family on each dataset in the following table:

| Dataset | Polynomial Basis | Radial Basis |
|---------|------------------|--------------|
| Dataset a | 0.2051            | 0.0521        |
| Dataset b | 0.1059            | 0.1063        |
| Dataset c | 0.6644            | 0.0414        |
| Dataset d | 0.2564            | 0.0444        |

In the table, we summarize the test mean squared error (MSE) for polynomial and radial basis functions on four datasets: Dataset a, b, c, and d. For Dataset a, radial basis functions outperform with an MSE of 0.0521 compared to the polynomial basis's 0.2051. Both families perform similarly on Dataset b with MSE values of 0.1059 and 0.1063, respectively. Radial basis functions excel on Dataset c, boasting an MSE of 0.0414, while polynomial functions score higher at 0.6644. On Dataset d, radial basis functions outshine with an MSE of 0.0444, while polynomial functions show a slightly higher MSE of 0.2564. 

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.kernel_ridge import KernelRidge
from itertools import product

data = pd.read_csv('../dataset/dataset-a.csv')

X = data.iloc[:, 1:].values  
y = data.iloc[:, 0].values   

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

degrees = np.arange(1, 11) 
p_values = np.arange(1, 11)  
s_values = np.arange(0.1, 1.1, 0.1)

best_degree_pol = None
best_degree_rad = None
best_s_rad = None
best_mse = float('inf')
best_mse_pol = float('inf')
best_mse_rad = float('inf')

for degree_pol, degree_rad, s_rad in product(degrees, p_values, s_values):
    polyreg = make_pipeline(PolynomialFeatures(degree_pol), LinearRegression())

    rbfreg = KernelRidge(kernel='rbf', gamma=1.0 / (2 * s_rad ** 2))
    
    polyreg.fit(X_train, y_train)
    rbfreg.fit(X_train, y_train)
    
    mse_pol = mean_squared_error(y_test, polyreg.predict(X_test))
    mse_rad = mean_squared_error(y_test, rbfreg.predict(X_test))
    
    if mse_pol < best_mse:
        best_mse = mse_pol
        best_mse_pol = mse_pol
        best_degree_pol = degree_pol
    if mse_rad < best_mse:
        best_mse = mse_rad
        best_mse_rad = mse_rad
        best_degree_rad = degree_rad
        best_s_rad = s_rad

print("Best Polynomial Degree:", best_degree_pol)
print("Best Radial Basis Function Degree:", best_degree_rad)
print("Best Radial Basis Function Width:", best_s_rad)
print("Best MSE Pol:", best_mse_pol)
print("Best MSE Rad:", best_mse_rad)
print("Best MSE:", best_mse)


We can get b, c and d dataset result as the same way.

## 4. Conclusion 

Through the comparison of polynomial and radial basis functions, we discovered variations in their performance across different datasets. Polynomial functions exhibit superior fitting capabilities on certain datasets, whereas radial basis functions outperform on others. This discrepancy highlights the adaptability of different basis functions for distinct problems. Hence, when selecting basis functions, it's crucial to consider the specific characteristics of the task at hand and choose an appropriate family of basis functions to achieve more accurate predictions.

Our study has several limitations. Firstly, we only compared two families of basis functions, while there might be other functions suitable for specific tasks. Secondly, we utilized fixed ranges of hyperparameters during the search, potentially impacting the stability of our results. Additionally, our models assumed a fixed form for basis functions, which might lack flexibility in some real-world applications. Lastly, we employed a fixed dataset split, and different splits could yield different outcomes.

Future research could explore novel types of basis functions, like trigonometric or exponential functions, to diversify the options available for modeling. Additionally, investigating adaptive hyperparameter tuning methods such as Bayesian optimization and exploring ensemble learning techniques could enhance the versatility and robustness of predictive models across different applications and datasets. These avenues of study would contribute to a deeper understanding of basis function selection and its impact on model performance.

*Note: Generative AI was used only used for editing the final report text.*
