Regression algorithm using linear models coded from scratch.
The training dataset contains 6,000 pairs of coordinates and their corresponding altitude values. You should create feature vectors based on the input coordinates as the model’s inputs and train your regression models to fit the training data. Predict the altitude values for the 2,000 coordinates in the testing dataset and save your predictions along with the model’s weights as a .csv file. For each approach, the mean squared error (MSE) of your predictions on the testing dataset must be less than 900; otherwise, you will fail the correctness check.
After passing the correctness check, TAs will evaluate your performance based on the results of your three approaches and select the best one for ranking. Your performance score will be calculated using the following formula:
Your final score in this part will be determined based on the ranking of your performance score.
-
Maximum Likelihood (MSE: 809 < 900)
-
Maximum A Posteriori (MSE: 805 < 900)
-
Basis function: Gaussian (the topographic map is similar to Gaussian basis function)
-
Prior function: Gaussian (altitudes wouldn't have many outliers e.g. there are rarely sudden spikes of altitude in the real world)
-
Likelihood function: Gaussian (the prediction is more likely to be what we predicted than uniformly distributed)
-
-
Bayesian Regression Method (MSE: 797 < 900)
-
Basis function: Gaussian (the topographic map is similar to Gaussian basis function)
-
Prior function: Gaussian (altitudes wouldn't have many outliers e.g. there are rarely sudden spikes of altitude in the real world)
-
Likelihood function: Gaussian (the prediction's distribution is more likely to be what we predicted than uniformly distributed)
-
- K-means to find cluster centroids for Gaussian basis function
- K-fold cross validation to avoid overfitting
- Find the best basis function and cluster centroids' locations.
- Balance the number weights used and mean squared error.