Skip to content

28604/ml-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ml-regression

Regression algorithm using linear models coded from scratch.

Problem: use different regression approaches to predict altitudes of southern Taiwan

An image of southern taiwan 3D topographic map

An image of regression model pipeline

Description

The training dataset contains 6,000 pairs of coordinates and their corresponding altitude values. You should create feature vectors based on the input coordinates as the model’s inputs and train your regression models to fit the training data. Predict the altitude values for the 2,000 coordinates in the testing dataset and save your predictions along with the model’s weights as a .csv file. For each approach, the mean squared error (MSE) of your predictions on the testing dataset must be less than 900; otherwise, you will fail the correctness check.

Grading Policy

After passing the correctness check, TAs will evaluate your performance based on the results of your three approaches and select the best one for ranking. Your performance score will be calculated using the following formula:

P e r f o r m a n c e   S c o r e = M e a n   S q u a r e d   E r r o r × N u m b e r   o f   W e i g h t s

Your final score in this part will be determined based on the ranking of your performance score.

Regression Methods

  • Maximum Likelihood (MSE: 809 < 900)

    • Basis function: Gaussian (the topographic map is similar to Gaussian basis function)

    • Likelihood function: Gaussian (the prediction is more likely to be what we predicted than uniformly distributed)

      An image of maximum likelihood formula
  • Maximum A Posteriori (MSE: 805 < 900)

    • Basis function: Gaussian (the topographic map is similar to Gaussian basis function)

    • Prior function: Gaussian (altitudes wouldn't have many outliers e.g. there are rarely sudden spikes of altitude in the real world)

    • Likelihood function: Gaussian (the prediction is more likely to be what we predicted than uniformly distributed)

      An image of maximum a posteriori formula

      An image of maximum a posteriori formula
  • Bayesian Regression Method (MSE: 797 < 900)

    • Basis function: Gaussian (the topographic map is similar to Gaussian basis function)

    • Prior function: Gaussian (altitudes wouldn't have many outliers e.g. there are rarely sudden spikes of altitude in the real world)

    • Likelihood function: Gaussian (the prediction's distribution is more likely to be what we predicted than uniformly distributed)

      An image of bayesian regression formula

Other Details

  • K-means to find cluster centroids for Gaussian basis function
  • K-fold cross validation to avoid overfitting

Challenges

  • Find the best basis function and cluster centroids' locations.
  • Balance the number weights used and mean squared error.

About

Regression algorithm using linear models coded using only NumPy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages