ml-regression

Regression algorithm using linear models coded from scratch.

Problem: use different regression approaches to predict altitudes of southern Taiwan

Description

The training dataset contains 6,000 pairs of coordinates and their corresponding altitude values. You should create feature vectors based on the input coordinates as the model’s inputs and train your regression models to fit the training data. Predict the altitude values for the 2,000 coordinates in the testing dataset and save your predictions along with the model’s weights as a .csv file. For each approach, the mean squared error (MSE) of your predictions on the testing dataset must be less than 900; otherwise, you will fail the correctness check.

Grading Policy

After passing the correctness check, TAs will evaluate your performance based on the results of your three approaches and select the best one for ranking. Your performance score will be calculated using the following formula:

$P e r f o r m a n c e S c o r e = M e a n S q u a r e d E r r o r \times N u m b e r o f W e i g h t s$

Your final score in this part will be determined based on the ranking of your performance score.

Regression Methods

Maximum Likelihood (MSE: 809 < 900)
- Basis function: Gaussian (the topographic map is similar to Gaussian basis function)
- Likelihood function: Gaussian (the prediction is more likely to be what we predicted than uniformly distributed)
Maximum A Posteriori (MSE: 805 < 900)
- Basis function: Gaussian (the topographic map is similar to Gaussian basis function)
- Prior function: Gaussian (altitudes wouldn't have many outliers e.g. there are rarely sudden spikes of altitude in the real world)
- Likelihood function: Gaussian (the prediction is more likely to be what we predicted than uniformly distributed)
Bayesian Regression Method (MSE: 797 < 900)
- Basis function: Gaussian (the topographic map is similar to Gaussian basis function)
- Prior function: Gaussian (altitudes wouldn't have many outliers e.g. there are rarely sudden spikes of altitude in the real world)
- Likelihood function: Gaussian (the prediction's distribution is more likely to be what we predicted than uniformly distributed)

Other Details

K-means to find cluster centroids for Gaussian basis function
K-fold cross validation to avoid overfitting

Challenges

Find the best basis function and cluster centroids' locations.
Balance the number weights used and mean squared error.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
img		img
README.md		README.md
problem_1.py		problem_1.py
problem_2.py		problem_2.py
problem_3.py		problem_3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ml-regression

Problem: use different regression approaches to predict altitudes of southern Taiwan

Description

Grading Policy

Regression Methods

Other Details

Challenges

About

Uh oh!

Releases

Packages

Languages

28604/ml-regression

Folders and files

Latest commit

History

Repository files navigation

ml-regression

Problem: use different regression approaches to predict altitudes of southern Taiwan

Description

Grading Policy

Regression Methods

Other Details

Challenges

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages