Skip to content

Commit

Permalink
Adjusted line width
Browse files Browse the repository at this point in the history
  • Loading branch information
yngvem committed Jul 15, 2019
1 parent fd6cf1f commit f94fb31
Showing 1 changed file with 38 additions and 22 deletions.
60 changes: 38 additions & 22 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,24 @@ Group Lasso
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/python/black

The group lasso [1] regulariser is a well known method to achieve structured sparsity
in machine learning and statistics. The idea is to create non-overlapping groups of
covariate, and recover regression weights in which only a sparse set of these covariate
groups have non-zero components.

There are several reasons for why this might be a good idea. Say for example that we have
a set of sensors and each of these sensors generate five measurements. We don't want
to maintain an unneccesary number of sensors. If we try normal LASSO regression, then
we will get sparse components. However, these sparse components might not correspond
to a sparse set of sensors, since they each generate five measurements. If we instead
use group LASSO with measurements grouped by which sensor they were measured by, then
The group lasso [1] regulariser is a well known method to achieve structured
sparsity in machine learning and statistics. The idea is to create
non-overlapping groups of covariate, and recover regression weights in which
only a sparse set of these covariate groups have non-zero components.

There are several reasons for why this might be a good idea. Say for example
that we have a set of sensors and each of these sensors generate five
measurements. We don't want to maintain an unneccesary number of sensors.
If we try normal LASSO regression, then we will get sparse components.
However, these sparse components might not correspond to a sparse set of
sensors, since they each generate five measurements. If we instead use group
LASSO with measurements grouped by which sensor they were measured by, then
we will get a sparse set of sensors.

About this project:
-------------------
This project is developed by Yngve Mardal Moe and released under an MIT lisence.
This project is developed by Yngve Mardal Moe and released under an MIT
lisence.

Installation guide:
-------------------
Expand Down Expand Up @@ -54,20 +56,34 @@ The todos are, in decreasing order of importance

- Use Mixins?

5. Classification problems (I have an experimental implementation, but it's not validated yet)
5. Classification problems (I have an experimental implementation, but it's
not validated yet)

Unfortunately, the most interesting parts are the least important ones, so expect the list
to be worked on from both ends simultaneously.
Unfortunately, the most interesting parts are the least important ones, so
expect the list to be worked on from both ends simultaneously.

Implementation details
----------------------
The problem is solved using the FISTA optimiser [2] with a gradient-based adaptive restarting scheme [3]. No line search is currently implemented, but I hope to look at that later.

Although fast, the FISTA optimiser does not achieve as low loss values as the significantly slower second order interior point methods. This might, at first glance, seem like a problem. However, it does recover the sparsity patterns of the data, which can be used to train a new model with the given subset of the features.

Also, even though the FISTA optimiser is not meant for stochastic optimisation, it has to my experience not suffered a large fall in performance when the mini batch was large enough. I have therefore implemented mini-batch optimisation using FISTA, and thus been able to fit models based on data with ~500 columns and 10 000 000 rows on my moderately priced laptop.

Finally, we note that since FISTA uses Nesterov acceleration, is not a descent algorithm. We can therefore not expect the loss to decrease monotonically.
The problem is solved using the FISTA optimiser [2] with a gradient-based
adaptive restarting scheme [3]. No line search is currently implemented, but
I hope to look at that later.

Although fast, the FISTA optimiser does not achieve as low loss values as the
significantly slower second order interior point methods. This might, at
first glance, seem like a problem. However, it does recover the sparsity
patterns of the data, which can be used to train a new model with the given
subset of the features.

Also, even though the FISTA optimiser is not meant for stochastic
optimisation, it has to my experience not suffered a large fall in
performance when the mini batch was large enough. I have therefore
implemented mini-batch optimisation using FISTA, and thus been able to fit
models based on data with ~500 columns and 10 000 000 rows on my moderately
priced laptop.

Finally, we note that since FISTA uses Nesterov acceleration, is not a
descent algorithm. We can therefore not expect the loss to decrease
monotonically.

References
----------
Expand Down

0 comments on commit f94fb31

Please sign in to comment.