Adjusted line width

yngvem · Jul 15, 2019 · f94fb31 · f94fb31
1 parent fd6cf1f
commit f94fb31
Showing 1 changed file with 38 additions and 22 deletions.
diff --git a/README.rst b/README.rst
@@ -10,22 +10,24 @@ Group Lasso
 .. image:: https://img.shields.io/badge/code%20style-black-000000.svg
     :target: https://github.com/python/black
 
-The group lasso [1] regulariser is a well known method to achieve structured sparsity
-in machine learning and statistics. The idea is to create non-overlapping groups of
-covariate, and recover regression weights in which only a sparse set of these covariate
-groups have non-zero components.
-
-There are several reasons for why this might be a good idea. Say for example that we have
-a set of sensors and each of these sensors generate five measurements. We don't want 
-to maintain an unneccesary number of sensors. If we try normal LASSO regression, then
-we will get sparse components. However, these sparse components might not correspond
-to a sparse set of sensors, since they each generate five measurements. If we instead
-use group LASSO with measurements grouped by which sensor they were measured by, then
+The group lasso [1] regulariser is a well known method to achieve structured 
+sparsity in machine learning and statistics. The idea is to create 
+non-overlapping groups of covariate, and recover regression weights in which 
+only a sparse set of these covariate groups have non-zero components.
+
+There are several reasons for why this might be a good idea. Say for example 
+that we have a set of sensors and each of these sensors generate five 
+measurements. We don't want to maintain an unneccesary number of sensors. 
+If we try normal LASSO regression, then we will get sparse components. 
+However, these sparse components might not correspond to a sparse set of 
+sensors, since they each generate five measurements. If we instead use group 
+LASSO with measurements grouped by which sensor they were measured by, then
 we will get a sparse set of sensors.
 
 About this project:
 -------------------
-This project is developed by Yngve Mardal Moe and released under an MIT lisence.
+This project is developed by Yngve Mardal Moe and released under an MIT 
+lisence.
 
 Installation guide:
 -------------------
@@ -54,20 +56,34 @@ The todos are, in decreasing order of importance
 
    - Use Mixins?
 
-5. Classification problems (I have an experimental implementation, but it's not validated yet)
+5. Classification problems (I have an experimental implementation, but it's 
+   not validated yet)
 
-Unfortunately, the most interesting parts are the least important ones, so expect the list
-to be worked on from both ends simultaneously.
+Unfortunately, the most interesting parts are the least important ones, so 
+expect the list to be worked on from both ends simultaneously.
 
 Implementation details
 ----------------------
-The problem is solved using the FISTA optimiser [2] with a gradient-based adaptive restarting scheme [3]. No line search is currently implemented, but I hope to look at that later.
-
-Although fast, the FISTA optimiser does not achieve as low loss values as the significantly slower second order interior point methods. This might, at first glance, seem like a problem. However, it does recover the sparsity patterns of the data, which can be used to train a new model with the given subset of the features.
-
-Also, even though the FISTA optimiser is not meant for stochastic optimisation, it has to my experience not suffered a large fall in performance when the mini batch was large enough. I have therefore implemented mini-batch optimisation using FISTA, and thus been able to fit models based on data with ~500 columns and 10 000 000 rows on my moderately priced laptop.
-
-Finally, we note that since FISTA uses Nesterov acceleration, is not a descent algorithm. We can therefore not expect the loss to decrease monotonically.
+The problem is solved using the FISTA optimiser [2] with a gradient-based 
+adaptive restarting scheme [3]. No line search is currently implemented, but 
+I hope to look at that later.
+
+Although fast, the FISTA optimiser does not achieve as low loss values as the 
+significantly slower second order interior point methods. This might, at 
+first glance, seem like a problem. However, it does recover the sparsity 
+patterns of the data, which can be used to train a new model with the given 
+subset of the features.
+
+Also, even though the FISTA optimiser is not meant for stochastic 
+optimisation, it has to my experience not suffered a large fall in 
+performance when the mini batch was large enough. I have therefore 
+implemented mini-batch optimisation using FISTA, and thus been able to fit 
+models based on data with ~500 columns and 10 000 000 rows on my moderately 
+priced laptop.
+
+Finally, we note that since FISTA uses Nesterov acceleration, is not a 
+descent algorithm. We can therefore not expect the loss to decrease 
+monotonically.
 
 References
 ----------