Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
art
 
 
 
 
doc
 
 
exp
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

L1-LSMI

L1-LSMI is a Matlab package for non-linear supervised feature selection with L1-penalized squared-loss mutual information. See this page on my web site for more details.

The source tree in this git repository includes all files I used for experiments and possibly other third-party packages. If you just need the minimal L1-LSMI package, grab this zip instead.

License

GPL v3

L1-LSMI a Matlab package for non-linear supervised feature selection 
with L1-penalized squared-loss mutual information.
Copyright (C) 2012 Wittawat Jitkrittum

L1-LSMI is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see http://www.gnu.org/licenses/.

Usage Demo

%
% Demonstrate how to use pglsmi (L1-LSMI).
% pg is an internal code for 'Plain gradient'
%
rng(1)
%%%%% Generate a toy dataset
% X is #dim x #sample
% Y is 1 x #sample
[X Y] = gen_plus(400);

%%%%% Some settings
% Number of features to select. Necessary option.
o.k = 2;

% How many restarts ? More restarts of course give better features, but
% slower.
o.ztuner_repeat = 1;

% LSMI cross validation fold
o.fold = 5; 

% Max iterations for one value of z (l1-ball radius)
o.maxIter = 100;

% Internally, pglsmi tries to find z which gives k features.
% But, in case that k is very large, it is difficult to get exact k
% features. So, we may allow some radius.
% 
% If #features_found is between k and k+high_k_radius, then treat as found.
% Put 0 (exact k) here since it is just a demo. 
o.high_k_radius = 0; 

% There are many other options. See the relevant files in pglsmi folder.
% In particular, see the lines with "myProcessOptions".

%%%%% Run pglsmi
S = fs_pglsmi(X, Y, o); % return a struct 
S

% S.F is a logical vector indicating which features are selected.
% Hopefully, we have [1 1 0 0 0 0 0 0 0 0] here. (the first two features 
% should be selected).
S.F

% S.ZT.W returns the actual weights W found using z = S.ZT.z. The final
% objective function value is S.ZT.f.
S.ZT.W

%%%%%%%%%%%%%%%%%%%%
end

function [X Y]=gen_plus(n)
% Simple classification problem.    
% 10 features.
% Y = sign(2*X1 - 3*X2 + noise)
    X = randn(10, n);
    Y = sign(2*X(1,:) - 3*X(2,:) + randn(1,n)/2);
    Y(Y==0) = 1;
end

Demo Output

S = 

        timetictoc: 4.2043
           timecpu: 8.8500
                 F: [1 1 0 0 0 0 0 0 0 0]
                ZT: [1x1 struct]
            ZTLogs: {[1x1 struct]}
        redun_rate: 6.0813e-04
    abs_redun_rate: 6.0813e-04


ans =

     1     1     0     0     0     0     0     0     0     0


ans =

    0.5290
    0.7510
         0
         0
         0
         0
         0
         0
         0
         0

meaning only the first two features (dimensions) among the 10 features are relevant to explain the output Y. This makes sense as the output Y in our toy dataset depends on only the first two features.

About

squared-loss mutual information based feature selection

Topics

Resources

License

Releases

No releases published
You can’t perform that action at this time.