pyMIToolbox

The pyMIToolbox is inspired by the MIToolbox and FEAST.

Similar to the MIToolbox it provides some functions to calculate the Entropy (H), conditional Entropy, Mutual Information (I), and conditional Mutual Information. These can be used to impelemt feature selection mechanisms like JMI or HJMI. For examples, please have a look at the test folder.

Examples

HJMI feature selection

Historical JMI (HJMI) feature selection mechanism is an extension of the JMI feature selection mechanism. Both deliver the same features if the amount of features is given. However, the HJMI allows specifying stopping criteria based on the improvement of the overall information of the selected features. Details can be found in the following paper:

Gocht, A.; Lehmann, C. & Schöne, R.
A New Approach for Automated Feature Selection
2018 IEEE International Conference on Big Data (Big Data), 2018 , 4915-4920
DOI: 10.1109/BigData.2018.8622548 or OpenAccess

For an implementation of the HJMI, please have a look to test/hjmi.py. Please note, that the algorithm is slightly modified, to avoid a division by zero.

JMI feature selection

The Joint Mutual Information (JMI) feature selection mechanism is based on the work of Brow et al. and Yang et al.:

Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
G. Brown, A. Pocock, M.-J. Zhao, M. Lujan
Journal of Machine Learning Research, 13:27-66 (2012)
ACM ID: 2188387 or OpenAccess

Yang, H. H. & Moody, J.
Data visualization and feature selection: New algorithms for nongaussian data
Advances in Neural Information Processing Systems, 2000 , 687-693

For an implementation of the JMI, please have a look to test/jmi.py.

Comparing to MATLAB(R)

Please be aware that Matlab has a different discretisation scheme, then python.

Discretising an Array in python would look like the following:

[tmp, features] = X.shape
D = np.zeros([tmp, features])

for i in range(features):
    N, E = np.histogram(X[:,i], bins=10)
    D[:,i] = np.digitize(X[:,i], bins, right=False)

While in Matlab the same code would look like:

[tmp, features] = size(X);

D=zeros([tmp, features]);
for i = 1:features
    [N,E] = histcounts(X(:,i),10,'BinLimits',[min(X(:,i)),max(X(:,i))]);
    D(:,i) = discretize(X(:,i),E,'IncludedEdge', 'left');
end

Moreover, Matlab and Python use different counting for array indexes. While python starts C-like at 0, Matlab starts at 1.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
mephisto		mephisto
pymit		pymit
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyMIToolbox

Examples

HJMI feature selection

JMI feature selection

Comparing to MATLAB(R)

About

Releases

Packages

Contributors 2

Languages

License

tud-zih-energy/pymit

Folders and files

Latest commit

History

Repository files navigation

pyMIToolbox

Examples

HJMI feature selection

JMI feature selection

Comparing to MATLAB(R)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages