Add modAL#1008
Closed
cosmic-cortex wants to merge 1 commit intovinta:masterfrom
cosmic-cortex:add-modAL
Closed
Conversation
23abd09 to
40cd98b
Compare
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Modular Active Learning framework for Python3Overview
Introduction
modAL is an active learning framework for Python3, designed with modularity, flexibility and extensibility in mind. Built on top of scikit-learn, it allows you to rapidly create active learning workflows with nearly complete freedom. What is more, you can easily replace parts with your custom built solutions, allowing you to design novel algorithms with ease. modAL is aimed for practicioners and researchers as well, to make the assembly and testing of active learning pipelines quick and easy.
Active learning from bird's-eye view
With the recent explosion of available data, you have can have millions of unlabelled examples with a high cost to obtain labels. For instance, when trying to predict the sentiment of tweets, obtaining a training set can require immense manual labour. But worry not, active learning comes to the rescue! In general, AL is a framework allowing you to increase classification performance by intelligently querying you to label the most informative instances. In general, an active learning workflow looks like the following.
The key components of any workflow are the model you choose, the uncertainty measure you use and the query strategy you apply to request labels. With modAL, instead of choosing from a small set of built-in components, you have the freedom to seamlessly integrate scikit-learn or Keras models into your algorithm and easily tailor your custom query strategies and uncertainty measures.
modAL in action
Let's see what modAL can do for you!
From zero to one in a few lines of code
Active learning with a scikit-learn classifier, for instance RandomForestClassifier, can be as simple as the following.
Replacing parts quickly
If you would like to use different uncertainty measures and query strategies than the default uncertainty sampling, you can either replace them with several built-in strategies or you can design your own by following a few very simple design principles. For instance, replacing the default uncertainty measure to classification entropy looks the following.
Replacing parts with your own solutions
modAL was designed to make it easy for you to implement your own query strategy. For example, implementing and using a simple random sampling strategy is as easy as the following.
An example with active regression
To see modAL in real action, let's consider an active regression problem with Gaussian Processes! In this example, we shall try to learn the noisy sine function:
For active learning, we shall define a custom query strategy tailored to Gaussian processes. In a nutshell, a query stategy in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance and the instance itself. In our case, the arguments are
regressorandX.After setting up the query strategy and the data, the active learner can be initialized.
The initial regressor is not very accurate.
The blue band enveloping the regressor represents the standard deviation of the Gaussian process at the given point. Now we are ready to do active learning!
After a few queries, we can see that the prediction is much improved.
Additional examples
Including this, many examples are available:
Installation
modAL requires
You can install modAL directly with pip:
Alternatively, you can install modAL directly from source:
For running the examples, Matplotlib >= 2.0 is recommended.
About the developer
modAL is developed by me, Tivadar Danka (aka cosmic-cortex in GitHub). I have a PhD in pure mathematics, but I fell in love with biology and machine learning right after I finished my PhD. I have changed fields and now I work in the Bioimage Analysis and Machine Learning Group of Peter Horvath, where I am working to develop active learning strategies for intelligent sample analysis in biology. During my work I realized that in Python, creating and prototyping active learning workflows can be made really easy and fast with scikit-learn, so I ended up developing a general framework for this. The result is modAL :) If you have any questions, requests or suggestions, you can contact me at 85a5187a@opayq.com! I hope you'll find modAL useful!
Anyone who agrees with this pull request could vote for it by adding a 👍 to it, and usually, the maintainer will merge it when votes reach 20.