# Meta-Learning

## 1. Case
- **Few-Shot Learning** [(Brendan Lake et al.)](https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf): to learn new concepts from one or a few instances of that concept

## 2. Idea
- **Problem of AI systems**: Current AI systems can master a complex skill from scratch, using an understandably large amount of time and experience. However, if we want our agents to be able to acquire many skills and adapt to many environments, we cannot afford to train each skill in each setting from scratch. 
- **How to use meta-learning to address the above problem**: If the agents can learn how to learn new tasks faster by reusing previous experience, rather than considering each new task in isolation, they may be able to adapt intelligently to a wide variety of new, unseen situations. This approach of **learning to learn, or meta-learning**, is a key stepping stone towards versatile agents that can continually learn a wide variety of tasks throughout their lifetimes.
- Meta-learning systems are trained by being exposed to **a large number of tasks** and are then tested in **their ability to learn new tasks**.
- During meta-learning, the model is trained to **learn tasks** in the meta-training set.

## 3. Methods
- During meta-learning, there are two optimizations at play:
    1. the learner, which learns new tasks
    2. the meta-learner, which trains the learner. 
- Methods for meta-learning:
    1. Recurrent model
        * The meta-learner uses gradient descent, whereas the learner simply rolls out the recurrent network.
        * Be less (meta-)efficient than other methods because the learner network needs to come up with its learning strategy from scratch
    2. Metric Learning
        * To learn a metric space
        * The meta-learning is performed using gradient descent (or your favorite neural network optimizer), whereas the learner corresponds to a comparison scheme, e.g. nearest neighbors, in the meta-learned metric space.
    3. Learning optimisers
        * To learn an optimiser
        * One network (the meta-learner) learns to update another network (the learner) so that the learner effectively learns the task. 
        
## 4. Limitations of transfer learning
- Typical steps of transfer learning:when approaching any new vision task, the well-known paradigm is to 
    1. collect labeled data for the task, 
    2. acquire a network pre-trained on ImageNet classification, and then 
    3. fine-tune the network (the last layer of the pre-trained network must be modified to adapt to the new labeled data) on the collected data using gradient descent.
- Limitations:
    1. For a very small dataset, e.g., in the few-shot learning setting, the number of labels are small, and the model can easliy go to overfitting.
    2. For non-vision domains, such as speech, language, and control, there are no analogous pre-training schemes.
- St:
<img src='images/MAML.png' width=500>

## 5. MAML [C. Finn et al.](https://arxiv.org/abs/1703.03400)
- Model-Agnostic Meta-Learning (MAML) trains over a wide range of tasks
- It trains for a representation that can be quickly adapted to a new task, via a few gradient steps. 
- **The meta-learner seeks to find an initialization** that is not only useful for adapting to various problems, but also can be adapted quickly (in a small number of steps) and efficiently (using only a few examples). 
- Suppose we are seeking to find a set of parameters θ that are highly adaptable. During the course of meta-learning (the bold line), MAML optimizes for a set of parameters such that when a gradient step is taken with respect to a particular task $i$ (the gray lines), the parameters are close to the optimal parameters $\theta^∗_i$ for task $i$. See the visualisation below.

<img src='images/MAML.png' width=500>