# A Framework for Active Learning in Applications Powered by Deep Learning

Andrej Karpathy, 2022. Software 2.0. URL: https://medium.com/@karpathy/software-2-0-a64152b37c35

## Abstract

Keywords: Infrastructure Engineering, Machine Learning, Artificial Intelligence

## Introduction

Foundational models will progressively be incorporated in virtually all applications, independent of its target domain. This will lead to a new generation of applications that will be able to learn from their users and improve over time. This paper proposes a framework for active learning in applications powered by these models.

A commom architecture for these applications is composed of five layers`[1]`:

1. General AI models: these are the foundational models that are trained on large, public datasets.
2. Specific AI models: these are trained on narrow data to outperform general models in specific use cases.
3. Hyperlocal AI models: these are trained on local, proprietary data.
4. Generative OS or API layer: this layer helps the application to access all the AI models required to solve a problem and is responsible for orchestrating the data flow between them.
5. Applications layer: this is the user-facing layer and should have powerful network effects and embedding characteristics.

Layers one and two are easily commoditized, as they are trained on public data. Layer three is the most defensible, as it is trained on proprietary data. Layer four is the most valuable, as it is the layer that orchestrates the data flow between the models. Layer five has characteristics that are similar to those of a social network, as it has powerful network effects and embedding characteristics.

Active learning has historically been used in the context of supervised learning, where the model is trained on a labeled dataset. In this paper, we propose a framework for active learning in applications powered by deep learning as a way to gain product speed. This can be achieved by launching the feature first, then letting the model learn over time. This framework is composed of four layers:

1. Prompt engineering and pre-processing data with custom embeddings: in which a general model is used to generate completion candidates, which are then used to generate custom embeddings after user validation.
2. Fine-tuning a foundational model: in which a foundational model is continuouly fine-tuned on the data being collected from the users, in order to improve its performance over time. 
3. Post-processing results to enforce ODD: large foundational models hallucinate and generate results that are out of distribution. In order to measure correctness, it is necessary to enforce ODD (operational design domain), which is the set of attributes that the model is expected to perform well on. This can be achieved through uncertainty estimation, which consists of adding such uncertain data points that will help the model better learn about the data, which in turn help it generalize well `[4, 5]`.
4. Infrastructure to support end to end validation: in which the application is instrumented to collect and validate data that will be fed into the model as well as orchestrate the data flow among different models.

## Related Work

Given a fixed budget for application development, some ingenuity is required in order to acquire annotated data to train a model to perform well on a specific task while retaining maximum generalization performance. This is known as the data annotation bottleneck`[2]` and is related to the correctness spectrum problem`[3]`.

The correctness problem has been studied in the context of self-driving cars, where the goal is to build a system that is correct enough to be safe. In that context, correctness is defined as the probability of the system to fail in a given scenario, which is related to the concept of known unknowns and unknown unknowns, which are the set of scenarios that the system is expected to encounter in the real world and the set of scenarios that the system is not expected to encounter in the real world, respectively. If all "knowns" are defined and guardrails are built to prevent any "unknown unknowns", then it is possible to achieve high bars of correctness for specific use cases.

This is related to the concept of operational design domain (ODD), which is the set of scenarios that the system is expected to encounter in the real world. By bounding the application to an ODD, it becomes much easier to test and validate the correctness of a pipeline.

## Datasets

For demonstration purposes, we will use the IMDb dataset, which is a dataset consisting of 50,000 movie reviews that are labeled as positive or negative `[6]`. It will help to demonstrate how an ODD can be implemented in a real world application using the proposed framework.

## Prompt engineering and pre-processing data with custom embeddings

## Fine-tuning a foundational model

## Post-processing results to enforce ODD

## Infrastructure to support end to end validation

## Sources: 

[1] James Currier, 2022. Generative AI Market Map and 5-Layer Tech Stack. URL: https://www.nfx.com/post/generative-ai-tech-5-layers#The-5--Layer-Generative-Tech-Stack
[2]
[3] Morgan Beller, 2023. The AI Startup Litmus Test. URL: https://www.nfx.com/post/ai-startup-litmus-test
[4] Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192. PMLR, 2017.
[5] Donggeun Yoo and In So Kweon. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 93–102, 2019.
[6] Andrew L. Maas et al., 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150. URL: https://www.aclweb.org/anthology/P11-1015.pdf