<img src="Images/brain.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 1. Introduction to Building AI Applications with Foundation Models
*AI Engineering*

----
The scaling up of AI models has two major consequences:
1. AI models are becoming more powerful and capable of more tasks, enabling more applications.
2. Emergence of *model as a service:* models developed by these few organizations are made available for others to use as a service.

AI engineering involves building applications using existing models. Before LLMs, AI already powered tools like product recommendations, fraud detection, and churn prediction—using data and machine learning to identify customers likely to cancel subscriptions or stop using a service. Today’s large-scale models expand these possibilities, and the AI engineer’s role has evolved beyond that of a traditional ML engineer.

<br/>
<img src="Images/brain.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 2. The Rise of AI Engineering
*AI Engineering*

----
#### A. History
Foundation models evolved from large language models, which began as basic language models in the 1950s. Though tools like ChatGPT and GitHub Copilot may seem sudden, they are the result of decades of technological progress, made scalable through *self-supervision.*

A language model encodes statistical information about a language, predicting the likelihood of a word based on context. For example, in the phrase "My favorite color is __," it would predict "blue" more often than "car."

The statistical nature of language dates back centuries. In 1905, Sherlock Holmes used basic English statistics in *The Adventure of the Dancing Men* to decode stick figures, deducing the most common figure represented the letter "E." Later, Claude Shannon advanced this idea during WWII to crack enemy codes, publishing his influential 1951 paper on English language modeling. Many concepts from this work, like entropy, are still used today. Originally, language models focused on one language, but now they can handle multiple languages.

#### B. Tokens
A language model's basic unit is a token, which can be a character, word, or part of a word (like "-tion"). For instance, GPT-4 splits "I can’t wait to build AI applications" into nine tokens, with "can’t" split into "can" and "’t." This process is called tokenization. On average, a token is about ¾ the length of a word, so 100 tokens equal roughly 75 words. The model's vocabulary consists of all possible tokens it can process. For example, Mixtral 8x7B has 32,000 tokens, while GPT-4 has 100,256. Tokenization and vocabulary size are set by model developers.

Language models use tokens instead of words or characters for three main reasons:

1. Tokens break words into meaningful components, like splitting "cooking" into "cook" and "ing."
2. Fewer unique tokens than words reduce the model's vocabulary size, improving efficiency.
3. Tokens help the model process unknown words, such as splitting "chatgpting" into "chatgpt" and "ing" to infer its structure.

Tokens offer a balance of fewer units than words while preserving more meaning than characters.

#### C. Types of language models
There are two main types of language models: masked and autoregressive.

* **Masked Language Models** predict missing tokens using context from both before and after the gap. For example, in “My favorite \_\_ is blue,” it might predict “color.” BERT is a well-known example. These models are used for tasks like sentiment analysis and code debugging, where context from both sides is important.

* **Autoregressive Language Models** predict the next token based only on preceding tokens. They generate text one token at a time, making them ideal for text generation. These models, like GPT, are more popular for generating content.

<img src="Images/language_models.png" alt="Language Models">

Language models are generative tools (hence the term *generative AI*) that complete text based on prompts, producing open-ended outputs through probabilistic predictions, which can be both exciting and frustrating.

#### D. Completion
Completion is a powerful tool that can be applied to tasks like translation, summarization, coding, and math problem-solving. For example, a prompt like “How are you in French is …” might be completed with “Comment ça va,” translating the phrase. Similarly, a prompt asking if an email is likely spam can be completed with “Likely spam,” turning the model into a spam classifier. However, completion isn't the same as conversation—if asked a question, the model might simply ask another question instead of answering.

#### E. Self-supervision
Different types of learning:
1. Supervised
2. Self-supervised: labels are inferred from the input data
3. Unsupervised

Language modeling is just one of many ML algorithms, there are others for object detection, topic modeling, recommender systems, weather forecasting, stock price prediction, etc. 

Language models stand out in machine learning because they can be trained with self-supervision, unlike many models that require expensive, labeled data (supervision). Self-supervision helps bypass the data labeling bottleneck, enabling models to scale. While supervised models, like fraud detection, are trained on labeled examples (e.g., "fraud" or "not fraud"), language models learn from vast amounts of unlabeled text. The success of AI in the 2010s, like AlexNet, relied on supervised learning, where the model classified over a million images into categories like "car" or "monkey."

A major drawback of supervised learning is the high cost and time required for data labeling, especially for large datasets. Labeling millions of images for ImageNet could cost millions, and some tasks, like medical diagnoses, would be prohibitively expensive. Self-supervision solves this by allowing models to infer labels from input data, eliminating the need for explicit labeling. In language modeling, for example, each sentence provides training samples by predicting tokens based on context, using markers like <BOS> and <EOS> to indicate the beginning and end of sequences. This enables more scalable and efficient learning.

<img src="Images/self_supervision.png" alt="Language Models">

Self-supervised learning allows language models to learn from text sequences without labeled data, leveraging the abundance of text in books, articles, and online content. This enables the scaling of language models into large language models (LLMs). The size of a model is measured by its parameters, which are variables updated during training. Larger models typically have more capacity to learn. For example, GPT's first model in 2018 had 117 million parameters, but by 2019, GPT-2 with 1.5 billion parameters made it seem small. Today, models with 100 billion parameters are considered large. 

> *Parameter* is a variable within an ML model that is updated through the training process. This often refers to the model weights & biases.

Larger models need more data to fully realize their potential, as their greater capacity requires more training data to maximize performance. Training a large model on a small dataset is inefficient, as smaller models could achieve similar results.

#### F. From Large Language Models to Foundation Models
Language models excel at tasks involving text but are limited in their ability to perceive the world like humans, who process data through vision, hearing, and touch. To address this, models like GPT-4V and Claude 3 are being extended to understand not just text but also images, videos, 3D assets, and other data modalities.

**Gemini and GPT-4V are better described as foundation models rather than LLMs, as they serve as a base for various AI applications and can be adapted for different needs.** These models represent a shift from traditional AI research, which was previously divided by data modalities—text (NLP), vision (computer vision), and audio (speech recognition and synthesis). Multimodal models, which can process multiple data types, are known as generative multimodal models (LMMs). Unlike text-only models, LMMs generate outputs based on both text and other modalities, like images.

Multimodal models, like language models, also require data to scale, and self-supervision works for them too. OpenAI's CLIP model used a variant called natural language supervision, training on 400 million (image, text) pairs found online instead of manually labeled data. This large, cost-effective dataset allowed CLIP to generalize across multiple image classification tasks without additional training, outperforming ImageNet's manually labeled dataset.

CLIP is an embedding model, not generative, designed to create joint embeddings of text and images to capture their meanings. Multimodal embedding models like CLIP serve as the foundation for generative multimodal models such as Flamingo, LLaVA, and Gemini (formerly Bard).

Foundation models, thanks to their scale and the way they are trained, are capable of a wide range of tasks. You can often tweak a general-purpose model to maximize its performance on a specific task. They also mark the transition from task-specific models to general-purpose
models.

<img src="Images/super_naturalinstructions_benchmark.png" alt="Language Models">

To get a model to generate desired outputs, you can use several common techniques:

* **Prompt Engineering**: Craft detailed instructions with examples to guide the model's responses.
* **Retrieval-Augmented Generation (RAG)**: Connect the model to a database (e.g., customer reviews) to improve its output.
* **Finetuning**: Further train the model on a specialized dataset (e.g., high-quality product descriptions).

Adapting an existing model is usually easier and faster than building one from scratch, saving time and resources (e.g., 10 examples and one weekend vs. 1 million examples and six months). Foundation models lower development costs and reduce time to market. The data needed to adapt a model depends on the technique used. While task-specific models can offer benefits like being smaller and more cost-effective, deciding whether to build or leverage an existing model is a key decision teams must make.

#### G. From Foundation Models to AI Engineering
AI engineering focuses on building applications using existing foundation models, whereas traditional ML engineering (or MLOps) involves developing new ML models. The shift to AI engineering reflects the trend of leveraging pre-built models rather than creating them from scratch.

The availability and accessibility of powerful foundation models lead to the rapid growth of AI engineering through:
1. Increasing general-purpose AI capabilities: models are powerful at existing tasks and can do more tasks.
2. Increased AI investments: The success of ChatGPT prompted a sharp increase in investments in AI, both from VCs and enterprises.
3. Low entrance barrier to building AI applications: models are exposed via APIs that receive user queries and return outputs. Without these APIs, using an AI model requires the infrastructure to host and serve. AI also makes it possible to build applications with minimal coding.

Because of the resources it takes to develop foundation models, this process is possible only for big corporations (Google, Meta, Microsoft, Baidu, Tencent), governments (Japan, the UAE), and ambitious, well-funded startups (OpenAI, Anthropic, Mistral).

Fun fact: as of September 16, 2024, the website theresanaiforthat.com lists 16,814 AIs for 14,688 tasks and 4,803 jobs! 

#### H. Applications
1. **Coding:** AI boosts productivity in software engineering, especially for tasks like documentation, code generation, and refactoring—improving output by up to 2x in some areas. It's particularly effective in frontend development but less so in backend work.
2. **Image and Video Production:** AI-generated profile pictures are now widely used on social media, with many people believing they enhance job prospects. While once banned for safety reasons (e.g., by Facebook in 2019), AI headshots are now mainstream, with platforms offering built-in tools by 2023.
3. **Marketing:** AI is being rapidly adopted in enterprise advertising and marketing. It can create promotional images and videos, assist with brainstorming and drafting content, and generate multiple ad variations for testing. AI can also tailor ads to different seasons and locations.
4. **Writing:** AI has long supported writing through tools like autocorrect and autocomplete, and its role has expanded significantly with large language models like ChatGPT, which excel at text generation. An MIT study found ChatGPT improved writing speed by 40% and quality by 18%, especially benefiting less confident writers. Consumers use AI to enhance emails, write essays, and generate interactive books, while tools like Grammarly and Google Docs offer AI-driven writing assistance. Businesses rely on AI for marketing, sales, reports, and SEO-optimized content, but this has also led to misuse—such as low-quality, AI-generated content farms flooding the web.
5. **Education:** As ChatGPT becomes integral to students’ academic work, its outages spark widespread complaints, highlighting its growing role in education. AI can personalize education by adapting materials to individual learning styles, generating quizzes, roleplaying scenarios, and even serving as debate partners or tutors. Though AI disrupts traditional education companies like Chegg, it also offers a powerful opportunity to accelerate and democratize skill acquisition.
6. **Conversational bots:** Serve roles from information retrieval and idea generation to companionship and therapy. They can emulate personalities, power digital relationships, and even simulate societies for research. In business, AI bots streamline customer service and assist with complex tasks like taxes or insurance claims. Intelligent NPCs can enhance gameplay and storytelling, transforming both existing games and enabling entirely new experiences.
7. **Information Aggregation:** AI can help by aggregating and summarizing information efficiently. According to Salesforce’s 2023 research, 74% of generative AI users rely on it to distill complex ideas. Applications now allow users to interact with documents conversationally—a use case known as "talk-to-your-docs"—and AI can also summarize websites, research, and generate reports. In businesses, AI aids in streamlining operations by reducing managerial workload, as seen with Instacart’s “Fast Breakdown” prompt, which summarizes communications into actionable tasks.
8. **Data Organization:** AI can generate text descriptions for images and videos, match visuals to text queries, and even create new images when none exist, as seen in services like Google Photos and Image Search. It excels at data analysis, enabling tasks like data visualization, outlier detection, and revenue forecasting. Enterprises use AI to extract structured data from unstructured sources—ranging from receipts and IDs to contracts and reports—improving organization and searchability. This capability, known as intelligent data processing (IDP), is projected to grow rapidly, reaching $12.81 billion by 2030.
9. **Workflow Automation:** AI should automate as much as possible to boost productivity. For individuals, it can handle routine tasks like booking restaurants, planning trips, or filling out forms. In enterprises, AI can streamline processes such as lead management, invoicing, and customer service. A key advancement is data synthesis, where AI helps generate and label data to improve its own models. Many tasks require access to external tools, which is where AI agents come in—autonomous systems that can plan and act on users’ behalf.

<br/>
<img src="Images/brain.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 3. Planning AI Applications
*AI Engineering*

----
It’s easy to build a cool demo with foundation models. It’s hard to create a profitable product.

#### A. Use Case Evaluation
**Reasons to Build an AI Application (from highest to lowest risk):**
1. **Existential Threat**: Competitors using AI could make your business obsolete; critical for industries like finance, insurance, and creative work.
2. **Missed Opportunities**: AI can boost profits and productivity across operations—marketing, customer support, sales, and more.
3. **Staying Competitive**: You’re unsure where AI fits yet, but want to stay ahead; investing in exploration may prevent falling behind.

**Additional Consideration:**
* If AI is critical to your business, consider building it in-house rather than outsourcing.

#### B. Roles of AI in Applications:
1. **Critical vs. Complementary**
   * *Critical*: App depends on AI to function (e.g., Face ID).
   * *Complementary*: App can function without AI, but AI enhances it (e.g., Gmail’s Smart Compose).
   * Higher AI reliability is needed when AI is critical.
2. **Reactive vs. Proactive**
   * *Reactive*: Responds to user actions (e.g., chatbots); speed often matters.
   * *Proactive*: Acts independently when relevant (e.g., Google Maps traffic alerts); must be high quality to avoid being intrusive.
3. **Dynamic vs. Static**
   * *Dynamic*: Continuously updates with user feedback; might mean that each user has their own model, continually finetuned on their data, or other mechanisms for personalization such as ChatGPT’s memory feature.
   * *Static*: Updates periodically or for groups of users; might have one model for a group of users. If that’s the case, these features are updated only when the shared model is updated.

#### C. Role of humans in Applications:
Microsoft (2023) proposed a framework for gradually increasing AI automation in products that they call Crawl-Walk-Run:
* Crawl means human involvement is mandatory.
* Walk means AI can directly interact with internal employees.
* Run means increased automation, potentially including direct AI interactions with external users.

The role of humans can change over time as the quality of the AI system improves. For example, in the beginning, when you’re still evaluating AI capabilities, you might use it to generate suggestions for human agents. If the acceptance rate by human agents is high, for example, 95% of AI-suggested responses to simple requests are used by human agents verbatim, you can let customers interact with AI directly for those simple requests.

#### D. What differentiates you from competitors?
In a way, building applications on top of foundation models means providing a layer on top of these models. This also means that if the underlying models expand in capabilities, the layer you provide might be subsumed by the models, rendering your application obsolete.

In AI, there are generally three types of competitive advantages: technology, data, and distribution (the ability to bring your product in front of users). 
* With foundation models, the core technologies of most companies will be similar.
* The distribution advantage likely belongs to big companies.
* The data advantage is more nuanced.

#### E. Setting Expectations
Judge the usefulness threshold of a product using metrics:
* Quality metrics: to measure the quality of the AI’s responses.
* Latency metrics: including TTFT (time to first token), TPOT (time per output token), and total latency. What is considered acceptable latency depends on your use case.
* Cost metrics: how much it costs per inference request.
* Other metrics: such as interpretability and fairness.

#### F. Milestone Planning
Building an AI demo is easy and fast, but turning it into a polished product is slow and difficult, with the final improvements taking the most time and effort. As seen in examples like UltraChat and LinkedIn, early progress (up to ~80%) can be fast, but refining the product from 80% to near-perfection (95–100%) is slow and difficult, often involving months of work fixing issues like hallucinations and fine-tuning performance.

#### G. Maintenance
AI product planning must include long-term maintenance and adaptability, as the AI field evolves rapidly. Using foundation models means committing to keeping up with constant change. Take note:
- Model *inference,* the process of computing an output given an input, is getting faster and cheaper.
- Without proper infrastructure for versioning and evaluation in place, the process can cause a lot of headaches.
- Evolving regulations around AI and IP pose risks, potentially threatening product ownership. IP-focused companies (e.g. games studios) are cautious about using AI due to concerns over future IP rights.

<br/>
<img src="Images/brain.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 4. The AI Engineering Stack
*AI Engineering*

----
#### A. Three Layers of the AI Stack
There are three layers to any AI application stack: application development, model development, and infrastructure:
- Application development: UI, prompt engineering, context construction, evaluation
- Model development: inference optimization, dataset engineering, modeling and training (finetuning), evaluation
- Infrastructure: compute management, data management, serving, monitoring

Foundation models are new, but building AI products still follows core principles: solve business problems, experiment systematically, optimize performance, and improve with feedback.

#### B. AI Engineering Versus ML Engineering
1. Model Use vs. Training: Unlike traditional ML, AI engineering focuses on adapting pre-trained foundation models rather than training new ones from scratch.
2. Compute Demands: AI engineering deals with larger, more compute-intensive models, requiring efficient optimization, more GPUs, and engineers skilled in managing large compute clusters.
3. Open-Ended Outputs: These models produce flexible, open-ended outputs, making them versatile but also harder to evaluate, which increases the importance of robust evaluation methods in AI engineering.

AI engineering focuses more on adapting and evaluating models than developing them. Model adaptation falls into two main types:

Prompt-based techniques (e.g., prompt engineering):
1. Don’t change model weights.
2. Use instructions or context to guide the model.
3. Easy to use, requires less data, and good for rapid experimentation.
4. May not suffice for complex or high-performance tasks.

Finetuning:
1. Involves updating model weights.
2. More complex and data-intensive.
3. Offers better performance, lower latency, and can handle new tasks.

#### C. Model development
Model development is the layer most commonly associated with traditional ML engineering. It has three main responsibilities: modeling and training, dataset engineering, and inference optimization.

Modeling and training involve designing model architectures, training, and finetuning using tools like Google's TensorFlow, Meta's PyTorch, and Hugging Face Transformers. This work requires deep ML knowledge—understanding algorithms, neural networks, and training concepts like gradient descent.

However, with foundation models now available, building AI applications no longer requires deep ML expertise—many succeed without it. Still, ML knowledge remains valuable for expanding capabilities and solving complex issues.

Training always involves changing model weights, but not all changes to model weights constitute training. For example, quantization, the process of reducing the precision of model weights, technically changes the model’s weight values but isn’t considered training.

Traininig:
1. Pre-training involves training a model from scratch with randomly initialized weights, typically using tasks like text completion. It is the most resource- and time-intensive part of training—InstructGPT, for example, used up to 98% of compute and data in this phase. Because it's costly and error-prone, few organizations do it, but experts in pre-training are highly sought after.
2. Finetuning is the process of continuing training on a pre-trained model. Since the model already has prior knowledge, finetuning is less resource-intensive than pre-training, requiring less data and compute.
3. Post-training and finetuning both refer to training a model after pre-training and are technically the same. However, the terms differ based on who does the training: Post-training is done by model developers (e.g., OpenAI) to improve general model behavior; Finetuning is done by application developers to adapt the model to specific tasks or needs.

#### D. Dataset engineering
Dataset engineering involves curating, generating, and annotating data for training AI models. Unlike traditional machine learning, which deals with close-ended tasks (e.g., spam detection) and structured data, AI engineering for foundation models handles open-ended tasks and unstructured data, making annotation more complex. Key tasks include deduplication, tokenization, context retrieval, and ensuring data quality. As models become commoditized, data is seen as the main differentiator. The amount of data needed varies by adaptation method (e.g., full training vs. finetuning vs. prompting).

#### E. Inference optimization
Inference optimization means making models faster and cheaper. Techniques include quantization, distillation, and parallelism.

#### F. Evaluation
Evaluation in AI helps manage risks and find improvements across the model adaptation process. It's especially challenging for foundation models due to their open-ended nature and varied outputs. The wide range of adaptation techniques also complicates fair comparisons, as performance can vary based on evaluation methods.

#### G. Prompt engineering and context construction
Prompt engineering guides AI behavior through input alone, without changing model weights. It can significantly impact performance, as shown in Gemini’s improved MMLU score. It involves crafting instructions and providing context or tools, especially for complex tasks.

#### H. AI interface
Some of the interfaces that are gaining popularity for AI applications:
- Standalone web, desktop, and mobile apps (e.g. Streamlit, Gradio, and Plotly Dash for building AI web apps).
- Browser extensions that let users quickly query AI models while browsing.
- Chatbots integrated into chat apps like Slack, Discord, WeChat, and WhatsApp.
- Many products, including VSCode, Shopify, and Microsoft 365, provide APIs that let developers integrate AI into their products as plug-ins and add-ons. These APIs can also be used by AI agents to interact with the world (e.g. MCP).

#### I. Summary of AI vs ML Engineering Workflows
<img src="Images/full_stack.png" alt="AI Engineering Versus Full-Stack Engineering">