## **Outline**

- Introduction to GAI and its application
- Generative Model Types

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Generative AI: A Year of Transformation**

<img src="./images/GAI1.png" width="800" align="center"/>

- Generative AI has gained extensive attention and investment in the past year
  - It can produce coherent text, images, code, and beyond-impressive outputs with just a simple textual prompt
- Generative AI goes beyond typical natural language processing (NLP) tasks

- Countless use cases:
  - Explaining complex algorithms.
  - Building bots.
  - Assisting in app development.
  - Explaining academic concepts.


- Fields undergoing transformation:
  - Animation.
  - Gaming.
  - Art.
  - Movies.
  - Architecture.
  - Coffee Industry (?)

#### The "Aha!" Moment in 2022

- Important questions:
  - Why now?
  - What's next?

<img src="./images/border.jpg" height="10" width="1500" align="center"/>



## **What is Generative AI?**

<img src="./images/GAI8.webp" width="800" align="center"/>


- Generative AI is a subfield of **machine learning**.
- It involves training AI models on **real-world data**.
- These models **generate new content** like text, images, and code.
- Comparable to **what humans would create**.


#### How Generative AI Works

- Training algorithms on large datasets.
  - Identifying and learning patterns.
    - **Neural networks** learn these patterns.
      - **Generate new data** following the learned patterns.

#### Generative AI in Natural Language Processing (NLP)

- Generative AI in NLP processes a vast corpus.
  - Responds to prompts based on learned probabilities.
    - Examples: Autocomplete and advanced models like ChatGPT and DALL-E.
  - Utilizes different model architectures.


<img src="./images/border.jpg" height="10" width="1500" align="center"/>


##  **General Components for Generative AI**

1. **Input Data:**
   - Generative AI models start with input data, which can be in various forms such as text, images, or structured data.

2. **Encoder (Optional):**
   - Some models use an encoder to transform input data into a suitable representation, especially in sequence-to-sequence models.

3. **Generator (Decoder):**
   - The core of the generative model is the generator or decoder, which generates new data based on learned patterns.
   - This component typically consists of neural network layers.

4. **Latent Space (Optional):**
   - In certain models like VAEs and GANs, a latent space is used to represent data in a compressed form.
   - The latent space is learned during training and can be sampled for generating new data.

5. **Loss Function:**
   - Generative models use a loss function to measure the difference between generated data and target data.
   - The model aims to minimize this loss during training to improve its generative capabilities.

6. **Training Data:**
   - Generative models are trained on a dataset containing examples of the data they are supposed to generate.
   - The model learns from this data to capture underlying patterns.

7. **Optimizer:**
   - Optimization algorithms like SGD or Adam are used to update the model's parameters during training to minimize the loss function.

8. **Sampling:**
   - Once trained, the model can generate new data by sampling from the learned distribution in the latent space or directly from the generator.

9. **Output Data:**
   - The generated data is the final output of the generative AI model, and it can be in the same or a different format as the input data, depending on the task.



<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **History of Generative AI**

#### Early Generative AI - Eliza

- Eliza chatbot developed in 1966 by Joseph Weizenbaum.
- Early implementations used rules-based approaches.
- Eliza had a limited vocabulary, lacked context, and overrelied on patterns.
- These limitations led to frequent breakdowns.
- **Challenges**
  - Customization and expansion of early chatbots were challenging
  - They struggled to adapt to different user inputs.
  - Lack of context made meaningful conversations difficult.
  - These limitations hindered their practical use.

<img src="./images/ELIZA_conversation.png" width="800" align="center"/>

#### **Recent Progress in Generative AI**

- Key Factors in Recent Success
  - Deep learning's three critical components:
    - **Scaling models**: Larger and more complex architectures.
    - **Large datasets**: Abundant real-world data for training.
    - Increased **compute power**: Faster training and more sophisticated models.
- These factors work together to drive the generative AI revolution.

<img src="./images/whynow.png" width="800" align="center"/>



<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **GPUs and their Application to Machine Learning**
-  GPUs vs. CPUs
   - GPUs designed for parallel processing.
   - Ideal for computationally intensive tasks.
   - Thousands of smaller cores for simultaneous processing.
   - Contrasts with CPUs focused on sequential processing.

- GPUs excel in training deep neural networks.
- **Parallelism** speeds up training significantly.
- Enables the handling of large, complex networks.

#### Beyond Training

- GPUs not limited to training.
- Used for inference and real-time applications.
- Widely adopted in industries like healthcare, finance, and gaming.

<img src="./images/Nvidia_CUDA_Logo.jpg" width="500" align="center"/>

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **AlexNet — 2012 — The Deep Learning Revolution**
- Deep learning and CNNs led the charge.
- CNNs (Convolutional Neural Networks) existed since the 1990s.
  - Layers in a CNN
    - Convolutional layers
      - filters to detect various features.
    - Pooling layers
      - Reduce the spatial dimensions to preserve important information.
    - Fully connected layers
      - Often used for classification

<img src="./images/CNN.jpeg" width="500" align="center"/>

- Previously impractical due to intensive computing requirements.

- https://poloclub.github.io/cnn-explainer/

- In 2012, AlexNet emerged.
  - A CNN model trained on GPUs and ImageNet data.
    - An astonishing 11% performance gap with the runner-up!

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Transformers: Attention Is All You Need (Google) — 2017**

- Deep learning lacked in natural language processing (NLP)
- NLP not just about translation or classification
- The challenge was coherent conversations with humans

- **RNN (Recurrent Neural Network)**
  - A type of neural network designed for sequential data.
  - Processes data with loops, allowing information persistence.
    - **How it works:**
        - Takes input at each time step.
        - Maintains a hidden state that captures previous information.

- **LSTM (Long Short-Term Memory)**
  - A type of RNN designed to address the vanishing gradient problem.
  - Keeps long-term dependencies in sequential data.
    - **How it works:**
        - Similar to RNN but with specialized memory cells.
        - Has gates (input, forget, output) to control information flow.
        - Can store, read, and write information selectively.
    - **Advantages:**
        - Handles long-term dependencies effectively.
        - Better at capturing and retaining sequential patterns.


<img src="./images/RNNLSTM.png"  width="400" align="center"/>


- Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks were early staples in natural language processing and time series analysis.

**Limitations**
- Proficient at short sequences but struggled with **longer text**.
- Couldn't capture complex ideas in extended text.

## **Transformers**

- Google introduced the "Transformer" model in 2017.
- Presented in the groundbreaking paper "Attention Is All You Need."
- A milestone that revolutionized translation problems.


#### The Power of Attention

- "Attention" mechanism - a neural network game-changer.
- Allows analyzing the entire input sequence.
- Determines relevance to each component of the output.
- Transforms NLP and many other AI domains.


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## Before Transformers:

In [1]:
%pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [5]:
import torch
import torch.nn as nn
import numpy as np

# Sample text data
text = "Startbucks is one of the most popular companies that provides coffee!"

# Preprocess the text and create sequences
tokens = text.split()
word_to_idx = {word: idx for idx, word in enumerate(tokens)}
idx_to_word = {idx: word for word, idx in word_to_idx.items()}
seq_length = 5

data = []
for i in range(len(tokens) - seq_length):
    seq_in = tokens[i:i + seq_length]
    seq_out = tokens[i + seq_length]
    data.append((seq_in, seq_out))

# Define an RNN-based text generation model
class RNNTextGenerator(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim):
        super(RNNTextGenerator, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x, hidden):
        x = self.embeddings(x)
        x, hidden = self.rnn(x, hidden)
        x = self.fc(x)
        return x, hidden

# Hyperparameters
vocab_size = len(tokens)
embedding_dim = 10
hidden_dim = 50
learning_rate = 0.01
num_epochs = 100

# Create and train the RNN model
model_rnn = RNNTextGenerator(vocab_size, embedding_dim, hidden_dim)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_rnn.parameters(), lr=learning_rate)

for epoch in range(num_epochs):
    for seq_in, seq_out in data:
        seq_in_idx = torch.tensor([word_to_idx[word] for word in seq_in], dtype=torch.long)
        seq_out_idx = torch.tensor([word_to_idx[seq_out]], dtype=torch.long)

        optimizer.zero_grad()
        hidden = None
        for i in range(seq_length):
            output, hidden = model_rnn(seq_in_idx[i].view(1, -1), hidden)
        
        loss = criterion(output.view(1, -1), seq_out_idx)
        loss.backward()
        optimizer.step()

# Generate text using the RNN model
seed_text = "Startbucks"
predicted_text = seed_text
hidden = None
for _ in range(10):
    seq_in_idx = torch.tensor([word_to_idx[word] for word in seed_text.split()], dtype=torch.long)
    output, hidden = model_rnn(seq_in_idx[-1].view(1, -1), hidden)
    predicted_word_idx = torch.argmax(output).item()
    predicted_word = idx_to_word[predicted_word_idx]
    predicted_text += " " + predicted_word
    seed_text += " " + predicted_word
print("********")
print(predicted_text)


********
Startbucks most popular companies that provides coffee! coffee! coffee! coffee! coffee!


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## After Transformers:

In [6]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Set the device to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Generate text using the Transformer-based model
input_text = "There are"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
output = model.generate(input_ids, max_length=30, num_return_sequences=1)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("********")
print(generated_text)


  from .autonotebook import tqdm as notebook_tqdm
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


********
There are many ways to get around the law.

The first is to get a license.

The second is to get a license from


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Transformers Beyond Translation**

- State-of-the-art models for numerous NLP tasks.
- Recently, Transformers made waves in computer vision.

- Impacts on NLP
  - Fostered advancements in conversational AI.
  - Enabled applications in chatbots, virtual assistants, and more.


<img src="./images/NLPEvolution.png"  align="center"/>

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Next Word Prediction, Scale, and Fine Tuning — BERT (Google) and GPT (OpenAI) Family — 2018**

- AI needed to understand **language beyond translation**.
- BERT and GPT addressed this crucial gap.

#### Introducing BERT

- BERT (Bidirectional Encoder Representations from Transformers).
- Google's approach to contextual language understanding.
- Trained on vast amounts of text to predict missing words.
- BERT's Impact
  - Achieved remarkable results in sentiment analysis, question answering, and more.
  - Contextual embeddings revolutionized language understanding.

#### GPT - A Different Approach

- GPT (Generative Pre-trained Transformer) by OpenAI.
- Focus on autoregressive language modeling.
  - Learning to generate text one word at a time.
- GPT's Language Generation
  - GPT-2's surprising ability to generate coherent text.
  - Human-like responses in chatbots and text generation.
  - Demonstrated the power of pre-trained models.


<img src="./images/GPT.jpeg" width="500" align="center"/>
---

#### Scaling Challenges

- Collecting quality training data remained a challenge.
- ImageNet required meticulous labeling of thousands of images.
- Text datasets for language tasks were equally demanding.

#### GPT-3: Scaling New Heights

- OpenAI introduced GPT-3 with 175 billion parameters.
- The largest and most powerful language model to date.

#### Fine Tuning - Customizing Models

- Fine tuning adapts large models to specific tasks.
- Cost-effective compared to training from scratch.
- Application in fields like healthcare, finance, and more.
- Examples
  - Fine-tuned models for medical document processing.
  - Improved accuracy in identifying medical conditions.
  - OpenAI's partnership with Microsoft for domain-specific AI.


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## Sentiment Analysis with and without BERT

In [11]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
import pandas as pd
from sklearn.model_selection import train_test_split


# Load the dataset from the CSV file
df = pd.read_csv('movie_reviews.csv')

# Split the dataset into training and testing sets
X = df['review']
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Traditional Approach: TF-IDF + Logistic Regression
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

lr_classifier = LogisticRegression()
lr_classifier.fit(X_train_tfidf, y_train)

sample_review = ["The movie was disappointing. The acting was mediocre, and the plot lacked depth. I would not recommend it."]

# Transform the sample review using TF-IDF
sample_review_tfidf = tfidf_vectorizer.transform(sample_review)

# Predict the sentiment for the sample review
sample_predicted_sentiment_lr = lr_classifier.predict(sample_review_tfidf)
sample_sentiment = "Positive" if sample_predicted_sentiment_lr[0] == 'positive' else "Negative"
print(f"Sentiment Prediction (TF-IDF + Logistic Regression): {sample_sentiment}")

# BERT-based Approach
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

results = nlp(sample_review)
predicted_sentiment_bert = results[0]['label']
print(f"Sentiment (BERT): {predicted_sentiment_bert}")


Sentiment Prediction (TF-IDF + Logistic Regression): Positive


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Sentiment (BERT): LABEL_1


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **The Challenge of Interaction**

- They focused on predicting the next word.
- Interacting with Language Models (LLMs) was challenging.
- Difficulties in following human instructions.

- **Instruction Tuning Unveiled**
  - Fine-tuning LLMs to follow human instructions
  - Enhanced interaction and task performance

** Benefits of Instruction Tuning**

- Increased accuracy and capabilities of LLMs.
- Alignment with human values.
- Prevention of undesired or dangerous content.


## **The Arrival of ChatGPT**

- ChatGPT: A milestone in Generative AI.
- Reorganized instruction tuning into a dialogue format.
- User-friendly interface for AI interaction.


## **Popular LLMs**
<img src="./images/GAI2.webp" width="1000" align="center"/>


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **OpenAI’s GPT Models**

<img src="./images/GAI3.webp" width="1000" align="center"/>

## **Task specific**

<img src="./images/GAI4.webp" width="1000" align="center"/>


<img src="./images/GAI5.webp" width="1000" align="center"/>

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Google AI and the Pathways Language Model (PaLM)**

<img src="./images/PALM.jpeg" width="500" align="center"/>

- Google's largest publicly disclosed model.
  - PaLM serves as a foundation model.
- Used in various Google projects.
  - Sscale up to 540 billion parameters.
- Trained on 780 billion tokens.
- A substantial leap beyond GPT-3.

#### Training Data

- Self-supervised learning with a diverse text corpus.
- Multilingual web pages, books, code repositories, and more.

#### PaLM's Performance

- PaLM's exceptional few-shot performance.
- Outperforming prior larger models like GPT-3.


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **DeepMind’s Chinchilla Model**

<img src="./images/Deepmind.webp" width="500" align="center"/>

- DeepMind Founded in 2010.
- Acquired by Google in 2014, now a subsidiary of Alphabet Inc.
  - DeepMind's pursuit of replicating human short-term memory.
  - Creation of a Neural Turing Machine.
  - A step towards understanding memory in AI.

- AlphaZaro
  - Competence achieved through reinforcement learning.

- AlphaFold's advances in protein folding.
  - Predicting over 200 million protein structures.
  - Revolutionizing the field of biology.

#### Flamingo - Describing Images


- In April 2022, DeepMind launched Flamingo.
  - A single visual language model capable of describing any picture.
  - Advancing AI's understanding of visual content.
  - https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model


#### Chinchilla AI - Outperforming GPT-3

- DeepMind's Chinchilla AI introduced in March 2022.
- Outperforming GPT-3.
- How 
  - Chinchilla boasts 70B parameters.
  - Trained on 1,400 tokens, 4.7x more than GPT-3.
- Significant benefits for inference costs.
  - Outperforming other large language model platforms.


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Meta AI (formerly FAIR)**

- FAIR, or Facebook Artificial Intelligence Research.
- A laboratory focused on open-source AI frameworks.


#### PyText - Advancing NLP

- In 2018, FAIR released PyText.
- A modeling framework for NLP systems.


#### Galactica - Assisting Scientists

- November 2022: Meta's Galactica.
- Assists scientists with tasks like summarizing papers and annotating molecules.
- Bridging the gap between AI and scientific research.


## **LLaMA - Large Language Model Meta AI**

<img src="./images/llama.jpeg" width="500" align="center"/>

- Released in February 2023.
- A foundational transformer-based language model.
- Aimed at advancing AI research and academic exploration.
- Responsible AI
  - LLaMA models released under non-commercial licenses.
  - Preventing misuse while promoting responsible AI.
  - Access granted to select researchers and organizations.
- Parameters
  - from 7 billion to 65 billion parameters.
  - Comparing LLaMA-65B to Chinchilla and PaLM.
- Training Data
  - LLaMA models trained on 1.4 trillion tokens in 20 languages.
  - Leveraging publicly available unlabeled data.
  - Data sources include CCNet, GitHub, Wikipedia, ArXiv, Stack Exchange, and books.

- Challenges
  - LLaMA's performance varies across languages.
  - Challenges related to bias, toxicity, and hallucination.


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Anthropic and the Claude Chatbot**

<img src="./images/claude.png" width="500" align="center"/>

- Anthropic: An AI startup and public benefit corporation.
- Founded in 2021 by Daniela Amodei and Dario Amodei, former OpenAI members.
- A focus on responsible AI and interpretability.


#### Claude Chatbot

- Introducing Claude, Anthropic's conversational large language model.
- Using **constitutional AI** for better alignment with human intentions.
- Claude Models
  - Claude comes in two versions: Claude-v1 and Claude Instant.
  - Claude-v1 for complex dialogues and creative content.
  - Claude Instant for casual conversations and summarization.

#### Limitations and Concerns

- Claude's limitations in math and programming.
- Occasional hallucinations and dubious instructions.
- Concerns about clever prompting bypassing safety features.

**Availability and Integration**

- Claude's media embargo lifted in January 2023.
- Integration with Discord Juni Tutor Bot and various platforms.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Open Source Efforts in AI and Machine Learning**



| Model Family Name | Created By | Sizes | Focus | Foundation or Fine-Tuned | License | What’s Interesting | Architectural Notes |
|-------------------|------------|-------|-------|--------------------------|---------|-------------------|--------------------|
| LLaMA | Meta | 7B, 13B, 32B, 65.2B | Varied | Foundation | Non-commercial | Basis for numerous fine-tuned variants | SwiGLU activation instead of ReLU |
| LLaMA 2 | Meta with Microsoft | 7B, 13B, 70B | Chat | Foundation | Commercial | Balances safety and helpfulness better than OpenAI's models | SwiGLU activation, RoPE over traditional embeddings |
| Alpaca | Stanford’s CRFM | 7B | Instruction following | Fine-tuned LLaMA 7B | Non-commercial | Trained on text-davinci-003 examples | - |
| Vicuna | LMSYS | 7B, 13B | Chat | Fine-tuned LLaMA 13B | Non-commercial | Utilizes conversations from ShareGPT.com for training | - |
| Guanaco | KBlueLeaf | 7B | Instruction following | Fine-tuned LLaMA 7B (parameter efficient) | Non-commercial | Fine-tuned using QLoRA | - |
| RedPajama | Multiple collaborators | 3B, 7B | Chat, Instruction following | Foundation | Commercial | Uses the fully open RedPajama dataset following the LLaMA training recipe | Modifications on the Pythia architecture |
| Falcon | Technology Innovation Institute of UAE | 7B, 40B | Varied | Foundation | Commercial | Features a 2D parallelism strategy and ZeRo optimization for efficient training | FlashAttention and Multi-query Attention techniques |
| Flan-T5 | Google | Various, up to 11B | Varied | Foundation | Commercial | Trained on a massive collection of datasets, tasks, and task categories | Based on the T5 encoder-decoder structure |
| Stable Beluga 2 (Freewilly) | Stability AI | 70B | Varied | Fine-tuned LLaMA 2 70B | Non-commercial | Uses a modified Orca approach for high-quality example generation | - |
| MPT | MosaicML | Up to 30B | Varied including story writing | Foundation | Commercial | Capable of generating extremely long texts (up to 84k tokens) with specific configurations | Features FlashAttention |


<img src="./images/border.jpg" height="10" width="1500" align="center"/>


<img src="./images/GAI6.png" width="1000" align="center"/>

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Image Generation: Dall-E | MidJourney | Stable Diffusion | DreamStudio**


<img src="./images/GAI7.webp" width="1000" align="center"/>

# AI Innovations in Creativity


#### DALL-E

**DALL-E** - *By OpenAI*

- Generates images from textual descriptions.
- Combines "Dali" and "Wall-E."
- Example: "a two-story pink house shaped like a shoe."
- **Limitation:** Limited to predefined concepts, potential for misinterpretation.

![DALL-E](images/dalle.webp)


#### Midjourney

**Midjourney** - *Artificial Intelligence for Exploration*

- Enhances exploration in robotics and space missions.
- Facilitates autonomous decision-making.
- A step towards AI-driven exploration.
- **Limitation:** Dependency on data quality, computational resources.

![Midjourney](images/midjourney.png)


#### Stable Diffusion

**Stable Diffusion** - *Generative Model Training*

- A technique for training generative models.
- Improves stability and quality.
- Used in GANs and AI art generation.
- **Limitation:** Requires extensive computational power and time.

![Stable Diffusion](images/stable.jpeg)


#### DreamStudio

**DreamStudio** - *AI-Enhanced Creative Tools*

- Empowers artists with AI-generated content.
- Seamlessly integrates AI into the creative process.
- Enables new forms of artistic expression.
- **Limitation:** May raise concerns about AI's role in art creation.

![DreamStudio](images/dreamstudio.jpeg)


# Comparison Table

| Innovation     | Description                                | Application                   | Limitation                           | Link                                       |
|----------------|--------------------------------------------|--------------------------------|--------------------------------------|--------------------------------------------|
| DALL-E         | Generates images from text                | Art, Design, Visual Creativity  | Limited to predefined concepts      | [Link](https://www.openai.com/research/dall-e/) |
| Midjourney     | Enhances exploration in robotics           | Space Missions, Autonomous Exploration | Data quality, computational resources | [Link](https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F) |
| Stable Diffusion | Training technique for generative models  | Generative Art, GANs          | High computational requirements     | [Link](https://stablediffusionweb.com/) |
| DreamStudio    | AI-enhanced creative tools                | Visual Arts, Design, Creativity | Ethical/artistic concerns            | [Link](https://beta.dreamstudio.ai/generate) |


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Alpaca**

- **Model Origin**: Fine-tuned from LLaMA 7B
- **Training Data**: 52K instruction-following demonstrations
- **Comparison**: Similar behavior to OpenAI’s text-davinci-003
- **Cost**: <$600 for reproduction
- **Code**: [GitHub.com/Stanford-Alpaca/Alpaca7B](#)

- Powerful instruction-following models:
    - GPT-3.5 (text-davinci-003)
    - ChatGPT
    - Claude
    - Bing Chat
- **Challenges**:
    - Generation of false information
    - Propagation of stereotypes
    - Toxic language generation

#### Alpaca Model Details
- **Purpose**: Addressing deficiencies in instruction-following models
- **Base**: Meta’s LLaMA 7B model
- **Training Data**: 52K instructions generated using text-davinci-003
- **Behavior**: Similar to text-davinci-003
- **Cost**: Surprisingly low


#### Training Recipe
- **Challenges**:
    1. Pretrained language model quality
    2. High-quality instruction data
- **Solution**: Meta’s new LLaMA models & self-instruct method
- **Training Details**: Fine-tuned LLaMA 7B on 52K demonstrations from text-davinci-003
- **Data Cost**: <$500 using OpenAI API

#### Preliminary Evaluation
- **Method**: Human evaluation on self-instruct evaluation set
- **Comparison**: Blind pairwise comparison between text-davinci-003 & Alpaca 7B
- **Results**: Alpaca and text-davinci-003 had very similar performance
- **Demo**: Interactive testing of Alpaca model



<img src="./images/alpaca.jpeg" width="800" align="center"/>

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Recap: Language Models & LLMs**
- **Highlight**: Advanced instruction-following models.
- **Examples**:
    - Alpaca 7B
    - ChatGPT 
    - Claude

- **Challenges**: 
    - Generation of **false information**
    - Propagation of **stereotypes**
    - Toxic language generation

- **Opportunities**: 
    - Seamless human-computer interaction
    - Enhanced content generation
    - Automation of complex text-based tasks


<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Going back to Generative AI**
- **Definition**: AI models designed to **produce** new content.
- **Scope**: Beyond text! Think images, videos, music, designs.
- **Objective**: Generate content nearly indistinguishable from what humans can produce.

#### Key Players in Generative AI
1. **Generative Adversarial Networks (GANs)**
2. **Variational Autoencoders (VAEs)**


## **GANs**
- **Concept**: Duel between two networks.
    - **Generator**: Crafts fake data.
    - **Discriminator**: Sifts real from fake.
- **Training Dynamics**: 
    - Generator crafts better fakes.
    - Discriminator refines its discernment.
- **Applications**:
    - Art creation (e.g., DeepArt)
    - Image super-resolution (e.g., SRGAN)
    - Generating faces (e.g., NVIDIA's FaceGAN)

## **VAEs**
- **Philosophy**: Compress data, then rebuild it.
- **Key Mechanism**: Introduces randomness during compression.
- **Benefits**:
    - Structured latent space for generation.
    - More consistent generation than GANs.
- **Applications**:
    - Image denoising
    - Content interpolation (e.g., morphing one image into another)
    - Generating art with unique styles

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **GANs vs VAEs: Quick Comparison**
- **Training Stability**:
    - GANs can be trickier to train due to the dueling nature.
    - VAEs generally offer stable training but might not achieve the same level of detail as GANs.
- **Generated Data Quality**:
    - GANs often produce sharper images.
    - VAEs might produce smoother or blurrier images.
- **Applications**:
    - GANs dominate in tasks requiring high-resolution outputs.
    - VAEs shine where a structured latent space or consistent output is vital.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **GANs**

- Traditional Machine Learning
  - Examines a complex input, like an **image**.
  - Produces a simple output, like a **label** ("cat").

- Generative Models
  - Takes a small piece of input, perhaps a few random numbers.
  - Produces a complex output, like an image of a **realistic-looking face**.

- Generative Adversarial Network (GAN)
  - An effective type of **generative model**.
  - Introduced only a few years ago.

- Why Use GANs?
  1. **Intellectual Challenge**: Crafting systems that can generate realistic data.
  2. **Practical Applications**: From creating art to enhancing blurry images.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **How does a GAN work?**

- GANs turn the seemingly impossible goal into reality.
- They utilize two key tricks.

- Using Randomness as an Ingredient

    1. **Variety**: Avoids producing the same output each time.
    2. **Mathematical Framework**: Translates image generation into probabilities.
    3. **Probability Distribution on Images**: Determines which images are likely to be faces.

- Neural Networks and Image Generation
  - Modeling a function on a high-dimensional space.
  - Neural networks excel at this kind of problem.

- The Big Insight: Contest!

    - The **"adversarial"** part of GAN.
    - Two competing networks:
    1. **Generator**: Creates random synthetic outputs.
    2. **Discriminator**: Distinguishes real outputs from synthetic ones.
    
- The Adversarial Duel
  - The two networks face off.
  - They improve together, aiming for a generator that creates realistic outputs.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

<img src="./images/GANLab.png" width="500" align="center"/>

- https://poloclub.github.io/ganlab/
  - GANs are complex systems.
  - This visualization simplifies GAN mechanics for clarity.


- Simplified GAN Mechanics

  - Instead of realistic images, the focus is on a **distribution of points in 2D**.
  - Easier to understand the mechanics in plain old (x,y) space.

- Intruction
  - Choose a probability distribution for the GAN to learn.
  - Visualized as a **set of data samples**.



<img src="./images/border.jpg" height="10" width="1500" align="center"/>


# Variational Autoencoder (VAE) to Generate Images


- What is an Autoencoder?
  - Transforms input to a **lower dimensional space** (encoding step).
  - Reconstructs input from the lower-dimensional representation (decoding step).

--  Visual Representation

<img src="./images/VAE1.png" width="500" align="center"/>


- To generate new images using a VAE, input random vectors to the decoder.

<img src="./images/VAE2.png" width="500" align="center"/>


- VAE vs Regular Autoencoders

  1. Imposes a **probability distribution** on the latent space.
  2. Learns the distribution so the outputs match the observed data.
  3. Latent outputs are randomly sampled from the distribution learned by the encoder.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>