# **Generative AI Personal Notes**

# **Introduction to Generative AI**

## **Definition of Generative AI**
Generative AI is basically any kind of AI that can *create* new content rather than just analyzing existing data. This could be text, images, music, code, or even video. Unlike traditional AI that mostly classifies, predicts, or detects patterns, generative AI tries to *imitate* the process of human creativity. Think of it as teaching a machine to “imagine” something new based on what it has learned. The cool part is that it doesn’t just copy what it’s seen—it generates something original, though often influenced by the training data.  

## **Brief History and Evolution**
Generative AI isn’t exactly brand new, but it has exploded in the past decade. Some key milestones:  
- Early work: Back in the 1990s and 2000s, we had basic probabilistic models and some neural networks that could generate simple patterns.  
- GANs: In 2014, Generative Adversarial Networks (GANs) came out, which was a huge game-changer for generating realistic images. The “adversarial” part means two networks compete—the generator tries to create content, the discriminator tries to spot fakes. This tug-of-war leads to surprisingly realistic outputs.  
- Transformers & LLMs: Around 2017, the Transformer architecture arrived, making it much easier to handle sequential data like text. That gave rise to models like GPT, which can write paragraphs, answer questions, or even code.  
- Multi-modal models: More recently, AI started handling multiple types of data at once—images, text, audio—blurring the lines between creative fields.  

So basically, the field went from basic pattern generation → convincing images → sophisticated language → multi-modal creativity. The pace has been insane, and every year there’s something new that feels “sci-fi level.”  

## **Main Application Areas**
Generative AI is everywhere now, even if you don’t notice it. Some big categories:  
- **Text & Language**: Chatbots, content creation, summarization, translation, coding assistants. Basically anything involving language can be augmented.  
- **Images & Art**: AI-generated art, photo editing, meme creation, even fashion design. Tools like DALL·E or MidJourney fall here.  
- **Audio & Music**: AI can compose music, create realistic voiceovers, or generate sound effects.  
- **Video & Animation**: Generating short clips, animating characters, deepfake-style video editing. Still harder than text or images, but improving fast.  
- **Science & Research**: Drug discovery, protein folding, chemical simulation, generating hypotheses or datasets.  
- **Business & Productivity**: Automated reports, personalized marketing content, document drafting, and customer support.  

In short, any field where creativity, simulation, or content production is important is being touched by generative AI. It’s not perfect, but the rate of improvement is crazy.  

# **Technical Fundamentals**
## **Types of Generative Models**
### **Autoregressive Models**
### **Variational Autoencoders (VAE)**
### **Generative Adversarial Networks (GAN)**
### **Diffusion Models**
- For text-to-image and multimodal generation
- Produces high-quality outputs by iteratively denoising

---

## **Fine-tuning**

Fine-tuning is the process of taking a pretrained Large Language Model (LLM) and adapting it to a **specific task or domain**. While pretraining gives the model a broad understanding of language, fine-tuning focuses it on particular use cases, improving accuracy and relevance.

---

### **Why Fine-tuning is Needed**
Pretrained LLMs are general-purpose—they know grammar, syntax, and general facts, but they may not perform optimally for specialized tasks like legal document analysis, medical question answering, or customer support automation. Fine-tuning helps the model learn task-specific patterns and terminology.

---

### **Methods of Fine-tuning**

1. **Full Fine-tuning**  
   - All model parameters are updated using task-specific data.  
   - Effective for specialized domains but computationally expensive, especially for very large models.  
   - Requires **high-end GPUs or TPUs** for training, and the process can take hours or days depending on model size.  
   - **Example:** Training GPT on a legal corpus to generate legal contracts.

2. **Parameter-Efficient Fine-tuning**  
   - Only a subset of the model parameters are updated, using methods such as **adapters**, **LoRA (Low-Rank Adaptation)**, or **prefix tuning**.  
   - Reduces compute costs and memory requirements while maintaining strong performance.  
   - **Example:** Fine-tuning a 70B-parameter LLaMA model for medical question answering without retraining all parameters.

3. **Reinforcement Learning from Human Feedback (RLHF)**  
   - Combines supervised fine-tuning with human preferences.  
   - The model learns to generate outputs aligned with desired behaviors, improving safety, helpfulness, and alignment.  
   - **Example:** ChatGPT is fine-tuned using RLHF to provide more useful, polite, and factual responses.

---

### **Challenges and Considerations**

- **Compute Requirements**: Full fine-tuning of large models can be extremely resource-intensive, requiring **powerful GPUs or TPUs**, large memory, and long training times.  
- **Data Sensitivity**: Fine-tuned models are only as good as the data they were trained on. If the task or domain data changes, the model may need to be **re-fine-tuned**, which can be costly and time-consuming.  
- **Maintenance Overhead**: Continually updating or expanding the model for new data or tasks can become a recurring effort.

---

### **Steps in Fine-tuning**
1. **Dataset Preparation**: Curate a high-quality dataset with examples relevant to the task.  
2. **Training**: Update model parameters using the dataset, adjusting learning rates and regularization.  
3. **Validation**: Evaluate performance on unseen examples to prevent overfitting.  
4. **Deployment**: Integrate the fine-tuned model into applications for inference.

---

# **Generative Language Models**

# **LLM Applications**
## **Chatbots and Virtual Assistants**
## **Content Creation**
## **Code Generation**
## **Summarization and Translation**
## **Data Analysis and Insights**
## **Prompt Engineering**
### **Basics of Prompt Design**
### **Temperature and Creativity**
## **Decoding / Generation Strategies**
### **Greedy Search**
### **Beam Search**
### **Top-k Sampling**
### **Top-p (Nucleus Sampling)**
### **Frequency and Presence Penalties**
### **Max Tokens**

## **Prompting and Model Control**
- Prompt engineering
- Template placeholders and runtime variables
- Emphasizing specific vocabulary vs general knowledge
## **Text Generation Parameters**
temperature, tok-p, max tokens, stop sequences, penalties

---

## **Retrieval-Augmented Generation (RAG)**

Retrieval-Augmented Generation (RAG) is a method that combines the generative capabilities of Large Language Models (LLMs) with external knowledge retrieval. This allows the model to provide answers that are more accurate, grounded, and relevant, using up-to-date or domain-specific information at inference time. Unlike standard LLM outputs, which rely solely on what the model learned during pretraining, RAG dynamically incorporates external knowledge without retraining.

---

### **How RAG Differs from Prompt Engineering and Fine-Tuning**

- **Prompt Engineering**: Adjusts the input text to guide the model’s responses, but the model is still limited to its pretrained knowledge.  
- **Fine-Tuning**: Updates model parameters on task-specific data, improving performance for specialized tasks, but requires compute resources and retraining.  
- **RAG**: Queries an external knowledge base in real time, feeding the retrieved information into the model. This makes the system flexible, scalable, and able to use current knowledge without retraining.

---

### **Core Components of RAG**

1. **Indexing**  
   - Documents or knowledge sources are preprocessed and organized into an index for efficient retrieval.  
   - The index acts as a map of the content, storing embeddings, metadata, or keywords for quick access.  
   - **Example:** Research papers, FAQs, or company manuals organized for fast lookup.

2. **Embedding Representations**  
   - Queries and document chunks are converted into numerical vectors that capture their meaning.  
   - Similarity between embeddings identifies which documents are most relevant to the query.  
   - **Example:** Using cosine similarity to retrieve the five most relevant articles for a user question.

3. **Vector Storage & Similarity Search**  
   - Embeddings are stored in a vector database optimized for fast semantic retrieval.  
   - Similarity search compares the query embedding against stored embeddings to find the closest matches.  
   - Popular tools include **FAISS**, **Milvus**, and **Weaviate**.  
   - **Example:** A query about “climate change effects” retrieves relevant papers even if the wording differs from the documents.

4. **Chunking**  
   - Long documents are split into smaller pieces, or “chunks,” to handle token limits and improve retrieval accuracy.  
   - Overlapping chunks can be used to maintain context.  
   - Each chunk is embedded and stored in the vector database.  
   - **Example:** A 200-page report is divided into 500 smaller chunks; only the most relevant ones are retrieved for answering a question.

5. **Retrieval Pipeline**  
   - The system uses similarity search on the vector database to retrieve the most relevant chunks for a given query.  
   - Retrieved content is passed to the LLM as context for generating a response.  
   - **Example:** A query about “X disease symptoms” pulls relevant research abstracts before the model generates an answer.

6. **Generation Step**  
   - The LLM produces a response combining its internal knowledge and the retrieved context.  
   - This approach reduces hallucinations and improves factual accuracy.

---

### **Advantages of RAG**

- **Grounded Responses**: Outputs are based on external sources, improving accuracy.  
- **Domain Adaptability**: Works well for specialized topics without retraining the model.  
- **Reduced Hallucinations**: By using real documents, the model is less likely to invent information.  
- **Dynamic Knowledge Updating**: Knowledge bases can be updated independently of the model.  
- **Resource Efficiency**: Avoids full model retraining, saving time and compute.  
- **Semantic Search**: Retrieves information based on meaning rather than exact words, improving relevance.


---

## **LLM Frameworks and Tools**
LangChain, LlamaIndex, Haystack
## **Advanced Prompting Techniques**
- Preamble/context for style and tone
- Self-consistency checks and iterative reasoning
## **Model Safety and Bias Mitigation**
- Groundedness / factuality
- Prompt injection mitigation
- Bias sources and mitigation strategies
- Fact-checking and auditing pipelines
## **Memory and State Handling**
- Session management / context retention
- Session timeout and ephemeral context
## **Multi-modal LLMs**
- Text + Image
- Text + Audio
- Text + video

---

## **Bias and Fairness**
## **Hallucinations and Incorrect Outputs**

# **Generative AI for Images and Multimedia**
## **Image Generation Models (DALL·E, Stable Diffusion, etc.)**
## **Audio and Music Generation**
## **Video Generation**

---

# **Tools and Platforms**
## **Libraries and Frameworks (Hugging Face, OpenAI, TensorFlow, etc.)**
## **Deployment Tools and APIs**
- Vector stores and database integration
- On-demand vs batch inference
## **No-code / Low-code Interfaces**


# **OCI Generative AI**

**Oracle Cloud Infrastructure (OCI)** is a comprehensive cloud platform that provides computing, storage, networking, and data management services for enterprises and developers. It offers a range of infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS) solutions, designed to support high-performance applications and scalable workloads.  

OCI emphasizes security, reliability, and performance, with features such as dedicated compute clusters, encrypted storage, and advanced networking options. The platform also includes specialized services for artificial intelligence and machine learning, allowing developers to build, train, and deploy AI models with integrated tools and resources.

---

## OCI Generative AI Service

OCI Generative AI Service provides pre-trained foundation models, primarily chat models that can be used in on-demand serving mode. Model endpoints act as the designated points where user requests are sent and responses are received. When using on-demand inference, customers are charged per character processed, which makes it suitable for applications with variable or moderate usage.  

Session-enabled endpoints allow the chat context to be retained during the session, helping maintain continuity in conversations. These sessions automatically end after a specified timeout period.  

The service also offers a citation option, which displays the source details for each chat response, providing transparency and traceability. Content moderation can be enabled during endpoint creation to ensure that generated content complies with safety and policy requirements.

### Types of Pre-trained Models
- **Chat models:** For interactive text-based conversations.
- **Embedding models:** For semantic search, similarity, or vector-based retrieval.
- **Image generation models:** For text-to-image tasks (if enabled in OCI).  
- **Other specialized models:** Depending on service updates, e.g., summarization or code generation.
- 
---

## Fine-Tuning and Custom Models

OCI allows fine-tuning of AI models using TCU methods, which are particularly suitable for large datasets ranging from hundreds of thousands to millions of samples.  

**Training Compute Unit (TCU)** is Oracle's unit of measurement for the computational resources allocated to model training and fine-tuning. Each TCU represents a portion of GPU or TPU resources, memory, and compute capacity, allowing predictable scaling of training workloads. Using TCUs, developers can estimate training time, optimize resource usage, and manage costs effectively.  

When preparing a custom dataset for fine-tuning, it is important to organize the data into separate files for training and validation to ensure proper evaluation and prevent data leakage.

Once a model has been fine-tuned, it is stored securely in OCI Object Storage, with encryption enabled by default to protect customer data. Dedicated AI clusters provide predictable pricing that does not fluctuate with demand, and they help minimize GPU or TPU memory overhead by sharing base model weights across multiple fine-tuned models. These clusters also enable the deployment of several fine-tuned models within a single infrastructure, improving efficiency and resource utilization.

---

## Prompting and RAG in OCI

In OCI, deciding between prompting and training depends on the goal. Prompting is best used when you want the model to emphasize specific vocabulary or phrases, such as product names, during responses. Fine-tuning, on the other hand, is more suitable for improving the model's understanding of broader domain-specific terminology across multiple tasks.  

**Retrieval-Augmented Generation (RAG)** offers a way to enhance model responses without the costs of full model training. Unlike traditional fine-tuning, RAG does not modify the model weights; instead, it augments generation with relevant information retrieved from external documents. This approach reduces hallucinations and ensures that responses are grounded in factual sources.  

The typical RAG pipeline consists of three main stages:

1. **Document Embedding:** External documents are converted into vector representations (embeddings) that capture their semantic meaning.  
2. **Indexing and Retrieval:** The embeddings are stored in a vector database or index. When a query is issued, the system retrieves the most relevant documents by comparing the query embedding with the stored vectors.  
3. **Generation:** The LLM uses the retrieved documents as context to generate responses, combining its language understanding with factual information from the sources.  

**Key points about RAG:**
- It simplifies setup compared to full fine-tuning, avoiding the need for large-scale training datasets and TCU usage.  
- Operational complexity still exists: the retrieval and embedding pipeline must be carefully configured, including document preprocessing, vector indexing, and relevance tuning.  
- RAG works well with prompting: prompts can still guide style, emphasis, or vocabulary, while retrieval provides grounded content.  
- This method allows fine control over sources used, enabling auditing, bias mitigation, and transparent content generation.  

Overall, RAG strikes a balance between **flexibility, grounding, and practical implementation**, making it a preferred choice when you want reliable, fact-based outputs without the overhead of fine-tuning the model itself.

---

## Vector Search and Database Integration

Before executing any code related to vector search in OCI, embeddings must first be created and stored in the database. These embeddings serve as the foundation for building vector stores, which allow the model to efficiently search and retrieve relevant information from large datasets. Using vector store code makes it possible to create these stores directly from database tables of embeddings, simplifying integration with Oracle Database AI services.  

### Vector Store Creation and Management
- **Creating a vector store:** Embeddings generated from documents or data are indexed in a vector database (e.g., OCI Object Storage + indexing service) to enable fast similarity search.  
- **Updating and maintaining:** When new data is added or documents change, embeddings need to be updated in the vector store to maintain search accuracy.  
- **Querying:** When a user query is issued, it is converted into an embedding and compared against the stored vectors to retrieve the most relevant documents.

### Importance for RAG
- Vector stores are a critical component of **Retrieval-Augmented Generation (RAG)** pipelines. The retrieved vectors provide the LLM with grounded, contextually relevant information to generate accurate responses.  
- Proper vector store management ensures minimal retrieval errors, reduces hallucinations, and allows fine control over the documents and sources used during generation.

When integrating vector search functionality with an Oracle Database:
- The **core field** in the vector search results represents the distance between the query vector and the corresponding body vector, indicating similarity or relevance.  
- **Network configuration:** Subnet ingress rules must be correctly set, and the source type for the Oracle database should be CIDR to ensure secure access.

---

## Knowledge Bases

OCI Generative AI Agents can connect to knowledge bases, which are structured collections of documents or data used to provide contextually accurate responses.  

- **Ingestion:** Documents are ingested into the knowledge base and optionally converted into embeddings for efficient retrieval.  
- **Failure handling:** If an ingestion job fails partially, only the failed and updated files are re-ingested.  
- **Deletion:** Deleting a knowledge base is permanent and cannot be undone.  
- **Integration with vector search:** Knowledge bases can leverage embeddings and vector stores to enhance retrieval, which is critical for RAG pipelines.  
- **Endpoint connection:** Agents can query knowledge bases through model endpoints to provide grounded, fact-based responses.

---

## Session Management in Agents

OCI Generative AI Agents include session management features that track the conversational history, including both user prompts and model responses. This allows the agent to maintain context across interactions, which is essential for creating coherent and continuous conversations.  

When session-enabled endpoints are used, the context is retained for the duration of the session. However, sessions have a timeout, after which the context is automatically cleared. Additionally, if an ingestion job processes multiple files and some fail, only the failed and subsequently updated files are re-ingested when the job is restarted, ensuring efficient data processing without duplicating work.

---

## Configuration and Code

Configuring OCI Generative AI services requires understanding key code settings as well as operational features of endpoints and session handling.

- **OCI Configuration File:**  
  The line `oci.config.from_file()` loads OCI configuration details from a local file, allowing the system to authenticate and connect to the cloud environment.

- **Serving Mode:**  
  For chat models, setting `ChatDetail.ServingMode = OnDemandServingMode` ensures that the model operates in on-demand inference mode, generating responses only when requested and providing flexibility in resource usage.

- **Embeddings Truncation:**  
  When working with embeddings, the setting `EmbedTextDetail.truncate = none` disables truncation of the input text, ensuring the full content is used for generating embeddings. This is particularly important for long texts to preserve context and information for vector search or retrieval tasks.

- **Model Endpoints:**  
  Endpoints are the designated points where user requests are sent and model responses are received.  
  - **Session-enabled endpoints:** Retain the conversation context for the duration of the session.  
  - **Session timeout:** After the specified timeout, the context is cleared automatically.  
  - **Immutable session option:** Once enabled, the session option cannot be changed.

- **Citation Option:**  
  Enabling citation displays source details for each chat response, improving transparency and traceability of the model's outputs.

- **Content Moderation:**  
  Content moderation ensures that generated outputs comply with safety policies and regulations. It must be enabled when creating the endpoint.

This configuration and operational setup ensures that developers can manage authentication, model behavior, session context, and output transparency effectively while using OCI Generative AI.

---

## Text Processing and Embeddings

When working with text in OCI Generative AI, processing it properly is essential for generating accurate embeddings and enabling efficient retrieval. Text is typically divided into paragraphs, which are then split into sequences and further broken down into tokens until the desired chunk size is reached. This ensures that each piece of text can be processed by the model without exceeding token limits.  

Handling token limits is particularly important when dealing with long documents, as exceeding the model's capacity can result in incomplete embeddings. In cases where only certain parts of a document are most relevant, a truncation strategy can be applied. For example, if the most important information is at the beginning, the end of the text may be truncated to fit within the token limit while preserving key content.

---

## Multi-modal and Advanced Models

OCI Generative AI supports advanced multi-modal capabilities, allowing models to handle different types of input and output beyond plain text. Diffusion models, for example, are used to generate complex outputs such as converting text descriptions into images. These models iteratively refine the output, producing high-quality and detailed results suitable for applications in design, marketing, and creative content generation.  

The **Cohere Embed V3** model represents an improvement over its predecessors in several ways. It provides higher-dimensional embeddings that capture semantic meaning more accurately, which improves performance in tasks like semantic search, clustering, and retrieval. This enhanced precision allows the model to better understand textual nuances and relationships, making similarity searches and document retrieval more reliable. Compared to previous versions, Cohere Embed V3 is optimized for **speed, accuracy, and generalization**, enabling developers to build applications that require high-quality semantic understanding across large datasets.



# Useful resources

### **Youtube**

*Builders Central*: Retrieval-augmented generation (RAG), Clearly Explained (Why it Matters)
https://www.youtube.com/watch?v=VioF7v8Mikg

### **Websites**

### **Courses**

*Oracle*: OCI Generative AI Professional (2025)
https://mylearn.oracle.com/ou/learning-path/become-an-oci-generative-ai-professional-2025/147863