# Generative AI(Gen AI)
LLMs are large pre-trained models that can be used directly or fine-tuned for specific business datasets. Typically, these models are accessed via APIs, as organizations may not host them locally.
  
### Key Resources:
  - [OpenAI](https://openai.com/) (pay-per-request)
  - [Hugging Face Repository] (https://huggingface.co/) (mostly free with a token required)
### Frameworks:
  - https://python.langchain.com/en/latest/index.html
  - https://docs.crewai.com/introduction
  - https://langchain-ai.github.io/langgraph/
  - https://huggingface.co/docs/transformers/transformers_agents
  - https://github.com/jerryjliu/llama_index

### Choose Generative AI Models When:
  1. Low Confidentiality Concerns:
     - Best suited when your data is not very highly sensitive. (If some level of confidentiality is required, ensure third-party providers offer strong encryption and legal agreements such as NDAs or DPAs to safeguard your data.)
  2. Large, Complex Datasets:
     - Ideal for handling massive or varying datasets when there's limited time or expertise to develop and scale a custom deep learning solution in-house.
  3. Multiple Downstream Tasks:
     - Effective when supporting various business needs using the same dataset; such as named entity recognition (NER), summarization, classification, etc. where a flexible, general-purpose model streamlines development.
    
### Generative AI - Search Engines
  - https://openai.com/
  - https://grok.com/?referrer=website
  - https://chat.deepseek.com/
  - https://gemini.google.com/app
  - https://claude.ai/new
  - https://www.phind.com/
  - https://www.perplexity.ai/  

### AI Coding Assistant
  - Cursor AI
    - AI-powered integrated development environment (IDE) built to boost developer productivity with cutting-edge artificial intelligence features.
    - https://www.cursor.com/en
  - Copilot
    - GitHub Copilot is an AI-driven coding assistant created through a collaboration between GitHub and OpenAI.
    - https://github.com/features/copilot

### Generative AI Models
#### OpenAI o1 모델
- OpenAI에서 제공하는 임베딩(embedding) 모델 중 하나입니다. o1은 OpenAI의 "OpenAI Ada 001" 계열 임베딩 모델 중 하나로, 텍스트를 벡터(숫자 배열)로 변환하는 데 최적화되어 있습니다. 빠르고 효율적이면서도, 다양한 자연어 처리 태스크(검색, 분류, 추천 등)에 적합한 범용 임베딩을 제공합니다. 보통 text-embedding-ada-002 같은 이름으로 더 알려져 있는데, o1은 그런 모델의 별칭 또는 내부 코드명으로 쓰이기도 합니다.
- 주요 특징
  - 고속, 경량화된 임베딩 생성
  - 다목적 범용 임베딩 (검색, 의미 비교 등에 사용)
  - OpenAI API에서 쉽게 호출 가능

#### Instructor Embedding 모델
- Instructor Embedding은 HKUNLP 팀에서 개발한 instruction 기반 텍스트 임베딩 모델입니다. 단순히 텍스트만 임베딩하지 않고, **임베딩 목적을 담은 instruction(지시문)**과 텍스트를 같이 넣어 임베딩을 생성합니다. 예를 들어, "Represent the query for retrieval:" 와 같은 instruction을 넣고 임베딩을 만들면, 검색용에 최적화된 벡터를 얻을 수 있습니다.
- 장점
  - 특정 태스크에 맞춘 임베딩 생성 가능 (검색, 요약, 분류 등)
  - 다양한 지시문을 활용해 다목적 임베딩
  - Hugging Face에서 hkunlp/instructor-xl 등으로 공개되어 있음

#### Engage with these LLM models through prompt engineering.
- https://learnprompting.org/docs/intro (Recommended)
- https://www.promptingguide.ai
- https://www.cloudskillsboost.google/paths/118
- Prompt Engineering : Design effective prompting techniques that interface with LLMs and other too

#### Large Language Model (LLM) :
- To access online LLMs, you will need an access token or API key from Hugging Face, Google Cloud and OpenAI.
- Obtain a free trial token and/or secure payable API by visiting the following links:
  - Hugging Face: https://huggingface.co/docs/hub/security-tokens
  - OpenAI: https://platform.openai.com/account/api-keys | https://platform.openai.com/settings/organization/api-keys
  - Google Cloud Vertex AI: [Get started for free] https://cloud.google.com/?hl=en

- LLMs
  - LangChain : https://python.langchain.com/en/latest/index.html
  - Crew AI : https://github.com/crewAIInc/crewAI
  - Transformers Agent : https://huggingface.co/docs/transformers/transformers_agents
  - LlamaIndex : https://github.com/jerryjliu/llama_index
  - ChainLit: Build and share LLM apps | https://docs.chainlit.io/overview
  - VLLM : Serving LLM | https://github.com/vllm-project/vllm
  - H2O LLM Studio : GUI designed for fine-tuning state-of-the-art large language models (LLMs) | https://github.com/h2oai/h2o-llmstudi

#### RAG(Retrieval-Augmented Generation)
##### RAG is a hybrid AI architecture that combines:
- Retrieval systems: Search external knowledge sources (like documents, databases, or the web) to find relevant information.
- Generative models: Use that retrieved information to produce accurate, context-aware responses.

##### How RAG Works
- Input query: User asks a question or provides a prompt.
- Retrieval step: The system searches a large corpus (e.g., a knowledge base, documents) to find relevant passages or data.
- Augmentation: Retrieved documents are fed into a generative model.
- Generation step: The generative model produces an output based on both the input and the retrieved knowledge.
- **ppt from prof : Load file -> Split into chunk(Token length) -> Embedding(OpenAI, VertexAI, HuggingFace, InstructorEmbedding) -> Build vectorStore(Chroma, Deep Lake, FAISS, Annoy, etc) -> Retrieval -> Q&A**

##### Benefits of RAG
- Improved factual accuracy: The generative model can ground responses in real data, reducing hallucinations.
- Access to up-to-date info: Retrieval allows use of current data without retraining the model.
- Domain adaptability: Easy to swap or update the retrieval corpus for new domains or languages.

##### Applications of RAG
- Customer support bots that pull answers from product manuals.
- Medical assistants referencing medical literature.
- Research assistants summarizing scientific papers.
- Chatbots with access to proprietary or private databases.

##### Example: RAG pipeline in practice
- User: “What are the latest updates on climate policy?”
- Retrieval: Search recent news and reports.
- Generation: Summarize and generate an answer based on retrieved documents.
  
##### Several different types of document loaders available in LangChain
- PDFs
- UnstructuredPDFLoader
- Social Media
- TwitterTweetLoader
- Messaging Services
- WhatsAppChatLoader
- File Types
- CSVLoader
  
##### Embedding
- OpenAI (EX :GPT-3.5 Turbo )
- HuggingFace Models (Ex: Falcon 40B )
- InstructorEmbedding (EX : hkunlp/Instructor-xl)
- Google Embedding (EX :Gemini-1.5-Flash-002
  
##### Types of Chains in LangChain (Stuff, Map-Reduce, Refine and Map-Rerank)
- References :
  - https://medium.com/@vinusebastianthomas/document-chains-in-langchain-d33c4bdbabd8
  - https://medium.com/@minh.hoque/what-are-llm-chains-671b84103ba

##### STUFF
- It is ideal for applications where documents are small, and only a few are used at a time.
- The stuff chain would fail if the document tokens exceed the LLM limit.
- Good : It’s quite cheap and it works for all application.
- Bad : If we are dealing with a lots of different types of chunks it is not the best practice

##### Map-Reduce
- Enables the iteration over a list of documents, generating individual outputs for each document, which can later be combined to produce a final result.
- Good : Improving efficiency (Parallel processing ) and reducing processing time.
  
##### Refine Chain
- The Refine chain focuses on iterative refinement of the output by feeding the output of one iteration into the next, aiming to enhance the accuracy and quality of the final result
- Good: Enhanced accuracy: By refining the output in each iteration, the chain can improve the accuracy and relevance of the final result.
- Bad: Increased computational resources | Longer processing time

##### Map-Rerank
- Can be used for question answering.
- It maps over documents, trying to both (a) answer a question, (b) assign a score to how good the answer is. It then picks the answer with the highest score. Map-Rerank is a good choice for question answering tasks where you expect there to be a single
simple answer in a single document. For example, you could use Map-Rerank to answer questions about factual topics, such as
"What is the capital of France?" Map-Rerank is not a good choice for question answering tasks where you expect there to be
multiple answers or where the answers are spread out over multiple documents.
- For example, you would not use Map-Rerank to answer questions about open-ended topics, such as
"What is the meaning of life?" or "What is the best way to solve world hunger?"
  
####  RAG - Search Type
##### Similarity Search
Similarity search selects text chunk vectors that are most similar to the question vector. This is the simplest and most straightforward search method. However, it can sometimes return documents that are not very relevant to the question.

##### Maximum Marginal Relevance (MMR)
- MMR search optimizes for similarity to query AND diversity among selected documents. This means that MMR search will try to find documents that are both similar to the question and different from each other.
- To set search_kwargs in Langchain in search type MMR, you can use the as_retriever() method.
- The as_retriever() method takes a dictionary of keyword arguments as its argument. The keyword arguments that you can use are:
  - k: The number of documents to return.
  - top_k: The number of documents to consider for each iteration of the MMR algorithm.
  - alpha: The relevance decay factor.
  - beta: The diversity penalty factor

##### Multiple PDFs in LLM
https://colab.research.google.com/drive/1mIO99-4QWgIKvjgAFj0vvEbQi5xgNCPk?usp=sharing
- More info : https://www.youtube.com/watch?v=s5LhRdh5fu4
##### LLM - Question Answering on Own Data
- https://medium.com/@onkarmishra/using-langchain-for-question-answering-on-own-data-3af0a82789ed
##### Levels Of Summarization: Novice to Expert
- https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/5%20Levels%20Of%20Summarization%20%20Novice%20To%20Expert.ipynb
##### Advanced - Generative AI with Large Language Models - Deeplearning.AI and Amazon Web Services
- https://github.com/amruthaa08/Generative_AI_LLMs/tree/main
##### Efficient Large Language Model training with LoRA and Hugging Face
- https://www.philschmid.de/fine-tune-flan-t5-peft
##### crewAI

