# Local RAG-Based AI Assistant with Multimodal Input, Function Calling & Cloud Deployment

### **Project Type**: *End-to-End AI Assistant*
### **Tech Focus**: *Local LLM (e.g. Mistral via Ollama), RAG, Whisper, BLIP, Function Calling, Multimodal AI*
### **Deployment**: *Cloud via Render*
### **System Goal**: *Build a low-resource, locally hosted AI system that accepts user queries (text, image, audio, PDF), retrieves relevant context from documents, optionally calls external APIs, and returns a coherent response using a local LLM — all within ~10GB storage.*

## **Objectives**:

- _Use Retrieval-Augmented Generation (RAG) to ground LLM responses on your data_
    
- _Enable multimodal input (text, PDF, image, and audio)_
        
- _Use function calling to dynamically pull in external API data (e.g., weather, stocks)_
        
- _Run everything locally with models like Mistral-7B via Ollama_
        
- _Use minimal storage (~10GB) and prepare for cloud deployment on Render_

## **Requirements (Local Setup)**

| Category  |  Requirement                     |
|------------|---------------------------------|
| OS        | Windows/Linux/macOS              |
| Python    | ≥ 3.10                           |
| RAM       | ≥ 8 GB                           |
| Disk	    | ≤ ~10 GB (with quantized models) |

### **Recommended Setup:**
- _Use virtualenv or conda for isolation_

- _GPU is optional (Ollama runs on CPU)_

- _Use quantized models like mistral:7b-instruct.Q4_0 to save space_

## Technologies & Tools
 
  | **Tech**      | **Use Case**                            |
  | ------------- | ----------------------------------- |
  | **Ollama**    | Run local LLMs (e.g., Mistral 7B)   |
  | **LangChain** | RAG pipeline, function calling      |
  | **FAISS**     | Vector database for semantic search |
  | **Whisper**   | Audio (speech) to text conversion   |
  | **BLIP**      | Image captioning / OCR              |
  | **PyMuPDF**   | PDF text extraction                 |
  | **Streamlit** | Simple user interface               |
  | **Render**    | Cloud deployment (free tier)        |


 ## **Installation Overview**:

#### **1. Install Ollama**

> **curl -fsSL https://ollama.com/install.sh | sh**

#### **2. Pull a small model (e.g., mistral)**

> **ollama pull mistral:7b-instruct**

#### **3. Set up Python environment**
> pip install langchain faiss-cpu streamlit PyMuPDF openai requests<br>
pip install git+https://github.com/openai/whisper.git<br>
pip install transformers accelerate timm<br>

#### **4. Download BLIP for image captioning**
> **from transformers import BlipProcessor, BlipForConditionalGeneration**

<br>

> Tip: Ensure that Python version ≥ 3.9 is installed and that you are using a virtual environment.

## **Estimated Storage Breakdown (~10GB)**

| Component                     | Approx Size     |
| ----------------------------- | --------------- |
| Mistral 7B Q4 via Ollama      | \~4 GB          |
| Whisper (small or medium)     | \~1-2 GB        |
| BLIP Model                    | \~1 GB          |
| Python + Dependencies         | \~2 GB          |
| Vector DB (e.g., FAISS Index) | \~100–500 MB    |
| **Total**                     | **\~8–10 GB**   |

## **What Will This Project Do?**
1. _Accept text, audio, PDF, or image as input_

2. _Preprocess input (OCR / Whisper / captioning)_

3. _Chunk & embed documents using LangChain_

4. _Store embeddings in FAISS_

5. _Retrieve relevant chunks using similarity search_

6. _Use local LLM to generate grounded response_

7. _If query needs external info (e.g., weather), use function calling to fetch from APIs_

8. _Return final answer via UI (Streamlit)_

## **Detailed Architecture**:

![RAG-flowchart.png](attachment:f5e03bfb-5767-4601-aeb5-6b04ca8423f4.png)

<br> 

## Next Steps
- Add PDF chunking and embedding

- Integrate Whisper for audio input

- Integrate BLIP for image-to-text

- Set up sample function calling (e.g., weather API)

- Streamlit UI

- Deploy to Render (minimal cloud setup)



## **Folder Structure (Planned):**

```
/local-ai-assistant
│
├── /data/               # PDFs, audio, images
├── /models/             # Optional local copies of models
├── app.py               # Main Streamlit app
├── rag_pipeline.py      # Core logic (RAG, LLM calls)
├── function_tools.py    # API functions for function calling
├── requirements.txt     # Python dependencies
└── README.md            # Project overview
```

## **Local Models vs. Online APIs**

| Component            | Source               | Internet Required  | Notes                                      |
| -------------------- | -------------------- | ------------------ | ------------------------------------------ |
| **Mistral LLM**      | Ollama (Local)       | ❌                  | Full offline inference after download      |
| **Whisper ASR**      | OpenAI (Local Git)   | ❌                  | Lightweight STT for audio input            |
| **BLIP Captioning**  | Hugging Face Model   | ❌ (after download) | Converts image to text prompts             |
| **LangChain Tools**  | Local integrations   | ❌ (except APIs)    | Can be extended to call APIs if needed     |
| **Function Calling** | Custom via LangChain | ❌ or ✅             | You decide whether to call APIs externally |

>  _All components are configured to run locally, unless you explicitly add external APIs for function calling (like weather, search, finance, etc.)_



## Deployment Guide (Free Cloud – Render)
_Render.com allows you to deploy lightweight apps for free with decent specs._<br>
##### **1. Prepare a requirements.txt**
```
streamlit
langchain
faiss-cpu
PyMuPDF
openai
requests
transformers
accelerate
timm
whisper @ git+https://github.com/openai/whisper.git
```
##### **2.Prepare render.yaml**
**services:**<br>
  - **type**: web<br>
   **name**: rag-multimodal-app<br>
   **env**: python<br>
   **buildCommand**: pip install -r requirements.txt<br>
   **startCommand**: streamlit run app.py<br>
   **plan**: free<br>

##### **3. Steps to Deploy**
1. _Push your code to a GitHub repository._

2. _Create an account on Render._

3. _Click "New Web Service" → Connect GitHub repo._

4. _Use render.yaml for config (Render auto-detects it)._

5. _Deploy_

> You can use streamlit as the frontend to interact with your RAG system.

<br>