In [None]:
# click on command will copy it to clipboard. Then you can paste it in terminal.

# ‚úÖ STEP 1 ‚Äî Install Tesseract OCR

Tesseract is required for OCR (extracting text from images).

---

## ü™ü Windows Users

1. Download Tesseract from:
   https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.2.0.20220712.exe

2. Install it (recommended location):
   <div>
  <button onclick="navigator.clipboard.writeText('C:\Program Files\Tesseract-OCR')">
    C:\Program Files\Tesseract-OCR
  </button>
</div>


3. Add it to Environment Variables (PATH):

   - Press Windows Key
   - Search "Environment Variables"
   - Click "Edit the system environment variables"
   - Click "Environment Variables"
   - Under "System Variables", find "Path"
   - Click "Edit"
   - Click "New"
   - Add:
     C:\Program Files\Tesseract-OCR
   - Click OK and restart terminal / VS Code

---

## üçé Mac Users

Run this in Terminal:

    
   <div>
  <button onclick="navigator.clipboard.writeText('brew install tesseract')">
    brew install tesseract
  </button>
</div>

---

## ‚úÖ Verify Installation

After installation, restart your terminal and run:

    
   <div>
  <button onclick="navigator.clipboard.writeText('tesseract --version')">
   tesseract --version
  </button>
</div>


If it prints version details ‚Üí Installation successful.

In [1]:
# Verify Tesseract installation

import shutil
import subprocess

# Check if tesseract is available
tesseract_path = shutil.which("tesseract")

if tesseract_path:
    print(f"‚úÖ Tesseract found at: {tesseract_path}")
    subprocess.run(["tesseract", "--version"])
else:
    print("‚ùå Tesseract not found.")
    print("Please install Tesseract and make sure it is added to PATH.")

‚úÖ Tesseract found at: /opt/homebrew/bin/tesseract
tesseract 5.5.2
 leptonica-1.87.0
  libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.1.3) : libpng 1.6.53 : libtiff 4.7.1 : zlib 1.2.12 : libwebp 1.6.0 : libopenjp2 2.5.4
 Found NEON
 Found libarchive 3.8.4 zlib/1.2.12 liblzma/5.8.2 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.7 expat/expat_2.7.3 CommonCrypto/system libb2/system
 Found libcurl/8.7.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.68.0


# ‚úÖ STEP 2 ‚Äî Check Python Version

This project requires Python 3.11

Run the next cell to verify.

In [None]:
import sys
print("Python version:", sys.version)

 # ‚úÖ STEP 3 ‚Äî Create Virtual Environment

‚ö†Ô∏è IMPORTANT:
Run this in terminal (NOT inside notebook):

    
<div>
  <button onclick="navigator.clipboard.writeText('python -m venv .venv')">
   python -m venv .venv
  </button>
</div>


# ‚úÖ STEP 4 ‚Äî Activate Virtual Environment

## Windows:
<div>
  <button onclick="navigator.clipboard.writeText('.venv\Scripts\Activate')">
   .venv\Scripts\Activate
  </button>
</div>

## Mac/Linux:
<div>
  <button onclick="navigator.clipboard.writeText('source .venv/bin/activate')">
   source .venv/bin/activate
  </button>
    </div>


If activated correctly, you will see:

    (.venv)

 continue
 
 <!--# ‚úÖ STEP 5.1 ‚Äî Important (Windows Users Only) 


If you are using **Windows**, you must modify the `requirements.txt` file.

### üî¥ Why?
`uvloop` does NOT support Windows.  
If you try installing it on Windows, it will give errors.

---

## ü™ü Windows Users ‚Äî Do This:

1. Open the file:
   
   `requirements.txt`

2. Find this line:

       uvloop==0.22.1

3. Comment it by adding `#` in front:

       # uvloop==0.22.1

4. Save the file.

---

## üçé Mac / Linux Users

Leave this line **uncommented**:

       uvloop==0.22.1

Mac and Linux support `uvloop`, so it should remain active.

--- -->


# ‚úÖ STEP 5 ‚Äî Install Requirements

After activating virtual environment, run in terminal:

<div>
  <button onclick="navigator.clipboard.writeText('pip install -r requirements.txt')">
   pip install -r requirements.txt
  </button>
    </div>

# ‚úÖ STEP 6 ‚Äî Setup Hugging Face Access Token

1. Go to:
   https://huggingface.co/settings/tokens

2. Create new token
3. Copy it
4. Paste it in the next cell

In [None]:
from huggingface_hub import login

login("YOUR_HF_ACCESS_TOKEN")
print("Logged in to Hugging Face Hub successfully!")

# ‚úÖ STEP 7 ‚Äî Preload All Models

‚ö†Ô∏è This will download and load all models into memory.
It may take several minutes.

WAIT until it finishes.

In [None]:
from backend.core.model_manager import load_all_models

load_all_models()

print("‚úÖ All models loaded successfully!")

# ‚úÖ STEP 8 ‚Äî Start Backend (FastAPI)

‚ö†Ô∏è Run this in a NEW TERMINAL (not notebook)

Make sure virtual environment is activated.

    
<div>
  <button onclick="navigator.clipboard.writeText('uvicorn backend.main:app --reload')">
   uvicorn backend.main:app --reload
  </button>
    </div>


Wait until you see:

    INFO:     Application startup complete.

Backend runs at:
http://127.0.0.1:8000

Do NOT close this terminal.

# ‚úÖ STEP 9 ‚Äî Start Frontend (Streamlit)

Open ANOTHER new terminal.
Activate virtual environment again.

Run:

    
 <div>
  <button onclick="navigator.clipboard.writeText('streamlit run frontend/app.py')">
   streamlit run frontend/app.py
  </button>
    </div>

Frontend runs at:
http://localhost:8501

install tessrect on your system 

windows : https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.2.0.20220712.exe
mac : brew install tesseract

if you are on windows make sure to set PATH of tessrect in enviroment variables




In [None]:
# python version shoud be 3.11


In [None]:
# In terminal, navigate to the project directory and run the following commands one by one:

# python --version (should be 3.11)
# python -m venv .venv
#  .venv\Scripts\Activate (windows) 
# source .venv/bin/activate (linux/mac)


In [None]:
# after making .venv enviroment and select .venv kernel and then run this code block to install the required dependencies.

! pip install -r requirements.txt

python: can't open file 'd:\\Python\\personal_data_detector\\install': [Errno 2] No such file or directory


## Setup Hugging Face Access Token

Run the following two cells to authenticate with Hugging Face so the notebook can download models.


 Login with your access token
```python
from huggingface_hub import login

login("YOUR_HF_ACCESS_TOKEN")
```

Replace `"YOUR_HF_ACCESS_TOKEN"` with your token from:
https://huggingface.co/settings/tokens

In [None]:
from huggingface_hub import login

token=""
login(token=token)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# this code block will load all the models and their weights in the memory. It may take some time to execute. 


from backend.core.model_manager import load_all_models

load_all_models()         


Ensuring all models exist...
Downloading BART model...


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Loading weights: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 515/515 [00:01<00:00, 408.37it/s, Materializing param=model.shared.weight]                                   
Writing model shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:04<00:00,  4.25s/it]


BART saved locally.
Downloading BLIP2 model...


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Fetching 2 files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [14:33<00:00, 436.88s/it]
Loading weights: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1247/1247 [00:02<00:00, 467.13it/s, Materializing param=vision_model.post_layernorm.weight]                               
Writing model shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [03:19<00:00, 199.94s/it]


BLIP2 saved locally.
Downloading yolov9c.pt...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolov9c.pt to 'yolov9c.pt': 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 49.4MB 10.8MB/s 4.6s4.5s<0.1ss
Saved yolov9c.pt
Downloading yolov8l-oiv7.pt...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolov8l-oiv7.pt to 'yolov8l-oiv7.pt': 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 84.5MB 10.7MB/s 7.9s7.9s<0.0s
Saved yolov8l-oiv7.pt
Downloading yolov8x-oiv7.pt...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolov8x-oiv7.pt to 'yolov8x-oiv7.pt': 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 131.5MB 11.1MB/s 11.9s 11.9s<0.0s
Saved yolov8x-oiv7.pt
All models ready.


## Run the Application

After activating the virtual environment, start the backend and frontend in separate terminals.

### Start Backend (FastAPI)

uvicorn backend.main:app --reload


This will start the API server, usually at:
```
http://127.0.0.1:8000
```

### Start Frontend (Streamlit)
```bash
streamlit run frontend/app.py
```

This will open the Streamlit interface in your browser, usually at:
```
http://localhost:8501
```

### Typical Workflow

1. Activate virtual environment
```bash
source .venv/bin/activate
```

2. Start backend
```bash
uvicorn backend.main:app --reload
```
backend.main:app --> it is a path to run backend from project root folder . if termainal root path is differnet then it needs to be changed accordinly

‚ö†Ô∏è **Important**

After running the backend command, wait for this line:

```text
INFO:     Application startup complete.
```

Only then start the Streamlit frontend.


3. In another terminal, start frontend
```bash
streamlit run frontend/app.py
```

In [11]:
print("Environment setup complete. You can now run the application.")

Environment setup complete. You can now run the application.
