Advancing Video Analytics With AI Agents

This project provides a blueprint for building visually perceptive and interactive AI agents for video search and summarization (VSS) using:

LLaMA (LLM)
Pinecone (vector DB)
Flask (web UI)
Vision-Language Models (VLM)
Retrieval-Augmented Generation (RAG)

Structure

app/ — Flask web app
video_processing/ — Video frame extraction and preprocessing
vlm/ — Vision-language model integration
llm/ — LLaMA integration
db/ — Pinecone vector DB integration

Setup

Install dependencies: pip install -r requirements.txt
Run the app: python app/main.py

Vision-Language Model (VLM) Usage

The vlm/ module uses BLIP for image captioning, with automatic GPU/CPU device management and robust error handling.

Example: Generating Captions for Video Frames

import cv2
from vlm import generate_captions

# Example: Extract a frame from a video and generate a caption
cap = cv2.VideoCapture('sample.mp4')
frames = []
ret, frame = cap.read()
if ret:
    frames.append(frame)
cap.release()

captions = generate_captions(frames)
print(captions)

The model will use GPU if available, otherwise CPU.
Errors during caption generation are caught and reported in the output list.

Requirements

For best performance, a CUDA-capable GPU is recommended for BLIP and LLaMA models.
Ensure your environment has the necessary drivers and libraries for GPU support.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
.vscode		.vscode
app		app
credentials		credentials
db		db
llm		llm
video_processing		video_processing
vlm		vlm
.DS_Store		.DS_Store
README.md		README.md
learn.md		learn.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Advancing Video Analytics With AI Agents

Structure

Setup

Vision-Language Model (VLM) Usage

Example: Generating Captions for Video Frames

Requirements

About

Uh oh!

Releases

Packages

Languages

Shindevrp/SmartVidSearch

Folders and files

Latest commit

History

Repository files navigation

Advancing Video Analytics With AI Agents

Structure

Setup

Vision-Language Model (VLM) Usage

Example: Generating Captions for Video Frames

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages