Skip to content

tripcoded/ARIA-mypersonalAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– ARIA β€” Personal Knowledge AI Assistant

ARIA is a full-stack personal knowledge assistant that ingests multiple knowledge sources and allows users to query them through an intelligent chat interface powered by Retrieval-Augmented Generation (RAG).

ARIA can process:

  • πŸ“„ PDF documents
  • ▢️ YouTube video transcripts
  • πŸ§‘β€πŸ’» GitHub repositories

The system converts these sources into vector embeddings stored in a semantic database and retrieves the most relevant information to generate contextual answers.

ARIA aims to function as a personal AI knowledge brain, allowing users to build a searchable memory of their documents, repositories, and learning resources.


πŸ“Έ User Interface

πŸ’» Desktop View

πŸ“± Mobile View

The interface provides a clean, intuitive design featuring:

  • Knowledge Base Management: Upload PDFs, paste GitHub repo links, and index YouTube videos
  • Interactive Chat: Real-time conversation with your personal AI brain
  • Status Indicators: Voice standby and reply status monitoring
  • Indexed History: Context explorer showing all indexed knowledge sources

✨ Key Features

πŸ“š Multi-Source Knowledge Ingestion

ARIA can ingest knowledge from multiple sources including:

  • πŸ“„ PDF documents
  • ▢️ YouTube transcripts
  • πŸ§‘β€πŸ’» GitHub repositories

Each source is processed and converted into embeddings for semantic retrieval.


πŸ’¬ Retrieval-Augmented Chat

ARIA answers questions by retrieving the most relevant knowledge chunks from the vector database and passing them to a language model for contextual response generation.

This ensures:

  • grounded answers
  • traceable sources
  • reduced hallucinations

πŸ—‚οΈ Source Management

Users can review all indexed knowledge sources through a knowledge ledger interface and remove individual sources when needed.

This allows dynamic control of the knowledge base.


🧠 Vector Database Powered Search

All document chunks are embedded using HuggingFace embeddings and stored in a Chroma vector database, enabling fast semantic search across all ingested knowledge.


πŸ§‘β€πŸ’» GitHub Repository Indexing

ARIA can ingest GitHub repositories and index their source files.

To avoid excessively long indexing operations, ingestion is intentionally limited by:

  • maximum file count
  • maximum file size

πŸ—οΈ System Architecture

ARIA uses a Retrieval-Augmented Generation pipeline.

User
 β”‚
 β–Ό
Frontend (Next.js)
 β”‚
 β–Ό
FastAPI Backend
 β”‚
 β”œβ”€β”€ Ingestion Pipeline
 β”‚     β”œ PDF parsing
 β”‚     β”œ GitHub repository crawling
 β”‚     β”” YouTube transcript extraction
 β”‚
 β”œβ”€β”€ Embedding Generation
 β”‚     β”” HuggingFace embeddings
 β”‚
 β”œβ”€β”€ Vector Storage
 β”‚     β”” Chroma database
 β”‚
 β–Ό
Retriever
 β”‚
 β–Ό
LLM (Groq)
 β”‚
 β–Ό
Generated Response

πŸ› οΈ Tech Stack

🎨 Frontend

  • Next.js
  • React
  • TypeScript
  • Tailwind CSS

βš™οΈ Backend

  • FastAPI
  • LangChain
  • Chroma Vector Database
  • HuggingFace Embeddings

🧠 AI Infrastructure

  • Groq LLM API
  • Retrieval Augmented Generation (RAG)

πŸ“ Repository Structure

.
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ ingestion.py
β”‚   β”‚   └── rag.py
β”‚   └── .env.example
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ package.json
β”‚   └── src/
β”‚
└── README.md

βš™οΈ Environment Configuration

Create a .env file in the backend directory based on .env.example.

πŸ”‘ Required

GROQ_API_KEY=

πŸ”§ Optional

DATA_DIR=
GROQ_CHAT_MODEL=
CHROMA_DB_DIR=
UPLOAD_DIR=
CORS_ALLOWED_ORIGINS=
GITHUB_TOKEN=
GITHUB_MAX_FILES=
GITHUB_MAX_FILE_BYTES=

πŸš€ Local Development Setup

βš™οΈ Backend

cd backend
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

python -m uvicorn main:app --host 127.0.0.1 --port 8000

🎨 Frontend

cd frontend
npm install
npm run dev

The frontend expects the backend at:

http://localhost:8000

unless overridden using:

NEXT_PUBLIC_API_URL

☁️ Render Deployment

The backend can be deployed to Render as a Python web service.

Recommended service settings:

  • Root Directory: backend
  • Build Command: pip install -r requirements.txt
  • Start Command: python -m uvicorn main:app --host 0.0.0.0 --port $PORT
  • Health Check Path: /health

Recommended environment variables:

  • GROQ_API_KEY
  • GROQ_CHAT_MODEL
  • GITHUB_TOKEN (optional)
  • DATA_DIR=./storage

This repo is currently configured for Render free tier, so it does not use a persistent disk.

That means uploaded files, indexed Chroma data, and saved memory can be lost whenever the service restarts, redeploys, or sleeps.

If you later move to a paid Render plan, you can attach a persistent disk and set DATA_DIR to the disk mount path.

If your frontend runs on a domain other than the default Vercel URL in backend/main.py, add it with:

CORS_ALLOWED_ORIGINS=https://your-frontend-domain.com

πŸ“Œ Operational Notes

πŸ§‘β€πŸ’» GitHub Repository Ingestion

GitHub repositories are indexed with limits to prevent extremely long ingestion jobs.

This keeps indexing practical and avoids excessive memory consumption.


▢️ YouTube Ingestion

YouTube transcript ingestion is currently experimental.

Possible issues include:

  • transcript extraction failure
  • unavailable transcripts
  • empty ingestion results

Because of this, YouTube ingestion should not yet be considered fully reliable.


⚠️ Current Limitations

☁️ Backend Deployment

The backend can run on Render or similar platforms, but it works best with a persistent disk and enough RAM for embedding generation.

ARIA performs several compute-intensive tasks including:

  • document parsing
  • chunk generation
  • embedding creation
  • vector indexing
  • retrieval orchestration

Most free-tier cloud platforms impose limitations on:

  • memory
  • CPU resources
  • execution time
  • persistent storage

Because of these infrastructure constraints, low-tier or fully ephemeral instances may still struggle with cold starts, indexing speed, and data persistence.


πŸŽ™οΈ Voice Input

ARIA includes an experimental voice input interface.

Voice interaction is not yet fully reliable and may fail due to:

  • browser speech recognition inconsistencies
  • microphone permission handling
  • device compatibility differences
  • speech-to-text accuracy variations

For now, text input is recommended for interacting with ARIA.


πŸ”’ Security

Never commit:

  • .env files
  • API keys
  • GitHub tokens

Always use .env.example as the public configuration template.

If credentials have been exposed in version control or screenshots, rotate them immediately.


πŸ›£οΈ Roadmap

Planned improvements include:

  • ☁️ scalable backend deployment
  • πŸ“Ί improved YouTube ingestion reliability
  • 🧠 persistent cloud vector storage
  • πŸ“„ improved document chunking
  • πŸ“š better source attribution
  • ⚑ real-time streaming responses
  • πŸŽ™οΈ improved voice interaction

πŸ“œ License

Copyright (c) 2026 tripcoded All rights reserved.

This repository, its source code, documentation, design, and associated materials are proprietary.

No part of this repository may be copied, reproduced, distributed, modified, sublicensed, published, or used for commercial or non-commercial redistribution without prior explicit written permission from the copyright holder.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors