A full-stack backend application that allows users to register, upload documents, and use generative AI to search and answer questions based on their personalized knowledge base. Uses FastAPI, Postgres, Docker, and OpenAI embeddings with a FAISS vector store.
- User registration and authentication (JWT-based)
- Secure document upload (txt/pdf)
- Automatic document chunking and OpenAI embedding
- Semantic search using FAISS vector index
- Query natural language questions against your knowledge base and get LLM-based answers
- REST API with automatic Swagger UI docs (
/docs)
knowledge_chat_system/
β
βββ app/
β βββ db/ # DB session and ORM setup
β βββ models/ # Pydantic and SQLAlchemy models/schemas
β β βββ database.py
β β βββ schemas.py
β βββ routes/ # All API endpoint route definitions
β β βββ auth.py
β β βββ documents.py
β β βββ query.py
β βββ services/ # Business logic and abstractions
β β βββ auth_service.py
β β βββ document_service.py
β β βββ embedding_service.py
β β βββ query_service.py
β βββ utils/ # Utility functions (e.g., logger, helpers)
β βββ config.py # App-wide config (from .env)
β βββ main.py # FastAPI app instance
β
βββ uploads/ # Uploaded user docs (ignored by git)
βββ faiss_indexes/ # FAISS vector indexes (ignored by git)
βββ logs/ # Log output (ignored by git)
βββ env/ # Local venv (ignored by git)
β
βββ requirements.txt # Python dependencies
βββ Dockerfile # For Docker builds
βββ docker-compose.yml # Multi-container orchestration
βββ .env.example # Template for env variables
βββ .gitignore
β
βββ README.md
git clone https://github.com/<your-username>/<repo-name>.git
cd <repo-name>python3 -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txtcp .env.example .envEdit .env and add your configuration:
OPENAI_API_KEY=sk-your-openai-key-here
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/knowledge_chat_db
SECRET_KEY=your-secret-key-here
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30docker-compose up --buildThis will:
- Start the FastAPI backend on
http://localhost:8000 - Start PostgreSQL database on
localhost:5432 - Create the necessary tables automatically
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
-
POST
/auth/register- Register a new user{ "username": "alice", "email": "alice@example.com", "password": "password123" } -
POST
/auth/login- Login and get JWT token{ "username": "alice", "password": "password123" }
-
POST
/documents/upload- Upload a document (requires authentication)- Headers:
Authorization: Bearer <token> - Body: Form data with file
- Headers:
-
GET
/documents/- List all documents for current user (requires authentication)- Headers:
Authorization: Bearer <token>
- Headers:
- POST
/query/- Query your documents (requires authentication){ "question": "What is machine learning?", "top_k": 3 }- Headers:
Authorization: Bearer <token> - Response includes: answer, sources, context_used, timestamp
- Headers:
curl -X POST "http://localhost:8000/auth/register" \
-H "Content-Type: application/json" \
-d '{
"username": "alice",
"email": "alice@example.com",
"password": "password123"
}'curl -X POST "http://localhost:8000/auth/login" \
-H "Content-Type: application/json" \
-d '{
"username": "alice",
"password": "password123"
}'curl -X POST "http://localhost:8000/documents/upload" \
-H "Authorization: Bearer <your_access_token>" \
-F "file=@document.txt"curl -X POST "http://localhost:8000/query/" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_access_token>" \
-d '{
"question": "What is the main topic of this document?",
"top_k": 3
}'docker-compose up --builddocker-compose logs -f fastapi_appdocker-compose downdocker-compose down -v
docker-compose up --builddocker-compose exec postgres psql -U postgres -d knowledge_chat_dbdocker-compose exec postgres psql -U postgres -d knowledge_chat_db -c "SELECT id, original_filename, chunk_count FROM documents;"docker-compose exec postgres psql -U postgres -d knowledge_chat_db -c "
DELETE FROM documents;
ALTER SEQUENCE documents_id_seq RESTART WITH 1;
DELETE FROM users;
ALTER SEQUENCE users_id_seq RESTART WITH 1;
"git init
git add .
git commit -m "Initial commit: Knowledge Chat System"
git branch -M main
git remote add origin https://github.com/<your-username>/<repo-name>.git
git push -u origin main- Go to https://render.com
- Click "New +" β "Web Service"
- Connect your GitHub repository
- Set build command:
pip install -r requirements.txt - Set start command:
uvicorn app.main:app --host 0.0.0.0 --port 10000 - Add environment variables from your
.envfile - Deploy!
- Go to https://railway.app
- Click "New Project" β "Deploy from GitHub Repo"
- Select your repository
- Add environment variables
- Set start command:
uvicorn app.main:app --host 0.0.0.0 --port $PORT - Deploy!
| Variable | Description | Example |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key for embeddings & LLM | sk-... |
DATABASE_URL |
PostgreSQL connection string | postgresql://user:pass@host:5432/db |
SECRET_KEY |
JWT secret key | your-secret-key |
ALGORITHM |
JWT algorithm | HS256 |
ACCESS_TOKEN_EXPIRE_MINUTES |
JWT expiration time | 30 |
- Backend Framework: FastAPI
- Web Server: Uvicorn
- Database: PostgreSQL with SQLAlchemy ORM
- Vector Store: FAISS
- Embeddings: OpenAI text-embedding-3-small
- LLM: OpenAI GPT-4
- Authentication: JWT (PyJWT)
- Containerization: Docker & Docker Compose
- Secure password hashing with bcrypt
- JWT-based token authentication
- Protected endpoints require valid tokens
- Upload and store documents (txt/pdf)
- Automatic chunking into manageable pieces
- OpenAI embeddings for semantic understanding
- FAISS index for fast similarity search
- Cosine similarity scoring
- Top-K retrieval for context
- OpenAI GPT-4 for answer generation
- Context-aware responses
- Source attribution
Solution: Ensure .env file has your OpenAI API key and is loaded properly.
Solution: Make sure PostgreSQL container is running:
docker-compose psSolution: Check logs and ensure:
- Document is uploaded
- FAISS index exists
- OpenAI API key is valid
- Security: Never commit
.envfile with real secrets - Data:
uploads/,logs/, andfaiss_indexes/are ignored by git - Testing: Use
/docsendpoint to test all APIs interactively - Local Development: Change
DATABASE_URLif using different Postgres setup
Feel free to fork, modify, and improve the project. For major changes, please open an issue first.
This project is provided as-is for educational and assignment purposes.
For issues or questions, please create a GitHub issue or contact the project maintainers.
Last Updated: October 27, 2025