A sophisticated AI-powered chatbot system designed for FinSolve Technologies that provides role-based access to company documents using Retrieval-Augmented Generation (RAG).
This system enables employees across different departments to query company information through a conversational AI interface, with strict role-based access controls ensuring users only see information relevant to their department and clearance level.
- Backend: FastAPI application with RAG pipeline
- Frontend: React-based chat interface
- Vector Store: ChromaDB for document embeddings and retrieval
- LLM: HuggingFace API for natural language generation
- Authentication: Role-based access control system
- Database: MongoDB for user management (if needed for persistence)
Component | Technology |
---|---|
Backend Framework | FastAPI |
Frontend Framework | React 18 |
Vector Database | ChromaDB |
LLM Provider | HuggingFace |
Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
Authentication | Basic Auth with RBAC |
Styling | Tailwind CSS |
HTTP Client | Axios |
Role | Username | Password | Access Permissions |
---|---|---|---|
C-Level Executive | tony_sharma |
password123 |
π ALL departments (Finance, Marketing, HR, Engineering, General) |
Engineering | peter_pandey |
engineer123 |
π§ Engineering documents + General employee handbook |
Finance | finance_user |
finance123 |
π° Finance documents + General employee handbook |
Marketing | marketing_user |
marketing123 |
π Marketing documents + General employee handbook |
HR | hr_user |
hr123 |
π₯ HR documents + General employee handbook |
Employee | employee_user |
employee123 |
π ONLY General employee handbook |
The system processes and provides access to the following document types:
- engineering_master_doc.md: Complete technical architecture, development processes, technology stack, security frameworks, testing methodologies, and operational guidelines
- financial_summary.md: Annual financial performance, expense analysis, cash flow data
- quarterly_financial_report.md: Detailed Q1-Q4 2024 financial results and projections
- hr_data.csv: Employee dataset with demographics, compensation, leave, attendance, and performance data (100 employee records)
- marketing_report_2024.md: Annual marketing performance overview
- marketing_report_q1_2024.md: Q1 marketing campaigns and metrics
- marketing_report_q2_2024.md: Q2 marketing campaigns and metrics
- marketing_report_q3_2024.md: Q3 marketing campaigns and metrics
- market_report_q4_2024.md: Q4 marketing campaigns and metrics
- employee_handbook.md: Company policies, benefits, procedures, code of conduct, and general employee information
- Python 3.10+
- Node.js 16+
- Yarn package manager
# Install Python dependencies
cd /app
pip install -e .
# Install frontend dependencies
cd frontend
yarn install
Backend .env
file is already configured:
HUGGINGFACE_API_KEY=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
MONGO_URL=mongodb://localhost:27017/finsolve_rag
CHROMA_PERSIST_DIRECTORY=./chroma_db
Frontend .env
file:
REACT_APP_BACKEND_URL=http://localhost:8000
Backend Server:
cd /app
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Frontend Server:
cd /app/frontend
yarn start
Run the comprehensive test suite:
cd /app
python backend_test.py
This tests:
- β Authentication for all user roles
- β Invalid login attempts
- β Role-based document access restrictions
- β Chat functionality with proper RBAC
- β Cross-role access prevention
- β API endpoint functionality
Test Finance User Access:
curl -X POST -u finance_user:finance123 \
-H "Content-Type: application/json" \
-d '{"message":"What was our revenue in Q4 2024?"}' \
http://localhost:8000/api/chat
Test Employee Access Restriction:
curl -X POST -u employee_user:employee123 \
-H "Content-Type: application/json" \
-d '{"message":"What was our revenue in Q4 2024?"}' \
http://localhost:8000/api/chat
- "What was our total revenue for 2024?"
- "What is our current technology stack?"
- "How many employees do we have and what's their performance distribution?"
- "What were our most successful marketing campaigns?"
- "What is our technology stack and architecture?"
- "How do we handle CI/CD and deployment processes?"
- "What security frameworks and compliance standards do we follow?"
- "What are our monitoring and maintenance procedures?"
- "What was our Q4 2024 revenue and how did it compare to previous quarters?"
- "What are our main expense categories and cost breakdowns?"
- "How did our gross margins perform throughout 2024?"
- "What are our cash flow trends and financial risks?"
- "What was our marketing spend in 2024 and how was it allocated?"
- "Which marketing campaigns performed best in terms of ROI?"
- "What is our customer acquisition cost and conversion rates?"
- "How did our brand awareness and engagement metrics change?"
- "How many employees do we have across different departments?"
- "What is the average performance rating and attendance percentage?"
- "What are our leave utilization patterns and balance trends?"
- "What is our salary distribution and compensation structure?"
- "What are the company leave policies and how do I apply?"
- "What benefits and perquisites are available to employees?"
- "What are the working hours and attendance requirements?"
- "How do performance reviews and career development work?"
- Strict Permission Enforcement: Users can only access documents from their authorized departments
- Cross-Role Access Prevention: Finance users cannot see engineering data, employees cannot see executive information
- Source Attribution: All responses include clear source document references
- Authentication Required: All endpoints (except health check) require valid authentication
- Document Metadata: Each document chunk includes department classification
- Query Filtering: Vector store searches are filtered by user permissions before retrieval
- Secure Authentication: Basic auth with encrypted password storage
- Error Handling: Graceful degradation without information leakage
-
Access the Frontend: Navigate to
http://localhost:3000
-
Login: Use any of the provided demo accounts or click the quick login buttons
-
Chat Interface:
- Type questions in natural language
- Get AI-powered responses with source citations
- Use suggested questions for your role
- View conversation history with timestamps
-
Role-Specific Experience:
- See only relevant sample questions for your role
- Responses limited to your authorized document access
- Clear indication of your current role and permissions
Health Check:
curl http://localhost:8000/api/health
Login:
curl -X POST -H "Content-Type: application/json" \
-d '{"username":"tony_sharma","password":"password123"}' \
http://localhost:8000/api/login
Chat:
curl -X POST -u username:password \
-H "Content-Type: application/json" \
-d '{"message":"Your question here"}' \
http://localhost:8000/api/chat
Get Available Documents:
curl -u username:password http://localhost:8000/api/documents
- Total Documents Processed: 134 chunks across 10 files
- Document Types: Markdown (.md) and CSV (.csv) files
- Vector Embeddings: 384-dimensional embeddings using sentence-transformers
- Supported Users: 6 different roles with varying permission levels
- Test Coverage: 28 comprehensive test cases with 100% pass rate
-
Document Processing:
- Documents are chunked into 1000-character segments with 200-character overlap
- Each chunk is embedded using sentence-transformers
- Metadata includes department classification and source file information
-
Query Processing:
- User query is embedded using the same model
- Vector similarity search finds relevant document chunks
- Results are filtered by user's role permissions
-
Response Generation:
- Retrieved documents provide context for the LLM
- HuggingFace API generates natural language responses
- Response includes source citations and department attributions
ROLE_PERMISSIONS = {
"finance": ["finance", "general"],
"marketing": ["marketing", "general"],
"hr": ["hr", "general"],
"engineering": ["engineering", "general"],
"c_level": ["finance", "marketing", "hr", "engineering", "general"],
"employee": ["general"]
}
- Advanced Authentication: Integration with LDAP/Active Directory
- Audit Logging: Track all user queries and accessed documents
- Document Versioning: Handle document updates and change tracking
- Advanced Analytics: Usage patterns and popular query analysis
- Multi-language Support: Internationalization for global teams
- Mobile Application: Native mobile apps for on-the-go access
Backend won't start:
- Check if port 8000 is available
- Verify all Python dependencies are installed
- Check logs:
tail -f server.log
Frontend won't connect:
- Ensure backend is running on port 8000
- Verify REACT_APP_BACKEND_URL in frontend/.env
- Check if port 3000 is available
Authentication failures:
- Verify username/password combinations
- Check network connectivity between frontend and backend
- Review browser console for CORS errors
No documents found:
- Verify documents exist in
/app/resources/data/
- Check vector store initialization in backend logs
- Confirm user has permissions for the queried department
For technical issues or questions:
- Engineering Team: peter@finsolve.com
- System Admin: admin@finsolve.com
- Documentation: This README and in-code comments
Built with β€οΈ for FinSolve Technologies
Empowering teams with secure, intelligent access to company knowledge