A web application that ranks products based on community sentiment on Reddit. Comments for various products are aggregated from related subreddits to figure out what people actually recommend and use.
First Clone the project
git clone https://github.com/Vchen7629/Cyphria.gitNote: if you don't have node installed on your pc, you need to install it to use the package manager via: https://nodejs.org/en/download
cd frontend
npm install
npm run devThis project uses the UV package manager to manage dependencies.
- Installing UV (Windows)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"- Frontend: React interface
- Backend: FastApi and Apache Airflow
- Data Stores: PostgreSql and Valkey
- Infrastructure: Kubernetes (K3s)
The data pipeline is split into 2 categories, historic and daily
Fresh sentiments for each product is ingested and processed daily for up to date rankings
- Source: Reddit Api with PRAW
- Orchestration: Apache airflow is used to ensure comments are processed in the correct order and trigger ingestion/processing for a different topic every hour of the day to avoid rate limits.
- Processing: Ingested Data is processed by seperate services for sentiment analysis, ranking, and summarization
- Storage: Data is stored in a PostgreSQL database to be queried later.
Comments older than 1 month are pulled from monthly academic torrent sources to fill out the sentiment history
- Source:
- Processing: The large dataset is processed via a Apache Spark Cluster
- Storage: Processed Data is stored in PostgreSQL database to be queried later