CrisisCast

🆘 Real-time crisis detection from social signals using scalable big data pipelines and AI.

Note: For a more detailed analysis and screenshots, see the Project Report.

CrisisCast is a fault-tolerant, real-time social signal decoder for emergencies. It ingests, processes, classifies, and visualizes crisis-related content from multiple digital platforms, transforming raw social media and news feeds into actionable intelligence for emergency responders and decision-makers.

📑 Table of Contents

🚀 Features
🏗️ Architecture
🛠️ Tech Stack
📋 Prerequisites
⚙️ Usage
📁 Project Structure
✨ Future Enhancements
👥 Contributors

Features

Multi-source Ingestion: Streams posts from Reddit, Bluesky (AT Protocol), and Google News RSS.
Scalable Streaming: Built on Apache Kafka (three-broker cluster) and Spark Structured Streaming for high-throughput, fault-tolerant data pipelines.
AI-powered Classification: FLAN-T5-base model hosted via FastAPI to classify posts into 13 crisis categories.
Semantic Search: Embeds text with SentenceTransformer (all-MiniLM-L6-v2) and stores vectors in Qdrant for low-latency, metadata-filtered similarity search.
Interactive Dashboard: Real-time visualizations (live feed, time-series, heatmaps, semantic search) built with Plotly Dash.
Resilience & Monitoring: Exponential backoff, Kafka replication, Spark checkpointing, memory monitoring, and Docker-based containerization.

Architecture

Ingestion Layer (ingestion/):
- Custom Kafka producers serialize and publish JSON posts from each source.
Processing Layer (processing/ & classification/):
- Spark Structured Streaming consumes Kafka topics, enforces schema, timestamps records, and calls the classification API.
- Classification service (FastAPI + FLAN-T5) returns crisis labels and metadata.
- Batch embedding and upserts to Qdrant via the utils/ helper module.
Storage:
- MongoDB: Time-series store for enriched post metadata and classification labels.
- Qdrant: Vector database for similarity search with metadata filtering.
Visualization Layer (dashboard/):
- Plotly Dash app retrieves data from MongoDB and Qdrant, rendering live feeds, trend charts, and semantic search results.

Tech Stack

Apache Kafka – Distributed event streaming
PySpark (Structured Streaming) – Distributed micro-batch processing
FastAPI + HuggingFace Transformers (FLAN-T5) – Classification API
SentenceTransformers – Text embedding
Qdrant – Vector similarity search
MongoDB – NoSQL time-series storage
Plotly Dash – Interactive web dashboard
Docker & Docker Compose – Container orchestration

Prerequisites

Docker & Docker Compose (≥ 1.29)
Python 3.8+
Java (for standalone Spark, if running locally)
MongoDB
Reddit API credentials
Bluesky account credentials
pip install requirements.txt

Usage

Live Feed: Displays the latest 20 posts (auto-refresh every 10 s).
Trend Analysis: Time-series charts of crisis mentions over selectable windows (6, 12, 24 h).
Semantic Search: Enter a free-text query to surface similar crisis posts via vector search.
Filtering: By source, crisis category, and time window.

Project Structure

crisiscast/
├── classification/        # FastAPI service for FLAN-T5 classification
├── consumer/              # Kafka consumers & MongoDB/Qdrant upserts
├── dashboard/             # Plotly Dash web application
├── ingestion/             # Kafka producers for Reddit, Bluesky, Google News
├── processing/            # Spark Structured Streaming job
├── utils/                 # Shared utility functions (schema, embedding, logging)
├── tests/                 # Unit and integration tests
├── docker-compose.yml     # Multi-container orchestration
├── requirements.txt       # Python dependencies
├── start_up.sh            # Helper script to launch components in order
└── README.md              # This document

Future Enhancements

GPU-accelerated model serving (e.g., Triton Inference Server)
Automated model fine-tuning with incoming data (RAG pipeline)
Kubernetes-based Spark and Kafka clusters for dynamic scaling
Multi-language support and expanded source coverage
Alerting and notification integration (Slack, SMS)

Contributors

Austin Huang
Nikhil Soni
Omer Basar
Sarasa Pattabiraman
Tanaya Pawar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrisisCast

📑 Table of Contents

Features

Architecture

Tech Stack

Prerequisites

Usage

Project Structure

Future Enhancements

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
classification		classification
consumer		consumer
dashboard		dashboard
ingestion		ingestion
processing		processing
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
start_up.txt		start_up.txt

Folders and files

Latest commit

History

Repository files navigation

CrisisCast

📑 Table of Contents

Features

Architecture

Tech Stack

Prerequisites

Usage

Project Structure

Future Enhancements

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages