Content Platform: Sentiment Analysis

A data platform comprised of Twitter content and a sentiment analysis pipeline

Architecture

Elasticsearch
Apache Pulsar (Stream & Stream Processing)
FastAPI + WebSockets (Query Tweets + Sentiment Scores)
Batch + Streaming Pipeline (Tweet Acquisition + Scoring)

Summary

Elasticsearch stores tweets processed through pulsar and pulsar-functions. The crawler is a set of scripts/functions that load/stream data into pulsar and parse/run's NLP Sentiment Analysis. The API will query Elasticsearch, and trigger crawler jobs to load content into Elasticsearch.

Run Pulsar locally

docker run -it \
  -p 6650:6650 \
  -p 8080:8080 \
  --mount source=pulsardata,target=/pulsar/data \
  --mount source=pulsarconf,target=/pulsar/conf \
  apachepulsar/pulsar:2.8.1 \
  bin/pulsar standalone

Run Elasticsearch locally

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.15.2
docker run -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.15.2

Setup for Local Development

conda env create -f environment.yml
conda activate sentiment
python setup.py develop

Run Webserver

./sentiment_api/scripts/runserver-dev.sh

Run Batch Process: Content Acquisition and Sentiment Analysis

Requires elasticsearch to be running on localhost:9200

python tweet_analysis/orchestrator.py

Current Build

API Endpoints (and scheduled background tasks)

Fetch Latest Content (param: days_back)
Fetch Top Trending Content
Search for Content

Scheduled scrape jobs

Scrape Top Trending Tags for content -> Elasticsearch
Perform Sentiment Analysis on content In Elasticsearch
- query 10 mins back in ES
- Score tweet text
- push updates to ES

Streaming: Future Feature

Pulsar Function that performs sentiment analysis on content as it is streamed into pulsar topics
- Research pulsar-admin and how to implement in production (Dockerfile? docker_run.sh?)

API Blueprint

Endpoints:
- Query a user's home timeline (tweepy)
- Fetch tweets by keyword (elasticsearch)

Will likely need to use docker-compose.yml for easy launch of a local dev instance (pulsar and elasticsearch included)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
sentiment_api		sentiment_api
tweet_analysis		tweet_analysis
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py
stream_arch.png		stream_arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Platform: Sentiment Analysis

Architecture

Summary

Run Pulsar locally

Run Elasticsearch locally

Setup for Local Development

Run Webserver

Run Batch Process: Content Acquisition and Sentiment Analysis

Current Build

API Endpoints (and scheduled background tasks)

Scheduled scrape jobs

Streaming: Future Feature

API Blueprint

About

Releases

Packages

Languages

tspannhw/sentiment-analysis-platform

Folders and files

Latest commit

History

Repository files navigation

Content Platform: Sentiment Analysis

Architecture

Summary

Run Pulsar locally

Run Elasticsearch locally

Setup for Local Development

Run Webserver

Run Batch Process: Content Acquisition and Sentiment Analysis

Current Build

API Endpoints (and scheduled background tasks)

Scheduled scrape jobs

Streaming: Future Feature

API Blueprint

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages