COVID-19 Live Tweet Analyzer

Pulling worldwide tweets and analyzing most popular words, hashtags, most tweeted locations and more. Ingesting data using Kafka, storing in Cassandra, analyzing with Spark, and scheduling with Airflow.

the system is comprised of 3 Apache Kafka microservices - 1 consumer that pulls tweets from twitter and push them to raw_tweet_data Kafka topic, 1 consumer-producer to get the raw tweets, parse them and publish to a raw_tweet_data Kafka topic, and the last consumer, that get the parsed tweets and publishes them to cassandra.

After this process is done, an Apache Spark service is spun-up, to pull the data from cassandra and analize it as written above (get most popular words, hashtags, most tweeted locations and more.). After the analysis, the results are written to an incremental resuts file, that can be use for dashbording etc.

All these services are scheduled by Apache Airflow

High level system architecture:

NOTE
Please don't take the project's architecture as an indicator of my skills or understanding of the technologies.
This project was made for hands-on experience with the tech, in the pupose to set up and use each of the technologies in one project.

Usage

(These steps apply for the state when Kafka, Airflow, Spark, and Cassandra are all setup and ready to go, and the user has a twitter API user, and credentials).

spin up zookeeper in order for Kafka services to work (can be done using Airflow - todo)
start the airflow DAG to start the process

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
Kafka shell commands.txt		Kafka shell commands.txt
KafkaConsumerProducer.py		KafkaConsumerProducer.py
KafkaConsumerWriteToCassandra.py		KafkaConsumerWriteToCassandra.py
KafkaTwitterProducer.py		KafkaTwitterProducer.py
LICENSE.md		LICENSE.md
README.md		README.md
SparkCassandraAnalysis.py		SparkCassandraAnalysis.py
airflowDag.py		airflowDag.py
covid-19-twitter-analytics.png		covid-19-twitter-analytics.png
createDBandTableCassandra.sql		createDBandTableCassandra.sql
exampleCassandraInteraction.py		exampleCassandraInteraction.py
exampleCommentTweetJSON.json		exampleCommentTweetJSON.json
tweet fields.txt		tweet fields.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Live Tweet Analyzer

Usage

About

Releases

Packages

Languages

License

vicmar57/COVID-19-Live-Tweet-Analyzer-Kafka-Spark-Cassandra-and-Airflow

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Live Tweet Analyzer

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages