Skip to content

trannguyenhan/X-news

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Install

pip3 install scrapy
pip3 install bs4
pip3 install lxml
pip3 install pyspark

create new topic in kafka:

bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --topic x_news_1 --bootstrap-server localhost:9092

Run

cd to consumer folder and run consumer kafka in package:

java -jar target/consumer-V1-jar-with-dependencies.jar 

cd to crawler folder and run command:

scrapy crawl news

Refer

About

Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published