Big Data Pipeline:

Prediction Of Football Matches Results

This is a pipeline that uses many Big Data technologies to predict results of a football match.

Requirements

Pull this image docker please. It is an Ubuntu image with Hadoop (2.7.2), Spark (2.2.1), Kafka (2.11-1.0.2) and HBase (1.4.8)

   docker pull liliasfaxi/spark-hadoop:hv-2.7.2

Data

We took the date from Kaggle: Matches in LaLiga.
The table consists of 37147 lines.
Each line represents a result of a football match.

Architecture

- Took the dataset from Kaggle which contains results from football matches in LaLiga and done some preprocessing.

- Launched a mapReduce job to calculate the average goals scored by each team.

- Sent the generated output through Kafka to be stored in HBase.

- Launched Spark job to extract that data from HBase and to use it to predict the result of a football match between two teams mentioned in the users requests.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
hadoop		hadoop
hbase		hbase
kafka-producer		kafka-producer
spark-streaming		spark-streaming
README.md		README.md
football.txt		football.txt
main.py		main.py
tp-big-data-project-0.0.1-SNAPSHOT.jar		tp-big-data-project-0.0.1-SNAPSHOT.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Pipeline:

Prediction Of Football Matches Results

Requirements

Data

Architecture

About

Releases

Packages

Contributors 2

Languages

zeineblabbane/Big-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Big Data Pipeline:

Prediction Of Football Matches Results

Requirements

Data

Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages