Skip to content

Investigate how we can use multiple different Natural Language Processing techniques and methods in order to automatically recognize the main actions in sports events

Notifications You must be signed in to change notification settings

yanismiraoui/Analyzing-sports-commentary-in-order-to-automatically-recognize-events-and-extract-insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing sports commentary in order to automatically recognize events and extract insights

➡️ This repo regroups the code done to investigate how we can use multiple different Natural Language Processing techniques and methods in order to automatically recognize the main actions in sports events. The aim was to extract insights by analyzing live sport commentaries from different sources and by classifying these major actions.

🔗 Paper

DATA: 🗂️

  • events.csv : main data file used for our analysis
  • events_cleaned.csv : cleaned version of the main data file
  • events_sentiment : portion of the main data file used for sentiment analysis
  • livescore30000.csv : data scraped from the website livescore.com
  • pred_livescore30000.csv : predictions made on the dataset scraped from livescore.com
  • pred_livescore30000_true.csv : predictions made on the dataset scraped from livescore.com along with the true labels
  • dictionary.txt : description of the different event types and their label
  • transcript_paralympic.txt : transcript example of the live commentaries from the Paralympic 2021

NOTEBOOKS: 📚

  • transcription_evaluation.ipynb : transcribe the audio data using the Google Spech-to-Text API and evaluate its quality
  • data_analysis_xgboost.ipynb : explore the distribution of the dataset and to perform a first analysis of the data
  • baseline_svm.ipynb : build an SVM model carefully folllowing the methods used in the baseline
  • otherclassifiers.ipynb : build other classification models using the same methods and cleaning processes
  • scrape_and_evaluation_commentaries.ipynb : scrape the live commentaries of a sample of football games from livescore.com and evaluate the performance of our model on this unseen dataset
  • bert_collab.ipynb : build a BERT classification model using Google Collab
  • sentiment_bert_collab.ipynb : analyze if BERT sentiment analysis could help detect main events using Google Collab
  • grahics.ipynb : build the graphics that will be presented in the research report

🔗 Website of the demo

🔗 Github repository of the demo

About

Investigate how we can use multiple different Natural Language Processing techniques and methods in order to automatically recognize the main actions in sports events

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published