Skip to content

A web app for real-time transcription, speaker diarization, and sentiment analysis

License

Notifications You must be signed in to change notification settings

yujansaya/diarization_sentiment_analyse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Diarization and Sentiment Analysis

This project provides a web application for audio diarization and sentiment analysis of conversations. Audio diarization is the process of segmenting and labeling an audio recording based on speaker identities, while sentiment analysis aims to extract sentiment or psychological insights from the conversation. This repository contains Python code utilizing Flask, Whisper API, OpenAI API, and Pyannote to achieve these functionalities.

Functionality:

  • Users can upload audio files (in WAV, MP3, or M4A formats) through a web interface.Screenshot 2024-04-26 at 1 39 58 PM
  • Upon upload, the system transcribes the audio using a speaker diarization algorithm. Screenshot 2024-04-26 at 1 27 55 PM
Screenshot 2024-04-26 at 1 37 46 PM
  • It then performs sentiment analysis on the transcribed text. Screenshot 2024-04-26 at 1 37 57 PM

  • Results, including the transcript and sentiment analysis, are displayed to the user.

Components:

  • main.py: Contains the Flask application, defining routes for file upload and handling, transcribing audio, and performing sentiment analysis using OpenAI.
  • diarisation.py: Implements the speaker diarization algorithm using the Pyannote library. It preprocesses the audio, extracts speaker embeddings, performs clustering, and generates a transcript with speaker labels.
  • upload.html: HTML template for the upload form and result display. Utilizes JavaScript and jQuery for asynchronous file upload and progress tracking.

Dependencies:

  • Flask: Web framework for Python.
  • Pyannote: Library for speaker diarization.
  • OpenAI API: Used for sentiment analysis.
  • Google Cloud Storage: Used for storing uploaded audio files.
  • FFmpeg: Required for audio preprocessing.
  • Werkzeug: Utility library for Flask.
  • Bootstrap: CSS framework for frontend styling.

Workflow:

  • Users upload an audio file through the web interface.
  • The file is processed, transcribed, and analyzed asynchronously.
  • Upon completion, the transcript and sentiment analysis results are displayed to the user.

Deployment: This Flask application is deployed on Google Cloud, access it through this link

Installation

Before running the application, ensure you have Python installed on your system. Additionally, install the required Python packages using pip. The required packages are listed in requirements.txt. As well there is archived ffmpeg file, please unzip it (if you are using Mac). Other wise you coukd download the file from this link. And please remember to change the path to the ffmpeg file in main.py.

os.environ["PATH"] += os.pathsep + f'YOUR ABSOLUTE PATH TO FFMPEG FILE'

Usage

  1. Clone this repository to your local machine:
git clone https://github.com/your_username/your_repository.git
cd your_repository
  1. Set up your OpenAI API key in main.py with your own API key.

  2. Run the Flask application:

python main.py
  1. Open your web browser and go to http://127.0.0.1:5000/. You will see a file upload form.

  2. Upload an audio file.

  3. Wait for the processing to complete. The application will transcribe the audio and perform sentiment analysis. Once done, you will see the transcription and sentiment analysis displayed on the web page.

File Structure

  • main.py: Contains Flask routes for handling file upload, audio diarization, and sentiment analysis.
  • diarisation.py: Implements the SpeakerDiarizer class for audio diarization using Pyannote.
  • upload.html: HTML template for the file upload form.

Dependencies

  • Flask: A micro web framework for Python.
  • Pyannote: A toolkit for speaker diarization and speaker embedding.
  • Whisper API: Used for transcribing audio.
  • OpenAI API: Used for sentiment analysis.
  • NumPy: A library for numerical computing.
  • scikit-learn: A machine learning library for Python.

Final thoughts

Upon finishing the project, I reflected on the following points:

  • Performance Optimization: Consider optimizing the application for better performance, such as by optimizing code execution, improving concurrency, or implementing caching mechanisms.

  • User Experience Enhancements: Enhance the user experience by adding features like real-time progress updates during audio processing, better error handling, or interactive visualizations for analysis results.

  • Scalability and Deployment: Think about strategies for scaling the application, such as deploying it to scalable cloud platforms like Google App Engine or Kubernetes for handling increased user traffic.

About

A web app for real-time transcription, speaker diarization, and sentiment analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published