Skip to content

tanushreebanerjee/cos514_final_project

Repository files navigation

Counterfactual Analysis for Spoken Dialog Summarization

COS514 Final Project, Fall 2023, Princeton University Authors: Tanushree Banerjee, Kiyosu Maeda Instructor: Prof. Sanjeev Arora

This project aims to analyze the impact of speaker diarization and speech recognition errors on the quality of text summarization in spoken dialogue. By injecting artificial errors into transcripts and leveraging large language models (LLMs), we aim to understand how errors in upstream tasks affect downstream summarization tasks.

Project Structure

project-root/
├── data/: Contains raw and processed data.
    ├── AMI: data from AMI corpus.
        └── ${meeting_id}
            ├── abstractive_annotation.txt: Annotated abstractive summary.
            ├── extractive_annotation.txt: Annotated extractive summary.
            └── segments_wer_0_der_0.txt: Annotated dialogue.
    ├── raw/: Original data from ICSI or AMI corpus.
    └── processed/: Processed data for the project.
├── src/: Python source code for different stages.
    ├── data_preparation.py: Script for loading and preparing data.
    ├── error_injection.py: Script for injecting errors into transcripts.
    ├── summarization.py: Script for summarizing transcripts.
    └── evaluation.py: Script for evaluating the summarization.
├── requirements.txt: List of Python dependencies for easy installation.
├── LICENSE: License information for your project.
└── README.md: Project documentation.

Getting Started

  1. Clone this repository.
    git clone REPO_URL
  1. Set up a conda environment and install dependencies from the requirements.txt file:
    conda create --name cos514 --file requirements.txt

Alternatively, you may use the install.sh script to create the environment and install the dependencies:

    bash scripts/install.sh
  1. Activate the conda environment:
    conda activate cos514
  1. Set OpenAI API key as the environment variable.
    export OPENAI_API_KEY=YOUR_API_KEY
  1. Run the main script:
    python main.py

License

This project is licensed under the MIT License.

About

COS514 Fall 2023 Final Project: Counterfactual Analysis for Spoken Dialog Summarization Tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published