# Narrative Labelling with GPT-4

## Overview
This document outlines a Python script designed to process multiple CSV files containing narrative data. The script generates overarching labels for sets of claims and entities using OpenAI's API, and outputs these labels to a new CSV file.

## Dependencies
- `openai`: To leverage OpenAI's GPT-4 model.
- `csv`: For handling CSV file operations.
- `os`: To interact with the file system.

## Process Description
1. **OpenAI API Key Setup**: The script loads the OpenAI API key from an environment variable.
2. **Function to Process CSV**: A dedicated function (`process_csv`) reads a CSV file, extracts unique entities and claims from the rows, and returns them.
3. **Directory and Output File Setup**: The script sets up the directory containing narrative CSV files (`Narratives/`) and the path for the output CSV (`Narrative/narrative_labelling.csv`).
4. **Iterating Through Files**: The script iterates through each CSV file in the directory, processing those that fit the narrative data format.
5. **Generating Super Labels with OpenAI**: For each file, it combines the claims and entities and sends them to OpenAI's model to generate a comprehensive label that represents the overall topic.
6. **Error Handling**: The script includes error handling for robust processing.
7. **Writing Results to CSV**: The generated labels (`super labels`) are written to a new CSV file along with the respective file names.

## Conclusion
The script efficiently processes narrative data from multiple CSV files, using OpenAI's GPT-4 model to create succinct, overarching labels that summarize the content of each file.


In [28]:
import openai
import csv
import os

# Load your OpenAI API key from an environment variable for better security
openai.api_key = ''

# Function to read and process the CSV file
def process_csv(file_path):
    with open(file_path, mode='r', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        rows = list(reader)
        unique_entities = set()
        claims = []

        # Extract unique entities and claims
        for row in rows:
            entities = row['entities'].split(',')  # Assuming entities are comma-separated
            unique_entities.update(entities)
            claims.append(row['text'])

        return list(unique_entities), claims

# Directory containing the CSV files
directory_path = 'Narratives/'

# Path for the output CSV file
output_file_path = 'Narratives/narrative_labelling.csv'

# List to store the results
results = []

# Iterate through each file in the directory
for filename in os.listdir(directory_path):
    if filename.startswith("narrative_") and filename.endswith(".csv"):
        csv_file_path = os.path.join(directory_path, filename)

        # Get unique entities and claims from the CSV
        unique_entities_list, claims = process_csv(csv_file_path)

        # Generate a super label for all claims and entities
        try:
            # Prepare the content for OpenAI
            content = ("You are an intelligent assistant skilled in generating overarching topics for fact-checking data. "
                       "Given a set of claims and unique entities, please generate a single, comprehensive short label that represents the overall topic these elements might be part of. "
                       "Claims: " + '; '.join(claims) + ". "
                       "Entities: " + ', '.join(unique_entities_list) + ".")
            
            # Generate the label using OpenAI
            response = openai.ChatCompletion.create(
                model="gpt-4",
                messages=[{"role": "user", "content": content}]
            )

            # Extract the response
            super_label = response.get('choices', [{}])[0].get('message', {}).get('content', '').strip()

            # Add the results to the list
            results.append([filename, super_label])

        except Exception as e:
            print(f"An error occurred while generating the super label for {filename}: {e}")

# Write the results to a CSV file
with open(output_file_path, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['File Name', 'Super Label'])
    writer.writerows(results)