# Automatic Topic Labelling with OpenAI API

## Overview
This document describes a Python script used for processing a CSV file containing topics. The script uses the OpenAI API to generate human-readable labels and domain names for each topic, and then outputs the enhanced data to a new CSV file.

## Dependencies
- `openai`: For using OpenAI's GPT-4 model.
- `csv`: For reading and writing CSV files.
- `json`: For JSON parsing.
- `os`: For interacting with the operating system.

## Process Flow
1. **Setup OpenAI API Key**: Load the OpenAI API key from an environment variable for security.
2. **CSV File Paths**: Set file paths for the input CSV (`biden_topic_info.csv`) and the output CSV (`biden_topic_labelled.csv`).
3. **Reading CSV File**: Read the CSV file and store rows in a list.
4. **Processing Each Row**: Extract columns like `Topic`, `Representation`, and `Representative_Docs` from each row.
5. **Generating Labels with OpenAI**: Use OpenAI's ChatCompletion model to generate labels and domain names based on the topic data.
6. **Handling Response**: Split the response from OpenAI into label and domain, and update each row with this new information.
7. **Error Handling**: Implement exception handling for robust processing.
8. **Writing to a New CSV File**: Write the updated data with labels and domain names to a new CSV file.

## Conclusion
This script demonstrates the integration of OpenAI's GPT-4 model for enhancing CSV data, particularly for labeling topics and categorizing them into specific domains.


In [None]:
import openai
import csv
import json
import os

# Load your OpenAI API key from an environment variable for better security
openai.api_key = ''

# Path to your CSV file
csv_file_path = 'biden_topic_info.csv' 

# Updated CSV file path
updated_csv_file_path = 'biden_topic_labelled.csv' 

# Read the CSV file
with open(csv_file_path, mode='r', encoding='utf-8') as file:
    reader = csv.DictReader(file)
    rows = list(reader)

# Process each row
for row in rows:
    try:
        # Extract the necessary columns
        topic = row['Topic']
        representation = row['Representation']
        representative_docs = row['Representative_Docs']

        # Generate the human-readable label and domain name using OpenAI
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{
                "role": "user",
                "content": ("You are an intelligent assistant skilled in generating labels and domain names for topics for fact-checking data. "
                            "Given the following details about a topic: "
                            f"Non-human readable Topic: {topic}. "
                            f"Representative Documents: {representative_docs}, "
                            f"Keywords: {representation}. "
                            "Please generate a concise, human-readable label for this topic, and suggest a relevant topic domain/context name such as politics, healthcare, finance, or social issues etc.. "
                            "Return only the label and a topic domain name.")
            }]
        )

        # Extract the response
        response_text = response.get('choices', [{}])[0].get('message', {}).get('content', '').strip()

        # Check if the response contains a comma
        if ',' in response_text:
            label, domain = response_text.split(',', 1)  # Splitting the response into label and domain
        else:
            label = response_text  # Use the entire response as the label
            domain = ''  # Default to an empty string if no domain is provided

        # Update the row with the generated data
        row['Human_Readable_Topic'] = label.strip()
        row['Domain_Name'] = domain.strip()

    except Exception as e:
        print(f"An error occurred while processing row: {e}")

# Write the updated data to a new CSV file
with open(updated_csv_file_path, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=reader.fieldnames + ['Human_Readable_Topic', 'Domain_Name'])
    writer.writeheader()
    writer.writerows(rows)