# ScrapUncle Call Analysis Pipeline

This pipeline processes audio or text logs from customer support conversations and extracts:

- Call Summary  
- Sentiment (Positive / Neutral / Negative)  
- Topic of Inquiry
- Corrected Transcript

---

## Core Workflow Steps

1. **File Upload**  
   Supported formats: `.mp3`, `.wav`, `.m4a`, `.flac`, `.txt`, `.json`

2. **Audio Preprocessing**    
   - Tools: `librosa`, `soundfile`

3. **Transcription (Audio to text)**  
   - Model: ` Resemblyzer + KMeans`

4. **Conversation Formatting**  
   - Converts plain transcript to structured user/agent JSON

5. **Summarization**  
   - Model: `bart-large-cnn-samsum`  

6. **Topic Extraction**  
   - Model: `KeyBERT`

7. **Sentiment Analysis**  
   - Model: `cardiffnlp/twitter-roberta-base-sentiment` – Hugging Face

---

## How to Use (Colab)

```python
# Upload your file
uploaded = handle_uploaded_file()

# Run the pipeline
result = process_scrap_call_pipeline(uploaded)

# Example Output:
# {
#   "call_duration_seconds": 102.45,
#   "topic": "Pickup scheduling issue",
#   "sentiment": "positive",
#   "sentiment_score": 0.73,
#   "summary": "The agent helped the user schedule a scrap pickup..."
# }


# Install and Import Important Libraries and Tools

In [None]:
!pip install -q git+https://github.com/openai/whisper.git
!pip install -q soundfile
!pip install -q librosa soundfile openai-whisper google-generativeai textblob
!pip install keybert
!pip install resemblyzer

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


# roBERT for SENTIMENT ANALYSIS

In [None]:
!pip install -q transformers torch torchvision

In [None]:
# For sentiment Analysis

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

# Load model and tokenizer once
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment")

# Custom label mapping
custom_label_map = {-1: "negative", 0: "neutral", 1: "positive"}

# Mapping model output index to custom keys
index_to_custom_label = {
    0: -1,  # model index 0 → custom label -1 (negative)
    1: 0,   # model index 1 → custom label  0 (neutral)
    2: 1    # model index 2 → custom label  1 (positive)
}

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

In [None]:
import whisper
import os
import librosa
import soundfile as sf
import re
import tempfile
import getpass
import time
import google.generativeai as genai
from transformers import pipeline
from keybert import KeyBERT # for topic extraction

# libraries for audio segmentation and speaker diarization
from resemblyzer import VoiceEncoder, preprocess_wav
from sklearn.cluster import KMeans
from resemblyzer.hparams import sampling_rate
import numpy as np

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

## For audio segmentation and speaker diarization

In [None]:
from resemblyzer import VoiceEncoder
encoder = VoiceEncoder()

Loaded the voice encoder model on cuda in 0.61 seconds.


## Load the summarization model

In [None]:
summarizer = pipeline("summarization", model="philschmid/bart-large-cnn-samsum")

Device set to use cuda:0


# Pipeline for Text/Audio procesing and Sentiment Analysis

In [None]:
# Load Whisper model
whisper_model = whisper.load_model("medium")


# === audio segmentation and speaker diarization ===
def diarize_and_transcribe_with_labels(audio_path, chunk_length_sec=15, num_speakers=2):
    y, sr = librosa.load(audio_path, sr=16000)
    duration = librosa.get_duration(y=y, sr=sr)

    # === Diarization ===
    wav_f = preprocess_wav(audio_path)
    _, embed_frames, _ = encoder.embed_utterance(wav_f, return_partials=True)

    kmeans = KMeans(n_clusters=num_speakers, random_state=0)
    speaker_labels = kmeans.fit_predict(embed_frames)

    frame_duration = duration / len(speaker_labels)

    # === Chunked Transcription ===
    labeled_segments = []
    for start_sec in range(0, int(duration), chunk_length_sec):
        end_sec = min(start_sec + chunk_length_sec, duration)
        start_sample = int(start_sec * sr)
        end_sample = int(end_sec * sr)
        chunk = y[start_sample:end_sample]

        with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as tmp_wav:
            sf.write(tmp_wav.name, chunk, sr)
            result = whisper_model.transcribe(tmp_wav.name, language="en")
            text = result["text"].strip()

        # Estimate speaker by majority vote from diarization frame indices
        start_idx = int(start_sec / frame_duration)
        end_idx = int(end_sec / frame_duration)
        speaker_slice = speaker_labels[start_idx:end_idx]
        if len(speaker_slice) == 0:
            continue
        majority_speaker = np.bincount(speaker_slice).argmax()

        if len(text) > 5:
            line = f":{majority_speaker+1}: {text}"
            labeled_segments.append(line)

    return labeled_segments, duration

# === Conversation Formatter ===
def format_conversation_to_json(segments):
    speakers = {1: "User", 2: "Agent"}
    conversation = []

    for seg in segments:
        match = re.match(r"^:(\d+):\s*(.*)", seg)
        if match:
            spk_num = int(match.group(1))
            text = match.group(2).strip()
            speaker = speakers.get(spk_num, f"Speaker {spk_num}")
            conversation.append({"sender": speaker, "message": text})
        else:
            print(f"⚠️ Skipped malformed segment: {seg}")

    return conversation

# === Summarizer Function  ===
def summarize_conversation(convo_json):
    try:
        # Step 1: Flatten conversation to text format
        text = "\n".join([f"{msg['sender'].capitalize()}: {msg['message']}" for msg in convo_json])

        summary = summarizer(text, max_length=200, min_length=30, do_sample=False)[0]['summary_text']
        return summary

    except Exception as e:
        print(f"⚠️ Summary Error: {e}")
        return "Summary not available."

# === Topic Extractor ===
def extract_topic(summary, top_n=5):

    kw_model = KeyBERT()

    keywords = kw_model.extract_keywords(
        summary,
        keyphrase_ngram_range=(1, 2),
        stop_words='english',
        top_n=top_n
    )

    if keywords:
        return keywords[0][0]  # Return only the topic with highest score
    else:
        return None

# === Sentiment Analysis ===

def analyze_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=1)[0]

    sentiment_score = probs[2] - probs[0]  # positive - negative score
    sentiment_label = custom_label_map[index_to_custom_label[probs.argmax().item()]]
    return sentiment_label, round(float(sentiment_score), 3)

In [None]:
def process_audio_librosa_pipeline(audio_path):

    segments, duration = diarize_and_transcribe_with_labels(audio_path)
    conversation = format_conversation_to_json(segments)
    summary = summarize_conversation(conversation)
    topic = extract_topic(summary)
    sentiment, score = analyze_sentiment(summary)

    result = {
        "call_duration_seconds": round(duration, 2),
        "topic": topic,
        "sentiment": sentiment,
        "sentiment_score": score,
        "summary": summary
    }

    return result

def process_text_or_json(convo_json_or_raw):
    if isinstance(convo_json_or_raw, str):
        # Treat raw text
        lines = convo_json_or_raw.strip().split('\n')
        convo_json = [{"sender": "user" if i % 2 == 0 else "bot", "message": line.strip()}
                      for i, line in enumerate(lines) if line.strip()]
    elif isinstance(convo_json_or_raw, list):
        convo_json = convo_json_or_raw
    else:
        print("❌ Invalid input format for text-based processing.")
        return None

    summary = summarize_conversation(convo_json)
    topic = extract_topic(summary)
    sentiment, score = analyze_sentiment(summary)

    result = {
        "message_count": len(convo_json),
        "topic": topic,
        "sentiment": sentiment,
        "sentiment_score": score,
        "summary": summary
    }

    print("\n📜 Summary:\n", result)
    return result


# === **Final Pipeline** ===
## process_scrap_call_pipeline( file_name )

In [None]:
def process_scrap_call_pipeline(uploaded):
    if not uploaded:
        print("❌ Failed.")
        return

    file_name = list(uploaded.keys())[0]

    ext = os.path.splitext(file_name)[1].lower()

    if ext in ['.wav', '.mp3', '.m4a', '.flac']:
        print("Detected audio file. Processing...")
        return process_audio_librosa_pipeline(file_name)

    elif ext in ['.json', '.txt']:
        print("Detected JSON/text file. Processing text...")

        try:
            with open(file_name, 'r') as f:
                content = f.read()
                try:
                    data = json.loads(content)
                except json.JSONDecodeError:
                    data = content  # Plain text fallback
        except Exception as e:
            print(f"⚠️ File Read Error: {e}")
            return

        return process_text_or_json(data)

    else:
        print("❌ Unsupported file type.")
        return

# **USE PIPELINE**


In [None]:
process_scrap_call_pipeline(file_path)

Saving 0df64e4188a5a440eafc9521854f1971.mp3 to 0df64e4188a5a440eafc9521854f1971 (2).mp3
✅ File uploaded:
Detected audio file. Processing...


{'call_duration_seconds': 70.99,
 'topic': 'cooler agent',
 'sentiment': 'neutral',
 'sentiment_score': -0.351,
 'summary': "User needs to get some of his stuff from the factory, but his pick up can't be today, because he's getting late. The next step is to share the photo of the cooler with Agent. "}

In [None]:
process_scrap_call_pipeline(file_path)