In [1]:
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn"
)


Device set to use cpu


In [2]:
def generate_summary(text):
    summary = summarizer(
        text,
        max_length=130,
        min_length=30,
        do_sample=False
    )
    return summary[0]['summary_text']


In [3]:
def chunk_text(text, chunk_size=800):
    words = text.split()
    for i in range(0, len(words), chunk_size):
        yield " ".join(words[i:i + chunk_size])


In [6]:
from pathlib import Path
BASE_DIR = Path.cwd().parent
DATA_PATH = BASE_DIR / "data" / "stakeholder_comments_500.csv"
import pandas as pd

df = pd.read_csv(DATA_PATH)
all_comments = " ".join(df["comment_text"].astype(str).tolist())


In [7]:
summaries = [generate_summary(chunk) for chunk in chunk_text(all_comments)]
final_summary = " ".join(summaries)


In [8]:
print(final_summary)

Amendment is a progressive step towards modern corporate regulation. Reduced penalties under this amendment will help MSMEs grow sustainably. The provision aligns well with global corporate governance standards. Amendment is a progressive step towards modern corporate regulation. Reduced penalties under this amendment will help MSMEs grow sustainably. The provision aligns well with global corporate governance standards. I strongly support this amendment as it promotes transparency and accountability. This change will encourage better corporate governance practices. The amendment is a progressive step towards modern corporate regulation. Reduced penalties under this amendment will help MSMEs grow sustainably. The amendment appears reasonable, though implementation will be important. Some provisions are clear while others need more explanation. The changes seem manageable but require proper awareness. The amendment is neither significantly beneficial nor harmful. The amendment is neither

In [9]:
grouped_comments = (
    df.groupby("expected_sentiment")["comment_text"]
      .apply(lambda x: " ".join(x.astype(str)))
      .to_dict()
)

grouped_comments.keys()


dict_keys(['Negative', 'Neutral', 'Positive'])

In [10]:
def chunk_text(text, max_words=400):
    words = text.split()
    chunks = []

    for i in range(0, len(words), max_words):
        chunk = " ".join(words[i:i + max_words])
        chunks.append(chunk)

    return chunks


In [11]:
emotion_summaries = {}

for sentiment, text in grouped_comments.items():
    chunks = chunk_text(text)

    chunk_summaries = []
    for chunk in chunks:
        summary = summarizer(
            chunk,
            max_length=130,
            min_length=40,
            do_sample=False
        )[0]["summary_text"]

        chunk_summaries.append(summary)

    # Merge chunk summaries into one
    emotion_summaries[sentiment] = " ".join(chunk_summaries)


Your max_length is set to 130, but your input_length is only 123. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=61)


In [12]:
for sentiment, summary in emotion_summaries.items():
    print(f"\nðŸ”¹ {sentiment.upper()} SUMMARY:\n")
    print(summary)



ðŸ”¹ NEGATIVE SUMMARY:

The amendment may increase operational costs for companies. This proposal does not adequately consider MSME challenges. The changes could lead to increased litigation. The draft law ignores practical implementation issues. The amendment may negatively impact ease of doing business. The amendment complicates existing compliance procedures. The draft law ignores practical implementation issues. The amendment may negatively impact ease of doing business. This proposal does not adequately consider MSME challenges. The changes could lead to increased litigation. The amendment complicates existing compliance procedures. The draft law ignores practical implementation issues. The amendment may negatively impact ease of doing business. This proposal requires substantial reworking before implementation. The changes could lead to increased litigation.

ðŸ”¹ NEUTRAL SUMMARY:

The draft law introduces incremental changes rather than major reforms. The impact of the amendmen

In [13]:
import json

output = {
    "overall_summary": final_summary,
    "emotion_summaries": emotion_summaries
}

with open("../data/summaries.json", "w") as f:
    json.dump(output, f)
