## Detailed Article Explaination

The detailed code explanation for this article is available at the following link:

https://www.daniweb.com/programming/computer-science/tutorials/542973/benchmarking-deepseek-r1-for-text-classification-and-summarization


For my other articles for Daniweb.com, please see this link:

https://www.daniweb.com/members/1235222/usmanmalik57

## Installing and Importing Required Libraries

In [1]:
!pip install huggingface_hub==0.24.7
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl



In [2]:
from huggingface_hub import InferenceClient
import os
import pandas as pd
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from collections import defaultdict

## Calling DeepSeek R1 Model Using Hugging Face Inference API

In [8]:
hf_token = os.environ.get('HF_TOKEN') 

#deepseek-R1-distill endpoint
#https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
deepseek_model_client = InferenceClient(
    "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
    token=hf_token
)


In [8]:
def make_prediction(model, system_role, user_query):
    
    response = model.chat_completion(
    messages=[{"role": "system", "content": system_role},
        {"role": "user", "content": user_query}],
    max_tokens=1000,
    )
         
    return response.choices[0].message.content

In [12]:
system_role = "Assign positive, negative, or neutral sentiment to the movie review. Return only a single word in your response"
user_query = "I like this movie a lot"
output = make_prediction(deepseek_model_client,
               system_role,
               user_query)
output

'<think>\nOkay, I need to assign a sentiment to the movie review "I like this movie a lot." Let\'s break this down. The user expressed that they like the movie a lot, which indicates a positive feeling. Words like "like" and "a lot" are strong indicators of satisfaction. There\'s no negative language here, and the enthusiasm is clear. So, the sentiment is definitely positive.\n</think>\n\npositive'

In [13]:
last_word = output.strip().split("\n")[-1].strip()
print(last_word) 

positive


## DeepSeek For Text Classification

In [14]:
## Dataset download link
## https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment?select=Tweets.csv

dataset = pd.read_csv(r"D:\Datasets\Tweets.csv")
dataset.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,570306133677760513,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,570301130888122368,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,570301083672813571,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,570301031407624196,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,570300817074462722,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


In [15]:
# Remove rows where 'airline_sentiment' or 'text' are NaN
dataset = dataset.dropna(subset=['airline_sentiment', 'text'])

# Remove rows where 'airline_sentiment' or 'text' are empty strings
dataset = dataset[(dataset['airline_sentiment'].str.strip() != '') & (dataset['text'].str.strip() != '')]

# Filter the DataFrame for each sentiment
neutral_df = dataset[dataset['airline_sentiment'] == 'neutral']
positive_df = dataset[dataset['airline_sentiment'] == 'positive']
negative_df = dataset[dataset['airline_sentiment'] == 'negative']

# Randomly sample records from each sentiment
neutral_sample = neutral_df.sample(n=34)
positive_sample = positive_df.sample(n=33)
negative_sample = negative_df.sample(n=33)

# Concatenate the samples into one DataFrame
dataset = pd.concat([neutral_sample, positive_sample, negative_sample])

# Reset index if needed
dataset.reset_index(drop=True, inplace=True)

# print value counts
print(dataset["airline_sentiment"].value_counts())

airline_sentiment
neutral     34
positive    33
negative    33
Name: count, dtype: int64


In [24]:
def predict_sentiment(model, system_role, user_query):
    
    response = model.chat_completion(
    messages=[{"role": "system", "content": system_role},
        {"role": "user", "content": user_query}],
    max_tokens=1000,
    )
         
    output =  response.choices[0].message.content
    last_word = output.strip().split("\n")[-1].strip()
    return last_word

In [25]:
tweets_list = dataset["text"].tolist()
all_sentiments = []
exceptions = 0

for i, tweet in enumerate(tweets_list, 1):
    
    try:
        print(f"Processing tweet {i}")
        system_role = "You are an expert in annotating tweets with positive, negative, and neutral emotions"

        user_query = (
            f"What is the sentiment expressed in the following tweet about an airline? "
            f"Select sentiment value from positive, negative, or neutral. "
            f"Return only the sentiment value in small letters.\n\n"
            f"tweet: {tweet}"
        )

        sentiment_value = predict_sentiment(deepseek_model_client, 
                                            system_role, 
                                            user_query)
        all_sentiments.append({
            'tweet_id': i,
            'sentiment': sentiment_value
        })
        print(i, sentiment_value)

    except Exception as e:
        print("===================")
        print("Exception occurred with Tweet:", i, "| Error:", e)
        exceptions += 1

print("Total exception count:", exceptions)


Processing tweet 1
1 neutral
Processing tweet 2
2 neutral
Processing tweet 3
3 neutral
Processing tweet 4
4 neutral
Processing tweet 5
5 neutral
Processing tweet 6
6 neutral
Processing tweet 7
7 positive
Processing tweet 8
8 negative
Processing tweet 9
9 positive
Processing tweet 10
10 neutral
Processing tweet 11
11 neutral
Processing tweet 12
12 neutral
Processing tweet 13
13 neutral
Processing tweet 14
14 positive
Processing tweet 15
15 negative
Processing tweet 16
16 neutral
Processing tweet 17
17 neutral
Processing tweet 18
18 positive
Processing tweet 19
19 neutral
Processing tweet 20
20 negative
Processing tweet 21
21 negative
Processing tweet 22
22 neutral
Processing tweet 23
23 neutral
Processing tweet 24
24 neutral
Processing tweet 25
25 neutral
Processing tweet 26
26 neutral
Processing tweet 27
27 neutral
Processing tweet 28
28 neutral
Processing tweet 29
29 neutral
Processing tweet 30
30 positive
Processing tweet 31
31 neutral
Processing tweet 32
32 neutral
Processing tweet 

In [42]:
keywords = {"positive", "negative", "neutral"}

result = []
for string in all_sentiments:
    for word in keywords:
        if word in string["sentiment"].lower(): 
            result.append({'sentiment':word})

results_df = pd.DataFrame(result)
results_df

Unnamed: 0,sentiment
0,neutral
1,neutral
2,neutral
3,neutral
4,neutral
...,...
95,negative
96,negative
97,negative
98,negative


In [43]:
accuracy = accuracy_score(results_df['sentiment'], dataset["airline_sentiment"].iloc[:len(results_df)])
print(f"Overall Accuracy: {accuracy}")

Overall Accuracy: 0.87


## DeepSeek R1 For Text Summarization

In [3]:
# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx

dataset = pd.read_excel(r"D:\Datasets\dataset.xlsx")
dataset = dataset.sample(frac=1)
print(dataset.shape)
dataset.head()

(1000, 10)


Unnamed: 0.1,Unnamed: 0,id,human_summary,publication,author,date,year,month,theme,content
308,259,17650,The thinking goes that some of the ancient ice...,New York Times,Nicholas St. Fleur,2017-01-09,2017.0,1.0,entertainment,"Every Friday, we’ll offer a Trilobite talking ..."
772,259,18208,They were scheduled to fly from iraq to the un...,New York Times,David Zucchino,2017-02-03,2017.0,2.0,politics,BAGHDAD — The Trump administration amended ...
8,0,17294,The decision they are making is not what does ...,New York Times,John Schwartz,2017-01-05,2017.0,1.0,science,"THOMPSONS, Tex. — Can one of the most promi..."
993,259,18456,bruins goalie tim thomas declined to visit the...,New York Times,Victor Mather,2017-02-13,2017.0,2.0,politics,At least six members of the Super New Englan...
902,259,18355,he would like to return to his country one day...,New York Times,Emily Palmer,2017-02-07,2017.0,2.0,business,Driving down a tight mountain road in Jarabaco...


In [4]:
dataset['summary_length'] = dataset['human_summary'].apply(len)
average_length = dataset['summary_length'].mean()
print(f"Average length of summaries: {average_length:.2f} characters")

Average length of summaries: 1168.78 characters


In [5]:
def generate_summary(model, system_role, user_query):
    
    response = model.chat_completion(
    messages=[{"role": "system", "content": system_role},
        {"role": "user", "content": user_query}],
    max_tokens=1500,
    )
         
    output = response.choices[0].message.content
    summary = output.strip().split("\n")[-1].strip()
    return summary

In [6]:
# Function to calculate ROUGE scores
def calculate_rouge(reference, candidate):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, candidate)
    return {key: value.fmeasure for key, value in scores.items()}

In [9]:
results = []

i = 0
for _, row in dataset[:20].iterrows():
    article = row['content']
    human_summary = row['human_summary']
    
    i = i + 1
        
    print(f"Summarizing article {i}.")
    system_role = "You are an expert in creating summaries from text"
    user_query = f"""Summarize the following article in 1150 characters. Do not return your thought process. Only the summary.
    Your summary will be evaluated using ROUGE score. The summary should look like human created:\n\n{article}\n\nSummary:"""

    generated_summary = generate_summary(deepseek_model_client, 
                                         system_role, 
                                         user_query)
    
    rouge_scores = calculate_rouge(human_summary, generated_summary)

    results.append({
        'article_id': row.id,
        'generated_summary': generated_summary,
        'rouge1': rouge_scores['rouge1'],
        'rouge2': rouge_scores['rouge2'],
        'rougeL': rouge_scores['rougeL']
    })

# Create a DataFrame with results
results_df = pd.DataFrame(results)

Summarizing article 1.
Summarizing article 2.
Summarizing article 3.
Summarizing article 4.
Summarizing article 5.
Summarizing article 6.
Summarizing article 7.
Summarizing article 8.
Summarizing article 9.
Summarizing article 10.
Summarizing article 11.
Summarizing article 12.
Summarizing article 13.
Summarizing article 14.
Summarizing article 15.
Summarizing article 16.
Summarizing article 17.
Summarizing article 18.
Summarizing article 19.
Summarizing article 20.


In [12]:
mean_values = results_df[['rouge1', 'rouge2', 'rougeL']].mean()
print(mean_values)

rouge1    0.368321
rouge2    0.102507
rougeL    0.183425
dtype: float64
