## Detailed article explaination
The detailed code explanation for this article is available at the following link:

https://www.daniweb.com/programming/computer-science/tutorials/543028/text-classification-and-summarization-with-deepseek-r1-distill-llama-70b

For my other articles for Daniweb.com, please see this link:

https://www.daniweb.com/members/1235222/usmanmalik57

## Importing and Installing Required Libraries

In [None]:
!pip install groq
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

In [None]:
from groq import Groq
import os
import pandas as pd
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from collections import defaultdict
from google.colab import userdata

## Calling DeepSeek R1 Distill Llama 70B using the Groq API

In [None]:
client = Groq(
    api_key=userdata.get('GROQ_API_KEY'),
)

def generate_response(system_instructions, user_query):

  response = client.chat.completions.create(
    model="deepseek-r1-distill-llama-70b",
    temperature = 0,
    max_tokens = 1000,
    messages=[
          {"role": "system", "content": system_instructions},
          {"role": "user", "content": user_query}
      ]
  )

  output = response.choices[0].message.content
  final_response = output.strip().split("</think>")[-1].strip()
  return final_response


system_instructions = "You are an expert Pizza chef"
user_query = """Explain the process of baking a pizza in three simple steps."""
response= generate_response(system_instructions, user_query)
print(response)

The process of baking a pizza can be broken down into three simple steps:

1. **Preparation**: Begin by preparing the pizza dough, allowing it to rise if homemade. Shape the dough into your desired pizza base. Spread a layer of tomato sauce over the dough, followed by your choice of cheese and toppings.

2. **Baking**: Preheat your oven to a high temperature, typically between 450°F to 500°F. Place the prepared pizza in the oven and bake for 10 to 15 minutes, or until the crust is golden brown and the cheese is bubbly and melted.

3. **Serving**: Once baked, remove the pizza from the oven and let it cool for a few minutes. This allows the cheese to set, making it easier to slice. Slice the pizza into portions and serve immediately.

This method ensures a delicious homemade pizza with a crispy crust and well-cooked toppings.


## DeepSeek R1 Distill Llama 70B For Text Classification

In [None]:
## Dataset download link
## https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment?select=Tweets.csv

dataset = pd.read_csv(r"/content/Tweets.csv")
dataset.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,570306133677760513,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,570301130888122368,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,570301083672813571,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,570301031407624196,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,570300817074462722,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


In [None]:
# Remove rows where 'airline_sentiment' or 'text' are NaN
dataset = dataset.dropna(subset=['airline_sentiment', 'text'])

# Remove rows where 'airline_sentiment' or 'text' are empty strings
dataset = dataset[(dataset['airline_sentiment'].str.strip() != '') & (dataset['text'].str.strip() != '')]

# Filter the DataFrame for each sentiment
neutral_df = dataset[dataset['airline_sentiment'] == 'neutral']
positive_df = dataset[dataset['airline_sentiment'] == 'positive']
negative_df = dataset[dataset['airline_sentiment'] == 'negative']

# Randomly sample records from each sentiment
neutral_sample = neutral_df.sample(n=34)
positive_sample = positive_df.sample(n=33)
negative_sample = negative_df.sample(n=33)

# Concatenate the samples into one DataFrame
dataset = pd.concat([neutral_sample, positive_sample, negative_sample])

# Reset index if needed
dataset.reset_index(drop=True, inplace=True)

# print value counts
print(dataset["airline_sentiment"].value_counts())

airline_sentiment
neutral     34
positive    33
negative    33
Name: count, dtype: int64


In [None]:

tweets_list = dataset["text"].tolist()
all_sentiments = []
exceptions = 0

for i, tweet in enumerate(tweets_list, 1):

    try:
        print(f"Processing tweet {i}")
        system_instructions = """You are an expert in annotating tweets with positive, negative, and neutral emotions. Think step by step."""

        user_query = (
            f"What is the sentiment expressed in the following tweet about an airline? "
            f"Select sentiment value from positive, negative, or neutral. "
            f"Return only the sentiment value in small letters.\n\n"
            f"tweet: {tweet}"
        )

        sentiment_value = response = generate_response(system_instructions, user_query)

        all_sentiments.append({
            'tweet_id': i,
            'sentiment': sentiment_value
        })
        print(i, sentiment_value)

    except Exception as e:
        print("===================")
        print("Exception occurred with Tweet:", i, "| Error:", e)
        exceptions += 1

print("Total exception count:", exceptions)


Processing tweet 1
1 negative
Processing tweet 2
2 neutral
Processing tweet 3
3 negative
Processing tweet 4
4 positive
Processing tweet 5
5 positive
Processing tweet 6
6 negative
Processing tweet 7
7 neutral
Processing tweet 8
8 negative
Processing tweet 9
9 neutral
Processing tweet 10
10 positive
Processing tweet 11
11 neutral
Processing tweet 12
12 neutral
Processing tweet 13
13 neutral
Processing tweet 14
14 negative
Processing tweet 15
15 positive
Processing tweet 16
16 negative
Processing tweet 17
17 neutral
Processing tweet 18
18 neutral
Processing tweet 19
19 positive
Processing tweet 20
20 negative
Processing tweet 21
21 neutral
Processing tweet 22
22 negative
Processing tweet 23
23 neutral
Processing tweet 24
24 neutral
Processing tweet 25
25 neutral
Processing tweet 26
26 neutral
Processing tweet 27
27 negative
Processing tweet 28
28 positive
Processing tweet 29
29 positive
Processing tweet 30
30 neutral
Processing tweet 31
31 positive
Processing tweet 32
32 negative
Processi

In [None]:
keywords = {"positive", "negative", "neutral"}

result = []
for string in all_sentiments:
    for word in keywords:
        if word in string["sentiment"].lower():
            result.append({'sentiment':word})

results_df = pd.DataFrame(result)

accuracy = accuracy_score(results_df['sentiment'], dataset["airline_sentiment"].iloc[:len(results_df)])
print(f"Overall Accuracy: {accuracy}")

Overall Accuracy: 0.69


## DeepSeek R1 Distill Llama 70B for Text Summarization

In [None]:
# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx

dataset = pd.read_excel(r"/content/dataset.xlsx")
dataset = dataset.sample(frac=1)
print(dataset.shape)
dataset.head()

(1000, 10)


Unnamed: 0.1,Unnamed: 0,id,human_summary,publication,author,date,year,month,theme,content
776,259,18212,If you are on your phone and don’t see an audi...,New York Times,Michael Barbaro,2017-02-13,2017.0,2.0,politics,"Back on the campaign trail, Donald J. Trump ..."
501,259,17883,Several Republican senators on Monday proposed...,New York Times,Robert Pear,2017-01-25,2017.0,1.0,business,WASHINGTON — Several Republican senators on...
80,0,17386,A united front of top intelligence officials a...,New York Times,Matt Flegenheimer and Scott Shane,2017-01-06,2017.0,1.0,crime,WASHINGTON — A united front of top intellig...
709,259,18135,"Yes, with nothing but my wife and sheep for co...",New York Times,Ben Rawlence,2017-01-27,2017.0,1.0,lifestyle,It is an idea universally indulged that escapi...
369,259,17726,In recent years the Mercer Family Foundation —...,New York Times,Robin Pogrebin,2017-01-19,2017.0,1.0,business,The American Museum of Natural History has lon...


In [None]:
# Function to calculate ROUGE scores
def calculate_rouge(reference, candidate):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, candidate)
    return {key: value.fmeasure for key, value in scores.items()}

In [None]:

results = []

i = 0
for _, row in dataset[:20].iterrows():
    article = row['content']
    human_summary = row['human_summary']

    i = i + 1

    print(f"Summarizing article {i}.")
    system_instructions = "You are an expert in creating summaries from text"
    user_query = f"""Summarize the following article in 1150 characters. Do not return your thought process. Only the summary.
    Your summary will be evaluated using ROUGE score. The summary should look like human created:\n\n{article}\n\nSummary:"""

    generated_summary = generate_response(system_instructions, user_query)

    rouge_scores = calculate_rouge(human_summary, generated_summary)

    results.append({
        'article_id': row.id,
        'generated_summary': generated_summary,
        'rouge1': rouge_scores['rouge1'],
        'rouge2': rouge_scores['rouge2'],
        'rougeL': rouge_scores['rougeL']
    })

# Create a DataFrame with results
results_df = pd.DataFrame(results)

mean_values = results_df[['rouge1', 'rouge2', 'rougeL']].mean()
print(mean_values)


Summarizing article 1.
Summarizing article 2.
Summarizing article 3.
Summarizing article 4.
Summarizing article 5.
Summarizing article 6.
Summarizing article 7.
Summarizing article 8.
Summarizing article 9.
Summarizing article 10.
Summarizing article 11.
Summarizing article 12.
Summarizing article 13.
Summarizing article 14.
Summarizing article 15.
Summarizing article 16.
Summarizing article 17.
Summarizing article 18.
Summarizing article 19.
Summarizing article 20.
rouge1    0.347660
rouge2    0.100158
rougeL    0.183272
dtype: float64
