## Detailed Article Explaination
The detailed code explanation for this article is available at the following link:

https://www.daniweb.com/programming/computer-science/tutorials/543145/deepseek-r1-vs-llama-3-1-405b-for-text-classification-and-summarization

For my other articles for Daniweb.com, please see this link:



## Importing and Installing Required Libraries

In [None]:
!pip install --upgrade fireworks-ai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl



In [None]:
from fireworks.client import Fireworks
import os
import pandas as pd
import time
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from collections import defaultdict
from google.colab import userdata


FW_API_KEY = userdata.get('FW_API_KEY')

## Calling LLMs Using FireworksAI API

In [None]:
client = Fireworks(api_key=FW_API_KEY)

def generate_response(model, system_instructions, user_query):
  response = client.chat.completions.create(
      model=model,
      max_tokens=4000,
      temperature=0,
      messages=[
              {
              "role": "system",
              "content": system_instructions
              },
              {
              "role": "user",
              "content": user_query
              }
          ]
      )

  output = response.choices[0].message.content

  if "</think>" in output:
    response = output.strip().split("</think>")[-1].strip()
    return response

  return response.choices[0].message.content

In [None]:
model = "accounts/fireworks/models/deepseek-r1"
system_instructions = "You are a helpful assistant."
user_query = "How to build muscles. Reply in three lines."

response = generate_response(model, system_instructions, user_query)
print(response)

1. **Strength Train Regularly:** Focus on compound exercises (squats, deadlifts, bench presses) with progressive overload to challenge muscles.  
2. **Eat Protein-Rich Meals:** Consume 1.6–2.2g of protein per kg of body weight daily and maintain a caloric surplus to fuel growth.  
3. **Prioritize Recovery:** Allow 48 hours between targeting the same muscle group and aim for 7–9 hours of sleep nightly.


## DeepSeek R1 vs Llama 3.1-405b for Text Classification

In [None]:
## Dataset download link
## https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment?select=Tweets.csv

dataset = pd.read_csv(r"/content/Tweets.csv")
dataset.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,570306133677760513,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,570301130888122368,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,570301083672813571,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,570301031407624196,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,570300817074462722,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


In [None]:

# Remove rows where 'airline_sentiment' or 'text' are NaN
dataset = dataset.dropna(subset=['airline_sentiment', 'text'])

# Remove rows where 'airline_sentiment' or 'text' are empty strings
dataset = dataset[(dataset['airline_sentiment'].str.strip() != '') & (dataset['text'].str.strip() != '')]

# Filter the DataFrame for each sentiment
neutral_df = dataset[dataset['airline_sentiment'] == 'neutral']
positive_df = dataset[dataset['airline_sentiment'] == 'positive']
negative_df = dataset[dataset['airline_sentiment'] == 'negative']

# Randomly sample records from each sentiment
neutral_sample = neutral_df.sample(n=34)
positive_sample = positive_df.sample(n=33)
negative_sample = negative_df.sample(n=33)

# Concatenate the samples into one DataFrame
dataset = pd.concat([neutral_sample, positive_sample, negative_sample])

# Reset index if needed
dataset.reset_index(drop=True, inplace=True)

# print value counts
print(dataset["airline_sentiment"].value_counts())


airline_sentiment
neutral     34
positive    33
negative    33
Name: count, dtype: int64


In [None]:

def predict_sentiment(model, sleep_time=0):

  tweets_list = dataset["text"].tolist()
  all_sentiments = []
  exceptions = 0
  for i, tweet in enumerate(tweets_list, 1):

      model_name, model_client = model

      try:

          system_instructions = "You are an expert in annotating tweets with positive, negative, and neutral emotions"

          user_query = (
              f"What is the sentiment expressed in the following tweet about an airline? "
              f"Select sentiment value from positive, negative, or neutral. "
              f"Return only the sentiment value (positive, negative, or neutral) in small letters.\n\n"
              f"tweet: {tweet}"
          )

          sentiment_value = generate_response(model_client, system_instructions, user_query)
          all_sentiments.append({
              'tweet_id': i,
              'model': model_name,
              'sentiment': sentiment_value
          })
          print(f"Tweet {i} - Model {model_name} - Sentiment {sentiment_value}")

      except Exception as e:
          print("===================")
          print("Exception occurred with model:", model_name, "| Tweet:", i, "| Error:", e)
          exceptions += 1

      time.sleep(sleep_time)

  print("Total exception count:", exceptions)

  return all_sentiments


In [None]:
model = ("llama3.1-405b", "accounts/fireworks/models/llama-v3p1-405b-instruct")

all_sentiments = predict_sentiment(model, sleep_time=0)
results_df = pd.DataFrame(all_sentiments)

accuracy = accuracy_score(results_df['sentiment'],
                          dataset["airline_sentiment"].iloc[:len(results_df)])
print(f"Accuracy for {model[0]}: {accuracy}")

Tweet 1 - Model llama3.1-405b - Sentiment negative
Tweet 2 - Model llama3.1-405b - Sentiment neutral
Tweet 3 - Model llama3.1-405b - Sentiment neutral
Tweet 4 - Model llama3.1-405b - Sentiment neutral
Tweet 5 - Model llama3.1-405b - Sentiment neutral
Tweet 6 - Model llama3.1-405b - Sentiment neutral
Tweet 7 - Model llama3.1-405b - Sentiment neutral
Tweet 8 - Model llama3.1-405b - Sentiment neutral
Tweet 9 - Model llama3.1-405b - Sentiment negative
Tweet 10 - Model llama3.1-405b - Sentiment positive
Tweet 11 - Model llama3.1-405b - Sentiment neutral
Tweet 12 - Model llama3.1-405b - Sentiment neutral
Tweet 13 - Model llama3.1-405b - Sentiment negative
Tweet 14 - Model llama3.1-405b - Sentiment positive
Tweet 15 - Model llama3.1-405b - Sentiment negative
Tweet 16 - Model llama3.1-405b - Sentiment neutral
Tweet 17 - Model llama3.1-405b - Sentiment neutral
Tweet 18 - Model llama3.1-405b - Sentiment neutral
Tweet 19 - Model llama3.1-405b - Sentiment neutral
Tweet 20 - Model llama3.1-405b - S

In [None]:
model = ("deepseek-r1", "accounts/fireworks/models/deepseek-r1")

all_sentiments = predict_sentiment(model, sleep_time=4)

results_df = pd.DataFrame(all_sentiments)

accuracy = accuracy_score(results_df['sentiment'],
                          dataset["airline_sentiment"].iloc[:len(results_df)])
print(f"Accuracy for {model[0]}: {accuracy}")

Tweet 1 - Model deepseek-r1 - Sentiment negative
Tweet 2 - Model deepseek-r1 - Sentiment neutral
Tweet 3 - Model deepseek-r1 - Sentiment neutral
Tweet 4 - Model deepseek-r1 - Sentiment neutral
Tweet 5 - Model deepseek-r1 - Sentiment neutral
Tweet 6 - Model deepseek-r1 - Sentiment neutral
Tweet 7 - Model deepseek-r1 - Sentiment positive
Tweet 8 - Model deepseek-r1 - Sentiment positive
Tweet 9 - Model deepseek-r1 - Sentiment neutral
Tweet 10 - Model deepseek-r1 - Sentiment positive
Tweet 11 - Model deepseek-r1 - Sentiment negative
Tweet 12 - Model deepseek-r1 - Sentiment neutral
Tweet 13 - Model deepseek-r1 - Sentiment negative
Tweet 14 - Model deepseek-r1 - Sentiment negative
Tweet 15 - Model deepseek-r1 - Sentiment neutral
Tweet 16 - Model deepseek-r1 - Sentiment negative
Tweet 17 - Model deepseek-r1 - Sentiment neutral
Tweet 18 - Model deepseek-r1 - Sentiment neutral
Tweet 19 - Model deepseek-r1 - Sentiment neutral
Tweet 20 - Model deepseek-r1 - Sentiment negative
Tweet 21 - Model dee

## DeepSeek R1 vs Llama 3.1-405b for Text Summarization

In [None]:
# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx

dataset = pd.read_excel(r"/content/dataset.xlsx")
dataset = dataset.sample(frac=1)
print(dataset.shape)
dataset.head()

(1000, 10)


Unnamed: 0.1,Unnamed: 0,id,human_summary,publication,author,date,year,month,theme,content
318,259,17663,” Also eliminated on Thursday was a program th...,New York Times,"Hannah Berkeley Cohen, Azam Ahmed and Frances ...",2017-01-14,2017.0,1.0,business,HAVANA — Andrés Iván and his girlfriend gre...
828,259,18272,He definitely was proud of his mom and had a s...,New York Times,"Adam Liptak, Peter Baker, Nicholas Fandos and ...",2017-02-05,2017.0,2.0,politics,WASHINGTON — Judge Neil M. Gorsuch’s first ...
596,259,17986,It was published on a site affiliated with The...,New York Times,Liz Spayd,2017-01-30,2017.0,1.0,entertainment,The conservative radio host Glenn Beck called ...
458,259,17831,"A. We hypothesize that roughly 50, 000 years a...",New York Times,Claudia Dreifus,2017-01-24,2017.0,1.0,crime,Geneticists tell us that somewhere between 1 a...
933,259,18388,but those criticisms were based on constitutio...,New York Times,Julie Hirschfeld Davis,2017-02-09,2017.0,2.0,business,"WASHINGTON — Judge Neil M. Gorsuch, Preside..."


In [None]:
# Function to calculate ROUGE scores
def calculate_rouge(reference, candidate):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, candidate)
    return {key: value.fmeasure for key, value in scores.items()}

In [None]:
def generate_summary(model, sleep_time=0):

  results = []
  model_name, model_client = model
  i = 0
  for _, row in dataset[:20].iterrows():
      article = row['content']
      human_summary = row['human_summary']

      i = i + 1

      print(f"Summarizing article {i} with model {model_name}")
      system_role = "You are an expert in creating summaries from text"
      user_query = f"Summarize the following article in 1150 characters. The summary should look like human created:\n\n{article}\n\nSummary:"

      generated_summary = generate_response(model_client, system_role, user_query)
      rouge_scores = calculate_rouge(human_summary, generated_summary)

      results.append({
          'model': model_name,
          'article_id': row.id,
          'generated_summary': generated_summary,
          'rouge1': rouge_scores['rouge1'],
          'rouge2': rouge_scores['rouge2'],
          'rougeL': rouge_scores['rougeL']
      })

      time.sleep(sleep_time)
  return results



In [None]:
model = ("llama3.1-405b", "accounts/fireworks/models/llama-v3p1-405b-instruct")
results = generate_summary(model, sleep_time=0)

results_df = pd.DataFrame(results)
average_scores = results_df[['rouge1', 'rouge2', 'rougeL']].mean()
average_scores.head()

Summarizing article 1 with model llama3.1-405b
Summarizing article 2 with model llama3.1-405b
Summarizing article 3 with model llama3.1-405b
Summarizing article 4 with model llama3.1-405b
Summarizing article 5 with model llama3.1-405b
Summarizing article 6 with model llama3.1-405b
Summarizing article 7 with model llama3.1-405b
Summarizing article 8 with model llama3.1-405b
Summarizing article 9 with model llama3.1-405b
Summarizing article 10 with model llama3.1-405b
Summarizing article 11 with model llama3.1-405b
Summarizing article 12 with model llama3.1-405b
Summarizing article 13 with model llama3.1-405b
Summarizing article 14 with model llama3.1-405b
Summarizing article 15 with model llama3.1-405b
Summarizing article 16 with model llama3.1-405b
Summarizing article 17 with model llama3.1-405b
Summarizing article 18 with model llama3.1-405b
Summarizing article 19 with model llama3.1-405b
Summarizing article 20 with model llama3.1-405b


Unnamed: 0,0
rouge1,0.363409
rouge2,0.10389
rougeL,0.194939


In [None]:
model = ("deepseek-r1", "accounts/fireworks/models/deepseek-r1")
results = generate_summary(model, sleep_time=4)

results_df = pd.DataFrame(results)
average_scores = results_df[['rouge1', 'rouge2', 'rougeL']].mean()
average_scores.head()

Summarizing article 1 with model deepseek-r1
Summarizing article 2 with model deepseek-r1
Summarizing article 3 with model deepseek-r1
Summarizing article 4 with model deepseek-r1
Summarizing article 5 with model deepseek-r1
Summarizing article 6 with model deepseek-r1
Summarizing article 7 with model deepseek-r1
Summarizing article 8 with model deepseek-r1
Summarizing article 9 with model deepseek-r1
Summarizing article 10 with model deepseek-r1
Summarizing article 11 with model deepseek-r1
Summarizing article 12 with model deepseek-r1
Summarizing article 13 with model deepseek-r1
Summarizing article 14 with model deepseek-r1
Summarizing article 15 with model deepseek-r1
Summarizing article 16 with model deepseek-r1
Summarizing article 17 with model deepseek-r1
Summarizing article 18 with model deepseek-r1
Summarizing article 19 with model deepseek-r1
Summarizing article 20 with model deepseek-r1


Unnamed: 0,0
rouge1,0.326427
rouge2,0.061559
rougeL,0.161707
