**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a carâ€™s text review, answering a customer question, summarizing or translating text, etc.


In [None]:
import pandas as pd
import torch

from transformers import logging
logging.set_verbosity(logging.WARNING)

In [99]:
import evaluate
from transformers import pipeline
import re

In [100]:
file_path = "data/car_reviews.csv"
car_df = pd.read_csv(file_path, delimiter=";")

In [101]:
car_df

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


In [102]:
text_col_candidates = ['Review']
label_col_candidates = ['Class']

TEXT_COL = text_col_candidates[0]
LABEL_COL = label_col_candidates[0]

reviews = car_df[TEXT_COL].astype(str).tolist()
true_labels_raw = car_df[LABEL_COL].tolist()

In [103]:
def normalize_binary_label(x):
    label_map = {"NEGATIVE": 0, "POSITIVE": 1}

    if isinstance(x, str):
        s = x.strip().upper()
        if s in label_map:
            return label_map[s]
        if s in {"0", "1"}:
            return int(s)
    elif isinstance(x, (int, float)):
        if int(x) in (0, 1):
            return int(x)

    raise ValueError(f"Cannot normalize label: {x!r}")

true_labels = [normalize_binary_label(l) for l in true_labels_raw]

In [104]:
sentiment_analyzer = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)

Device set to use cpu


In [105]:
predicted_labels = sentiment_analyzer(reviews)

In [106]:
label_to_int = {"NEGATIVE": 0, "POSITIVE": 1}
predictions = [label_to_int[p["label"].upper()] for p in predicted_labels]

In [107]:
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")

In [108]:
accuracy_result = accuracy_metric.compute(
    predictions=predictions,
    references=true_labels
)["accuracy"]

In [109]:
accuracy_result

0.8

In [110]:
f1_result = f1_metric.compute(
    predictions=predictions,
    references=true_labels
)["f1"]

In [111]:
f1_result

0.8571428571428571

In [112]:
def first_two_sentences(text):
    sentences = re.split(r'(?<=[\.!?])\s+', text.strip())
    sentences = [s for s in sentences if s]
    if not sentences:
        return text.strip()
    return " ".join(sentences[:2])

In [113]:
first_review = reviews[0]
en_segment = first_two_sentences(first_review)

In [114]:
translator = pipeline(
    "translation_en_to_es",
    model="Helsinki-NLP/opus-mt-en-es"
)

Device set to use cpu


In [115]:
translation_output = translator(
    en_segment,
    max_length=512,
    clean_up_tokenization_spaces=True
)
translated_review = translation_output[0]["translation_text"].strip()

In [116]:
translated_review

'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.'

In [117]:
with open("data/reference_translations.txt", "r", encoding="utf-8") as f:
    reference_lines = [line.strip() for line in f if line.strip()]

In [118]:
reference_lines

['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.',
 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']

In [119]:
bleu_metric = evaluate.load("bleu")

In [120]:
bleu_score = bleu_metric.compute(
    predictions=[translated_review],
    references=[reference_lines]
)

In [121]:
bleu_score

{'bleu': 0.7794483794144497,
 'precisions': [0.9090909090909091,
  0.8571428571428571,
  0.75,
  0.631578947368421],
 'brevity_penalty': 1.0,
 'length_ratio': 1.0476190476190477,
 'translation_length': 22,
 'reference_length': 21}

In [122]:
bleu_score

{'bleu': 0.7794483794144497,
 'precisions': [0.9090909090909091,
  0.8571428571428571,
  0.75,
  0.631578947368421],
 'brevity_penalty': 1.0,
 'length_ratio': 1.0476190476190477,
 'translation_length': 22,
 'reference_length': 21}

In [123]:
context = reviews[1]
question = "What did he like about the brand?"

In [124]:
context

"The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warranty with no issues. Now, about 60k miles later, the transmission is failing again. It sounds like a truck, and the issues are well-documented. The dealer tells me it is normal, refusing to do anything to resolve the issue. After owning the car for 4 years, there are many other vehicles I would purchase over this one. Initially, I really liked what the brand is about: ride quality, reliability, etc. But I will not purchase another one. Despite these concerns, I must say, the level of comfort in the car has always been satisfactory, but not worth the rest of issues found."

In [125]:
qa_pipeline = pipeline(
    "question-answering",
    model="deepset/minilm-uncased-squad2"
)

Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [126]:
qa_output = qa_pipeline(question=question, context=context)
answer = qa_output["answer"].strip()

In [127]:
answer

'ride quality, reliability'

In [128]:
last_review = reviews[-1]

In [129]:
last_review

"I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. Th

In [130]:
summarizer = pipeline(
    "summarization",
    model="sshleifer/distilbart-cnn-12-6"
)

Device set to use cpu


In [131]:
summary_output = summarizer(
    last_review,
    min_length=50,
    max_length=55,
    do_sample=False,
    clean_up_tokenization_spaces=True
)
summarized_text = summary_output[0]["summary_text"].strip()

In [132]:
summarized_text

'Nissan Rogue provides the desired SUV experience without burdening me with an exorbitant payment. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. The engine delivers strong performance, and'

In [133]:
toxicity_measure = evaluate.load("toxicity", module_type="measurement")
toxicity_result = toxicity_measure.compute(
    predictions=[summarized_text],
    aggregation="maximum"
)
max_toxicity = toxicity_result["max_toxicity"]

Device set to use cpu


In [134]:
max_toxicity

0.00013878720346838236

In [135]:
regard_measure = evaluate.load("regard", module_type="measurement")
regard_result = regard_measure.compute(
    data=[summarized_text],
    aggregation="maximum"
)
max_regard = regard_result["max_regard"]

Device set to use cpu


In [136]:
max_regard

{'positive': 0.7860167622566223,
 'other': 0.09952664375305176,
 'neutral': 0.08915404230356216,
 'negative': 0.025302601978182793}

In [137]:
print("=== Sentiment classification ===")
print("True labels:     ", true_labels)
print("Predictions:     ", predictions)
print("Accuracy:        ", accuracy_result)
print("F1-score:        ", f1_result)
print()

print("=== Translation ===")
print("EN segment:      ", en_segment)
print("Translated (ES): ", translated_review)
print("BLEU score:      ", bleu_score)
print()

print("=== QA ===")
print("Question:        ", question)
print("Context snippet: ", context[:200], "...")
print("Answer:          ", answer)
print()

print("=== Summary & Bias metrics ===")
print("Summary:         ", summarized_text)
print("Max toxicity:    ", max_toxicity)
print("Max regard:      ", max_regard)

=== Sentiment classification ===
True labels:      [1, 0, 1, 0, 1]
Predictions:      [1, 1, 1, 0, 1]
Accuracy:         0.8
F1-score:         0.8571428571428571

=== Translation ===
EN segment:       I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use.
Translated (ES):  Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.
BLEU score:       {'bleu': 0.7794483794144497, 'precisions': [0.9090909090909091, 0.8571428571428571, 0.75, 0.631578947368421], 'brevity_penalty': 1.0, 'length_ratio': 1.0476190476190477, 'translation_length': 22, 'reference_length': 21}

=== QA ===
Question:         What did he like about the brand?
Context snippet:  The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warran ...
Answer:           ride quality, reliabi