# Implementation of LLM Model for summarization of Large Texts

Utilizing open-source LLMs available in Hugging Face

LLM used: **Google Gemma 3 4B IT**

Note: *Meta Llama/Llama-2-7b-chat-hf* required access from Meta authors, used Gemma for immediate use.

In [None]:
# Installing required packages
!pip install -q datasets transformers langchain langchain_community accelerate langchain-huggingface

In [None]:
# Authorizing hugging face with token
!hf auth login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `hf auth whoami` to get more information or `hf auth logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
The token `sum

In [None]:
# Importing pckages
from transformers import AutoTokenizer, pipeline
from langchain import LLMChain, PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import ChatPromptTemplate
from datasets import Dataset
import torch

In [None]:
# Creating pre-trained model instance and tokenizer
model_name = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
# Creating a summarization pipeline
summarization_pipeline = pipeline(
    "summarization",
    model=model_name,
    tokenizer=tokenizer,
    dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_new_tokens=150,
    min_new_tokens=100,
    do_sample=True,
    early_stopping=True
)

Device set to use cuda:0


In [None]:
llm = HuggingFacePipeline(pipeline = summarization_pipeline, model_kwargs = {'temperature':0})

In [None]:
messages = [
    {
        "role": "system",
        "content": (
            "You are tasked with writing a hybrid of extractive and abstractive summary of the following text in consise manner."
            "Return your response in this format:\n\n"
            "Text:\n"
            "```{text}```\n\n"
            "SUMMARY:"
        )
    },
    {
        "role": "user",
        "content": "{text}\n<end_summary>"
    }
]

In [None]:
prompt = ChatPromptTemplate.from_messages(messages)
llm_chain = prompt | llm

In [None]:
# Reading text to summarize
with open('/content/article.txt', 'r') as f:
    text = f.read()

In [None]:
# Generating chunks of text data by respecting paragraphs
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)
chunks = splitter.split_text(text)

In [None]:
for chunk in chunks:
    print(f'{len(chunk)}: {chunk}\n\n\n')

1199: Arm skipped the NPU hype, making the CPU great at AI instead

In the race to accelerate AI on everyday devices, the industry has long relied on two workhorses: GPUs, with their raw parallel power, and NPUs (Neural Processing Units), designed for specialized neural network tasks. Both come with flaws, though. GPUs have a latency and power overhead for short, spiky workloads (like a voice assistant processing a quick query, or an AI-powered device search), and NPUs, while efficient, are fragmented across vendors and force developers to account for a lot of different hardware. That's why I find Arm's alternative option intriguing, and honestly, outright better than a pure NPU reliance.

I attended Arm's briefing in Cambridge to learn about the company's upcoming Lumex platform, its compute platform aimed at mobile. We were provided a technical walkthrough, breaking down the improvements seen in the C1 Ultra, C1 Premium, C1 Pro, and C1 Nano cores, along with an introduction to the ne

In [None]:
# Testing model summarization
response = llm_chain.invoke(chunks[0])

In [None]:
response

"Arm skipped the NPU hype, making the CPU great at AI instead. The company's upcoming Lumex platform, its compute platform aimed at mobile. When questioned on the lack of NPU, Arm made it clear that it had no current interest in joining that race in consumer platforms, though that's not to say that the company is anti-NPU. In the race to accelerate AI on everyday devices, the industry has long relied on two workhorses: GPUs, with their raw parallel power, and NPUs (Neural Processing Units)"

In [None]:
# Creating summaries of chunks and combining them
data = {'text': chunks}
dataset = Dataset.from_dict(data)

summaries = [llm_chain.invoke(text) for text in dataset['text']]

final_summary = "\n".join(summaries)

print("\nGenerated Summaries:")
print(final_summary)


Generated Summaries:
In the race to accelerate AI on everyday devices, the industry has long relied on two workhorses: GPUs, with their raw parallel power, and NPUs (Neural Processing Units), designed for specialized neural network tasks. When questioned on the lack of NPU, Arm made it clear that it had no current interest in joining that race in consumer platforms, though that's not to say that the company is anti-NPU. I attended Arm's briefing in Cambridge to learn about the company's upcoming Lumex platform, its compute platform aimed at mobile.
On-device AI is just easier to develop for CPU execution while still being generally good enough for decent results. SME2 isn't a replacement for an NPU; it's a complement. Arm's Lumex platform is mobile-focused, but it was also talked about in the context of laptop or even desktop computing several times, which would be Arm's Niva platform. The last presentation, focused on software, ended with a slide that said "Ecosystem is ready across 

In [None]:
len(final_summary)

4079

In [None]:
# Pipeline to combine chunk summaries
final_summarizer = pipeline(
    'summarization',
    model = model_name,
    tokenizer = tokenizer,
    dtype = torch.bfloat16,
    trust_remote_code = True,
    device_map = "auto",
    max_new_tokens = 300,
    min_new_tokens = 200,
    do_sample = True,
    early_stopping = True
)

Device set to use cuda:0


In [None]:
# Summarizing all chunks
final_model_summary = final_summarizer(final_summary)[0]["summary_text"]

In [None]:
final_model_summary

'Arm\'s Lumex platform is mobile-focused, but it was also talked about in the context of laptop or even desktop computing. SME2 isn\'t a replacement for an NPU; it\'s a complement. Larger AI workloads will still likely run better on the GPU in the long run, but for something that requires a response in mere seconds, then the CPU will be a significantly better option. Kleidi AI is Arm\'s way of being able to "develop once, test once, deploy everywhere," supporting NEON, SVE2, andSME2. Tools like ONNX, PyTorch, OpenCV, and more can all use Kleidi AI (or Kleidi CV in the case of OpenCV) The Kleidi libraries themselves are mostly written in assembly with simple C/C++ calls to utilize the library, and these handle the execution for you. It\'s a great way to make on-device AI a whole lot better, and this is a good way to help make it better on- device.'

**Summary:**

Arm's Lumex platform is mobile-focused, but it was also talked about in the context of laptop or even desktop computing. SME2 isn't a replacement for an NPU; it's a complement.

Larger AI workloads will still likely run better on the GPU in the long run, but for something that requires a response in mere seconds, then the CPU will be a significantly better option.

Kleidi AI is Arm's way of being able to "develop once, test once, deploy everywhere," supporting NEON, SVE2, andSME2. Tools like ONNX, PyTorch, OpenCV, and more can all use Kleidi AI (or Kleidi CV in the case of OpenCV) The Kleidi libraries themselves are mostly written in assembly with simple C/C++ calls to utilize the library, and these handle the execution for you.

It's a great way to make on-device AI a whole lot better, and this is a good way to help make it better on- device.