-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# LLMs with Hugging Face

In this notebook, we'll take a whirlwind tour of some top applications using Large Language Models (LLMs):
* Summarization
* Sentiment analysis
* Translation
* Zero-shot classification
* Few-shot learning

We will see how existing, open-source (and proprietary) models can be used out-of-the-box for many applications.  For this, we will use [Hugging Face models](https://huggingface.co/models) and some simple prompt engineering.

We will then look at Hugging Face APIs in more detail to understand how to configure LLM pipelines.

### ![Dolly](https://files.training.databricks.com/images/llm/dolly_small.png) Learning Objectives
1. Use a variety of existing models for a variety of common applications.
1. Understand basic prompt engineering.
1. Understand search vs. sampling for LLM inference.
1. Get familiar with the main Hugging Face abstractions: datasets, pipelines, tokenizers, and models.

## Classroom Setup

Libraries:
* [sacremoses](https://github.com/alvations/sacremoses) is for the translation model `Helsinki-NLP/opus-mt-en-es`

In [0]:
%pip install sacremoses==0.0.53

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
Collecting sacremoses==0.0.53
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 880.6/880.6 kB 10.8 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py): started
  Building wheel for sacremoses (setup.py): finished with status 'done'
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=fd538ad8d85a25f418b5dfd7d1f2ad4a4dc2d06f84c5db5098de2d59011fbc40
  Stored in directory: /root/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb
Successfully built sacremoses
Installing collected packages: sacremoses
Successfully installed sacremoses-0.0.53
[43mNote: you may need to restart the kernel using dbutils.library.res

In [0]:
%run ../Includes/Classroom-Setup

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


Resetting the learning environment:
| enumerating serving endpoints...found 0...(0 seconds)
| No action taken

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/large-language-models/v01"


Importing lab testing framework.



Using the "default" schema.

Predefined paths variables:
| DA.paths.working_dir: /dbfs/mnt/dbacademy-users/vijaymohire@bhadaleit.onmicrosoft.com/large-language-models
| DA.paths.user_db:     /dbfs/mnt/dbacademy-users/vijaymohire@bhadaleit.onmicrosoft.com/large-language-models/database.db
| DA.paths.datasets:    /dbfs/mnt/dbacademy-datasets/large-language-models/v01

Setup completed (6 seconds)

The models developed or used in this course are for demonstration and learning purposes only.
Models may occasionally output offensive, inaccurate, biased information, or harmful instructions.


## Common LLM applications

The goal of this section is to get your feet wet with several LLM applications and to show how easy it can be to get started with LLMs.

As you go through the examples, note the datasets, models, APIs, and options used.  These simple examples can be starting points when you need to build your own application.

In [0]:
from datasets import load_dataset
from transformers import pipeline

### Summarization

Summarization can take two forms:
* `extractive` (selecting representative excerpts from the text)
* `abstractive` (generating novel text summaries)

Here, we will use a model which does *abstractive* summarization.

**Background reading**: The [Hugging Face summarization task page](https://huggingface.co/docs/transformers/tasks/summarization) lists model architectures which support summarization. The [summarization course chapter](https://huggingface.co/course/chapter7/5) provides a detailed walkthrough.

In this section, we will use:
* **Data**: [xsum](https://huggingface.co/datasets/xsum) dataset, which provides a set of BBC articles and summaries.
* **Model**: [t5-small](https://huggingface.co/t5-small) model, which has 60 million parameters (242MB for PyTorch).  T5 is an encoder-decoder model created by Google which supports several tasks such as summarization, translation, Q&A, and text classification.  For more details, see the [Google blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html), [code on GitHub](https://github.com/google-research/text-to-text-transfer-transformer), or the [research paper](https://arxiv.org/pdf/1910.10683.pdf).

In [0]:
xsum_dataset = load_dataset(
    "xsum", version="1.2.0", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01"
)  # Note: We specify cache_dir to use predownloaded data.
xsum_dataset  # The printed representation of this object shows the `num_rows` of each dataset split.



Downloading builder script:   0%|          | 0.00/5.76k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.24k [00:00<?, ?B/s]

Found cached dataset xsum (/dbfs/mnt/dbacademy-datasets/large-language-models/v01/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

This dataset provides 3 columns:
* `document`: the BBC article text
* `summary`: a "ground-truth" summary --> Note how subjective this "ground-truth" is.  Is this the same summary you would write?  This a great example of how many LLM applications do not have obvious "right" answers.
* `id`: article ID

In [0]:
xsum_sample = xsum_dataset["train"].select(range(10))
display(xsum_sample.to_pandas())

document,summary,id
"The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.",Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,35232142
"A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately.""",Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,40143035
"Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details",Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,35951548
"John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues.","A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.",36266422
"Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead.","An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",38826984
"Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott.",Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,34540833
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable.",A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,20836172
"Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica.",Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,35932467
"Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title.""",Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,40758845
"The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC"".","A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".",30358490


We next use the Hugging Face `pipeline` tool to load a pre-trained model.  In this LLM pipeline constructor, we specify:
* `task`: This first argument specifies the primary task.  See [Hugging Face tasks](https://huggingface.co/tasks) for more information.
* `model`: This is the name of the pre-trained model from the [Hugging Face Hub](https://huggingface.co/models).
* `min_length`, `max_length`: We want our generated summaries to be between these two token lengths.
* `truncation`: Some input articles may be too long for the LLM to process.  Most LLMs have fixed limits on the length of input sequences.  This option tells the pipeline to truncate the input if needed.

In [0]:
summarizer = pipeline(
    task="summarization",
    model="t5-small",
    min_length=20,
    max_length=40,
    truncation=True,
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)  # Note: We specify cache_dir to use predownloaded models.

In [0]:
# Apply to 1 article
summarizer(xsum_sample["document"][0])



[{'summary_text': 'the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the'}]

In [0]:
# Apply to a batch of articles
results = summarizer(xsum_sample["document"])

In [0]:
# Display the generated summary side-by-side with the reference summary and original document.
# We use Pandas to join the inputs and outputs together in a nice format.
import pandas as pd

display(
    pd.DataFrame.from_dict(results)
    .rename({"summary_text": "generated_summary"}, axis=1)
    .join(pd.DataFrame.from_dict(xsum_sample))[
        ["generated_summary", "summary", "document"]
    ]
)



generated_summary,summary,document
the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the,Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,"The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk."
a fire alarm went off at the Holiday Inn in Hope Street on Saturday . guests were asked to leave the hotel . the two buses were parked side-by-side in,Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,"A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately."""
"Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen . stewards only handed Hamilton a reprimand after governing body said ""n",Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,"Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details"
"the 67-year-old is accused of committing the offences between March 1972 and October 1989 . he denies all the charges, including two counts of indecency","A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.","John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues."
a man receiving psychiatric treatment at the clinic threatened to shoot himself and others . the incident comes amid tension in Istanbul following several attacks on the reina nightclub .,"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.","Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead."
Gregor Townsend gave a debut to powerhouse wing Taqele Naiyaravoro . the dragons gave first starts of the season to wing a,Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,"Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott."
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured in the crash . police want to trace Nathan Davis, 27, who has links to the Audi .",A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable."
the 25-year-old was hit by a motorbike during the Gent-Wevelgem race . he was riding for the Wanty-Gobert team and was taken,Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,"Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica."
"gundogan will not be fit for the start of the premier league season at Brighton on 12 august . the 26-year-old says his recovery time is now being measured in ""week",Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,"Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title."""
"the crash happened about 07:20 GMT at the junction of the A127 and Progress Road in leigh-on-Sea, Essex . the man, aged in his 20s","A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC""."


### Sentiment analysis

Sentiment analysis is a text classification task of estimating whether a piece of text is positive, negative, or another "sentiment" label.  The precise set of sentiment labels can vary across applications.

**Background reading**: See the Hugging Face [task page on text classification](https://huggingface.co/tasks/text-classification) or [Wikipedia on sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis).

In this section, we will use:
* **Data**: [poem sentiment](https://huggingface.co/datasets/poem_sentiment) dataset, which provides lines from poems tagged with sentiments `negative` (0), `positive` (1), `no_impact` (2), or `mixed` (3).
* **Model**: [fine-tuned version of BERT](https://huggingface.co/nickwong64/bert-base-uncased-poems-sentiment).  BERT, or Bidirectional Encoder Representations from Transformers, is an encoder-only model from Google usable for 11+ tasks such as sentiment analysis and entity recognition.  For more details, see this [Hugging Face blog post](https://huggingface.co/blog/bert-101) or the [Wikipedia page](https://en.wikipedia.org/wiki/BERT_&#40;language_model&#41;).

In [0]:
poem_dataset = load_dataset(
    "poem_sentiment", version="1.0.0", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01"
)
poem_sample = poem_dataset["train"].select(range(10))
display(poem_sample.to_pandas())

Found cached dataset poem_sentiment (/dbfs/mnt/dbacademy-datasets/large-language-models/v01/poem_sentiment/default/1.0.0/4e44428256d42cdde0be6b3db1baa587195e91847adabf976e4f9454f6a82099)


  0%|          | 0/3 [00:00<?, ?it/s]

id,verse_text,label
0,with pale blue berries. in these peaceful shades--,1
1,"it flows so long as falls the rain,",2
2,"and that is why, the lonesome day,",0
3,"when i peruse the conquered fame of heroes, and the victories of mighty generals, i do not envy the generals,",3
4,of inward strife for truth and liberty.,3
5,the red sword sealed their vows!,3
6,and very venus of a pipe.,2
7,"who the man, who, called a brother.",2
8,"and so on. then a worthless gaud or two,",0
9,to hide the orb of truth--and every throne,2


We load the pipeline using the task `text-classification` since we want to classify text with a fixed set of labels.

In [0]:
sentiment_classifier = pipeline(
    task="text-classification",
    model="nickwong64/bert-base-uncased-poems-sentiment",
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)

In [0]:
results = sentiment_classifier(poem_sample["verse_text"])

In [0]:
# Display the predicted sentiment side-by-side with the ground-truth label and original text.
# The score indicates the model's confidence in its prediction.

# Join predictions with ground-truth data
joined_data = (
    pd.DataFrame.from_dict(results)
    .rename({"label": "predicted_label"}, axis=1)
    .join(pd.DataFrame.from_dict(poem_sample).rename({"label": "true_label"}, axis=1))
)

# Change label indices to text labels
sentiment_labels = {0: "negative", 1: "positive", 2: "no_impact", 3: "mixed"}
joined_data = joined_data.replace({"true_label": sentiment_labels})

display(joined_data[["predicted_label", "true_label", "score", "verse_text"]])

predicted_label,true_label,score,verse_text
positive,positive,0.9965937733650208,with pale blue berries. in these peaceful shades--
no_impact,no_impact,0.9987409710884094,"it flows so long as falls the rain,"
negative,negative,0.995965838432312,"and that is why, the lonesome day,"
mixed,mixed,0.9687354564666748,"when i peruse the conquered fame of heroes, and the victories of mighty generals, i do not envy the generals,"
mixed,mixed,0.975967526435852,of inward strife for truth and liberty.
mixed,mixed,0.9665797352790833,the red sword sealed their vows!
no_impact,no_impact,0.9986388087272644,and very venus of a pipe.
no_impact,no_impact,0.9986108541488647,"who the man, who, called a brother."
negative,negative,0.9965572357177734,"and so on. then a worthless gaud or two,"
no_impact,no_impact,0.9985186457633972,to hide the orb of truth--and every throne


### Translation

Translation models may be designed for specific pairs of languages, or they may support more than two languages.  We will see both below.

**Background reading**: See the Hugging Face [task page on translation](https://huggingface.co/tasks/translation) or the [Wikipedia page on machine translation](https://en.wikipedia.org/wiki/Machine_translation).

In this section, we will use:
* **Data**: We will use some example hard-coded sentences.  However, there are a variety of [translation datasets](https://huggingface.co/datasets?task_categories=task_categories:translation&sort=downloads) available from Hugging Face.
* **Models**:
   * [Helsinki-NLP/opus-mt-en-es](https://huggingface.co/Helsinki-NLP/opus-mt-en-es) is used for the first example of English ("en") to Spanish ("es") translation.  This model is based on [Marian NMT](https://marian-nmt.github.io/), a neural machine translation framework developed by Microsoft and other researchers.  See the [GitHub page](https://github.com/Helsinki-NLP/Opus-MT) for code and links to related resources.
   * [t5-small](https://huggingface.co/t5-small) model, which has 60 million parameters (242MB for PyTorch).  T5 is an encoder-decoder model created by Google which supports several tasks such as summarization, translation, Q&A, and text classification.  For more details, see the [Google blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html), [code on GitHub](https://github.com/google-research/text-to-text-transfer-transformer), or the [research paper](https://arxiv.org/pdf/1910.10683.pdf).  For our purposes, it supports translation for English, French, Romanian, and German.

Some models are designed for specific language-to-language translation.  Below, we use an English-to-Spanish model.

In [0]:
en_to_es_translation_pipeline = pipeline(
    task="translation",
    model="Helsinki-NLP/opus-mt-en-es",
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)

In [0]:
en_to_es_translation_pipeline(
    "Existing, open-source (and proprietary) models can be used out-of-the-box for many applications."
)

[{'translation_text': 'Los modelos existentes, de código abierto (y propietario) se pueden utilizar fuera de la caja para muchas aplicaciones.'}]

Other models are designed to handle multiple languages.  Below, we show this with `t5-small`.  Note that, since it supports multiple languages (and tasks), we give it an explicit instruction to translate from one language to another.

In [0]:
t5_small_pipeline = pipeline(
    task="text2text-generation",
    model="t5-small",
    max_length=50,
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)

In [0]:
t5_small_pipeline(
    "translate English to French: Existing, open-source (and proprietary) models can be used out-of-the-box for many applications."
)

[{'generated_text': 'Les modèles existants, libres (et propriétaires) peuvent être utilisés hors de la boîte de commande pour de nombreuses applications.'}]

In [0]:
t5_small_pipeline(
    "translate English to Romanian: Existing, open-source (and proprietary) models can be used out-of-the-box for many applications."
)

[{'generated_text': 'Modelele existente, deschise (şi proprietăţi) pot fi utilizate în afara legii pentru multe aplicaţii.'}]

### Zero-shot classification

Zero-shot classification (or zero-shot learning) is the task of classifying a piece of text into one of a few given categories or labels, without having explicitly trained the model to predict those categories beforehand.  The idea appeared in literature before modern LLMs, but recent advances in LLMs have made zero-shot learning much more flexible and powerful.

**Background reading**: See the Hugging Face [task page on zero-shot classification](https://huggingface.co/tasks/zero-shot-classification) or [Wikipedia on zero-shot learning](https://en.wikipedia.org/wiki/Zero-shot_learning).

In this section, we will use:
* **Data**: a few example articles from the [xsum](https://huggingface.co/datasets/xsum) dataset used in the Summarization section above.  Our goal is to label news articles under a few categories.
* **Model**: [nli-deberta-v3-small](https://huggingface.co/cross-encoder/nli-deberta-v3-small), a fine-tuned version of the DeBERTa model.  The DeBERTa base model was developed by Microsoft and is one of several models derived from BERT; for more details on DeBERTa, see the [Hugging Face doc page](https://huggingface.co/docs/transformers/model_doc/deberta), the [code on GitHub](https://github.com/microsoft/DeBERTa), or the [research paper](https://arxiv.org/abs/2006.03654).

In [0]:
zero_shot_pipeline = pipeline(
    task="zero-shot-classification",
    model="cross-encoder/nli-deberta-v3-small",
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)


def categorize_article(article: str) -> None:
    """
    This helper function defines the categories (labels) which the model must use to label articles.
    Note that our model was NOT fine-tuned to use these specific labels,
    but it "knows" what the labels mean from its more general training.

    This function then prints out the predicted labels alongside their confidence scores.
    """
    results = zero_shot_pipeline(
        article,
        candidate_labels=[
            "politics",
            "finance",
            "sports",
            "science and technology",
            "pop culture",
            "breaking news",
        ],
    )
    # Print the results nicely
    del results["sequence"]
    display(pd.DataFrame(results))



In [0]:
categorize_article(
    """
Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau.
Rynard Landman and Ashton Hewitt got a try in either half for the Dragons.
Glasgow showed far superior strength in depth as they took control of a messy match in the second period.
Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee.
Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes.
It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards.
Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert.
But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal.
The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly.
It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg.
Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try.
The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes.
However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting.
Dragons director of rugby Lyn Jones said: "We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way.
"Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent.
"It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."
Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe.
Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau.
Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson.
Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott.
"""
)

labels,scores
sports,0.4690114259719848
breaking news,0.223164826631546
science and technology,0.1070253774523735
pop culture,0.1044710054993629
politics,0.0573898740112781
finance,0.0389375276863575


In [0]:
categorize_article(
    """
The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could have been carried out to ensure the retaining wall did not fail.
"It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten," she said.
"That may not be true but it is perhaps my perspective over the last few days.
"Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"
Meanwhile, a flood alert remains in place across the Borders because of the constant rain.
Peebles was badly hit by problems, sparking calls to introduce more defences in the area.
Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs.
The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand.
He said it was important to get the flood protection plan right but backed calls to speed up the process.
"I was quite taken aback by the amount of damage that has been done," he said.
"Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."
He said it was important that "immediate steps" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans.
Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.
"""
)

labels,scores
breaking news,0.2082107812166214
politics,0.1737899333238601
pop culture,0.1737534254789352
science and technology,0.1571808010339737
sports,0.1545622050762176
finance,0.132502868771553


### Few-shot learning

In few-shot learning tasks, you give the model an instruction, a few query-response examples of how to follow that instruction, and then a new query.  The model must generate the response for that new query.  This technique has pros and cons: it is very powerful and allows models to be reused for many more applications, but it can be finicky and require significant prompt engineering to get good and reliable results.

**Background reading**: See the [Wikipedia page on few-shot learning](https://en.wikipedia.org/wiki/Few-shot_learning_&#40;natural_language_processing&#41;) or [this Hugging Face blog about few-shot learning](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api).

In this section, we will use:
* **Task**: Few-shot learning can be applied to many tasks.  Here, we will do sentiment analysis, which was covered earlier.  However, you will see how few-shot learning allows us to specify custom labels, whereas the previous model was tuned for a specific set of labels.  We will also show other (toy) tasks at the end.  In terms of the Hugging Face `task` specified in the `pipeline` constructor, few-shot learning is handled as a `text-generation` task.
* **Data**: We use a few examples, including a tweet example from the blog post linked above.
* **Model**: [gpt-neo-1.3B](https://huggingface.co/EleutherAI/gpt-neo-1.3B), a version of the GPT-Neo model discussed in the blog linked above.  It is a transformer model with 1.3 billion parameters developed by Eleuther AI.  For more details, see the [code on GitHub](https://github.com/EleutherAI/gpt-neo) or the [research paper](https://arxiv.org/abs/2204.06745).

In [0]:
# We will limit the response length for our few-shot learning tasks.
from transformers import pipeline
few_shot_pipeline = pipeline(
    task="text-generation",
    model="EleutherAI/gpt-neo-1.3B",
    max_new_tokens=10,
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]



Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

***Tip***: In the few-shot prompts below, we separate the examples with a special token "###" and use the same token to encourage the LLM to end its output after answering the query.  We will tell the pipeline to use that special token as the end-of-sequence (EOS) token below.

In [0]:
# Get the token ID for "###", which we will use as the EOS token below.
eos_token_id = few_shot_pipeline.tokenizer.encode("###")[0]

In [0]:
# Without any examples, the model output is inconsistent and usually incorrect.
results = few_shot_pipeline(
    """For each tweet, describe its sentiment:

[Tweet]: "This new music video was incredible"
[Sentiment]:""",
    eos_token_id=eos_token_id,
)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


For each tweet, describe its sentiment:

[Tweet]: "This new music video was incredible"
[Sentiment]: "very good"
[Tweet]: "it


In [0]:
# With only 1 example, the model may or may not get the answer right.
results = few_shot_pipeline(
    """For each tweet, describe its sentiment:

[Tweet]: "This is the link to the article"
[Sentiment]: Neutral
###
[Tweet]: "This new music video was incredible"
[Sentiment]:""",
    eos_token_id=eos_token_id,
)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


For each tweet, describe its sentiment:

[Tweet]: "This is the link to the article"
[Sentiment]: Neutral
###
[Tweet]: "This new music video was incredible"
[Sentiment]: Neutral
###


In [0]:
# With 1 example for each sentiment, the model is more likely to understand!
results = few_shot_pipeline(
    """For each tweet, describe its sentiment:

[Tweet]: "I hate it when my phone battery dies."
[Sentiment]: Negative
###
[Tweet]: "My day has been 👍"
[Sentiment]: Positive
###
[Tweet]: "This is the link to the article"
[Sentiment]: Neutral
###
[Tweet]: "This new music video was incredible"
[Sentiment]:""",
    eos_token_id=eos_token_id,
)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


For each tweet, describe its sentiment:

[Tweet]: "I hate it when my phone battery dies."
[Sentiment]: Negative
###
[Tweet]: "My day has been 👍"
[Sentiment]: Positive
###
[Tweet]: "This is the link to the article"
[Sentiment]: Neutral
###
[Tweet]: "This new music video was incredible"
[Sentiment]: Positive
###


Just for fun, we show a few more examples below.

In [0]:
# The model isn't ready to serve drinks!
results = few_shot_pipeline(
    """For each food, suggest a good drink pairing:

[food]: tapas
[drink]: wine
###
[food]: pizza
[drink]: soda
###
[food]: jalapenos poppers
[drink]: beer
###
[food]: scone
[drink]:""",
    eos_token_id=eos_token_id,
)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


For each food, suggest a good drink pairing:

[food]: tapas
[drink]: wine
###
[food]: pizza
[drink]: soda
###
[food]: jalapenos poppers
[drink]: beer
###
[food]: scone
[drink]: club soda

I like a couple of these


In [0]:
# This example sometimes works and sometimes does not, when sampling.  Too abstract?
results = few_shot_pipeline(
    """Given a word describing how someone is feeling, suggest a description of that person.  The description should not include the original word.

[word]: happy
[description]: smiling, laughing, clapping
###
[word]: nervous
[description]: glancing around quickly, sweating, fidgeting
###
[word]: sleepy
[description]: heavy-lidded, slumping, rubbing eyes
###
[word]: confused
[description]:""",
    eos_token_id=eos_token_id,
)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


Given a word describing how someone is feeling, suggest a description of that person.  The description should not include the original word.

[word]: happy
[description]: smiling, laughing, clapping
###
[word]: nervous
[description]: glancing around quickly, sweating, fidgeting
###
[word]: sleepy
[description]: heavy-lidded, slumping, rubbing eyes
###
[word]: confused
[description]: watching, thinking

The word that best describes


In [0]:
# We override max_new_tokens to generate longer answers.
# These book descriptions were taken from their corresponding Wikipedia pages.
results = few_shot_pipeline(
    """Generate a book summary from the title:

[book title]: "Stranger in a Strange Land"
[book description]: "This novel tells the story of Valentine Michael Smith, a human who comes to Earth in early adulthood after being born on the planet Mars and raised by Martians, and explores his interaction with and eventual transformation of Terran culture."
###
[book title]: "The Adventures of Tom Sawyer"
[book description]: "This novel is about a boy growing up along the Mississippi River. It is set in the 1840s in the town of St. Petersburg, which is based on Hannibal, Missouri, where Twain lived as a boy. In the novel, Tom Sawyer has several adventures, often with his friend Huckleberry Finn."
###
[book title]: "Dune"
[book description]: "This novel is set in the distant future amidst a feudal interstellar society in which various noble houses control planetary fiefs. It tells the story of young Paul Atreides, whose family accepts the stewardship of the planet Arrakis. While the planet is an inhospitable and sparsely populated desert wasteland, it is the only source of melange, or spice, a drug that extends life and enhances mental abilities.  The story explores the multilayered interactions of politics, religion, ecology, technology, and human emotion, as the factions of the empire confront each other in a struggle for the control of Arrakis and its spice."
###
[book title]: "Blue Mars"
[book description]:""",
    eos_token_id=eos_token_id,
    max_new_tokens=50,
)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


Generate a book summary from the title:

[book title]: "Stranger in a Strange Land"
[book description]: "This novel tells the story of Valentine Michael Smith, a human who comes to Earth in early adulthood after being born on the planet Mars and raised by Martians, and explores his interaction with and eventual transformation of Terran culture."
###
[book title]: "The Adventures of Tom Sawyer"
[book description]: "This novel is about a boy growing up along the Mississippi River. It is set in the 1840s in the town of St. Petersburg, which is based on Hannibal, Missouri, where Twain lived as a boy. In the novel, Tom Sawyer has several adventures, often with his friend Huckleberry Finn."
###
[book title]: "Dune"
[book description]: "This novel is set in the distant future amidst a feudal interstellar society in which various noble houses control planetary fiefs. It tells the story of young Paul Atreides, whose family accepts the stewardship of the planet Arrakis. While the planet is an in

**Prompt engineering** is a new but critical technique for working with LLMs.  You saw some brief examples above.  As you use more general and powerful models, constructing good prompts becomes ever more important.  Some great resources to learn more are:
* [Wikipedia](https://en.wikipedia.org/wiki/Prompt_engineering) for a brief overview
* [Best practices for prompt engineering with OpenAI API](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api)
* [🧠 Awesome ChatGPT Prompts](https://github.com/f/awesome-chatgpt-prompts) for fun examples with ChatGPT

## Hugging Face APIs

In this section, we dive into some more details on Hugging Face APIs.
* Search and sampling to generate text
* Auto* loaders for tokenizers and models
* Model-specific loaders

Recall the `xsum` dataset from the **Summarization** section above:

In [0]:
from datasets import load_dataset
from transformers import pipeline

In [0]:
xsum_dataset = load_dataset(
    "xsum", version="1.2.0", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01"
)  # Note: We specify cache_dir to use predownloaded data.
xsum_dataset  # The printed representation of this object shows the `num_rows` of each dataset split.

Found cached dataset xsum (/dbfs/mnt/dbacademy-datasets/large-language-models/v01/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

In [0]:
xsum_sample = xsum_dataset["train"].select(range(10))
display(xsum_sample.to_pandas())

document,summary,id
"The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.",Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,35232142
"A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately.""",Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,40143035
"Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details",Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,35951548
"John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues.","A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.",36266422
"Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead.","An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",38826984
"Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott.",Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,34540833
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable.",A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,20836172
"Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica.",Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,35932467
"Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title.""",Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,40758845
"The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC"".","A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".",30358490


### Search and sampling in inference

You may see parameters like `num_beams`, `do_sample`, etc. specified in Hugging Face pipelines.  These are inference configurations.

LLMs work by predicting (generating) the next token, then the next, and so on.  The goal is to generate a high probability sequence of tokens, which is essentially a search through the (enormous) space of potential sequences.

To do this search, LLMs use one of two main methods:
* **Search**: Given the tokens generated so far, pick the next most likely token in a "search."
   * **Greedy search** (default): Pick the single next most likely token in a greedy search.
   * **Beam search**: Greedy search can be extended via beam search, which searches down several sequence paths, via the parameter `num_beams`.
* **Sampling**: Given the tokens generated so far, pick the next token by sampling from the predicted distribution of tokens.
   * **Top-K sampling**: The parameter `top_k` modifies sampling by limiting it to the `k` most likely tokens.
   * **Top-p sampling**: The parameter `top_p` modifies sampling by limiting it to the most likely tokens up to probability mass `p`.

You can toggle between search and sampling via parameter `do_sample`.

For more background on search and sampling, see [this Hugging Face blog post](https://huggingface.co/blog/how-to-generate).

We will illustrate these various options below using our summarization pipeline.

In [0]:
summarizer = pipeline(
    task="summarization",
    model="t5-small",
    min_length=20,
    max_length=40,
    truncation=True,
    model_kwargs={"cache_dir": "/dbfs/mnt/dbacademy-datasets/large-language-models/v01"},
)  # Note: We specify cache_dir to use predownloaded models.

In [0]:
# We previously called the summarization pipeline using the default inference configuration.
# This does greedy search.
summarizer(xsum_sample["document"][0])



[{'summary_text': 'the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the'}]

In [0]:
# We can instead do a beam search by specifying num_beams.
# This takes longer to run, but it might find a better (more likely) sequence of text.
summarizer(xsum_sample["document"][0], num_beams=10)

[{'summary_text': 'the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the'}]

In [0]:
# Alternatively, we could use sampling.
summarizer(xsum_sample["document"][0], do_sample=True)

[{'summary_text': 'a full cost of damage in Newton Stewart is still being assessed . many roads in the area remain badly affected by standing water . a flood alert remains in place across the Borders'}]

In [0]:
# We can modify sampling to be more greedy by limiting sampling to the top_k or top_p most likely next tokens.
summarizer(xsum_sample["document"][0], do_sample=True, top_k=10, top_p=0.8)

[{'summary_text': 'the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the'}]

### Auto* loaders for tokenizers and models

We have already seen the `dataset` and `pipeline` abstractions from Hugging Face.  While a `pipeline` is a quick way to set up an LLM for a given task, the slightly lower-level abstractions `model` and `tokenizer` permit a bit more control over options.  We will show how to use those briefly, following this pattern:

* Given input articles.
* Tokenize them (converting to token indices).
* Apply the model on the tokenized data to generate summaries (represented as token indices).
* Decode the summaries into human-readable text.

We will first look at the [Auto* classes](https://huggingface.co/docs/transformers/model_doc/auto) for tokenizers and model types which can simplify loading pre-trained tokenizers and models.

API docs:
* [AutoTokenizer](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoTokenizer)
* [AutoModelForSeq2SeqLM](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForSeq2SeqLM)

In [0]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the pre-trained tokenizer and model.
tokenizer = AutoTokenizer.from_pretrained("t5-small", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01")

In [0]:
import pandas as pd
# For summarization, T5-small expects a prefix "summarize: ", so we prepend that to each article as a prompt.
articles = list(map(lambda article: "summarize: " + article, xsum_sample["document"]))
display(pd.DataFrame(articles, columns=["prompts"]))



prompts
"summarize: The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk."
"summarize: A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately."""
"summarize: Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details"
"summarize: John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues."
"summarize: Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead."
"summarize: Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott."
"summarize: Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable."
"summarize: Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica."
"summarize: Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title."""
"summarize: The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC""."


In [0]:
# Tokenize the input
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)
print("input_ids:")
print(inputs["input_ids"])
print("attention_mask:")
print(inputs["attention_mask"])

input_ids:
tensor([[21603,    10,    37,  ...,     0,     0,     0],
        [21603,    10,    71,  ...,     0,     0,     0],
        [21603,    10, 21945,  ..., 18002,    21,     1],
        ...,
        [21603,    10, 21768,  ...,     0,     0,     0],
        [21603,    10,  9982,  ...,     0,     0,     0],
        [21603,    10,    37,  ...,     0,     0,     0]])
attention_mask:
tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])


In [0]:
# Generate summaries
summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    num_beams=2,
    min_length=0,
    max_length=40,
)
print(summary_ids)

tensor([[    0,     8,   423,   583,    13,  1783,    16, 20126, 16496,    19,
           341,   271, 14841,     3,     5,   186,  7540,    16,   158,    15,
          2296,     7,  5718,  2367, 14621,  4161,    57,  4125,   387,     3,
             5,     3,     9,  8347,  5685,  3048,    16,   286,   640,     8],
        [    0,  1472,  6196,   877,   326,    44,     8,  9108,    86,    29,
            16,  6000,  1887,    30,  1856,     3,     5,  2554,   130,  1380,
            12,  1175,     8,  1595,     3,     5,    80,    13,     8,   192,
         14264,    19,    45, 13692,    63,     6,     8,   119,    45, 20576],
        [    0,     3,   849,  2239,     7,   163, 14014,     3,    60,  8234,
           232,   227,     3, 19585,   643,   845,   150,  8033,    47,   787,
            30,   213,     3,    88,   225,  2447,     3,     5,     3,   849,
          2239,     7,   497,     3,    31,    29,    32,   964,  8033,    47],
        [    0,     8,     3,  3708,    18,  1201

In [0]:
# Decode the generated summaries
decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
display(pd.DataFrame(decoded_summaries, columns=["decoded_summaries"]))

### Model-specific tokenizer and model loaders

You can also more directly load specific tokenizer and model types, rather than relying on `Auto*` classes to choose the right ones for you.

API docs:
* [T5Tokenizer](https://huggingface.co/docs/transformers/main/en/model_doc/t5#transformers.T5Tokenizer)
* [T5ForConditionalGeneration](https://huggingface.co/docs/transformers/main/en/model_doc/t5#transformers.T5ForConditionalGeneration)

In [0]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("t5-small", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01")
model = T5ForConditionalGeneration.from_pretrained(
    "t5-small", cache_dir="/dbfs/mnt/dbacademy-datasets/large-language-models/v01"
)

In [0]:
# The tokenizer and model can then be used similarly to how we used the ones loaded by the Auto* classes.
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)
summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    num_beams=2,
    min_length=0,
    max_length=40,
)
decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)

display(pd.DataFrame(decoded_summaries, columns=["decoded_summaries"]))

decoded_summaries
the full cost of damage in Newton Stewart is still being assessed. many roads in peeblesshire remain badly affected by standing water. a flood alert remains in place across the
"fire alarm went off at the Holiday Inn in Hope Street on Saturday. guests were asked to leave the hotel. one of the two buses is from germany, the other from china"
stewards only handed reprimand after governing body says no instruction was given on where he should park. stewards say 'no clear instruction was
"the 67-year-old is accused of committing the offences between March 1972 and October 1989. he denies all the charges, including two counts of indecency"
a man receiving treatment at the clinic threatened to shoot himself and others. a young police officer was evacuated from the hospital. the incident comes amid tension in Istanbul following several attacks
Gregor Townsend gave a debut to powerhouse wing Taqele Naiyaravoro. the dragons gave first starts of the season to wing a
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured in the crash. police want to trace Nathan Davis, 27, who has links to the Audi."
the 25-year-old was hit by a motorbike during the Gent-Wevelgem race. the race passed through northern france. the sport's governing
gundogan says he can see the finishing line after tearing cruciate knee ligaments in December. the german missed the 2014 world cup after back surgery that kept him out for
"the crash happened about 07:20 GMT at the junction of the A127 and Progress Road in leigh-on-Sea, Essex. the man, aged in his 20s"


## Summary

We've covered some common LLM applications and seen how to get started with them quickly using pre-trained models from the Hugging Face Hub.  We've also see how to tweak some configurations.

But how did we find those models for our tasks?  In the lab, you will find new pre-trained models for tasks, using the Hugging Face Hub.  You will also explore tweaking model configurations to gain intuition about their effects.

-sandbox
&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>