# Summary test with BART

This file contains the procedure we did with:
- Web Scrapping -> Newspaper3k
- Summary creating -> BART

In [3]:
from newspaper import Article

In [2]:
cnn_microsoft = 'https://edition.cnn.com/2024/01/12/business/apple-microsoft-most-valuable-publicly-traded-company/index.html'
fox = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
sky_sports = 'https://www.skysports.com/football/news/11095/13055829/jurgen-klopp-clarifies-mohamed-salahs-liverpool-return-hes-the-most-loyal-egyptian-ive-met' 

In [13]:
def create_article(url):
    article = Article(url)
    article.download()
    article.parse()
    return article

In [14]:
article = create_article(sky_sports)

In [15]:
sinner_dj = 'https://edition.cnn.com/2024/01/26/sport/jannik-sinner-novak-djokovic-australian-open-spt-intl/index.html'
article_2 = create_article(sinner_dj)

In [7]:
from transformers import AutoTokenizer, BartForConditionalGeneration
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

In [16]:
ARTICLE_TO_SUMMARIZE = (
    article_2.text
)
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt', truncation=True)

In [18]:
# Generate Summary
summary_ids = model.generate(inputs["input_ids"], num_beams=2, min_length=100, max_length=200)
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

"Novak Djokovic beaten by Jannik Sinner in Australian Open semifinal. Djakovic loses for first time at Melbourne Park in 2,195 days. Sinner, 22, will now contest first grand slam final of his career in Sunday's showpiece. Djokovich's bid for an outright record 25th grand slam title is put on hold after defeat.. Sner is the youngest male finalist at the Australian Open since 2008. He will face either Daniil Medvedev or Alexander Zverev in the final."

### 3 Different summaries

In [24]:
summaries = []
for i in range(3):
    # Adjusting temperature and top_k for each summary
    summary_ids = model.generate(inputs["input_ids"], 
                                 num_beams=4 + i,  # fixed number of beams
                                 temperature=1.0 + (0.3 * i),  # increasing temperature
                                 top_k=50 + (20 * i),  # increasing top_k
                                 min_length=100, 
                                 max_length=200, 
                                 do_sample=True,
                                 early_stopping=True)

    summary = tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    summaries.append(summary)

In [25]:
print(summaries[0])
print()
print(summaries[1])
print()
print(summaries[2])

Novak Djokovic beaten by Jannik Sinner in Australian Open semifinal. Sinner, 22, is youngest man to reach Australian Open final since 2008. Italian will face either Daniil Medvedev or Alexander Zverev in Sunday's showpiece. Djok Serbian has not lost at Australian Open in 2,195 days, 33 matches ago. Back to Mail Online home. back to the page you came from to see the full match. Click here to read the match report.

Novak Djokovic beaten by Jannik Sinner in semifinal of Australian Open. Sinner, 22, is youngest male finalist at Melbourne Park since 2008. The Italian will now face either Daniil Medvedev or Alexander Zverev in Sunday's final. It is the first time in 33 matchesDjokovic has lost at the Australian Open in Melbourne. The Serb is bidding for his 11th title and a record 25th grand slam title in all.

Novak Djokovic beaten by Italy's Jannik Sinner in Australian Open semis. The 22-year-old Sinner is youngest male finalist at Melbourne Park in 10 years. Sinner will now face either D

### Other approach

In [26]:
summaries = []
for i in range(3):
    summary_ids = model.generate(inputs["input_ids"], 
                                 num_beams=4 + i,  # Varying number of beams
                                 no_repeat_ngram_size=2 + i,  # Preventing repeating n-grams
                                 length_penalty=1.0 + (0.1 * i),  # Adjusting length penalty
                                 min_length=100, 
                                 max_length=200, 
                                 early_stopping=True)

    summary = tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    summaries.append(summary)

In [27]:
print(summaries[0])
print()
print(summaries[1])
print()
print(summaries[2])

Novak Djokovic beaten by Jannik Sinner in Australian Open semifinal. Djakovic loses for first time at Melbourne Park in 2,195 days. Italian will now face either Daniil Medvedev or Alexander Zverev in Sunday's final. The 22-year-old is the youngest male finalist at the tournament since 2008. He has now beaten the Serb in three of their last four matches dating back to November. It is his first grand slam final of his career.

Novak Djokovic beaten by Jannik Sinner in Australian Open semifinal. Sinner, 22, is the youngest male finalist at Australian Open since 2008. He will face either Daniil Medvedev or Alexander Zverev in Sunday's showpiece. Djkovic's bid for an outright record 25th grand slam title is put on hold after he was outplayed by the Italian across their three hour, 22-minute contest. The Serb was broken at 2-1 in the fourth set having held a 40-0 lead.

Novak Djokovic beaten by Jannik Sinner 6-1 6-2 6-7(6-8) 6-3 in Australian Open semifinal. The 22-year-old Italian will now 

In [11]:
summary_dicc = {
    'Summary 1': '',
    'Summary 2': '',
    'Summary 3': ''
}
def create_article(url):
        article = Article(url)
        article.download()
        article.parse()
        return article
def summary_generator(selected_news):
    global summary_dicc
    
    article = create_article(selected_news)
    ARTICLE_TO_SUMMARIZE = (
    article.text
    )
    inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt', truncation=True)
    summaries_ = []
    for i in range(3):
        summary_ids = model.generate(inputs["input_ids"], 
                                 num_beams=4 + i,  # fixed number of beams
                                 temperature=1.0 + (0.3 * i),  # increasing temperature
                                 top_k=50 + (20 * i),  # increasing top_k
                                 min_length=100, 
                                 max_length=200, 
                                 do_sample=True,
                                 early_stopping=True)

        summary = tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
        summaries_.append(summary)
    
    for i, key in enumerate(summary_dicc.keys()):
        summary_dicc[key] = summaries_[i]
        
    return summaries_[0], summaries_[1], summaries_[2]

In [12]:
x = summary_generator('https://edition.cnn.com/2024/02/02/tech/apple-vision-pro-what-you-need-to-know/index.html')

In [13]:
x

('The Vision Pro, Apple’s first new product in seven years, officially launched in stores on Friday in the US. Retail stores are offering private one-on-one demos on a first come, first served basis. The headset will have 256 GB of storage, and prescription lens inserts for the device will be available starting at $149. Once you factor in additional accessories, like a $200 travel case and $50 battery pack holder and more, it can cost up to $4,600.',
 "The $3,499 Vision Pro is Apple's first major release since the Apple Watch nine years ago. Retail stores are offering private one-on-one demos on a first come, first served basis. The headset will have 256 GB of storage, and prescription lens inserts for the device will be available starting at $149. Once you factor in additional accessories, like a $200 travel case and $50 battery pack holder and more, it can cost up to $4,600, The New York Times reported.",
 "Apple's new mixed reality headset went on sale at Apple Stores across the cou