In [1]:
import os
import sys
import requests
import pandas as pd
import re
import tensorflow as tf
print('Python Version: ' + sys.version)
print('TensorFlow Version: ' + tf.__version__)

Python Version: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) 
[GCC 9.4.0]
TensorFlow Version: 2.6.4


## Summarizing Documents With *ktrain*

Original Notebook: https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/text_summarization_with_bart.ipynb  

*ktrain* includes the ability to summarize text based on a pretrained [BART](https://arxiv.org/abs/1910.13461) model from the `transformers` library.

To perform summarization, first create a `TransformerSummarizer` instance as follows. (Note that this feature requires PyTorch to be installed on your system.)

In [2]:
# !pip install ktrain --upgrade

In [3]:
import ktrain
from ktrain import text
ts = text.TransformerSummarizer()

print('Ktrain Version: ' + ktrain.__version__)

Ktrain Version: 0.31.10


#### Copy files to local FS from GCP bucket

In [4]:
def get_gcs_data (bucket_name, folder_name, file_name, path_local):
    url = 'https://storage.googleapis.com/' + bucket_name + '/' + folder_name + '/' + file_name
    r = requests.get(url)
    open(path_local + '/' + file_name , 'wb').write(r.content)

In [5]:
bucket_name = 'msca-bdp-data-open'
folder_name = 'books'
file_name = ['3boat10.txt']
path_local = '/home/jupyter/data/books'

os.makedirs(path_local, exist_ok=True)

for file in file_name:
    get_gcs_data (bucket_name = bucket_name,
                 folder_name = folder_name,
                 file_name = file,
                 path_local = path_local)
    print('Downloaded: ' + file)

Downloaded: 3boat10.txt


In [6]:
bucket_name = 'msca-bdp-data-open'
folder_name = 'news'
file_name = ['news_toyota.json']
path_local = '/home/jupyter/data/news'

os.makedirs(path_local, exist_ok=True)

for file in file_name:
    get_gcs_data (bucket_name = bucket_name,
                 folder_name = folder_name,
                 file_name = file,
                 path_local = path_local)
    print('Downloaded: ' + file)

Downloaded: news_toyota.json


In [7]:
text = '''

The University of Chicago is an urban research university that has driven new ways of thinking since 1890. Our commitment to free and open inquiry draws inspired scholars to our global campuses, where ideas are born that challenge and change the world.

We empower individuals to challenge conventional thinking in pursuit of original ideas. Students in the College develop critical, analytic, and writing skills in our rigorous, interdisciplinary core curriculum. Through graduate programs, students test their ideas with UChicago scholars, and become the next generation of leaders in academia, industry, nonprofits, and government.

UChicago research has led to such breakthroughs as discovering the link between cancer and genetics, establishing revolutionary theories of economics, and developing tools to produce reliably excellent urban schooling. We generate new insights for the benefit of present and future generations with our national and affiliated laboratories: Argonne National Laboratory, Fermi National Accelerator Laboratory, and the Marine Biological Laboratory in Woods Hole, Massachusetts.

The University of Chicago is enriched by the city we call home. In partnership with our neighbors, we invest in Chicago's mid-South Side across such areas as health, education, economic growth, and the arts. Together with our medical center, we are the largest private employer on the South Side.

In all we do, we are driven to dig deeper, push further, and ask bigger questions—and to leverage our knowledge to enrich all human life. Our diverse and creative students and alumni drive innovation, lead international conversations, and make masterpieces. Alumni and faculty, lecturers and postdocs go on to become Nobel laureates, CEOs, university presidents, attorneys general, literary giants, and astronauts. 
'''

print(text)



The University of Chicago is an urban research university that has driven new ways of thinking since 1890. Our commitment to free and open inquiry draws inspired scholars to our global campuses, where ideas are born that challenge and change the world.

We empower individuals to challenge conventional thinking in pursuit of original ideas. Students in the College develop critical, analytic, and writing skills in our rigorous, interdisciplinary core curriculum. Through graduate programs, students test their ideas with UChicago scholars, and become the next generation of leaders in academia, industry, nonprofits, and government.

UChicago research has led to such breakthroughs as discovering the link between cancer and genetics, establishing revolutionary theories of economics, and developing tools to produce reliably excellent urban schooling. We generate new insights for the benefit of present and future generations with our national and affiliated laboratories: Argonne National Labo

Now, let's use our `TransformerSummarizer` instance to summarize the long document.

In [8]:
ts.summarize(text)

'The University of Chicago is an urban research university that has driven new ways of thinking since 1890. UChicago research has led to such breakthroughs as discovering the link between cancer and genetics. Alumni and faculty, lecturers and postdocs go on to become Nobel laureates, CEOs, university presidents, attorneys general, literary giants, and astronauts.'

### Summarizing book with ktrain

In [9]:
directory = '/home/jupyter/data/books/'
book = '3boat10.txt'
f = open(directory+book)
textRaw = f.read()
text = re.sub(r'\n', ' ', textRaw)

In [10]:
%time ts.summarize(text)

CPU times: user 2.56 s, sys: 154 ms, total: 2.72 s
Wall time: 2.71 s


"Jerome K. Harris was a victim to one hundred and seven deadly diseases. He was diagnosed with typhoid fever, diphtheria, Bright's disease, Cholera, and housemaid's knee. He also suffered from zymosis, which he thought he had been born with. He wrote a book about his experiences."

### Summarizing news articles with ktrain

In [11]:
directory = '/home/jupyter/data/news/'
news_articles = 'news_toyota.json'

path = directory+news_articles

In [12]:
news_df = pd.read_json(directory+news_articles, orient='records', lines=True)

news_df.shape

(100, 4)

In [13]:
# Filter non-English news
news_eng = news_df[news_df['language']=='english'].reset_index(drop=True)

In [14]:
# Remove /n characters to avoid problems with analysis
news_eng['text_clean'] = news_eng['text'].map(lambda x: re.sub(r'\n', '.  ', str(x)))

In [15]:
pd.set_option('display.max_colwidth', None)
news_eng[['text', 'text_clean']].head(5)

Unnamed: 0,text,text_clean
0,"QR Code Link to This Post All maintenance receipts available, one owner truck. Cash sale. No trades. 6477478013","QR Code Link to This Post All maintenance receipts available, one owner truck. Cash sale. No trades. 6477478013"
1,"0 \nNEW YORK: Automakers reported mixed US car sales in January, with strong demand for SUVs and pickup trucks continuing to provide a cushion in a declining overall auto market. \nFord and Fiat Chrysler reported declines in year-over-year sales, while General Motors scored a modest increase and Toyota saw a more substantial jump. \nUS car sales fell last year for the first time since the financial crisis and are projected to decline again in 2018. Still, analysts and industry executives expect US sales this year to come in above a solid 16 million vehicles amid low unemployment and strong consumer confidence. \n“US economic factors are very healthy and we’re seeing the effect in the auto industry — not just in strong demand for SUVs and pickups, but in demand for high trim versions of vehicles,” said Mark LaNeve, Ford’s vice president for US marketing. \nFord’s January sales dropped 6.6 percent from the same month of 2017 to 161,143. Within the total, car sales slumped 23.3 percent, including big drops for the Fusion and Focus, but that was partially countered by increased sales of the market-leading F-Series pickups. \nFiat Chrysler (FCA) saw sales fall 13 percent to 132,803, with gains for the Jeep brand offset by hefty declines in other models. \nAnd in contrast with the trend for strong sales of pickup trucks, the Ram truck brand fell 16 percent. However, FCA introduced a revamped fleet of the popular pickup at the Detroit Auto Show last month. \nMeanwhile, GM posted a 1.3 percent increase in overall sales compared to January 2017 to 198,548. The biggest US automaker pointed to strong sales of larger vehicles, including the Silverado pickup truck and Chevrolet Equinox SUV. \nToyota lead the pack with a 16.8 percent jump last month to 167,056, on gains in light trucks and in its sedan business. \nThe Toyota Camry, which was revamped with the 2018 model, saw a 21.3 percent increase in January. The company also will introduce an upgraded Avalon sedan to dealerships this spring. \nEdmunds.com had projected a 1.4 percent drop in overall sales compared with the same month of last year, largely due to seasonal factors. \n“In January, automakers are expected to pull the reins in on the more generous incentive programs that we saw at the end of 2017,” Jessica Caldwell, executive director of industry analysis at Edmunds, said in a forecast note. \n“However, it’s typical to see a slowdown at dealerships in January following the high-selling holiday months. This isn’t necessarily a solid indicator of the direction that the year is headed in terms of overall sales.” \nCopyright AFP (Agence France-Press), 2018","0 . NEW YORK: Automakers reported mixed US car sales in January, with strong demand for SUVs and pickup trucks continuing to provide a cushion in a declining overall auto market. . Ford and Fiat Chrysler reported declines in year-over-year sales, while General Motors scored a modest increase and Toyota saw a more substantial jump. . US car sales fell last year for the first time since the financial crisis and are projected to decline again in 2018. Still, analysts and industry executives expect US sales this year to come in above a solid 16 million vehicles amid low unemployment and strong consumer confidence. . “US economic factors are very healthy and we’re seeing the effect in the auto industry — not just in strong demand for SUVs and pickups, but in demand for high trim versions of vehicles,” said Mark LaNeve, Ford’s vice president for US marketing. . Ford’s January sales dropped 6.6 percent from the same month of 2017 to 161,143. Within the total, car sales slumped 23.3 percent, including big drops for the Fusion and Focus, but that was partially countered by increased sales of the market-leading F-Series pickups. . Fiat Chrysler (FCA) saw sales fall 13 percent to 132,803, with gains for the Jeep brand offset by hefty declines in other models. . And in contrast with the trend for strong sales of pickup trucks, the Ram truck brand fell 16 percent. However, FCA introduced a revamped fleet of the popular pickup at the Detroit Auto Show last month. . Meanwhile, GM posted a 1.3 percent increase in overall sales compared to January 2017 to 198,548. The biggest US automaker pointed to strong sales of larger vehicles, including the Silverado pickup truck and Chevrolet Equinox SUV. . Toyota lead the pack with a 16.8 percent jump last month to 167,056, on gains in light trucks and in its sedan business. . The Toyota Camry, which was revamped with the 2018 model, saw a 21.3 percent increase in January. The company also will introduce an upgraded Avalon sedan to dealerships this spring. . Edmunds.com had projected a 1.4 percent drop in overall sales compared with the same month of last year, largely due to seasonal factors. . “In January, automakers are expected to pull the reins in on the more generous incentive programs that we saw at the end of 2017,” Jessica Caldwell, executive director of industry analysis at Edmunds, said in a forecast note. . “However, it’s typical to see a slowdown at dealerships in January following the high-selling holiday months. This isn’t necessarily a solid indicator of the direction that the year is headed in terms of overall sales.” . Copyright AFP (Agence France-Press), 2018"
2,transmission: automatic 2005 Toyota Camry LE 4 door 4 cyl AUTOMATIC VERY CLEAN INSIDE CLOTH INTERIOR NICE. Just has a Little Hale damage car runs. GREAT 167300 MILEAGE. CALL show contact info . $2450 6473894894,transmission: automatic 2005 Toyota Camry LE 4 door 4 cyl AUTOMATIC VERY CLEAN INSIDE CLOTH INTERIOR NICE. Just has a Little Hale damage car runs. GREAT 167300 MILEAGE. CALL show contact info . $2450 6473894894
3,favorite this post Brand New Toyota Avalon Floor Mats - $115 (New Britain) hide this posting unhide QR Code Link to This Post I have a set of front and rear original Toyota Avalon floor mats in black. These have never been installed and are still in the original wrapping. These mats based on Toyota's website will fit all Avalons from 2013-2018 and retail for 154.47 . If interested I can be reached at show contact info . Thanks do NOT contact me with unsolicited services or offers post id: 6468888955,favorite this post Brand New Toyota Avalon Floor Mats - $115 (New Britain) hide this posting unhide QR Code Link to This Post I have a set of front and rear original Toyota Avalon floor mats in black. These have never been installed and are still in the original wrapping. These mats based on Toyota's website will fit all Avalons from 2013-2018 and retail for 154.47 . If interested I can be reached at show contact info . Thanks do NOT contact me with unsolicited services or offers post id: 6468888955
4,"more ads by this user QR Code Link to This Post Black w/Piano Black w/Perforated NuLuxe Seat Trim. CARFAX One-Owner. 31/21 Highway/City MPG Obsidian Priced below KBB Fair Purchase Price!Locally owned and operated, Coliseum Lexus of Oakland takes pride in treating customers to the very best in customer service. Our Red Carpet Elite programs define Lexus luxury. Also included with all Lexus and Toyota Vehicles unless certified, 3 synthetic oil changes, filter and inspection are included. Follow the links below to learn more.3.5L V6 DOHC Dual VVT-i 24V Engine:","more ads by this user QR Code Link to This Post Black w/Piano Black w/Perforated NuLuxe Seat Trim. CARFAX One-Owner. 31/21 Highway/City MPG Obsidian Priced below KBB Fair Purchase Price!Locally owned and operated, Coliseum Lexus of Oakland takes pride in treating customers to the very best in customer service. Our Red Carpet Elite programs define Lexus luxury. Also included with all Lexus and Toyota Vehicles unless certified, 3 synthetic oil changes, filter and inspection are included. Follow the links below to learn more.3.5L V6 DOHC Dual VVT-i 24V Engine:"


#### Summarizing a single article

In [16]:
text = str(news_eng['text_clean'][1])
text

'0 .  NEW YORK: Automakers reported mixed US car sales in January, with strong demand for SUVs and pickup trucks continuing to provide a cushion in a declining overall auto market. .  Ford and Fiat Chrysler reported declines in year-over-year sales, while General Motors scored a modest increase and Toyota saw a more substantial jump. .  US car sales fell last year for the first time since the financial crisis and are projected to decline again in 2018. Still, analysts and industry executives expect US sales this year to come in above a solid 16 million vehicles amid low unemployment and strong consumer confidence. .  “US economic factors are very healthy and we’re seeing the effect in the auto industry — not just in strong demand for SUVs and pickups, but in demand for high trim versions of vehicles,” said Mark LaNeve, Ford’s vice president for US marketing. .  Ford’s January sales dropped 6.6 percent from the same month of 2017 to 161,143. Within the total, car sales slumped 23.3 perc

In [17]:
%time ts.summarize(text)

CPU times: user 1.17 s, sys: 89.4 ms, total: 1.26 s
Wall time: 1.26 s


'Ford and Fiat Chrysler reported declines in year-over-year sales. General Motors scored a modest increase and Toyota saw a more substantial jump. US car sales fell last year for the first time since the financial crisis. Still, analysts expect US sales this year to come in above a solid 16 million vehicles.'

In [18]:
text = str(news_eng['text_clean'][5])
text

"February 01, 2018, 09:13:  .  (RTTNews.com) - The Japanese stock market is declining on Friday following the mixed cues overnight from Wall Street and on a stronger yen. Rising U.S. bond yields also dented investor sentiment. .  In late-morning trades, the benchmark Nikkei 225 Index is losing 328.25 points or 1.40 percent to 23,157.86, off a low of 23,122.45 earlier. The Japanese market snapped a six-day losing streak and closed higher on Thursday. .  The major exporters are mixed on a stronger yen. Canon is advancing almost 1 percent and Sony is adding 0.4 percent, while Mitsubishi Electric is losing more than 1 percent and Panasonic is declining 0.5 percent. SoftBank Group's shares are lower by 1 percent. .  Among automakers, Toyota is declining 0.6 percent, while Honda is adding 0.1 percent. In the banking sector, Sumitomo Mitsui Financial is losing almost 1 percent and Mitsubishi UFJ Financial is lower by 1 percent. .  In the oil space, Inpex and Japan Petroleum Exploration are lo

In [19]:
%time ts.summarize(text)

CPU times: user 1.42 s, sys: 108 ms, total: 1.52 s
Wall time: 1.52 s


'The benchmark Nikkei 225 Index is losing 328.25 points or 1.40 percent to 23,157.86. The Japanese market snapped a six-day losing streak and closed higher on Thursday. On Wall Street, stocks closed mixed on Thursday as traders seemed reluctant to make significant moves ahead of the release of the closely watched monthly jobs report on Friday.'

#### Summarizing across articles (10 articles)

In [20]:
text = str(news_eng['text_clean'][:10].tolist())

In [21]:
%time ts.summarize(text)

CPU times: user 1.35 s, sys: 188 ms, total: 1.54 s
Wall time: 1.54 s


'Ford and Fiat Chrysler reported declines in year-over-year sales, while General Motors scored a modest increase and Toyota saw a more substantial jump. US car sales fell last year for the first time since the financial crisis and are projected to decline again in 2018. Still, analysts and industry executives expect US sales this year to come in above a solid 16 million vehicles.'

#### Summarizing across articles (100 articles)

In [22]:
text = str(news_eng['text_clean'][:100].tolist())

In [23]:
%time ts.summarize(text)

CPU times: user 2.12 s, sys: 156 ms, total: 2.28 s
Wall time: 2.27 s


'Ford and Fiat Chrysler reported declines in year-over-year sales, while General Motors scored a modest increase and Toyota saw a more substantial jump. US car sales fell last year for the first time since the financial crisis and are projected to decline again in 2018. Still, analysts and industry executives expect US sales this year to come in above a solid 16 million vehicles.'

#### Summarizing across articles (all articles)

In [24]:
text = str(news_eng['text_clean'].tolist())

In [25]:
%time ts.summarize(text)

CPU times: user 1.89 s, sys: 176 ms, total: 2.07 s
Wall time: 2.07 s


'Ford and Fiat Chrysler reported declines in year-over-year sales, while General Motors scored a modest increase and Toyota saw a more substantial jump. US car sales fell last year for the first time since the financial crisis and are projected to decline again in 2018. Still, analysts and industry executives expect US sales this year to come in above a solid 16 million vehicles.'

In [26]:
import datetime
import pytz

datetime.datetime.now(pytz.timezone('US/Central')).strftime("%a, %d %B %Y %H:%M:%S")

'Sun, 30 October 2022 12:34:57'