## Zero Shot Learning Using Natural Language Inference

In this notebook, we will demonstrate **zero-shot** topic classification.  **Zero-Shot Learning (ZSL)** is being able to solve a task despite not having received any training examples of that task.  The `ZeroShotClassifier` class in *ktrain* can be used to perform topic classification with no training examples.  The technique is based on **Natural Language Inference (or NLI)** as described in [this interesting blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html) by Joe Davison.

Source: https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/zero_shot_learning_with_nli.ipynb

In [1]:
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import re
import string

In [2]:
pd.set_option('display.max_colwidth', None)

#### Check for GPU presence

In [3]:
#Verify we got CPU + GPU or only CPU
tf.config.list_physical_devices()

2022-10-30 17:46:01.405026: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-30 17:46:01.416940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-30 17:46:01.419000: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero


[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [4]:
!nvidia-smi

Sun Oct 30 17:46:02 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P0    31W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [5]:
tf.__version__

'2.6.4'

In [6]:
# !pip install ktrain --upgrade

In [7]:
import ktrain

#### Copy files to local FS from GCP bucket

In [8]:
path_zsl = '/home/jupyter/data/zsl'

if not os.path.isdir(path_zsl): os.mkdir(path_zsl)

#### Instantiate the zero-shot-classifier and then describe the topic labels for our classifier with strings.

In [9]:
zsl = ktrain.text.ZeroShotClassifier()
labels=['automotive', 'business', 'crime', 'education', 'finance', 'politics', 'sports', 'technology']

#### Predict labels

There is no training involved here, as we are using **zero-shot-learning**.  We will simply supply the document that is being classified and the `topic_strings` defined earlier. The `predict` method uses Natural Language Inference (NLI) to infer the topic probabilities.

In [10]:
text = '''What a federal government shutdown would mean in Chicago'''

In [11]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
5,politics,0.549638
4,finance,0.055144
1,business,0.031941
7,technology,0.014482
2,crime,0.008615
3,education,0.001518
0,automotive,0.001497
6,sports,0.00036


In [12]:
text = '''Wolf experts urge UK police not to shoot escaped animal'''

In [13]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
2,crime,0.766685
3,education,0.239452
1,business,0.013391
7,technology,0.009085
6,sports,0.005439
5,politics,0.003027
4,finance,0.002569
0,automotive,0.000522


In [14]:
text = '''A 29-year-old man was shot Friday evening on the South Side.'''

In [15]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
2,crime,0.98751
7,technology,0.040317
4,finance,0.019993
5,politics,0.009519
0,automotive,0.009268
3,education,0.006389
1,business,0.003557
6,sports,0.002449


In [16]:
text = '''Three teenagers have been charged with felony robbery after they were 
taken into custody in connection with a string of robberies from the Near North Side to Kenwood.'''

In [17]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
2,crime,0.988657
7,technology,0.068227
1,business,0.045834
4,finance,0.034175
5,politics,0.01707
3,education,0.01628
0,automotive,0.005773
6,sports,0.003517


In [18]:
text = '''American and Southwest joined United Airlines in reporting expectation-beating earnings 
and unveiling expansion plans.  But investors, fearing that more flights might lead to a fare war, 
pounded airline stocks for a second day even as American Airlines signaled that higher fuel costs 
will probably force it to raise fares..'''

In [19]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
1,business,0.834711
4,finance,0.734396
7,technology,0.127065
2,crime,0.046787
5,politics,0.037455
3,education,0.014266
6,sports,0.008155
0,automotive,0.006896


In [20]:
text = '''The gorgeous Giulia Quadrifoglio seduces the soul and sears the 
senses with a beautiful balance of aggression and finesse. 
Alfa flaunts its racing pedigree with the four-leaf-clover 
badge displayed on the Giulia’s shapely flanks. 
Its Ferrari-derived twin-turbo V-6 sings a sinister tune, 
belting out 505 horsepower. Its clever, communicative chassis can 
conquer a race course with unfiltered ferocity or coolly traverse 
the tarmac without commotion. An excellent eight-speed automatic 
transmission and rear-wheel drive are standard; sadly, 
a manual gearbox is missing. Alfa Romeo’s past and present 
reliability issues also remain an unknown quantity. 
Still, the Giulia Quadrifoglio, or QF, is an exotic sports sedan 
that sets a new benchmark for the genre—which is why it made our list of 10Best Cars for 2018.'''

In [21]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
6,sports,0.988785
0,automotive,0.97294
7,technology,0.203415
2,crime,0.039129
5,politics,0.010752
4,finance,0.007339
3,education,0.005153
1,business,0.000506


In [22]:
text = '''My husband ordered a fruit arrangement for me for Valentine's Day. 
He had planned on taking me to the movies with two free tickets he was promised with a 
promotion you had been advertising. My husband was unaware that these tickets came via email. 
However, your sales representative who took his order failed to record his email address. 
Therefore we never received the tickets. 
I have called corporate and the store manager about this. 
They seem to not be able to resolve things in a timely manner. 
Also the fruit was not the best tasting. 
Needless to say we will never be supporting your business again. 
Overall poor customer service and a very overpriced product.'''

In [23]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
1,business,0.989676
7,technology,0.364626
4,finance,0.202784
2,crime,0.100025
3,education,0.066658
6,sports,0.028541
5,politics,0.017946
0,automotive,0.016109


In [24]:
text = '''

The University of Chicago is an urban research university that has driven new ways of thinking since 1890. Our commitment to free and open inquiry draws inspired scholars to our global campuses, where ideas are born that challenge and change the world.

We empower individuals to challenge conventional thinking in pursuit of original ideas. Students in the College develop critical, analytic, and writing skills in our rigorous, interdisciplinary core curriculum. Through graduate programs, students test their ideas with UChicago scholars, and become the next generation of leaders in academia, industry, nonprofits, and government.

UChicago research has led to such breakthroughs as discovering the link between cancer and genetics, establishing revolutionary theories of economics, and developing tools to produce reliably excellent urban schooling. We generate new insights for the benefit of present and future generations with our national and affiliated laboratories: Argonne National Laboratory, Fermi National Accelerator Laboratory, and the Marine Biological Laboratory in Woods Hole, Massachusetts.

The University of Chicago is enriched by the city we call home. In partnership with our neighbors, we invest in Chicago's mid-South Side across such areas as health, education, economic growth, and the arts. Together with our medical center, we are the largest private employer on the South Side.

In all we do, we are driven to dig deeper, push further, and ask bigger questions—and to leverage our knowledge to enrich all human life. Our diverse and creative students and alumni drive innovation, lead international conversations, and make masterpieces. Alumni and faculty, lecturers and postdocs go on to become Nobel laureates, CEOs, university presidents, attorneys general, literary giants, and astronauts. 
'''

In [25]:
nli_labels = zsl.predict(text, labels=labels, include_labels=True)
pd.DataFrame(nli_labels, columns=['Label', 'Relevance']).sort_values(by=['Relevance'], ascending=False)

Unnamed: 0,Label,Relevance
3,education,0.604124
7,technology,0.531229
5,politics,0.200898
1,business,0.119388
4,finance,0.114148
0,automotive,0.074157
6,sports,0.054469
2,crime,0.046676


### Customizing the Classifier for Zero-Shot Sentiment Analysis

As stated above, the `ZeroShotClassifier` is implemented using Natural Language Inference (NLI).  That is, the document is treated as a **premise**, and each label is treated as a **hypothesis**.  To predict labels, an NLI model is used to predict whether or not each label is entailed by the premise.  By default, the template used for the hypothesis is of the form `"This text is about <label>."`, where `<label>` is replaced with a candidate label (e.g., `politics`, `sports`, etc.).  Although this works well for many text classification problems such as the topic classification examples above, we can customize the template with the `nli_template` parameter if necessary.  For instance, if predicting sentiment of Yelp reviews, we might change the template as follows:

In [26]:
text = '''My husband ordered a fruit arrangement for me for Valentine's Day. 
He had planned on taking me to the movies with two free tickets he was promised with a 
promotion you had been advertising. My husband was unaware that these tickets came via email. 
However, your sales representative who took his order failed to record his email address. 
Therefore we never received the tickets. 
I have called corporate and the store manager about this. 
They seem to not be able to resolve things in a timely manner. 
Also the fruit was not the best tasting. 
Needless to say we will never be supporting your business again. 
Overall poor customer service and a very overpriced product.'''

In [27]:
zsl.predict(text, labels=['negative', 'positive'], include_labels=True,
            nli_template="The sentiment of this restaurant review is {}.")

[('negative', 0.9980232119560242), ('positive', 0.005484164692461491)]

If you compare with the default template, you'll see the negative score is higher with the custom template.

Let's now consider a more ambiguous review:
> The food is delicious and the cocktails are excellent, however the service was poor

In [28]:
doc = "The food is delicious and the cocktails are excellent, however the service was poor"
zsl.predict(doc, labels=['negative', 'positive'], include_labels=True,
            nli_template="The sentiment of this restaurant review is {}.")

[('negative', 0.10656122118234634), ('positive', 0.6364313364028931)]

From the output above, we see that the results do **NOT** sum to one and both labels are above a standard threshold of `0.5`.  By default, `ZeroShotClassifier` treats the task as a multilabel problem, which allows multiple labels to be true.  Since the review is both negative and positive, both scores are above the `0.5` threshold (although the `positive` class is only above slightly when using the custom template).

If the labels are to be treated as mutually-exclusive, we can set `multilabel=False` in which case the scores will sum to 1 we will classify the review as negative overall:

In [29]:
doc = "The food is delicious and the cocktails are excellent, however the service was poor"
zsl.predict(doc, labels=['negative', 'positive'], include_labels=True,
            nli_template="The sentiment of this restaurant review is {}.",
             multilabel=False)

[('negative', 0.19782660901546478), ('positive', 0.802173376083374)]

### Prediction Time and Batch Size

The `predict` method can accept a large list of documents.  Documents are automatically split into batches based on the `batch_size` parameter, which can be increased to speed up predictions.

Note also that the `predict` method of `ZeroShotClassifier` generates a separate NLI prediction for each label included in the `labels` parameter.  As `len(labels)` and the number of documents fed to `predict` increases, the prediction time will also increase.  **You can speed up predictions by increasing the `batch_size`.**  The default `batch_size` is currently set conservatively at 8:

In [30]:
path = 'https://storage.googleapis.com/msca-bdp-data-open/news/news_samsung.json'

news_df = pd.read_json(path, orient='records', lines=True)

news_df = news_df.sample(frac=0.01, replace=False, random_state=1).reset_index(drop=True)
news_df.shape

(285, 4)

In [31]:
# Filter non-English articles
news_eng = news_df[news_df['language']=='english'].reset_index(drop=True)

In [32]:
# Remove special characters to avoid problems with analysis
news_eng['text_clean'] = news_eng['text'].map(lambda x: re.sub('[^a-zA-Z0-9 @ . , : - _]', '', str(x)))
news_eng = news_eng[['text_clean']]

In [33]:
texts = news_eng['text_clean'].tolist()

#### Predicting 8 topics for news articles on GPU using `batch_size=1`

In [34]:
%time predictions = zsl.predict(texts, labels=labels, include_labels=False, batch_size=1)

CPU times: user 2min 1s, sys: 1min 7s, total: 3min 9s
Wall time: 2min 26s


In [35]:
predictions_df = pd.DataFrame(predictions, columns=[labels]) 
news_topics = news_eng.join(predictions_df, how='inner')
news_topics.columns = [['text'] + labels]

predictions_df.to_json('/home/jupyter/data/zsl/predictions_zsl_b1.json', orient='records', lines=True)
news_topics.to_json('/home/jupyter/data/zsl/news_topics_zsl_b1.json', orient='records', lines=True)


# Reset multi-level index
news_topics.columns = news_topics.columns.get_level_values(0)

# Select the small-ish articles only
news_topics.loc[(news_topics["text"].str.len() > 100) & (news_topics["text"].str.len() < 500)].head(10)

Unnamed: 0,text,automotive,business,crime,education,finance,politics,sports,technology
1,"Samsung microwave 50 OnoCame in the caravan we Just bought but dont need it. Looks brand new. Needs a wipe.Pickup only, no holds.",0.037435,0.058699,0.004895,0.008507,0.052492,0.003977,0.003872,0.941041
4,Samsung Galaxy tab A9.7 inch touch screenOtterbox defender series case with screen protector16gb of storageRecharge dock and cableOnly has a few marks on the screen protectorNo marks or scratches on the screen it self,0.008269,0.054831,0.008057,0.011446,0.023182,0.003798,0.008489,0.873278
6,"Samsung is expected to announce a new smartwatch at its Unpacked event next month, the Samsung Galaxy Watch Active2. The device, designed to compete with the Apple Watch, is rumoured to include one of the Apple Watchs most desired features, an ECG reader. Now, according to Wareable.com, the",0.001126,0.460356,0.006225,0.006362,0.034927,0.003169,0.066312,0.981514
8,I want to buy are Samsung Galaxy s10 plus similar to this if anyone is selling one please contact me on,0.02015,0.473502,0.003388,0.008048,0.175045,0.001469,0.005744,0.987131
10,"Samsung Galaxy M10 32GB Memory, 3GB RAM, 3400mAh Battery Brand New 7446677 MVR 2,550.00Samsung Galaxy M10 32GB Memory, 3GB RAM, 3400mAh Battery Brand New SealedFree Delivery Call 7446677We Sell Original Products Only, We are Resposible for any returns within 48hrs after sale. Premium Seller Since 2010 Listing ID : 2662137 Last Updated : 23Jun2019",0.196043,0.846702,0.039778,0.060728,0.607072,0.021529,0.035601,0.952042
17,super cheap with good specs... 23mp cam somemore... what do u thinkSent from Samsung Nokia 3310 using GAGT,0.060886,0.115194,0.019241,0.047752,0.162421,0.01531,0.039002,0.956864
20,samsung 2 x 4 GB ram DescriptionI have a pair of 4 GB Samsung laptop ram Specifications can seen on pics I need 10600 instead of 12800 Shipping: Collection From sellers around the web: Comments Offers,0.032684,0.298925,0.027303,0.015298,0.407947,0.004757,0.004622,0.952372
21,"When we turn on our TV it displays the sources window, covering the bottom 13 of the screen, for 1015 seconds. Sometimes it comes on when we accidentally hit the sources button on the remote VZ remote, and it wont go away until we turn the TV off and on again. Is there a way to disable the automatic display of input source Im pretty sure we know what were watching, and if we want to change the source, we would do so.",0.053012,0.069108,0.050715,0.054511,0.062063,0.040218,0.069196,0.95189
23,iATKOS Inside: Current state of Samsung 970 Evo Plus Current state of Samsung 970 Evo Plus http:4.bp.blogspot.compHUMWpQbhq4WFAFtmrfj4I K_IDKYWOVUmf0YOWWxjtwyE6d9aLKUomwaGgCK4Bs1600iatkos_new.png,0.526383,0.41747,0.00958,0.015382,0.105067,0.005989,0.026775,0.949342
26,https:www.channelnewsasia.comnewsbus...t11693122SEOUL:Samsung Electronics said Friday Jul 5 it expects operating profit to tumble 56 per cent for the second quarter of this year in the face of a weakening chip market.brbrThe worlds largest maker of smartphones and memory chips has enjoyed record profits in recent years despite a series of ...,0.001489,0.505598,0.004281,0.002343,0.071728,0.003583,0.001364,0.617251


#### Predicting 8 topics for news articles on GPU using `batch_size=8`

In [36]:
%time predictions = zsl.predict(texts, labels=labels, include_labels=False, batch_size=8)

  "TIP: Try increasing batch_size to speedup ZeroShotClassifier predictions"


CPU times: user 37.2 s, sys: 1min 25s, total: 2min 2s
Wall time: 1min 49s


In [37]:
predictions_df = pd.DataFrame(predictions, columns=[labels]) 
news_topics = news_eng.join(predictions_df, how='inner')
news_topics.columns = [['text'] + labels]

predictions_df.to_json('/home/jupyter/data/zsl/predictions_zsl_b8.json', orient='records', lines=True)
news_topics.to_json('/home/jupyter/data/zsl/news_topics_zsl_b8.json', orient='records', lines=True)

# Reset multi-level index
news_topics.columns = news_topics.columns.get_level_values(0)

# Select the small-ish articles only
news_topics.loc[(news_topics["text"].str.len() > 100) & (news_topics["text"].str.len() < 500)].head(10)

Unnamed: 0,text,automotive,business,crime,education,finance,politics,sports,technology
1,"Samsung microwave 50 OnoCame in the caravan we Just bought but dont need it. Looks brand new. Needs a wipe.Pickup only, no holds.",0.037435,0.058699,0.004895,0.008507,0.052492,0.003977,0.003872,0.941041
4,Samsung Galaxy tab A9.7 inch touch screenOtterbox defender series case with screen protector16gb of storageRecharge dock and cableOnly has a few marks on the screen protectorNo marks or scratches on the screen it self,0.008269,0.05483,0.008057,0.011446,0.023181,0.003798,0.008489,0.873278
6,"Samsung is expected to announce a new smartwatch at its Unpacked event next month, the Samsung Galaxy Watch Active2. The device, designed to compete with the Apple Watch, is rumoured to include one of the Apple Watchs most desired features, an ECG reader. Now, according to Wareable.com, the",0.001126,0.460354,0.006225,0.006362,0.034927,0.003169,0.066313,0.981514
8,I want to buy are Samsung Galaxy s10 plus similar to this if anyone is selling one please contact me on,0.02015,0.473504,0.003388,0.008048,0.175046,0.001469,0.005744,0.987131
10,"Samsung Galaxy M10 32GB Memory, 3GB RAM, 3400mAh Battery Brand New 7446677 MVR 2,550.00Samsung Galaxy M10 32GB Memory, 3GB RAM, 3400mAh Battery Brand New SealedFree Delivery Call 7446677We Sell Original Products Only, We are Resposible for any returns within 48hrs after sale. Premium Seller Since 2010 Listing ID : 2662137 Last Updated : 23Jun2019",0.196043,0.846702,0.039778,0.060729,0.607072,0.021529,0.035601,0.952042
17,super cheap with good specs... 23mp cam somemore... what do u thinkSent from Samsung Nokia 3310 using GAGT,0.060886,0.115194,0.019241,0.047752,0.162421,0.015311,0.039002,0.956864
20,samsung 2 x 4 GB ram DescriptionI have a pair of 4 GB Samsung laptop ram Specifications can seen on pics I need 10600 instead of 12800 Shipping: Collection From sellers around the web: Comments Offers,0.032683,0.298926,0.027303,0.015298,0.407945,0.004757,0.004622,0.952372
21,"When we turn on our TV it displays the sources window, covering the bottom 13 of the screen, for 1015 seconds. Sometimes it comes on when we accidentally hit the sources button on the remote VZ remote, and it wont go away until we turn the TV off and on again. Is there a way to disable the automatic display of input source Im pretty sure we know what were watching, and if we want to change the source, we would do so.",0.053011,0.069108,0.050715,0.05451,0.062063,0.040218,0.069196,0.95189
23,iATKOS Inside: Current state of Samsung 970 Evo Plus Current state of Samsung 970 Evo Plus http:4.bp.blogspot.compHUMWpQbhq4WFAFtmrfj4I K_IDKYWOVUmf0YOWWxjtwyE6d9aLKUomwaGgCK4Bs1600iatkos_new.png,0.526383,0.417469,0.00958,0.015383,0.105067,0.005989,0.026775,0.949343
26,https:www.channelnewsasia.comnewsbus...t11693122SEOUL:Samsung Electronics said Friday Jul 5 it expects operating profit to tumble 56 per cent for the second quarter of this year in the face of a weakening chip market.brbrThe worlds largest maker of smartphones and memory chips has enjoyed record profits in recent years despite a series of ...,0.001489,0.505598,0.004281,0.002343,0.071728,0.003583,0.001364,0.617253


#### Predicting 8 topics for news articles on GPU using `batch_size=64`

In [38]:
%time predictions = zsl.predict(texts, labels=labels, include_labels=False, batch_size=64)

CPU times: user 45.7 s, sys: 3min 37s, total: 4min 23s
Wall time: 4min 18s


In [39]:
predictions_df = pd.DataFrame(predictions, columns=[labels]) 
news_topics = news_eng.join(predictions_df, how='inner')
news_topics.columns = [['text'] + labels]

predictions_df.to_json('/home/jupyter/data/zsl/predictions_zsl_b64.json', orient='records', lines=True)
news_topics.to_json('/home/jupyter/data/zsl/news_topics_zsl_b64.json', orient='records', lines=True)

# Reset multi-level index
news_topics.columns = news_topics.columns.get_level_values(0)

# Select the small-ish articles only
news_topics.loc[(news_topics["text"].str.len() > 100) & (news_topics["text"].str.len() < 500)].head(10)

Unnamed: 0,text,automotive,business,crime,education,finance,politics,sports,technology
1,"Samsung microwave 50 OnoCame in the caravan we Just bought but dont need it. Looks brand new. Needs a wipe.Pickup only, no holds.",0.037435,0.0587,0.004895,0.008507,0.052492,0.003977,0.003872,0.941041
4,Samsung Galaxy tab A9.7 inch touch screenOtterbox defender series case with screen protector16gb of storageRecharge dock and cableOnly has a few marks on the screen protectorNo marks or scratches on the screen it self,0.008269,0.05483,0.008057,0.011446,0.023181,0.003798,0.008489,0.873278
6,"Samsung is expected to announce a new smartwatch at its Unpacked event next month, the Samsung Galaxy Watch Active2. The device, designed to compete with the Apple Watch, is rumoured to include one of the Apple Watchs most desired features, an ECG reader. Now, according to Wareable.com, the",0.001126,0.460354,0.006225,0.006362,0.034927,0.003169,0.066313,0.981514
8,I want to buy are Samsung Galaxy s10 plus similar to this if anyone is selling one please contact me on,0.020149,0.473499,0.003388,0.008048,0.175045,0.001469,0.005744,0.987131
10,"Samsung Galaxy M10 32GB Memory, 3GB RAM, 3400mAh Battery Brand New 7446677 MVR 2,550.00Samsung Galaxy M10 32GB Memory, 3GB RAM, 3400mAh Battery Brand New SealedFree Delivery Call 7446677We Sell Original Products Only, We are Resposible for any returns within 48hrs after sale. Premium Seller Since 2010 Listing ID : 2662137 Last Updated : 23Jun2019",0.196042,0.846702,0.039778,0.060729,0.607073,0.021529,0.035601,0.952042
17,super cheap with good specs... 23mp cam somemore... what do u thinkSent from Samsung Nokia 3310 using GAGT,0.060886,0.115194,0.01924,0.047752,0.162421,0.01531,0.039002,0.956864
20,samsung 2 x 4 GB ram DescriptionI have a pair of 4 GB Samsung laptop ram Specifications can seen on pics I need 10600 instead of 12800 Shipping: Collection From sellers around the web: Comments Offers,0.032683,0.298921,0.027303,0.015298,0.407948,0.004757,0.004622,0.952372
21,"When we turn on our TV it displays the sources window, covering the bottom 13 of the screen, for 1015 seconds. Sometimes it comes on when we accidentally hit the sources button on the remote VZ remote, and it wont go away until we turn the TV off and on again. Is there a way to disable the automatic display of input source Im pretty sure we know what were watching, and if we want to change the source, we would do so.",0.053011,0.069108,0.050714,0.05451,0.062064,0.040218,0.069195,0.95189
23,iATKOS Inside: Current state of Samsung 970 Evo Plus Current state of Samsung 970 Evo Plus http:4.bp.blogspot.compHUMWpQbhq4WFAFtmrfj4I K_IDKYWOVUmf0YOWWxjtwyE6d9aLKUomwaGgCK4Bs1600iatkos_new.png,0.526383,0.417469,0.00958,0.015382,0.105068,0.005989,0.026775,0.949343
26,https:www.channelnewsasia.comnewsbus...t11693122SEOUL:Samsung Electronics said Friday Jul 5 it expects operating profit to tumble 56 per cent for the second quarter of this year in the face of a weakening chip market.brbrThe worlds largest maker of smartphones and memory chips has enjoyed record profits in recent years despite a series of ...,0.001489,0.505596,0.004281,0.002343,0.071727,0.003583,0.001364,0.617251


In [40]:
!gsutil -m cp -n '/home/jupyter/data/zsl/*' 'gs://msca-bdp-data-open/zsl/' 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Skipping existing item: gs://msca-bdp-data-open/zsl/news_topics_zsl_b1.json
Skipping existing item: gs://msca-bdp-data-open/zsl/predictions_zsl_b8.json
Skipping existing item: gs://msca-bdp-data-open/zsl/news_topics_zsl_b64.json
Skipping existing item: gs://msca-bdp-data-open/zsl/predictions_zsl_b1.json
Skipping existing item: gs://msca-bdp-data-open/zsl/predictions_zsl_b64.json
Skipping existing item: gs://msca-bdp-data-open/zsl/news_topics_zsl_b8.json


In [41]:
!nvidia-smi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Sun Oct 30 17:54:55 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P0    32W /  70W |  13525MiB / 15360MiB |      0%      Default |
|                               |            

#### Reading predicted results

In [42]:
!gsutil -m cp -n 'gs://msca-bdp-data-open/zsl/*.json' '/home/jupyter/data/zsl/'

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Skipping existing item: file:///home/jupyter/data/zsl/news_topics_zsl_b1.json
Skipping existing item: file:///home/jupyter/data/zsl/news_topics_zsl_b64.json
Skipping existing item: file:///home/jupyter/data/zsl/news_topics_zsl_b8.json
Skipping existing item: file:///home/jupyter/data/zsl/predictions_zsl_b1.json
Skipping existing item: file:///home/jupyter/data/zsl/predictions_zsl_b64.json
Skipping existing item: file:///home/jupyter/data/zsl/predictions_zsl_b8.json


In [43]:
news_topics_b1 = pd.read_json('/home/jupyter/data/zsl/news_topics_zsl_b1.json', orient='records', lines=True)
news_topics_b8 = pd.read_json('/home/jupyter/data/zsl/news_topics_zsl_b8.json', orient='records', lines=True)
news_topics_b64 = pd.read_json('/home/jupyter/data/zsl/news_topics_zsl_b64.json', orient='records', lines=True)

In [44]:
import datetime
import pytz

datetime.datetime.now(pytz.timezone('US/Central')).strftime("%a, %d %B %Y %H:%M:%S")

'Sun, 30 October 2022 12:54:59'