<a href="https://colab.research.google.com/github/thingumajig/colab-experiments/blob/master/GPT_2_abstractive_summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# .init

In [1]:
!git clone https://github.com/ilopezfr/gpt-2/
import os
os.chdir('gpt-2')
!python download_model.py 117M
!python download_model.py 345M
!pip3 install -r requirements.txt

Cloning into 'gpt-2'...
remote: Enumerating objects: 256, done.[K
remote: Total 256 (delta 0), reused 0 (delta 0), pack-reused 256[K
Receiving objects: 100% (256/256), 4.57 MiB | 19.44 MiB/s, done.
Resolving deltas: 100% (141/141), done.
Fetching checkpoint: 1.00kit [00:00, 519kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 44.2Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 567kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:08, 61.2Mit/s]                                  
Fetching model.ckpt.index: 6.00kit [00:00, 3.40Mit/s]                                               
Fetching model.ckpt.meta: 472kit [00:00, 39.6Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 40.4Mit/s]                                                       
Fetching checkpoint: 1.00kit [00:00, 542kit/s]       

In [20]:
import os
!pwd 
os.chdir('/content/gpt-2')

import json
import numpy as np
import tensorflow as tf

from src import model, sample, encoder

import textwrap

def print_wrapped_text(raw_text):
  wrapper = textwrap.TextWrapper(width = 80)
  wrap_list = wrapper.wrap(text=raw_text)

  for line in wrap_list:
     print(line)


def generate_summary(
    raw_text,
    model_name='117M',
    seed=None,
    nsamples=1,
    batch_size=1,
    length=None,
    temperature=1,
    top_k=0,
    models_dir='models',
):
    """
    Interactively run the model
    :model_name=117M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :nsamples=1 : Number of samples to return total
    :batch_size=1 : Number of batches (only affects speed/memory).  Must divide nsamples.
    :length=None : Number of tokens in generated text, if None (default), is
     determined by model hyperparameters
    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.
    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
     :models_dir : path to parent folder containing model subfolders
     (i.e. contains the <model_name> folder)     
    """
    models_dir = os.path.expanduser(os.path.expandvars(models_dir))
    if batch_size is None:
        batch_size = 1
    assert nsamples % batch_size == 0

    enc = encoder.get_encoder(model_name, models_dir)
    hparams = model.default_hparams()
    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
        length = hparams.n_ctx // 2
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)
        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
        saver.restore(sess, ckpt)

        context_tokens = enc.encode(raw_text)
        generated = 0
        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
                context: [context_tokens for _ in range(batch_size)]
            })[:, len(context_tokens):]
            for i in range(batch_size):
                generated += 1
                text = enc.decode(out[i])
                print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
                print_wrapped_text(text)
        print("=" * 80)




/content/gpt-2


In [21]:
#@title Abstractive Summarization
raw_text = "The results of the European elections revealed some unexpected outcomes Monday as the full picture from the world's biggest multi-country vote became clearer.  Over four days last week voters across 28 countries delivered the highest turnout in a European election for 20 years as they selected new representatives to sit in the European Parliament. Here are some of the key takeaways: Traditional centrist parties took a drubbing, with the so-called Grand Coalition -- which consists of the center-left Progressive Alliance of Socialists and Democrats (S&D) bloc and the center-right European People's Party (EPP) -- losing more than 70 seats and its majority in the EU parliament. One of the key figures in the S&D is Spanish Prime Minister Pedro Sanchez, while German chancellor Angela Merkel is part of the EPP. In contrast, liberal-centrist grouping the Alliance of Liberals and Democrats for Europe (ALDE&R), which includes French President Emmanuel Macron, picked up 32 seats and will now play an important role in nominating officials for key EU positions." #@param {type:"string"}
text_length = 60 #@param {type:"integer"}
raw_text += '\nTL;DR:\n'

print("=" * 40 + f" Source text{len(raw_text)} " + " " + "=" * 40)    
print_wrapped_text(raw_text)
print('='*80)

    
generate_summary(raw_text, model_name='345M', nsamples=3, length=text_length, temperature=1)


The results of the European elections revealed some unexpected outcomes Monday
as the full picture from the world's biggest multi-country vote became clearer.
Over four days last week voters across 28 countries delivered the highest
turnout in a European election for 20 years as they selected new representatives
to sit in the European Parliament. Here are some of the key takeaways:
Traditional centrist parties took a drubbing, with the so-called Grand Coalition
-- which consists of the center-left Progressive Alliance of Socialists and
Democrats (S&D) bloc and the center-right European People's Party (EPP) --
losing more than 70 seats and its majority in the EU parliament. One of the key
figures in the S&D is Spanish Prime Minister Pedro Sanchez, while German
chancellor Angela Merkel is part of the EPP. In contrast, liberal-centrist
grouping the Alliance of Liberals and Democrats for Europe (ALDE&R), which
includes French President Emmanuel Macron, picked up 32 seats and will now play
