<a href="https://colab.research.google.com/github/vishalkesti382/AI_in_Finance/blob/master/Copy_of_Predicting_Movie_Reviews_with_BERT_on_TF_Hub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Copyright 2019 Google Inc.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#Predicting Fund Name Class with BERT on TF Hub

If you’ve been following Natural Language Processing over the past year, you’ve probably heard of BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

Now that BERT's been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. In an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. Alternatively, [finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.



In [0]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In [37]:
%tensorflow_version 2.x

TensorFlow is already loaded. Please restart the runtime to change versions.


In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [14]:
!pip install bert-tensorflow

Collecting bert-tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)
[K     |████▉                           | 10kB 17.9MB/s eta 0:00:01[K     |█████████▊                      | 20kB 1.7MB/s eta 0:00:01[K     |██████████████▋                 | 30kB 2.5MB/s eta 0:00:01[K     |███████████████████▍            | 40kB 1.7MB/s eta 0:00:01[K     |████████████████████████▎       | 51kB 2.1MB/s eta 0:00:01[K     |█████████████████████████████▏  | 61kB 2.5MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.4MB/s 
Installing collected packages: bert-tensorflow
Successfully installed bert-tensorflow-1.0.1


In [15]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization




Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [16]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'OUTPUT_DIR_NAME'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = True #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = 'FUND_NAME' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: OUTPUT_DIR_NAME *****


#Data

First, let's load our fund name data. 

In [0]:
from tensorflow import keras
import os
import re
import pandas as pd
import numpy as np


# Load all files from a directory in a DataFrame.
# Load and process the dataset files.
def dataset_load():
  dataset = pd.read_csv('/content/Clean_classificaion_data.csv')
  dataset.dropna(inplace =True)
  seed = 929
  labels = dataset['doctype'].values
  train_df, test_df = train_test_split(dataset, test_size=0.3, stratify=labels, shuffle=True, random_state=seed) 
  print('data loaded')
  
  return train_df, test_df


In [0]:
dataset = pd.read_csv('/content/Clean_classificaion_data.csv')

In [10]:
dataset.head()

Unnamed: 0.1,Unnamed: 0,doctype,fullfilename,text,preprocessed_text,char_num,word_num
0,0,Red Docs,W:CGSS\InterDept\Digitization\Model Training\N...,\nREDEMPTION REQUEST FORM \n \nTo: Triton Val...,redemption REQUEST FORM To ...,2944,473
1,1,Red Docs,W:CGSS\InterDept\Digitization\Model Training\N...,SC-88707\nCertificate of Registration of Exemp...,SC Certificate of Registration of Exempted Lim...,691,110
2,2,Red Docs,W:CGSS\InterDept\Digitization\Model Training\N...,Citco Fund Services (Singapore) Pte Ltd\nTrans...,Citco Fund Services Singapore Pte Ltd Transfer...,3833,610
3,3,Red Docs,W:CGSS\InterDept\Digitization\Model Training\N...,WC-102342\nCertificate of Registration of Exem...,WC Certificate of Registration of Exempted Lim...,684,110
4,4,Red Docs,W:CGSS\InterDept\Digitization\Model Training\N...,Portcullis Fund Administration \nChange / Retr...,Portcullis Fund Administration Change Retrieve...,1154,183


In [26]:
dataset["doctype"].value_counts()

Trade Contract Note       2386
Statements                2335
Performance Reports       1543
Transfer Contract Note    1009
Trade ACK                  841
Prospectus                 507
Sub Docs                   430
Capital Call Notices       341
Financial Statements       276
Red Docs                   231
Meetings AGM EGM           216
KIID Documents             199
Distribution Notice         66
Dividend Confirmation       66
Corporate Action CN         51
NAV Report                  49
FINRA                        9
Equalisation CN              5
Stock Transfer Forms         5
Compulsory Red CN            5
Liquidations                 3
Switch Contract Note         2
Name: doctype, dtype: int64

In [17]:
train, test = dataset_load()

data loaded


In [0]:
cleanup_nums = {"doctype": {"Trade Contract Note": 0, "Statements": 1, "Performance Reports": 2, "Transfer Contract Note": 3,
                                  "Trade ACK": 4, "Prospectus": 5, "Sub Docs":6, "Capital Call Notices":7, "Financial Statements":8, 
                            "Red Docs":9, "Meetings AGM EGM":10,"KIID Documents":11, "Distribution Notice":12, "Dividend Confirmation":13, "Corporate Action CN":14,
                            "NAV Report": 15, "FINRA":16, "Equalisation CN":17, "Stock Transfer Forms":18, "Compulsory Red CN":19, "Liquidations":20,
                            "Switch Contract Note":21}}

In [28]:
dataset.replace(cleanup_nums, inplace=True)
dataset.head()

Unnamed: 0.1,Unnamed: 0,doctype,fullfilename,text,preprocessed_text,char_num,word_num
0,0,9,W:CGSS\InterDept\Digitization\Model Training\N...,\nREDEMPTION REQUEST FORM \n \nTo: Triton Val...,redemption REQUEST FORM To ...,2944,473
1,1,9,W:CGSS\InterDept\Digitization\Model Training\N...,SC-88707\nCertificate of Registration of Exemp...,SC Certificate of Registration of Exempted Lim...,691,110
2,2,9,W:CGSS\InterDept\Digitization\Model Training\N...,Citco Fund Services (Singapore) Pte Ltd\nTrans...,Citco Fund Services Singapore Pte Ltd Transfer...,3833,610
3,3,9,W:CGSS\InterDept\Digitization\Model Training\N...,WC-102342\nCertificate of Registration of Exem...,WC Certificate of Registration of Exempted Lim...,684,110
4,4,9,W:CGSS\InterDept\Digitization\Model Training\N...,Portcullis Fund Administration \nChange / Retr...,Portcullis Fund Administration Change Retrieve...,1154,183


In [0]:
seed = 929
labels = dataset['doctype'].values
train, test = train_test_split(dataset, test_size=0.3, stratify=labels, shuffle=True, random_state=seed)

In [30]:
train.shape

(7402, 7)

To keep training fast, we'll take a sample of 5000 train and test examples, respectively.

In [0]:
train = train.sample(3000)
test = test.sample(2000)

In [33]:
train.columns

Index(['Unnamed: 0', 'doctype', 'fullfilename', 'text', 'preprocessed_text',
       'char_num', 'word_num'],
      dtype='object')

For us, our input data is the 'sentence' column and our label is the 'polarity' column (0, 1 for negative and positive, respecitvely)

In [0]:
DATA_COLUMN = 'text'
LABEL_COLUMN = 'doctype'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'fundName', 'Not_fundName'
label_list = [0, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [0]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [36]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [37]:
tokenizer.tokenize("Bridgewater Pure Alpha Fund I Everett, MA 02149 Class CLASS B")

['bridge',
 '##water',
 'pure',
 'alpha',
 'fund',
 'i',
 'everett',
 ',',
 'ma',
 '02',
 '##14',
 '##9',
 'class',
 'class',
 'b']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [38]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

INFO:tensorflow:Writing example 0 of 3000


INFO:tensorflow:Writing example 0 of 3000


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] zeta ##do ##cs : toe ##mail = f ##su ##bs ##holding ##s @ clears ##tream . com , f ##sb ##j ##bh ##old ##ings @ clears ##tream . com ; cal ##ama ##tta cu ##sch ##ieri fund services limited e ##wr ##opa business centre , tri ##q dun ka ##rm , b ' kara - malta all correspondence is to be addressed to : p . o . box 34 ##9 , valle ##tta - malta phone : ( + 356 ) 25 68 ##8 68 ##8 - fa ##x : ( + 356 ) 25 68 ##8 256 email : cc ##fs @ cc . com . mt web : www . cc . com . mt registered in malta no . c 45 ##7 [SEP]


INFO:tensorflow:tokens: [CLS] zeta ##do ##cs : toe ##mail = f ##su ##bs ##holding ##s @ clears ##tream . com , f ##sb ##j ##bh ##old ##ings @ clears ##tream . com ; cal ##ama ##tta cu ##sch ##ieri fund services limited e ##wr ##opa business centre , tri ##q dun ka ##rm , b ' kara - malta all correspondence is to be addressed to : p . o . box 34 ##9 , valle ##tta - malta phone : ( + 356 ) 25 68 ##8 68 ##8 - fa ##x : ( + 356 ) 25 68 ##8 256 email : cc ##fs @ cc . com . mt web : www . cc . com . mt registered in malta no . c 45 ##7 [SEP]


INFO:tensorflow:input_ids: 101 23870 3527 6169 1024 11756 21397 1027 1042 6342 5910 23410 2015 1030 28837 25379 1012 4012 1010 1042 19022 3501 23706 11614 8613 1030 28837 25379 1012 4012 1025 10250 8067 5946 12731 11624 21939 4636 2578 3132 1041 13088 29477 2449 2803 1010 13012 4160 24654 10556 10867 1010 1038 1005 13173 1011 9933 2035 11061 2003 2000 2022 8280 2000 1024 1052 1012 1051 1012 3482 4090 2683 1010 20171 5946 1011 9933 3042 1024 1006 1009 27509 1007 2423 6273 2620 6273 2620 1011 6904 2595 1024 1006 1009 27509 1007 2423 6273 2620 17273 10373 1024 10507 10343 1030 10507 1012 4012 1012 11047 4773 1024 7479 1012 10507 1012 4012 1012 11047 5068 1999 9933 2053 1012 1039 3429 2581 102


INFO:tensorflow:input_ids: 101 23870 3527 6169 1024 11756 21397 1027 1042 6342 5910 23410 2015 1030 28837 25379 1012 4012 1010 1042 19022 3501 23706 11614 8613 1030 28837 25379 1012 4012 1025 10250 8067 5946 12731 11624 21939 4636 2578 3132 1041 13088 29477 2449 2803 1010 13012 4160 24654 10556 10867 1010 1038 1005 13173 1011 9933 2035 11061 2003 2000 2022 8280 2000 1024 1052 1012 1051 1012 3482 4090 2683 1010 20171 5946 1011 9933 3042 1024 1006 1009 27509 1007 2423 6273 2620 6273 2620 1011 6904 2595 1024 1006 1009 27509 1007 2423 6273 2620 17273 10373 1024 10507 10343 1030 10507 1012 4012 1012 11047 4773 1024 7479 1012 10507 1012 4012 1012 11047 5068 1999 9933 2053 1012 1039 3429 2581 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] morgan stanley fund services ci ##tc ##o bank ned ##erland n ##v the observatory , 7 - 11 sir john rogers ##on ' s quay c / o ci ##tc ##o data processing services ltd dublin 2 , ireland 260 ##0 cork airport business park tel : + 1 - 91 ##4 - 225 - 88 ##85 ( us ) kin ##sal ##e road tel : + 35 ##3 - 1 - 79 ##9 - 87 ##7 ##8 ( int ' l ) cork ireland fa ##x : + 35 ##3 - 1 - 65 ##5 - 87 ##9 ##5 6 - sep - 2019 investor id : i ##00 ##00 ##23 ##9 ##13 re : ci ##tc ##o global custody n ##v - re - [SEP]


INFO:tensorflow:tokens: [CLS] morgan stanley fund services ci ##tc ##o bank ned ##erland n ##v the observatory , 7 - 11 sir john rogers ##on ' s quay c / o ci ##tc ##o data processing services ltd dublin 2 , ireland 260 ##0 cork airport business park tel : + 1 - 91 ##4 - 225 - 88 ##85 ( us ) kin ##sal ##e road tel : + 35 ##3 - 1 - 79 ##9 - 87 ##7 ##8 ( int ' l ) cork ireland fa ##x : + 35 ##3 - 1 - 65 ##5 - 87 ##9 ##5 6 - sep - 2019 investor id : i ##00 ##00 ##23 ##9 ##13 re : ci ##tc ##o global custody n ##v - re - [SEP]


INFO:tensorflow:input_ids: 101 5253 6156 4636 2578 25022 13535 2080 2924 12311 22492 1050 2615 1996 9970 1010 1021 1011 2340 2909 2198 7369 2239 1005 1055 21048 1039 1013 1051 25022 13535 2080 2951 6364 2578 5183 5772 1016 1010 3163 13539 2692 8513 3199 2449 2380 10093 1024 1009 1015 1011 6205 2549 1011 14993 1011 6070 27531 1006 2149 1007 12631 12002 2063 2346 10093 1024 1009 3486 2509 1011 1015 1011 6535 2683 1011 6584 2581 2620 1006 20014 1005 1048 1007 8513 3163 6904 2595 1024 1009 3486 2509 1011 1015 1011 3515 2629 1011 6584 2683 2629 1020 1011 19802 1011 10476 14316 8909 1024 1045 8889 8889 21926 2683 17134 2128 1024 25022 13535 2080 3795 9968 1050 2615 1011 2128 1011 102


INFO:tensorflow:input_ids: 101 5253 6156 4636 2578 25022 13535 2080 2924 12311 22492 1050 2615 1996 9970 1010 1021 1011 2340 2909 2198 7369 2239 1005 1055 21048 1039 1013 1051 25022 13535 2080 2951 6364 2578 5183 5772 1016 1010 3163 13539 2692 8513 3199 2449 2380 10093 1024 1009 1015 1011 6205 2549 1011 14993 1011 6070 27531 1006 2149 1007 12631 12002 2063 2346 10093 1024 1009 3486 2509 1011 1015 1011 6535 2683 1011 6584 2581 2620 1006 20014 1005 1048 1007 8513 3163 6904 2595 1024 1009 3486 2509 1011 1015 1011 3515 2629 1011 6584 2683 2629 1020 1011 19802 1011 10476 14316 8909 1024 1045 8889 8889 21926 2683 17134 2128 1024 25022 13535 2080 3795 9968 1050 2615 1011 2128 1011 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


INFO:tensorflow:label: 4 (id = 4)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] whale rock capital management llc quarterly letter october 16 , 2019 dear investor , m co whale rock . flagship fund nas ##da ##q s & p 500 ( tr ) mas ##cm ##i world lp ( net ##1 ) info tech q ##3 2019 ( 9 . 5 % ) ( 0 . 1 % ) 1 . 7 % re 2 . 3 % y ##t ##d 11 . 7 % 20 . 6 % 20 . s ##6 % t 29 . 5 % it ##d r ( may 1 , 2006 ) 343 . 4 % 244 . 4 % lea ##20 ##1 . 3 % 280 . 1 % c q ##3 summary and review @ the whale rock flagship fund [SEP]


INFO:tensorflow:tokens: [CLS] whale rock capital management llc quarterly letter october 16 , 2019 dear investor , m co whale rock . flagship fund nas ##da ##q s & p 500 ( tr ) mas ##cm ##i world lp ( net ##1 ) info tech q ##3 2019 ( 9 . 5 % ) ( 0 . 1 % ) 1 . 7 % re 2 . 3 % y ##t ##d 11 . 7 % 20 . 6 % 20 . s ##6 % t 29 . 5 % it ##d r ( may 1 , 2006 ) 343 . 4 % 244 . 4 % lea ##20 ##1 . 3 % 280 . 1 % c q ##3 summary and review @ the whale rock flagship fund [SEP]


INFO:tensorflow:input_ids: 101 13156 2600 3007 2968 11775 12174 3661 2255 2385 1010 10476 6203 14316 1010 1049 2522 13156 2600 1012 10565 4636 17235 2850 4160 1055 1004 1052 3156 1006 19817 1007 16137 27487 2072 2088 6948 1006 5658 2487 1007 18558 6627 1053 2509 10476 1006 1023 1012 1019 1003 1007 1006 1014 1012 1015 1003 1007 1015 1012 1021 1003 2128 1016 1012 1017 1003 1061 2102 2094 2340 1012 1021 1003 2322 1012 1020 1003 2322 1012 1055 2575 1003 1056 2756 1012 1019 1003 2009 2094 1054 1006 2089 1015 1010 2294 1007 27810 1012 1018 1003 24194 1012 1018 1003 12203 11387 2487 1012 1017 1003 13427 1012 1015 1003 1039 1053 2509 12654 1998 3319 1030 1996 13156 2600 10565 4636 102


INFO:tensorflow:input_ids: 101 13156 2600 3007 2968 11775 12174 3661 2255 2385 1010 10476 6203 14316 1010 1049 2522 13156 2600 1012 10565 4636 17235 2850 4160 1055 1004 1052 3156 1006 19817 1007 16137 27487 2072 2088 6948 1006 5658 2487 1007 18558 6627 1053 2509 10476 1006 1023 1012 1019 1003 1007 1006 1014 1012 1015 1003 1007 1015 1012 1021 1003 2128 1016 1012 1017 1003 1061 2102 2094 2340 1012 1021 1003 2322 1012 1020 1003 2322 1012 1055 2575 1003 1056 2756 1012 1019 1003 2009 2094 1054 1006 2089 1015 1010 2294 1007 27810 1012 1018 1003 24194 1012 1018 1003 12203 11387 2487 1012 1017 1003 13427 1012 1015 1003 1039 1053 2509 12654 1998 3319 1030 1996 13156 2600 10565 4636 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 2 (id = 2)


INFO:tensorflow:label: 2 (id = 2)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] bn ##y mellon alternative investment services ltd . 48 par - la - ville road , suite 46 ##4 hamilton hm ##11 bermuda clears ##tream banking s . a . af ##s 1949 ##55 confirmation of intent clears ##tream banking s . a . date : 28 - aug - 2018 42 , avenue j ##f kennedy account number : * * - * * * * - * * * * 64 ##40 luxembourg l - 1855 luxembourg fa ##x number 35 ##3 - 21 - 49 ##10 ##33 ##5 email address : f ##ste ##am ##f @ clears ##tream . com verde alpha fund ltd trade details the administrator has received your instructions to execute the following trade . verde alpha fund ltd - [SEP]


INFO:tensorflow:tokens: [CLS] bn ##y mellon alternative investment services ltd . 48 par - la - ville road , suite 46 ##4 hamilton hm ##11 bermuda clears ##tream banking s . a . af ##s 1949 ##55 confirmation of intent clears ##tream banking s . a . date : 28 - aug - 2018 42 , avenue j ##f kennedy account number : * * - * * * * - * * * * 64 ##40 luxembourg l - 1855 luxembourg fa ##x number 35 ##3 - 21 - 49 ##10 ##33 ##5 email address : f ##ste ##am ##f @ clears ##tream . com verde alpha fund ltd trade details the administrator has received your instructions to execute the following trade . verde alpha fund ltd - [SEP]


INFO:tensorflow:input_ids: 101 24869 2100 22181 4522 5211 2578 5183 1012 4466 11968 1011 2474 1011 20184 2346 1010 7621 4805 2549 5226 20287 14526 13525 28837 25379 8169 1055 1012 1037 1012 21358 2015 4085 24087 13964 1997 7848 28837 25379 8169 1055 1012 1037 1012 3058 1024 2654 1011 15476 1011 2760 4413 1010 3927 1046 2546 5817 4070 2193 1024 1008 1008 1011 1008 1008 1008 1008 1011 1008 1008 1008 1008 4185 12740 10765 1048 1011 8492 10765 6904 2595 2193 3486 2509 1011 2538 1011 4749 10790 22394 2629 10373 4769 1024 1042 13473 3286 2546 1030 28837 25379 1012 4012 16184 6541 4636 5183 3119 4751 1996 8911 2038 2363 2115 8128 2000 15389 1996 2206 3119 1012 16184 6541 4636 5183 1011 102


INFO:tensorflow:input_ids: 101 24869 2100 22181 4522 5211 2578 5183 1012 4466 11968 1011 2474 1011 20184 2346 1010 7621 4805 2549 5226 20287 14526 13525 28837 25379 8169 1055 1012 1037 1012 21358 2015 4085 24087 13964 1997 7848 28837 25379 8169 1055 1012 1037 1012 3058 1024 2654 1011 15476 1011 2760 4413 1010 3927 1046 2546 5817 4070 2193 1024 1008 1008 1011 1008 1008 1008 1008 1011 1008 1008 1008 1008 4185 12740 10765 1048 1011 8492 10765 6904 2595 2193 3486 2509 1011 2538 1011 4749 10790 22394 2629 10373 4769 1024 1042 13473 3286 2546 1030 28837 25379 1012 4012 16184 6541 4636 5183 3119 4751 1996 8911 2038 2363 2115 8128 2000 15389 1996 2206 3119 1012 16184 6541 4636 5183 1011 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


INFO:tensorflow:label: 4 (id = 4)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] blue ##or ##cha ##rd micro ##fin ##ance fund societe d ’ invest ##isse ##ment a capital variable registered office : 2 , rue d ’ alsace , l - 112 ##2 luxembourg r . c . s . luxembourg : b ##66 ##25 ##8 ( the « fund » ) proxy for use at the annual general meeting of shareholders of the fund on 22 november 2019 at 3 . 00 pm or any rec ##on ##ven ##ing or ad ##jou ##rn ##ment thereof ( the " meeting " ) we , _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [SEP]


INFO:tensorflow:tokens: [CLS] blue ##or ##cha ##rd micro ##fin ##ance fund societe d ’ invest ##isse ##ment a capital variable registered office : 2 , rue d ’ alsace , l - 112 ##2 luxembourg r . c . s . luxembourg : b ##66 ##25 ##8 ( the « fund » ) proxy for use at the annual general meeting of shareholders of the fund on 22 november 2019 at 3 . 00 pm or any rec ##on ##ven ##ing or ad ##jou ##rn ##ment thereof ( the " meeting " ) we , _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [SEP]


INFO:tensorflow:input_ids: 101 2630 2953 7507 4103 12702 16294 6651 4636 18341 1040 1521 15697 23491 3672 1037 3007 8023 5068 2436 1024 1016 1010 13413 1040 1521 24922 1010 1048 1011 11176 2475 10765 1054 1012 1039 1012 1055 1012 10765 1024 1038 28756 17788 2620 1006 1996 1077 4636 1090 1007 24540 2005 2224 2012 1996 3296 2236 3116 1997 15337 1997 1996 4636 2006 2570 2281 10476 2012 1017 1012 4002 7610 2030 2151 28667 2239 8159 2075 2030 4748 23099 6826 3672 21739 1006 1996 1000 3116 1000 1007 2057 1010 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 102


INFO:tensorflow:input_ids: 101 2630 2953 7507 4103 12702 16294 6651 4636 18341 1040 1521 15697 23491 3672 1037 3007 8023 5068 2436 1024 1016 1010 13413 1040 1521 24922 1010 1048 1011 11176 2475 10765 1054 1012 1039 1012 1055 1012 10765 1024 1038 28756 17788 2620 1006 1996 1077 4636 1090 1007 24540 2005 2224 2012 1996 3296 2236 3116 1997 15337 1997 1996 4636 2006 2570 2281 10476 2012 1017 1012 4002 7610 2030 2151 28667 2239 8159 2075 2030 4748 23099 6826 3672 21739 1006 1996 1000 3116 1000 1007 2057 1010 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 1035 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 10 (id = 10)


INFO:tensorflow:label: 10 (id = 10)


INFO:tensorflow:Writing example 0 of 2000


INFO:tensorflow:Writing example 0 of 2000


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] ram ( cayman ) systematic l / s european e ##qui ##ties fund ltd account statement for the month ended may 31 , 2017 ( una ##udi ##ted ) clears ##tream banking s . a . af ##s 104 ##60 ##2 clears ##tream banking s . a . 42 , avenue j ##f kennedy l - 1855 luxembourg luxembourg email : f ##s ##10 ##46 ##0 ##2 ##holding ##s @ clears ##tream . com shareholder summary shares na ##v per investment mt ##d y ##t ##d held share value % return % return class ib series 1 159 . 510 $ 126 . 79 $ 20 , 224 . 71 0 . 93 % 5 . 96 % total $ 20 , 224 . 71 total [SEP]


INFO:tensorflow:tokens: [CLS] ram ( cayman ) systematic l / s european e ##qui ##ties fund ltd account statement for the month ended may 31 , 2017 ( una ##udi ##ted ) clears ##tream banking s . a . af ##s 104 ##60 ##2 clears ##tream banking s . a . 42 , avenue j ##f kennedy l - 1855 luxembourg luxembourg email : f ##s ##10 ##46 ##0 ##2 ##holding ##s @ clears ##tream . com shareholder summary shares na ##v per investment mt ##d y ##t ##d held share value % return % return class ib series 1 159 . 510 $ 126 . 79 $ 20 , 224 . 71 0 . 93 % 5 . 96 % total $ 20 , 224 . 71 total [SEP]


INFO:tensorflow:input_ids: 101 8223 1006 26164 1007 11778 1048 1013 1055 2647 1041 15549 7368 4636 5183 4070 4861 2005 1996 3204 3092 2089 2861 1010 2418 1006 14477 21041 3064 1007 28837 25379 8169 1055 1012 1037 1012 21358 2015 9645 16086 2475 28837 25379 8169 1055 1012 1037 1012 4413 1010 3927 1046 2546 5817 1048 1011 8492 10765 10765 10373 1024 1042 2015 10790 21472 2692 2475 23410 2015 1030 28837 25379 1012 4012 18668 12654 6661 6583 2615 2566 5211 11047 2094 1061 2102 2094 2218 3745 3643 1003 2709 1003 2709 2465 21307 2186 1015 18914 1012 23475 1002 14010 1012 6535 1002 2322 1010 19711 1012 6390 1014 1012 6109 1003 1019 1012 5986 1003 2561 1002 2322 1010 19711 1012 6390 2561 102


INFO:tensorflow:input_ids: 101 8223 1006 26164 1007 11778 1048 1013 1055 2647 1041 15549 7368 4636 5183 4070 4861 2005 1996 3204 3092 2089 2861 1010 2418 1006 14477 21041 3064 1007 28837 25379 8169 1055 1012 1037 1012 21358 2015 9645 16086 2475 28837 25379 8169 1055 1012 1037 1012 4413 1010 3927 1046 2546 5817 1048 1011 8492 10765 10765 10373 1024 1042 2015 10790 21472 2692 2475 23410 2015 1030 28837 25379 1012 4012 18668 12654 6661 6583 2615 2566 5211 11047 2094 1061 2102 2094 2218 3745 3643 1003 2709 1003 2709 2465 21307 2186 1015 18914 1012 23475 1002 14010 1012 6535 1002 2322 1010 19711 1012 6390 1014 1012 6109 1003 1019 1012 5986 1003 2561 1002 2322 1010 19711 1012 6390 2561 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] os ##pre ##y supermarket income and growth 1 lp unit trust application form this application form should be completed in capitals in full and signed and sent by post or by hand to thompson tara ##z deposit ##ary ltd , 47 park lane , london w ##1 ##k 1 ##pr full name of pension scheme ( exactly as it appears on the hm ##rc online system ) ( “ the applicant ” ) : designation : name of pension scheme trustee ( s ) : name of pension scheme administrator : pension schemes office tax reference ( ps ##tr ) number : as evidence of the applicant ’ s ps ##tr number and registration with hm ##rc , please attach to this application form a certified [SEP]


INFO:tensorflow:tokens: [CLS] os ##pre ##y supermarket income and growth 1 lp unit trust application form this application form should be completed in capitals in full and signed and sent by post or by hand to thompson tara ##z deposit ##ary ltd , 47 park lane , london w ##1 ##k 1 ##pr full name of pension scheme ( exactly as it appears on the hm ##rc online system ) ( “ the applicant ” ) : designation : name of pension scheme trustee ( s ) : name of pension scheme administrator : pension schemes office tax reference ( ps ##tr ) number : as evidence of the applicant ’ s ps ##tr number and registration with hm ##rc , please attach to this application form a certified [SEP]


INFO:tensorflow:input_ids: 101 9808 28139 2100 17006 3318 1998 3930 1015 6948 3131 3404 4646 2433 2023 4646 2433 2323 2022 2949 1999 15433 1999 2440 1998 2772 1998 2741 2011 2695 2030 2011 2192 2000 5953 10225 2480 12816 5649 5183 1010 4700 2380 4644 1010 2414 1059 2487 2243 1015 18098 2440 2171 1997 11550 5679 1006 3599 2004 2009 3544 2006 1996 20287 11890 3784 2291 1007 1006 1523 1996 23761 1524 1007 1024 8259 1024 2171 1997 11550 5679 13209 1006 1055 1007 1024 2171 1997 11550 5679 8911 1024 11550 11683 2436 4171 4431 1006 8827 16344 1007 2193 1024 2004 3350 1997 1996 23761 1521 1055 8827 16344 2193 1998 8819 2007 20287 11890 1010 3531 22476 2000 2023 4646 2433 1037 7378 102


INFO:tensorflow:input_ids: 101 9808 28139 2100 17006 3318 1998 3930 1015 6948 3131 3404 4646 2433 2023 4646 2433 2323 2022 2949 1999 15433 1999 2440 1998 2772 1998 2741 2011 2695 2030 2011 2192 2000 5953 10225 2480 12816 5649 5183 1010 4700 2380 4644 1010 2414 1059 2487 2243 1015 18098 2440 2171 1997 11550 5679 1006 3599 2004 2009 3544 2006 1996 20287 11890 3784 2291 1007 1006 1523 1996 23761 1524 1007 1024 8259 1024 2171 1997 11550 5679 13209 1006 1055 1007 1024 2171 1997 11550 5679 8911 1024 11550 11683 2436 4171 4431 1006 8827 16344 1007 2193 1024 2004 3350 1997 1996 23761 1521 1055 8827 16344 2193 1998 8819 2007 20287 11890 1010 3531 22476 2000 2023 4646 2433 1037 7378 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 6 (id = 6)


INFO:tensorflow:label: 6 (id = 6)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] 14 nov 2019 clears ##tream banking sa attention investment fund op ##ps 42 avenue j ##f kennedy luxembourg l ##18 ##55 luxembourg client copy account name : clears ##tream banking sa trade date : 13 / 11 / 2019 account number : 800 ##14 ##90 ##9 settlement date : 19 / 11 / 2019 portfolio number : 01 ##40 designation : 247 ##9 ##2 in accordance with your instructions we have today bought from you : sv ##s church house investment grade fixed interest fund income units is ##in : gb ##00 ##0 ##47 ##43 ##8 ##28 deal reference deal time units price : gb ##p amount red ##eem ##ed : gb ##p dil ##ution levy : gb ##p net consideration : gb ##p cs ##22 [SEP]


INFO:tensorflow:tokens: [CLS] 14 nov 2019 clears ##tream banking sa attention investment fund op ##ps 42 avenue j ##f kennedy luxembourg l ##18 ##55 luxembourg client copy account name : clears ##tream banking sa trade date : 13 / 11 / 2019 account number : 800 ##14 ##90 ##9 settlement date : 19 / 11 / 2019 portfolio number : 01 ##40 designation : 247 ##9 ##2 in accordance with your instructions we have today bought from you : sv ##s church house investment grade fixed interest fund income units is ##in : gb ##00 ##0 ##47 ##43 ##8 ##28 deal reference deal time units price : gb ##p amount red ##eem ##ed : gb ##p dil ##ution levy : gb ##p net consideration : gb ##p cs ##22 [SEP]


INFO:tensorflow:input_ids: 101 2403 13292 10476 28837 25379 8169 7842 3086 5211 4636 6728 4523 4413 3927 1046 2546 5817 10765 1048 15136 24087 10765 7396 6100 4070 2171 1024 28837 25379 8169 7842 3119 3058 1024 2410 1013 2340 1013 10476 4070 2193 1024 5385 16932 21057 2683 4093 3058 1024 2539 1013 2340 1013 10476 11103 2193 1024 5890 12740 8259 1024 23380 2683 2475 1999 10388 2007 2115 8128 2057 2031 2651 4149 2013 2017 1024 17917 2015 2277 2160 5211 3694 4964 3037 4636 3318 3197 2003 2378 1024 16351 8889 2692 22610 23777 2620 22407 3066 4431 3066 2051 3197 3976 1024 16351 2361 3815 2417 21564 2098 1024 16351 2361 29454 13700 12767 1024 16351 2361 5658 9584 1024 16351 2361 20116 19317 102


INFO:tensorflow:input_ids: 101 2403 13292 10476 28837 25379 8169 7842 3086 5211 4636 6728 4523 4413 3927 1046 2546 5817 10765 1048 15136 24087 10765 7396 6100 4070 2171 1024 28837 25379 8169 7842 3119 3058 1024 2410 1013 2340 1013 10476 4070 2193 1024 5385 16932 21057 2683 4093 3058 1024 2539 1013 2340 1013 10476 11103 2193 1024 5890 12740 8259 1024 23380 2683 2475 1999 10388 2007 2115 8128 2057 2031 2651 4149 2013 2017 1024 17917 2015 2277 2160 5211 3694 4964 3037 4636 3318 3197 2003 2378 1024 16351 8889 2692 22610 23777 2620 22407 3066 4431 3066 2051 3197 3976 1024 16351 2361 3815 2417 21564 2098 1024 16351 2361 29454 13700 12767 1024 16351 2361 5658 9584 1024 16351 2361 20116 19317 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] from : + 35 ##3 ( 0 ) 21 49 ##1 03 ##35 page : 1 / 1 date : 12 . 11 . 2019 ##15 : 57 : 08 to : vp ##bm ##ka ##g fr ##or : clears ##tream af ##s fa ##x : + 35 ##3 ( o , 21 49 ##1 03 ##35 ko ##fa ##x ' ) at : 19 - l 1 - 12 - 15 : 56 doc : 125 page : 0 ##ol clears ##ke ##am deutsche bo ##ß ##se ##group dat ##b l ##2 - nov - 2019 pages i com ##pa } i ##y vp ##bank ##ag location va ##du ##z , lie ##ct ##ite ##nstein attention tt ##ans ##fen fr ##ea of pay ##nc ##nt [SEP]


INFO:tensorflow:tokens: [CLS] from : + 35 ##3 ( 0 ) 21 49 ##1 03 ##35 page : 1 / 1 date : 12 . 11 . 2019 ##15 : 57 : 08 to : vp ##bm ##ka ##g fr ##or : clears ##tream af ##s fa ##x : + 35 ##3 ( o , 21 49 ##1 03 ##35 ko ##fa ##x ' ) at : 19 - l 1 - 12 - 15 : 56 doc : 125 page : 0 ##ol clears ##ke ##am deutsche bo ##ß ##se ##group dat ##b l ##2 - nov - 2019 pages i com ##pa } i ##y vp ##bank ##ag location va ##du ##z , lie ##ct ##ite ##nstein attention tt ##ans ##fen fr ##ea of pay ##nc ##nt [SEP]


INFO:tensorflow:input_ids: 101 2013 1024 1009 3486 2509 1006 1014 1007 2538 4749 2487 6021 19481 3931 1024 1015 1013 1015 3058 1024 2260 1012 2340 1012 10476 16068 1024 5401 1024 5511 2000 1024 21210 25526 2912 2290 10424 2953 1024 28837 25379 21358 2015 6904 2595 1024 1009 3486 2509 1006 1051 1010 2538 4749 2487 6021 19481 12849 7011 2595 1005 1007 2012 1024 2539 1011 1048 1015 1011 2260 1011 2321 1024 5179 9986 1024 8732 3931 1024 1014 4747 28837 3489 3286 11605 8945 19310 3366 17058 23755 2497 1048 2475 1011 13292 1011 10476 5530 1045 4012 4502 1065 1045 2100 21210 9299 8490 3295 12436 8566 2480 1010 4682 6593 4221 15493 3086 23746 6962 18940 10424 5243 1997 3477 12273 3372 102


INFO:tensorflow:input_ids: 101 2013 1024 1009 3486 2509 1006 1014 1007 2538 4749 2487 6021 19481 3931 1024 1015 1013 1015 3058 1024 2260 1012 2340 1012 10476 16068 1024 5401 1024 5511 2000 1024 21210 25526 2912 2290 10424 2953 1024 28837 25379 21358 2015 6904 2595 1024 1009 3486 2509 1006 1051 1010 2538 4749 2487 6021 19481 12849 7011 2595 1005 1007 2012 1024 2539 1011 1048 1015 1011 2260 1011 2321 1024 5179 9986 1024 8732 3931 1024 1014 4747 28837 3489 3286 11605 8945 19310 3366 17058 23755 2497 1048 2475 1011 13292 1011 10476 5530 1045 4012 4502 1065 1045 2100 21210 9299 8490 3295 12436 8566 2480 1010 4682 6593 4221 15493 3086 23746 6962 18940 10424 5243 1997 3477 12273 3372 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] key investor information wellington management portfolio ##s ( dublin ) p . l . c . this document provides you with key investor information about this sub - fund . it is not marketing material . the information is required by law to help you understand the nature and the risks of investing in this sub - fund . you are advised to read it so you can make an informed decision about whether to invest . global bond portfolio usd class d acc ##um ##ulating hedge ##d global bond portfolio ( the " portfolio " ) a sub - fund of wellington management portfolio ##s ( dublin ) p . l . c . ( the " company " ) is ##in : ie ##00 [SEP]


INFO:tensorflow:tokens: [CLS] key investor information wellington management portfolio ##s ( dublin ) p . l . c . this document provides you with key investor information about this sub - fund . it is not marketing material . the information is required by law to help you understand the nature and the risks of investing in this sub - fund . you are advised to read it so you can make an informed decision about whether to invest . global bond portfolio usd class d acc ##um ##ulating hedge ##d global bond portfolio ( the " portfolio " ) a sub - fund of wellington management portfolio ##s ( dublin ) p . l . c . ( the " company " ) is ##in : ie ##00 [SEP]


INFO:tensorflow:input_ids: 101 3145 14316 2592 8409 2968 11103 2015 1006 5772 1007 1052 1012 1048 1012 1039 1012 2023 6254 3640 2017 2007 3145 14316 2592 2055 2023 4942 1011 4636 1012 2009 2003 2025 5821 3430 1012 1996 2592 2003 3223 2011 2375 2000 2393 2017 3305 1996 3267 1998 1996 10831 1997 19920 1999 2023 4942 1011 4636 1012 2017 2024 9449 2000 3191 2009 2061 2017 2064 2191 2019 6727 3247 2055 3251 2000 15697 1012 3795 5416 11103 13751 2465 1040 16222 2819 10924 17834 2094 3795 5416 11103 1006 1996 1000 11103 1000 1007 1037 4942 1011 4636 1997 8409 2968 11103 2015 1006 5772 1007 1052 1012 1048 1012 1039 1012 1006 1996 1000 2194 1000 1007 2003 2378 1024 29464 8889 102


INFO:tensorflow:input_ids: 101 3145 14316 2592 8409 2968 11103 2015 1006 5772 1007 1052 1012 1048 1012 1039 1012 2023 6254 3640 2017 2007 3145 14316 2592 2055 2023 4942 1011 4636 1012 2009 2003 2025 5821 3430 1012 1996 2592 2003 3223 2011 2375 2000 2393 2017 3305 1996 3267 1998 1996 10831 1997 19920 1999 2023 4942 1011 4636 1012 2017 2024 9449 2000 3191 2009 2061 2017 2064 2191 2019 6727 3247 2055 3251 2000 15697 1012 3795 5416 11103 13751 2465 1040 16222 2819 10924 17834 2094 3795 5416 11103 1006 1996 1000 11103 1000 1007 1037 4942 1011 4636 1997 8409 2968 11103 2015 1006 5772 1007 1052 1012 1048 1012 1039 1012 1006 1996 1000 2194 1000 1007 2003 2378 1024 29464 8889 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 11 (id = 11)


INFO:tensorflow:label: 11 (id = 11)


#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [0]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            "f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [0]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [0]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [44]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'OUTPUT_DIR_NAME', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3fbc905940>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': 'OUTPUT_DIR_NAME', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3fbc905940>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [46]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.




















Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into OUTPUT_DIR_NAME/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into OUTPUT_DIR_NAME/model.ckpt.


INFO:tensorflow:loss = 3.147037, step = 0


INFO:tensorflow:loss = 3.147037, step = 0


INFO:tensorflow:global_step/sec: 0.563831


INFO:tensorflow:global_step/sec: 0.563831


INFO:tensorflow:loss = 0.5654856, step = 100 (177.360 sec)


INFO:tensorflow:loss = 0.5654856, step = 100 (177.360 sec)


INFO:tensorflow:global_step/sec: 0.632483


INFO:tensorflow:global_step/sec: 0.632483


INFO:tensorflow:loss = 0.3138714, step = 200 (158.107 sec)


INFO:tensorflow:loss = 0.3138714, step = 200 (158.107 sec)


INFO:tensorflow:Saving checkpoints for 281 into OUTPUT_DIR_NAME/model.ckpt.


INFO:tensorflow:Saving checkpoints for 281 into OUTPUT_DIR_NAME/model.ckpt.


INFO:tensorflow:Loss for final step: 0.52769756.


INFO:tensorflow:Loss for final step: 0.52769756.


Training took time  0:08:42.558131


Now let's use our test data to see how well our model did:

In [0]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [48]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-12-16T18:07:16Z


INFO:tensorflow:Starting evaluation at 2019-12-16T18:07:16Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from OUTPUT_DIR_NAME/model.ckpt-281


INFO:tensorflow:Restoring parameters from OUTPUT_DIR_NAME/model.ckpt-281


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


InvalidArgumentError: ignored

Now let's write code to make predictions on new sentences:

In [0]:
def getPrediction(in_sentences):
  labels = ["Trade Contract Note", "Statements","Performance Reports", "Transfer Contract Note", "Trade ACK", "Prospectus","Sub Docs",                   
"Capital Call Notices", "Financial Statements", "Red Docs", "Meetings AGM EGM", "KIID Documents", "Distribution Notice",         
"Dividend Confirmation", "Corporate Action CN", "NAV Report", "FINRA", "Equalisation CN", "Stock Transfer Forms", "Compulsory Red CN",            
"Liquidations", "Switch Contract Note" ]
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

In [0]:
pred_sentences = ["CAPITAL CALL NOTIFICATION LETTER Global Infrastructure Solutions 3",
"Feeder SCSP July 20, 2018 Dear Investor In accordance with Sections 3.1 and 4.1 of the Amended and",
"Restated Deed of Limited Partnership, we are writing to inform you of the Global Infrastructure Solutions 3 Feeder SCSp’s capital call.",
"The capital call is primarily being drawn down to fund new and follow-on investments. Your capital call details are located on the accompanying notice.",
"The amount due is payable on August 03, 2018 in Euros. Sincerely yours, Serge Lauper Managing Director BlackRock Infrastructure INV ID# 21138394", 
"Global Infrastructure Solutions 3 Feeder SCSp Capital Call Notification ",
"July 20, 2018 PAYMENT DUE AUGUST 03, 2018 Clearstream Banking S.A. AFS 145326 This mailing constitutes a capital call notice for Global ",
"Infrastructure Solutions 3 Feeder SCSp. Please note the total amount due and ensure that the capital call is fully funded no later than the due date. ",
"Capital Call Components - Cash 1 Investments 174,548 Total Capital Call € 174,548 Partner's ",
"Capital Activity Summary Commitment 100.00 % 3,000,000 Funded commitment before this capital call 33.96 % 1,018,765 Current capital call", 
"5.82 % 174,548 Funded commitment after this capital call 39.78 % 1,193,313 Unfunded commitment after this capital call 60.22 %   € 1,806,687 ",
"Please use the instructions below for paying your capital call. When remitting your payment, please ensure that adequate provision is made for",
 "bank charges and transfer costs. Name of Bank: Deutsche Bank AG Swift Number: DEUTDEFFXXX SBOSGB2XXXX, State Street Bank & Trust Account Name: ",
 "Company, London Treasury IBAN: DE43 5007 0010 0927 361600 Amount Due: € 174,548 Due Date: August 03, 2018 BLTNEUR01 - ",
 "Clearstream Banking S.A. AFS Reference: 145326 To properly allocate your payment to our bank account please include BLTNEUR01 in the reference ",
" of the payment. Should you have any questions, please do not hesitate to contact us at +44-207-743-2603 or via email    realassets@blackrock.com. ",
 "Also, to ensure the accuracy of the Fund’s records and your timely receipt of all capital call notification, cash distributions and general",
 " correspondence, please report all changes promptly. 1 Decreases unfunded commitment INV ID# 21138394    Copyright © 2018 BlackRock. All Rights Reserved. "]




In [53]:
predictions = getPrediction(pred_sentences)

INFO:tensorflow:Writing example 0 of 18


INFO:tensorflow:Writing example 0 of 18


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] capital call notification letter global infrastructure solutions 3 [SEP]


INFO:tensorflow:tokens: [CLS] capital call notification letter global infrastructure solutions 3 [SEP]


INFO:tensorflow:input_ids: 101 3007 2655 26828 3661 3795 6502 7300 1017 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 3007 2655 26828 3661 3795 6502 7300 1017 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] feeder sc ##sp july 20 , 2018 dear investor in accordance with sections 3 . 1 and 4 . 1 of the amended and [SEP]


INFO:tensorflow:tokens: [CLS] feeder sc ##sp july 20 , 2018 dear investor in accordance with sections 3 . 1 and 4 . 1 of the amended and [SEP]


INFO:tensorflow:input_ids: 101 21429 8040 13102 2251 2322 1010 2760 6203 14316 1999 10388 2007 5433 1017 1012 1015 1998 1018 1012 1015 1997 1996 13266 1998 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 21429 8040 13102 2251 2322 1010 2760 6203 14316 1999 10388 2007 5433 1017 1012 1015 1998 1018 1012 1015 1997 1996 13266 1998 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] rest ##ated deed of limited partnership , we are writing to inform you of the global infrastructure solutions 3 feeder sc ##sp ’ s capital call . [SEP]


INFO:tensorflow:tokens: [CLS] rest ##ated deed of limited partnership , we are writing to inform you of the global infrastructure solutions 3 feeder sc ##sp ’ s capital call . [SEP]


INFO:tensorflow:input_ids: 101 2717 4383 15046 1997 3132 5386 1010 2057 2024 3015 2000 12367 2017 1997 1996 3795 6502 7300 1017 21429 8040 13102 1521 1055 3007 2655 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 2717 4383 15046 1997 3132 5386 1010 2057 2024 3015 2000 12367 2017 1997 1996 3795 6502 7300 1017 21429 8040 13102 1521 1055 3007 2655 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] the capital call is primarily being drawn down to fund new and follow - on investments . your capital call details are located on the accompanying notice . [SEP]


INFO:tensorflow:tokens: [CLS] the capital call is primarily being drawn down to fund new and follow - on investments . your capital call details are located on the accompanying notice . [SEP]


INFO:tensorflow:input_ids: 101 1996 3007 2655 2003 3952 2108 4567 2091 2000 4636 2047 1998 3582 1011 2006 10518 1012 2115 3007 2655 4751 2024 2284 2006 1996 10860 5060 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 1996 3007 2655 2003 3952 2108 4567 2091 2000 4636 2047 1998 3582 1011 2006 10518 1012 2115 3007 2655 4751 2024 2284 2006 1996 10860 5060 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] the amount due is pay ##able on august 03 , 2018 in euros . sincerely yours , serge lau ##per managing director black ##rock infrastructure in ##v id # 211 ##38 ##39 ##4 [SEP]


INFO:tensorflow:tokens: [CLS] the amount due is pay ##able on august 03 , 2018 in euros . sincerely yours , serge lau ##per managing director black ##rock infrastructure in ##v id # 211 ##38 ##39 ##4 [SEP]


INFO:tensorflow:input_ids: 101 1996 3815 2349 2003 3477 3085 2006 2257 6021 1010 2760 1999 19329 1012 25664 6737 1010 21747 21360 4842 6605 2472 2304 16901 6502 1999 2615 8909 1001 19235 22025 23499 2549 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 1996 3815 2349 2003 3477 3085 2006 2257 6021 1010 2760 1999 19329 1012 25664 6737 1010 21747 21360 4842 6605 2472 2304 16901 6502 1999 2615 8909 1001 19235 22025 23499 2549 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from OUTPUT_DIR_NAME/model.ckpt-281


INFO:tensorflow:Restoring parameters from OUTPUT_DIR_NAME/model.ckpt-281


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


Great! We have a Fund name classifier!

In [54]:
predictions

[('CAPITAL CALL NOTIFICATION LETTER Global Infrastructure Solutions 3',
  array([-4.829085  , -4.5385303 , -3.588954  , -4.952178  , -4.056877  ,
         -5.4448214 , -4.3627553 , -0.42157695, -5.0326505 , -2.5146303 ,
         -3.650778  , -5.9266686 , -3.8745835 , -4.3029766 , -4.1463785 ,
         -4.142169  , -4.015101  , -4.9833565 , -4.49714   , -3.9920034 ,
         -4.8230915 , -4.452085  ], dtype=float32),
  'Capital Call Notices'),
 ('Feeder SCSP July 20, 2018 Dear Investor In accordance with Sections 3.1 and 4.1 of the Amended and',
  array([-4.938771 , -4.2299557, -2.0524607, -4.472643 , -4.798257 ,
         -1.5666049, -3.648292 , -1.3724289, -3.8219516, -3.478047 ,
         -1.603549 , -4.167856 , -4.8630943, -4.928856 , -4.7478533,
         -4.4528947, -4.8424487, -5.5739536, -5.0385838, -4.859559 ,
         -5.1754646, -5.175075 ], dtype=float32),
  'Capital Call Notices'),
 ('Restated Deed of Limited Partnership, we are writing to inform you of the Global Infrastructu

In [0]:
!ls {'OUTPUT_DIR_NAME'}

checkpoint				     model.ckpt-0.index
eval					     model.ckpt-0.meta
events.out.tfevents.1576259185.f74618c4161e  model.ckpt-468.data-00000-of-00001
graph.pbtxt				     model.ckpt-468.index
model.ckpt-0.data-00000-of-00001	     model.ckpt-468.meta


In [0]:
latest = tf.train.latest_checkpoint('OUTPUT_DIR_NAME')


In [0]:
latest

'OUTPUT_DIR_NAME/model.ckpt-468'

In [0]:
# Create a new model instance
model = create_model(predicting, input_ids, input_mask, segment_ids, labels,num_labels)

# Load the previously saved weights
# model.load_weights(latest)

NameError: ignored

In [0]:
# Save the weights
estimator.save_weights('OUTPUT_DIR_NAME')

# Create a new model instance
model = create_model()

# Restore the weights
model.load_weights('OUTPUT_DIR_NAME')


NameError: ignored

In [0]:
!mkdir -p saved_model

In [0]:
!pip install -q pyyaml h5py