Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
demo Depend on tfjs 1.0 in USE. (#164) Mar 6, 2019
images
src Update models to use latest 0.x (#156) Feb 26, 2019
.gitignore
.npmignore Add cache directory to npmignore (#154) Feb 23, 2019
README.md
package.json Depend on tfjs 1.0 in USE. (#164) Mar 6, 2019
rollup.config.js Prepare universal sentence encoder for npm (#144) Feb 6, 2019
run_tests.ts
tsconfig.json Add the Universal Sentence Encoder lite. (#139) Feb 4, 2019
tslint.json Add the Universal Sentence Encoder lite. (#139) Feb 4, 2019
yarn.lock

README.md

Universal Sentence Encoder lite

The Universal Sentence Encoder (Cer et al., 2018) (USE) is a model that encodes text into 512-dimensional embeddings. These embeddings can then be used as inputs to natural language processing tasks such as sentiment classification and textual similarity analysis.

This module is a TensorFlow.js GraphModel converted from the USE lite (module on TFHub), a lightweight version of the original. The lite model is based on the Transformer (Vaswani et al, 2017) architecture, and uses an 8k word piece vocabulary.

In this demo we embed six sentences with the USE, and render their self-similarity scores in a matrix (redder means more similar):

selfsimilarity

The matrix shows that USE embeddings can be used to cluster sentences by similarity.

The sentences (taken from the TensorFlow Hub USE lite colab):

  1. I like my phone.
  2. Your cellphone looks great.
  3. How old are you?
  4. What is your age?
  5. An apple a day, keeps the doctors away.
  6. Eating strawberries is healthy.

Installation

Using yarn:

$ yarn add @tensorflow/tfjs@1.0.0 @tensorflow-models/universal-sentence-encoder

Using npm:

$ npm install @tensorflow/tfjs@1.0.0 @tensorflow-models/universal-sentence-encoder

Usage

To import in npm:

import * as use from '@tensorflow-models/universal-sentence-encoder';

or as a standalone script tag:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.0.0"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script>

Then:

// Load the model.
use.load().then(model => {
  // Embed an array of sentences.
  const sentences = [
    'Hello.',
    'How are you?'
  ];
  model.embed(sentences).then(embeddings => {
    // `embeddings` is a 2D tensor consisting of the 512-dimensional embeddings for each sentence.
    // So in this example `embeddings` has the shape [2, 512].
    embeddings.print(true /* verbose */);
  });
});

To use the Tokenizer separately:

use.loadTokenizer().then(tokenizer => {
  tokenizer.encode('Hello, how are you?'); // [341, 4125, 8, 140, 31, 19, 54]
});
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.