Converting to embeddings #3

temiwale88 · 2023-03-04T21:19:31Z

Hello team,

How do I simply output tokens / embeddings from a model like "bert-base-multilingual-cased" using this library?

Thanks.

xenova · 2023-03-04T21:59:42Z

If I understand correctly, you're looking for something like:

let pipe = await pipeline('embeddings', 'bert-base-multilingual-cased');
let features = await pipe('this is text');
console.log(features); // of shape [1, 768]

Full example

<!DOCTYPE html>
<html lang="en">

<head>
  <script src="https://cdn.jsdelivr.net/npm/@xenova/transformers/dist/transformers.min.js"></script>
</head>

<body>
  <script>
      document.addEventListener('DOMContentLoaded', async () => {
          let pipe = await pipeline('embeddings', 'bert-base-multilingual-cased');
          let features = await pipe('this is text');
          console.log(features)
      });
  </script>
</body>

</html>

If this isn't what you're looking for, could you provide the corresponding Python code?

temiwale88 · 2023-03-04T23:09:47Z

Wow. Thanks for your prompt response @xenova ! The python code can be found in huggingface's example.
I want the actual embeddings of 'this is text' i.e. the vector of, in this case, 768 real numbers that represents that text.
https://huggingface.co/bert-base-multilingual-cased

from transformers import BertTokenizer, TFBertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
model = TFBertModel.from_pretrained("bert-base-multilingual-cased")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

xenova · 2023-03-04T23:33:45Z

Okay sure! Then the code is very similar:

// use online model
let model_path = 'https://huggingface.co/Xenova/transformers.js/resolve/main/quantized/bert-base-multilingual-cased/default'

// or local model...
// let model_path = './models/onnx/quantized/bert-base-multilingual-cased/default'

let tokenizer = await AutoTokenizer.from_pretrained(model_path)
let model = await AutoModel.from_pretrained(model_path)
let text = "Replace me by any text you'd like."
let encoded_input = await tokenizer(text)
let output = await model(encoded_input)
console.log(output)
// output.last_hidden_state is probably what you want, which has dimensions: [1, 13, 768]

Full output

'1607': h {
  dims: [ 1, 768 ],
  type: 'float32',
  data: Float32Array(768) [
       0.3123791813850403, 0.003413991304114461,   0.4250122904777527,
     -0.21854251623153687, -0.03365212306380272,   0.5738293528556824,
      0.44006016850471497,  0.22588559985160828,  -0.5350813269615173,
      0.36405789852142334, 0.021792402490973473,  -0.2876971364021301,
     -0.29596221446990967, 0.020903922617435455,   0.3584246039390564,
      -0.3395065665245056,   0.7530634999275208,   0.1739857792854309,
      0.44116804003715515,  -0.4844452440738678,  -0.9996387362480164,
      -0.5158448815345764, -0.12882454693317413, -0.49362120032310486,
      -0.5188333988189697,    0.336887925863266,  -0.3745831549167633,
      0.36033907532691956,  0.37004369497299194, -0.39894717931747437,
      0.17938275635242462,   -0.999678373336792,   0.7192255854606628,
       0.7524069547653198,  0.30271032452583313,  -0.3640788495540619,
      0.11464710533618927,  0.22076407074928284,  0.22265100479125977,
     -0.10793695598840714, -0.31874018907546997, 0.029070451855659485,
      -0.4867308437824249,  0.05832969397306442,  -0.0781107172369957,
     -0.38290923833847046,  -0.3295677900314331,   0.3887884318828583,
      -0.5113925337791443,   0.1652149111032486,  0.11696229130029678,
      0.29684290289878845,    0.453733891248703,   0.3948168158531189,
       0.2942390441894531,   0.1424790769815445,   0.3129344880580902,
      0.13875140249729156,   0.5188011527061462,  -0.4438818097114563,
     -0.06960029155015945,   0.5636783838272095,   0.2037249058485031,
      -0.0677640363574028,  -0.1857370287179947, -0.40668919682502747,
    -0.009993324056267738,  -0.3047634959220886,   0.5168231129646301,
     -0.30341315269470215,  -0.2810252010822296,  -0.3927687704563141,
     -0.24477936327457428, 0.013969076797366142,  0.18359339237213135,
      -0.3113570809364319,   0.5239997506141663,   0.3185481131076813,
       0.1643698662519455, -0.30105236172676086,  -0.5766139030456543,
      -0.5408143401145935,  -0.5070666670799255,  0.29758158326148987,
     -0.24177992343902588,   0.5012576580047607,  0.21541833877563477,
     -0.47010543942451477,    0.170441672205925, 0.014354686252772808,
      0.33614882826805115,   0.5729205012321472, -0.24921900033950806,
        0.231329545378685, -0.37843769788742065,  -0.4504586458206177,
      -0.8556193709373474, -0.36552679538726807,   -0.440240740776062,
       -0.471213161945343,
    ... 668 more items
  ],
  size: 768
},
last_hidden_state: h {
  dims: [ 1, 13, 768 ],
  type: 'float32',
  data: Float32Array(9984) [
       0.179206982254982,  -0.45218634605407715,    0.3733910918235779,
      0.6348037123680115, -0.029033176600933075,    0.8430363535881042,
     -0.5690401792526245,   0.08038058131933212,   0.07163389027118683,
     0.21128004789352417,   0.18273138999938965,   0.28029823303222656,
     -0.2271048128604889,   -0.5668721199035645,  -0.25434204936027527,
    -0.20566235482692719,  -0.21113985776901245,  0.023820027709007263,
    -0.07503996044397354,    0.6554434299468994,    0.0331328809261322,
      0.5251416563987732,  -0.18271729350090027,   0.40064969658851624,
     0.19416815042495728,    0.1606152355670929,    0.6098182201385498,
    -0.45384591817855835,    0.3769904673099518,   -0.6224309802055359,
      0.3582024872303009,    0.3312928378582001,  -0.15432783961296082,
      0.9041290283203125,    0.6253558397293091,    0.4761025309562683,
      -1.585947871208191,    0.3880254030227661,    0.0689338743686676,
     0.01133832335472107,    0.3931882381439209,  -0.16045214235782623,
      0.7108028531074524,   0.06522128731012344,   0.12024108320474625,
      0.4158273935317993,    0.6558313965797424,   0.40796613693237305,
     0.41697409749031067,   -0.9136700630187988, -0.027383755892515182,
     -0.4363255500793457,    0.7598181962966919,   -0.6640756130218506,
     0.28937026858329773,   0.33868512511253357,    -0.200037881731987,
    -0.08954302966594696,   0.10191089659929276,   0.47241780161857605,
     0.12555810809135437,   -0.2889227867126465,   0.12182769924402237,
     -0.5717242360115051,  -0.22202396392822266,   0.11810725182294846,
    0.031238414347171783,  -0.08660764992237091,   -0.2523582875728607,
    -0.05567580461502075,   0.18019431829452515,   0.07720719277858734,
      0.5153632760047913,   0.14237044751644135,   -0.2686181664466858,
     0.15073272585868835,   0.04963202774524689,  -0.14567162096500397,
     0.11041714251041412,    0.2176727056503296,  -0.23475873470306396,
      0.5151072144508362,   0.22279003262519836,  0.023239057511091232,
     0.11294965445995331,    0.1889713555574417,    0.1793782114982605,
    -0.07152184098958969,  -0.16680236160755157,    -1.068345069885254,
    -0.41137754917144775,   0.07684430480003357,    0.7185859680175781,
     0.33627647161483765,   -0.3375685214996338,   -0.2667768895626068,
     -0.3442692756652832,  -0.10465909540653229,   -0.1447119116783142,
     0.19307956099510193,
    ... 9884 more items
  ],
  size: 9984
}
}

For memory-efficiency purposes, ONNX returns multi-dimension arrays as a single, flat Float32Array. So, just make sure you use/index it correctly :)

In addition to adding a few await keywords, the main difference is that you have to specify the whole path since, right now, it doesn't try to guess what task you want to perform (default, in this case).

That behaviour might change in the future though.

temiwale88 · 2023-03-05T04:02:28Z

Ok. Let me give this a try and report back. My goal is to index text embeddings right within a nodejs app and send the 768 embedding to Elasticsearch. I'm inclined to using python (create a fastapi app that solely embeds incoming text using this tokenizer) however it'd be nice if I can keep my app primarily written in JS.

If I use python, then locally I'd use fastapi but in production I'd use an AWS lambda function (with API gateway) to serve this embedding task.

Thoughts? Does it make sense to use python in production or can I still have a production-grade strategy by keeping it in JS i.e. get the embeddings re: your example and store the float32array as is in Elasticsearch?

Thanks!

xenova · 2023-03-05T12:37:00Z

Cool idea! However, if your app is intended to be run entirely on a server, it might be best to use the Python library. The main goal of this project is to bring that functionality to the browser.

That said, many users have found quantized ONNX models to have competitive performance to their PyTorch counterparts... but you would need to test to see which is best for your use-case. And of course, the added benefit of running entirely in JS may be a good enough reason to use this library.

Keep me updated though! I'd love to see how the library is used in real applications! 😄

xenova · 2023-03-05T12:37:25Z

(feel free to reopen if needed!)

temiwale88 · 2023-03-05T15:16:50Z

Thanks @xenova !

temiwale88 · 2023-03-06T15:22:26Z

Hello @xenova . Thanks again for your help! When I run your example in nodejs, I get the following error (alongside a lot of compiled JS):

TypeError [ERR_WORKER_PATH]: The worker script or module filename must be an absolute path or a relative path starting with './' or '../'. Received "blob:nodedata:262343bf-f735-4bbc-b506-34b7fad27351"
    at new NodeError (node:internal/errors:393:5)
    at new Worker (node:internal/worker:165:15)
    at Object.yc (C:\Projects\iTestify\node_modules\onnxruntime-web\dist\ort-web.node.js:6:7890)
    at Object.Cc (C:\Projects\iTestify\node_modules\onnxruntime-web\dist\ort-web.node.js:6:7948)
    at lt (C:\Projects\iTestify\node_modules\onnxruntime-web\dist\ort-web.node.js:6:5690)
    at Et (C:\Projects\iTestify\node_modules\onnxruntime-web\dist\ort-web.node.js:6:9720)
    at wasm://wasm/025ff5d6:wasm-function[10917]:0x7f0507
    at wasm://wasm/025ff5d6:wasm-function[1580]:0xf3ecb
    at wasm://wasm/025ff5d6:wasm-function[2786]:0x1d31ee
    at wasm://wasm/025ff5d6:wasm-function[5903]:0x49713d {
  code: 'ERR_WORKER_PATH'
}

Even so, it takes a little too long to output that error (I suppose it'll take the same time to output tokens).
Here's my JS code

const tokenize = async () => {
    // use online model
    let model_path = 'https://huggingface.co/Xenova/transformers.js/resolve/main/quantized/bert-base-multilingual-cased/default'

    // or local model...
    // let model_path = './models/onnx/quantized/bert-base-multilingual-cased/default'

    let tokenizer = await AutoTokenizer.from_pretrained(model_path)
    let model = await AutoModel.from_pretrained(model_path)
    let text = "Replace me by any text you'd like."
    let encoded_input = await tokenizer(text)
    let output = await model(encoded_input)
    return output.last_hidden_state
}

xenova · 2023-03-06T16:55:24Z

Hi again. Yes this is a bug in onnx runtime. See here for more information: #4

TL;DR:

// 1. Fix "ReferenceError: self is not defined" bug when running directly with node
// https://github.com/microsoft/onnxruntime/issues/13072
global.self = global;

const { pipeline, env } = require('@xenova/transformers')

// 2. Disable spawning worker threads for testing.
// This is done by setting numThreads to 1
env.onnx.wasm.numThreads = 1

// 3. Continue as per usual:
// ...

temiwale88 · 2023-03-07T14:44:33Z

Thanks @xenova ! It works as you stated! Thank you!
I'm, however, getting a Float32Array(9984) which means it's giving me a vector of 13 X 768 real numbers. So I need to figure that out.
Also, performance is lacking in nodejs (obviously I'm hacking your beautiful work for nodejs instead of the web browser as you intended it to be :-D ).

Any tips? As a way to optimize for speed, shall I 'download the model and place it in the ./models/onnx/quantized folder (or another location, provided you set env.localURL)' as in your comment here #4?

Thanks,
Elijah

xenova · 2023-03-07T14:49:44Z

I'm, however, getting a Float32Array(9984) which means it's giving me a vector of 13 X 768 real numbers. So I need to figure that out.

I made a utility function to help "reshape" the outputs:

transformers.js/src/pipelines.js

Lines 412 to 438 in cef0f51

    
           function reshape(data, dimensions) { 
        
               const totalElements = data.length; 
        
               const dimensionSize = dimensions.reduce((a, b) => a * b); 
        
               if (totalElements !== dimensionSize) { 
        
                   throw Error(`cannot reshape array of size ${totalElements} into shape (${dimensions})`); 
        
               } 
        
               let reshapedArray = data; 
        
               for (let i = dimensions.length - 1; i >= 0; i--) { 
        
                   reshapedArray = reshapedArray.reduce((acc, val) => { 
        
                       let lastArray = acc[acc.length - 1]; 
        
                       if (lastArray.length < dimensions[i]) { 
        
                           lastArray.push(val); 
        
                       } else { 
        
                           acc.push([val]); 
        
                       } 
        
                       return acc; 
        
                   }, [[]]); 
        
               } 
        
               return reshapedArray[0]; 
        
           }

so, you can use that 👍 I haven't exposed the method from the module - I could add that in the next update perhaps. In the meantime, you can just copy past the code :)

Also, performance is lacking in nodejs (obviously I'm hacking your beautiful work for nodejs instead of the web browser as you intended it to be :-D ).

The biggest bottleneck will undoubtedly be the "redownloading" of the model each time you request. I haven't implemented local caching yet... but it should be as simple as downloading the model to some cache directory.

Any tips? As a way to optimize for speed, shall I 'download the model and place it in the ./models/onnx/quantized folder (or another location, provided you set env.localURL)' as in your comment here #4?

Definitely! If you're running locally, there's no good reason to have to download the model each time ;)

scottroot · 2024-02-13T09:01:20Z

Hello, is there new documentation on this? I am seeing that task = "embeddings" is no longer a possible option. Feature-extraction task doesn't seem to produce what I am expecting either, generating resulting arrays that are 2-5x larger than they should be.

jonathanpv · 2024-04-05T02:54:45Z

Hello, is there new documentation on this? I am seeing that task = "embeddings" is no longer a possible option. Feature-extraction task doesn't seem to produce what I am expecting either, generating resulting arrays that are 2-5x larger than they should be.

Now you can use 'feature-extraction' pipeline to generate embeddings!

xenova closed this as completed Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting to embeddings #3

Converting to embeddings #3

temiwale88 commented Mar 4, 2023

xenova commented Mar 4, 2023 •

edited

temiwale88 commented Mar 4, 2023

xenova commented Mar 4, 2023 •

edited

temiwale88 commented Mar 5, 2023

xenova commented Mar 5, 2023

xenova commented Mar 5, 2023

temiwale88 commented Mar 5, 2023

temiwale88 commented Mar 6, 2023

xenova commented Mar 6, 2023 •

edited

temiwale88 commented Mar 7, 2023

xenova commented Mar 7, 2023 •

edited

scottroot commented Feb 13, 2024

jonathanpv commented Apr 5, 2024

Converting to embeddings #3

Converting to embeddings #3

Comments

temiwale88 commented Mar 4, 2023

xenova commented Mar 4, 2023 • edited

temiwale88 commented Mar 4, 2023

xenova commented Mar 4, 2023 • edited

temiwale88 commented Mar 5, 2023

xenova commented Mar 5, 2023

xenova commented Mar 5, 2023

temiwale88 commented Mar 5, 2023

temiwale88 commented Mar 6, 2023

xenova commented Mar 6, 2023 • edited

temiwale88 commented Mar 7, 2023

xenova commented Mar 7, 2023 • edited

scottroot commented Feb 13, 2024

jonathanpv commented Apr 5, 2024

xenova commented Mar 4, 2023 •

edited

xenova commented Mar 4, 2023 •

edited

xenova commented Mar 6, 2023 •

edited

xenova commented Mar 7, 2023 •

edited