Optimization of text-embeddings execution and failure handling. #522

kartik2112 · 2021-10-07T17:48:29Z

This PR deals with issue #519

I updated the code so that the SentenceTransformer now generates the embeddings in batches instead of processing it line by line. The progress bar is still there. At the same time, if the default device is not available, the embedding task is re-attempted on the next GPU device.

Challenges that could be raised:

If a machine has just 1 GPU? The function torch.cuda.getDeviceCount() will return 1, so it won't re-attempt the execution.
If machine does not have a GPU? By default, no device is specified. If the execution fails, the condition to re-attempt does not get satisfied and the failure is reported without a re-attempt

… with issue usc-isi-i2#519

kartik2112 · 2021-12-09T02:20:50Z

Added working test case for text embedding

saggu

I added some comments @kartik2112

Also, I created a PR from usc-isi-i2 to your fork - kartik2112#1

Always pull the latest changes from the main repo to your fork before creating a PR

kgtk/cli/text_embedding.py

kgtk/gt/embedding_utils.py

pull latest changes in `dev` from usc-isi-i2

saggu · 2022-01-04T18:36:03Z

@kartik2112 any updates?

saggu

@kartik2112 changes required for handling writing to output file

LICENSE

kgtk/gt/embedding_utils.py

szeke and others added 5 commits September 3, 2021 13:13

Update LICENSE

b8860c8

Optimization of text-embeddings execution and failure handling. Deals…

1795c73

… with issue usc-isi-i2#519

Minor change to GPU count check logic

9c3c49c

Fixes to text embedding testcase

2eb0924

Merge branch 'master' of https://github.com/kartik2112/kgtk into dev

cfca4b9

Kartik Shenoy added 2 commits December 8, 2021 18:33

Minor fix

c6a450d

Text embedding doc added

afd1c9d

saggu requested changes Dec 9, 2021

View reviewed changes

kgtk/cli/text_embedding.py Outdated Show resolved Hide resolved

kgtk/gt/embedding_utils.py Show resolved Hide resolved

kgtk/gt/embedding_utils.py Outdated Show resolved Hide resolved

kgtk/gt/embedding_utils.py Outdated Show resolved Hide resolved

Merge pull request #1 from usc-isi-i2/dev

03f568a

pull latest changes in `dev` from usc-isi-i2

kartik2112 and others added 7 commits January 5, 2022 07:10

Merge branch 'usc-isi-i2:dev' into dev

297fc4e

Update text_embedding.py

960b24f

Added kgtkwriter support to text_embedding

f71e823

Minor mode fix

923ce23

Minor mode fix

7db5f23

Removed commented code

66b1c58

default -o value - with output to terminal

4867792

saggu requested changes Jan 5, 2022

View reviewed changes

LICENSE Outdated Show resolved Hide resolved

kgtk/gt/embedding_utils.py Outdated Show resolved Hide resolved

kartik2112 and others added 2 commits January 5, 2022 17:57

Update LICENSE

9a8c5cb

Kgtk writer no file specified handling used

98e0870

saggu merged commit 67f5c15 into usc-isi-i2:dev Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of text-embeddings execution and failure handling. #522

Optimization of text-embeddings execution and failure handling. #522

kartik2112 commented Oct 7, 2021

kartik2112 commented Dec 9, 2021

saggu left a comment

saggu commented Jan 4, 2022

saggu left a comment

Optimization of text-embeddings execution and failure handling. #522

Optimization of text-embeddings execution and failure handling. #522

Conversation

kartik2112 commented Oct 7, 2021

kartik2112 commented Dec 9, 2021

saggu left a comment

Choose a reason for hiding this comment

saggu commented Jan 4, 2022

saggu left a comment

Choose a reason for hiding this comment