Advices for inference speedup #88

yishusong · 2024-05-14T17:32:58Z

Hi team,

I'm running inference on a g5.24xlarge GPU instance. The data is currently structured in a Pandas dataframe. I use Pandas apply method to apply the predict_entities function. When the df gets fairly large (~1.5M rows), it takes days to run the inference.

I'm wondering if there is a way to increase GPU utilization? I suppose Pandas df is not the most efficient data structure... or maybe there is a parameter I missed that can boost GPU utilization?

Any advice is much appreciated!

Marwen-Bhj · 2024-05-14T18:22:23Z

hello @yishusong , using Pandas apply method is slow, I suppose you want to use this model with a specific column and have the output in another maybe.

transform that column into a list, use model.batch_predict_entities(your_list, labels)
create a dictionnary from that output and join back with the dataframe

You would probably run OOM so make sure you run this in batches (split your data) and to run torch.cuda.empty_cache() .

About the increasing GPU utilization, I am not sure how we can increase or even make sure that it's using GPU during inference, I hope someone helps with that.

urchade · 2024-05-14T18:44:04Z

you can create batches like this

# Sample text data
all_text = ["sample text 1", "sample text 2", …, "sample text n"]

# Define the batch size
batch_size = 10

# Function to create batches
def create_batches(data, batch_size):
    for i in range(0, len(data), batch_size):
        yield data[i:i + batch_size]

# Example usage of the generator function
all_predictions = []
for batch in create_batches(all_text, batch_size):
    predictions = model.batch_predict(batch)
    all_predictions.extend(predictions)

yishusong · 2024-05-14T18:51:14Z

Thank you very much for the replies! I'll try it out shortly.

Re: @Marwen-Bhj 's comment about GPU... I haven't looked into the source code yet but is it possible to use the model with huggingface? I was thinking something like device_map = 'auto' to use all GPU, or make data type = float16 to make the data smaller. Does the code base offer configurations like this?

If not, maybe a memory optimized instance will perform better?

urchade · 2024-05-14T19:02:51Z

you can try the automatic mixed precision (AMP) module in PyTorch for inference. For me it helps speeding-up the training, but I have not tried inference

from torch.cuda.amp import autocast

with autocast(dtype = torch.float16):
    predictions = model.batch_predict(batch)

Marwen-Bhj · 2024-05-14T20:05:36Z

@urchade I tried AMP, it did not increase the inference speed.
headsup @yishusong
suprisingly, running inference on a CPU cluster is faster by at least 3 times than a GPU :
CPU cluster : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
GPU instance : Nvidia V100

yishusong · 2024-05-14T20:17:01Z

Thanks! With CPU there is joblib so there will be more speedup.

urchade · 2024-05-14T23:59:41Z

Ok, that's weird but ok 😅

Did you try to pass model.to('cuda') instead of model.cuda() ?

Marwen-Bhj · 2024-05-15T10:12:37Z

@urchade that fixed it !
thank you :)

yishusong · 2024-05-17T00:57:34Z

Thanks a lot! This indeed speed up inference a lot.

However, model.to('cuda') seems to only utilize 1 GPU. I looked up online, the nn.DataParallel(model) won't extend to GLiNER batch_predict...

lifepillar · 2024-05-20T12:39:18Z

I'm also interested in how to boost performance using multiple GPUs.

bartmachielsen · 2024-06-16T10:40:29Z

Hi, would it also be possible to speed up using AWS Inferentia / Optimum Neuron? (see article)

yishusong · 2024-06-16T16:11:15Z

I don't think inferentia works because it only supports a very limited list of HF models. Also it might not be compatible with CUDA so there might be other dependency issues.

vijayendra-g · 2024-07-16T12:04:53Z

@yishusong @Marwen-Bhj How were you able to achieve inference within seconds with cpu ?
It takes close to 18-20 minutes with a medium 2.1 .

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
model.to('cpu')

text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""

labels = ["person", "award", "date", "competitions", "teams"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advices for inference speedup #88

Advices for inference speedup #88

yishusong commented May 14, 2024

Marwen-Bhj commented May 14, 2024 •

edited

Loading

urchade commented May 14, 2024

yishusong commented May 14, 2024

urchade commented May 14, 2024 •

edited

Loading

Marwen-Bhj commented May 14, 2024

yishusong commented May 14, 2024

urchade commented May 14, 2024

Marwen-Bhj commented May 15, 2024

yishusong commented May 17, 2024

lifepillar commented May 20, 2024

bartmachielsen commented Jun 16, 2024

yishusong commented Jun 16, 2024

vijayendra-g commented Jul 16, 2024 •

edited

Loading

Advices for inference speedup #88

Advices for inference speedup #88

Comments

yishusong commented May 14, 2024

Marwen-Bhj commented May 14, 2024 • edited Loading

urchade commented May 14, 2024

yishusong commented May 14, 2024

urchade commented May 14, 2024 • edited Loading

Marwen-Bhj commented May 14, 2024

yishusong commented May 14, 2024

urchade commented May 14, 2024

Marwen-Bhj commented May 15, 2024

yishusong commented May 17, 2024

lifepillar commented May 20, 2024

bartmachielsen commented Jun 16, 2024

yishusong commented Jun 16, 2024

vijayendra-g commented Jul 16, 2024 • edited Loading

Marwen-Bhj commented May 14, 2024 •

edited

Loading

urchade commented May 14, 2024 •

edited

Loading

vijayendra-g commented Jul 16, 2024 •

edited

Loading