-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advices for inference speedup #88
Comments
hello @yishusong , using Pandas apply method is slow, I suppose you want to use this model with a specific column and have the output in another maybe.
You would probably run OOM so make sure you run this in batches (split your data) and to run About the increasing GPU utilization, I am not sure how we can increase or even make sure that it's using GPU during inference, I hope someone helps with that. |
you can create batches like this # Sample text data
all_text = ["sample text 1", "sample text 2", …, "sample text n"]
# Define the batch size
batch_size = 10
# Function to create batches
def create_batches(data, batch_size):
for i in range(0, len(data), batch_size):
yield data[i:i + batch_size]
# Example usage of the generator function
all_predictions = []
for batch in create_batches(all_text, batch_size):
predictions = model.batch_predict(batch)
all_predictions.extend(predictions) |
Thank you very much for the replies! I'll try it out shortly. Re: @Marwen-Bhj 's comment about GPU... I haven't looked into the source code yet but is it possible to use the model with huggingface? I was thinking something like device_map = 'auto' to use all GPU, or make data type = float16 to make the data smaller. Does the code base offer configurations like this? If not, maybe a memory optimized instance will perform better? |
you can try the automatic mixed precision (AMP) module in PyTorch for inference. For me it helps speeding-up the training, but I have not tried inference from torch.cuda.amp import autocast
with autocast(dtype = torch.float16):
predictions = model.batch_predict(batch) |
@urchade I tried AMP, it did not increase the inference speed. |
Thanks! With CPU there is joblib so there will be more speedup. |
Ok, that's weird but ok 😅 Did you try to pass |
@urchade that fixed it ! |
Thanks a lot! This indeed speed up inference a lot. However, |
I'm also interested in how to boost performance using multiple GPUs. |
Hi, would it also be possible to speed up using AWS Inferentia / Optimum Neuron? (see article) |
I don't think inferentia works because it only supports a very limited list of HF models. Also it might not be compatible with CUDA so there might be other dependency issues. |
Hi team,
I'm running inference on a g5.24xlarge GPU instance. The data is currently structured in a Pandas dataframe. I use Pandas apply method to apply the predict_entities function. When the df gets fairly large (~1.5M rows), it takes days to run the inference.
I'm wondering if there is a way to increase GPU utilization? I suppose Pandas df is not the most efficient data structure... or maybe there is a parameter I missed that can boost GPU utilization?
Any advice is much appreciated!
The text was updated successfully, but these errors were encountered: