Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to overcome memory issues when predicting large batches of data? #69

Closed
SaadAhmed433 opened this issue Oct 18, 2022 · 3 comments
Closed

Comments

@SaadAhmed433
Copy link

Hello team,

I have a dataset of about 8000 comments each comment is around 6 to 8 words (some are shorted with 2 words only)

The problem is that I am unable to get the prediction since I run out of GPU memory during the process. To overcome this I am using a custom loop to loop over comments in batches and append the results to a data frame.

comments_list = comments["text"].to_list()
 df = pd.DataFrame()

 for i in range(0, len(comments_list), 32):
     comms = comments_list[i : i + 32]
     results = Detoxify("original", device=device).predict(comms)
     results = pd.DataFrame(results)
     df = df.append(results, ignore_index=True)

Is there a more efficient way of doing this than writing a for loop?

Currently I have a 16GB Testa T4 as GPU.

Thanks!

@laurahanu
Copy link
Collaborator

Hello, you should initialise the model only once and assign it to a variable so this doesn't happen at each iteration:

model = Detoxify("original", device=device)
comments_list = comments["text"].to_list()
 df = pd.DataFrame()

 for i in range(0, len(comments_list), 32):
     comms = comments_list[i : i + 32]
     results = model.predict(comms)
     results = pd.DataFrame(results)
     df = df.append(results, ignore_index=True)

Now you should be able to use a bigger batch size as well.
Hope this helps!

@SaadAhmed433
Copy link
Author

Thanks for the suggestion and pointing it out. The change worked pretty well infact insanely well.

Previously it was averaging around 7 mins to get all the predictions, now everything is done in about 8 seconds.

@laurahanu
Copy link
Collaborator

Great, glad it helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants