Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload model has a bottleneck on the big model transferring #40

Closed
benchuang11046 opened this issue Jun 12, 2019 · 4 comments
Closed

Upload model has a bottleneck on the big model transferring #40

benchuang11046 opened this issue Jun 12, 2019 · 4 comments

Comments

@benchuang11046
Copy link
Collaborator

The bottleneck of uploading is about 500-600MB on private cloud.(Maybe less on public cloud)
Currently, the uploading model file is going to api-afs and written to blob.
It may cause timeout when transferring the big file.
I want to add a function to update model to blob directly.

# training... 
# produce model...
...

# afs2-model DO NOT upload 
afs2_model.upload_model(...)

# Model file on blob is models/{instance_id}/{model_respository_id}/{model_id}
# User should input their blob_credential
# Check existed file on the blob and update the big model to the blob directly
# Notify api-afs this operation success/failed
afs2_model.update_model_to_blob(blob_credential, model_file)
@chenjr0719
Copy link
Collaborator

Interesting idea, but this might cause a really BIG change. I suggest a new feature to afs-api, make afs-api use user's own blob storage instance. No centralized blob storage also means you don't have to upload models through afs-api. But this might also increase efforts of metadata management. What do you think?

@benchuang11046
Copy link
Collaborator Author

We have a plan to setting user-provide blob function on portal.
And put the blob credential to the user workspace runtime environment variables.
This is why I proposed afs2_model.update_model_to_blob(blob_credential, model_file) the blob_credential can be optional when the user-provide blob appears.

Yes, @chenjr0719 is right. There is a effort to managed the model on blob storage without afs-api if update model directly.
After updating(or uploading) the model, we may notify afs-api the content modified and sync meta data.

However, centralized blob storage architecture in afs-api will be maintained at least a month...

@estherxyz
Copy link
Collaborator

Hi @benchuang11046 ,

I have a question about this, "The bottleneck of uploading is about 500-600MB on private cloud.(Maybe less on public cloud)".
The bottleneck is because of network transmission speed, or it is the limit about api process speed... or... others?

@benchuang11046
Copy link
Collaborator Author

Hi @estherxyz
For testing, increasing afs-api memory can improve the upload model size a little bit.(600MB -> 700MB)
Additionally, there are many factors cause the issue, like network transmission slow causes response timeout.
There is a problem that we have a big model to upload depends on afs-api resource enough or not.

@benchuang11046 benchuang11046 mentioned this issue Jun 24, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants