Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to pull model #13

Closed
vkuznet opened this issue Apr 7, 2017 · 4 comments
Closed

Switch to pull model #13

vkuznet opened this issue Apr 7, 2017 · 4 comments

Comments

@vkuznet
Copy link
Owner

vkuznet commented Apr 7, 2017

Currently we implemented push approach:

The way how it's done now in my prototype is the following: a client sends request to an agent to transfer dataset /a/b/c from to site X. The agent first checks if it has this dataset, if so, it initiates the transfer by pushing data from itself to site X. If that agent does not have this dataset it broadcasts request to all known agents. The agent who has it replies and request is delegated to that agent. This agent then pushes the data from itself to site X.

It has some flaw, e.g. site can go down or experience maintenance or run out of disk space, therefore we need to explore, develop and eventually switch to pull model.

Sites today have complete control over the agent that puts data into their site. This is a design choice that was made in order to put the responsibility for transfers onto the site ops team. E.g. the site can turn off their agent when they have problems with storage. They can throttle it if there are issues. They can stop the agent if they loose disk and thus run out of space, or run out of space for some other reason. In pull model request will land to a site which request the data and fetch it from original site. From the above description we'll redirect request to agent sitting on site X and it will download dataset /a/b/c from whatever site holds its copy.

@rishiloyola
Copy link
Collaborator

rishiloyola commented Apr 11, 2017

screen shot 2017-04-12 at 1 33 15 am

Implementation Details - First, the client will select an agent which has the complete requested data set. After deciding an agent it will create the tranferRequest and will pass it to the request manager of site B on /manager endpoint. Request manager will store the request in the pool. The request manager will approve requests from the pool, if site conditions are good (no disk issue) and may disapprove request is site needs time to handle its own issues.

Instead of designing new end-point to pull the data, request manager will pass the transferRequest to the selected agent on /request endpoint. After getting the request from siteB that agent will push the data on upload endpoint.

The request manager will approve the request based on two parameters - Time and data size. If the site has enough storage capacity then only manager will approve the request.

@vkuznet
Copy link
Owner Author

vkuznet commented Apr 11, 2017 via email

@vkuznet vkuznet added this to the June development milestone May 10, 2017
@rishiloyola
Copy link
Collaborator

If the transfer fails then we will again make the new instance of the request and will send it to the mentioned agent. If the same request fails more then three times then we will throw an error and will stop the process.

@vkuznet
Copy link
Owner Author

vkuznet commented Jun 2, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants