Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zos3: support update of some of workloads #1425

Closed
2 of 4 tasks
muhamadazmy opened this issue Oct 12, 2021 · 2 comments
Closed
2 of 4 tasks

zos3: support update of some of workloads #1425

muhamadazmy opened this issue Oct 12, 2021 · 2 comments
Assignees
Labels
type_feature New feature or request

Comments

@muhamadazmy
Copy link
Member

muhamadazmy commented Oct 12, 2021

The provision engine need to handle the failure to update differently than an install update. The problem is a failed update does not mean that the reservation is gone.

For example, increase a disk size can fail, but doesn't mean the disk is gone so the disk state should still be okay but we need to report back that the operation of the resize has failed. This is currently is not possible, because a workload can only hold ONE error message that is associated with the workload state.

I suggest that a workload can have instead (State) which is basically the current disk state. And also a List of (states) that are associated with operations.

For example

Disk Workload {
   State: Ok,
   Result: {
       Size: "current disk size",
    } 
   Operations {
      {State: Error, Operatino: update, Time: Timestamp, message: "failed to increase disk size, no enough space"}
      {State: Ok, Operation: install, Time: Timestamp, message: ""}
   }
} 
  • Deployment storage rewrite to support transactions
  • Disk resize
  • ZDB resize
  • Monitoring should use transaction data to correctly calculate usage
@xmonader xmonader added the type_feature New feature or request label Oct 28, 2021
@xmonader xmonader added this to Backlog in 3.0.1 via automation Oct 28, 2021
@xmonader xmonader added this to Backlog in TFGrid_3.0.0 via automation Oct 28, 2021
@xmonader xmonader added this to the next milestone Oct 28, 2021
@muhamadazmy muhamadazmy moved this from Backlog to Accepted in 3.0.1 Oct 28, 2021
@muhamadazmy muhamadazmy self-assigned this Oct 28, 2021
@muhamadazmy muhamadazmy moved this from Accepted to In Progress in 3.0.1 Oct 28, 2021
@muhamadazmy muhamadazmy moved this from In Progress to Accepted in 3.0.1 Oct 28, 2021
@muhamadazmy muhamadazmy removed this from Accepted in 3.0.1 Nov 15, 2021
@muhamadazmy muhamadazmy added this to To do in 3.0 via automation Nov 15, 2021
@muhamadazmy muhamadazmy moved this from To do to Accepted in 3.0 Nov 22, 2021
@muhamadazmy muhamadazmy moved this from Accepted to To do in 3.0 Nov 25, 2021
@muhamadazmy muhamadazmy moved this from Accepted to Backlog in 3.0 Dec 7, 2021
@muhamadazmy muhamadazmy removed this from Backlog in 3.0 Dec 28, 2021
@muhamadazmy muhamadazmy added this to Accepted in 3.0.9+ X via automation Dec 28, 2021
@muhamadazmy
Copy link
Member Author

Requirements

Was thinking about the cleanest way to implement this without sacrifice correctness of the operation. Came across this set of requirements

  • A deployment is a container of some workloads that can interact together
  • A change on the workload can be to create, update or delete
  • Not all workload types can execute an update
  • An update on the workload need to change the state if the update was successful, otherwise a failure to update need to reflect that the transaction has failed, BUT the state is not (the state also include workload data)

This all pushes toward a different persisted workload storage (currently deployments are stored as single files). Instead the abstract structure of storage need to reflect "transactions" where transactions can itself container 'data', 'state' and error. and the final state of the object is computed by scanning the transactions and compute the final state/data of the workload.

deployment {
   ...
   workload: <name> {
     type: <type>
     transactions: [
        {op: install, data: {...}, state: ok, error: none}
        {op: update, data: {...}, state: not-changed: error}
     ]
   }
}

for example the previous workload state will still be ok, but with the data provided on install.
If we push another update call that passes, then it will has {op: update, data: {..}, state: ok} and then the final state of the workload will be ok, and the associated data is data from last update operations

WORK IN PRGRESS

@muhamadazmy
Copy link
Member Author

Get Deployment operation need to be backward compatible. A Get operation need to return the same data (current state) of the deployment.
A new call should return history of operations with data, and errors.

@muhamadazmy muhamadazmy moved this from Accepted to In progress in 3.0.9+ X Jan 12, 2022
@muhamadazmy muhamadazmy removed this from In progress in 3.0.9+ X Jan 13, 2022
@muhamadazmy muhamadazmy added this to Accepted in 3.1.0 via automation Jan 13, 2022
@muhamadazmy muhamadazmy moved this from Accepted to In progress in 3.1.0 Jan 13, 2022
@muhamadazmy muhamadazmy removed this from In progress in 3.1.0 Jan 17, 2022
@muhamadazmy muhamadazmy added this to Accepted in 3.0.9+ X via automation Jan 17, 2022
@muhamadazmy muhamadazmy moved this from Accepted to In progress in 3.0.9+ X Jan 17, 2022
TFGrid_3.0.0 automation moved this from Backlog to Done Jan 17, 2022
3.0.9+ X automation moved this from In progress to Done Jan 17, 2022
@muhamadazmy muhamadazmy moved this from Done to Verification in 3.0.9+ X Jan 17, 2022
@muhamadazmy muhamadazmy moved this from Verification to Done in 3.0.9+ X Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type_feature New feature or request
Projects
No open projects
3.0.9+ X
  
Done
TFGrid_3.0.0
  
Done
Development

No branches or pull requests

2 participants