Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tfrobot: Support deployments cancelation on Ctrl+c #838

Closed
Eslam-Nawara opened this issue Feb 18, 2024 · 4 comments
Closed

tfrobot: Support deployments cancelation on Ctrl+c #838

Eslam-Nawara opened this issue Feb 18, 2024 · 4 comments
Assignees
Labels
tfrobot type_feature New feature or request
Milestone

Comments

@Eslam-Nawara
Copy link
Contributor

Is your feature request related to a problem? Please describe

tfrobot stops deployment on Ctrl+c but don't cancel deployed vms

Describe the solution you'd like

Cancelation on Ctrl+c should be added to clean up after not completed deployments

@Eslam-Nawara
Copy link
Contributor Author

Eslam-Nawara commented Feb 18, 2024

Work completed:
tried to run 4 vm groups on different node groups and send SIGTERM

tfrobot git:(development-cancel-deployments-on-SIGTERM) ✗ go run main.go deploy -c config.yaml
2:40PM INF starting peer session=tf-320046 twin=10804
2:40PM INF validating configuration file
2:40PM INF done validating configuration file
2:40PM INF Running deployment Node group=group_a
2:40PM INF Filter nodes Node group=group_a
2:40PM WRN Planetary and public IP options are false. Setting planetary IP to true vms group=example1vm
2:40PM INF Starting mass deployment Node group=group_a
2:41PM INF Retrying to deploy Deployment trial=2 Node group=group_a
2:41PM INF Filter nodes Node group=group_a
2:41PM INF Starting mass deployment Node group=group_a
2:41PM INF Done deploying Node group=group_a
2:41PM INF Running deployment Node group=group_b
2:41PM INF Filter nodes Node group=group_b
2:41PM WRN Planetary and public IP options are false. Setting planetary IP to true vms group=example2vm
2:41PM INF Starting mass deployment Node group=group_b
^C
2:41PM INF canceling contracts project name=group_b
2:42PM INF Running deployment Node group=group_c
2:42PM INF canceling contracts project name=group_c
2:42PM INF project is canceled project name=group_c
2:42PM INF Running deployment Node group=group_d
2:42PM INF canceling contracts project name=group_d
2:42PM INF project is canceled project name=group_d
2:42PM INF canceling contracts project name=group_a
2:42PM INF canceling contracts project name=group_b
2:42PM INF project is canceled project name=group_b
2:42PM INF canceling contracts project name=group_c
2:42PM INF project is canceled project name=group_c
2:42PM INF canceling contracts project name=group_d
2:42PM INF project is canceled project name=group_d
2:42PM FTL error="failed to run deployer, deployment was interrupted with signal SIGTERM"
exit status 1tfrobot git:(development-cancel-deployments-on-SIGTERM)

@Eslam-Nawara Eslam-Nawara added this to the 1.0.0 milestone Feb 21, 2024
@A-Harby
Copy link
Contributor

A-Harby commented Feb 26, 2024

It didn't work with me; I deployed and stopped it, and it didn't cancel any of the deployed contracts.

node_groups:
  - name: group_a
    nodes_count: 3
    free_cpu: 5
    free_mru: 85
    free_ssd: 10
    free_hdd: 10
  - name: group_b
    nodes_count: 3
    free_cpu: 5
    free_mru: 85
    free_ssd: 10
    free_hdd: 10
  - name: group_c
    nodes_count: 3
    free_cpu: 5
    free_mru: 85
    free_ssd: 10
    free_hdd: 10
vms:
  - name: examplevm
    vms_count: 1
    node_group: group_a
    cpu: 1
    mem: 10
    flist: 'https://hub.grid.tf/tf-official-apps/base:latest.flist'
    entry_point: '/sbin/zinit init'
    root_size: 0
    ssd:
      - size: 15
        mount_point: '/mnt/ssd'
    ssh_key: example1
    env_vars:
      user: user1
      pwd: 1234
  - name: examplevm
    vms_count: 1
    node_group: group_b
    cpu: 1
    mem: 10
    flist: 'https://hub.grid.tf/tf-official-apps/base:latest.flist'
    entry_point: '/sbin/zinit init'
    root_size: 0
    ssd:
      - size: 15
        mount_point: '/mnt/ssd'
    ssh_key: example1
    env_vars:
      user: user1
      pwd: 1234
  - name: examplevm
    vms_count: 1
    node_group: group_c
    cpu: 1
    mem: 10
    flist: 'https://hub.grid.tf/tf-official-apps/base:latest.flist'
    entry_point: '/sbin/zinit init'
    root_size: 0
    ssd:
      - size: 15
        mount_point: '/mnt/ssd'
    ssh_key: example1
    env_vars:
      user: user1
      pwd: 1234

image

image

@Eslam-Nawara
Copy link
Contributor Author

Eslam-Nawara commented Feb 26, 2024

Work completed

  • updated tfrobot to delete the groups with the correct group name format
tfrobot git:(development-fix-tfrobot-cancel-on-ctrl-c) ✗ go run main.go deploy -c config.yml
5:20PM INF starting peer session=tf-573353 twin=6410
5:20PM INF validating configuration file
5:20PM INF done validating configuration file
5:20PM INF Running deployment Node group=group_a
5:20PM INF Filter nodes Node group=group_a
5:20PM WRN Planetary and public IP options are false. Setting planetary IP to true vms group=examplevm
5:20PM INF Starting mass deployment Node group=group_a
5:20PM INF Done deploying Node group=group_a
5:20PM INF Running deployment Node group=group_b
5:20PM INF Filter nodes Node group=group_b
5:20PM WRN Planetary and public IP options are false. Setting planetary IP to true vms group=examplevm
5:20PM INF Starting mass deployment Node group=group_b
^C
5:21PM INF canceling contracts project name=vm/group_b
5:21PM INF project is canceled project name=vm/group_b
5:21PM INF Running deployment Node group=group_c
5:21PM INF canceling contracts project name=vm/group_c
5:21PM INF project is canceled project name=vm/group_c
5:21PM INF canceling contracts project name=vm/group_a
5:21PM INF project is canceled project name=vm/group_a
5:21PM INF canceling contracts project name=vm/group_b
5:21PM INF canceling contracts project name=vm/group_c
5:21PM INF project is canceled project name=vm/group_c
5:21PM FTL error="failed to run deployer, deployment was interrupted with signal SIGTERM"
exit status 1tfrobot git:(development-fix-tfrobot-cancel-on-ctrl-c) ✗ go run main.go load -c config.yml
5:21PM INF starting peer session=tf-575218 twin=6410
5:21PM INF Loading deployments
ok: {}
error:
    group_a: couldn't find any contracts of node group group_a
    group_b: couldn't find any contracts of node group group_b
    group_c: couldn't find any contracts of node group group_c

@Eslam-Nawara
Copy link
Contributor Author

verified:

deployments are canceled on ctrl+c

5:03PM INF starting peer session=tf-68310 twin=10804
5:03PM INF validating configuration file
5:03PM INF done validating configuration file
5:03PM INF Running deployment Node group=group_b
5:03PM INF Filter nodes Node group=group_b
^C5:03PM WRN ygg ip, mycelium ip and public IP options are false. Setting ygg IP to true vms group=examplevm
5:03PM WRN ygg ip, mycelium ip and public IP options are false. Setting ygg IP to true vms group=examplevm
5:03PM INF Starting mass deployment Node group=group_b
5:03PM INF canceling contracts project name=vm/group_b
5:03PM INF project is canceled project name=vm/group_b
5:03PM INF canceling contracts project name=vm/group_b
5:03PM INF project is canceled project name=vm/group_b
5:03PM FTL error="failed to run deployer, deployment was interrupted with signal SIGTERM"
exit status 1
➜  tfrobot git:(development) ✗```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tfrobot type_feature New feature or request
Projects
Status: Done
Development

No branches or pull requests

2 participants