Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

vmdk_ops fails to start after ESX reboot #976

Closed
ashahi1 opened this issue Feb 28, 2017 · 8 comments
Closed

vmdk_ops fails to start after ESX reboot #976

ashahi1 opened this issue Feb 28, 2017 · 8 comments

Comments

@ashahi1
Copy link
Contributor

ashahi1 commented Feb 28, 2017

Docker cli commands (docker volume create/ls) does not works after vm reboot if the vm had crashed while the container was still running.


Steps:

  1. Created a volume
root@photon-xVmYMbyTn [ ~ ]# docker volume ls
DRIVER              VOLUME NAME
root@photon-xVmYMbyTn [ ~ ]# docker volume create --driver=vmdk --name=testVol -o size=200mb
testVol
root@photon-xVmYMbyTn [ ~ ]# docker volume ls
DRIVER              VOLUME NAME
vmdk                testVol@sharedVmfs-0
root@photon-xVmYMbyTn [ ~ ]# docker volume inspect testVol@sharedVmfs-0
[
    {
        "Name": "testVol@sharedVmfs-0",
        "Driver": "vmdk",
        "Mountpoint": "/mnt/vmdk/testVol@sharedVmfs-0",
        "Status": {
            "access": "read-write",
            "attach-as": "independent_persistent",
            "capacity": {
                "allocated": "14MB",
                "size": "200MB"
            },
            "clone-from": "None",
            "created": "Tue Feb 28 08:30:56 2017",
            "created by VM": "photon-VM1.3",
            "datastore": "sharedVmfs-0",
            "diskformat": "thin",
            "fstype": "ext4",
            "status": "detached"
        },
        "Labels": {},
        "Scope": "global"
    }
]
root@photon-xVmYMbyTn [ ~ ]#
  1. Ran the container with the volume mounted:
root@photon-xVmYMbyTn [ ~ ]# docker volume ls
DRIVER              VOLUME NAME
vmdk                testVol@sharedVmfs-0
root@photon-xVmYMbyTn [ ~ ]# docker run -it --volume-driver=vmdk -v testVol@sharedVmfs-0:/vol1 --name ub ubuntu
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu

d54efb8db41d: Pull complete
f8b845f45a87: Pull complete
e8db7bf7c39f: Pull complete
9654c40e9079: Pull complete
6d9ef359eaaa: Pull complete
Digest: sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535
Status: Downloaded newer image for ubuntu:latest
root@e676ea9c8e99:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  vol1
root@e676ea9c8e99:/# echo " Hello World" > vol1/hellos
root@e676ea9c8e99:/# cat vol1/hellos
 Hello World

From ESX:

[root@sc-rdops-vm08-dhcp-226-41:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py ls
Volume   Datastore     Created By VM  Created                   Attached To VM (name/uuid)  Policy  Capacity  Used  Disk Format  Filesystem Type  Access      Attach As
-------  ------------  -------------  ------------------------  --------------------------  ------  --------  ----  -----------  ---------------  ----------  ----------------------
testVol  sharedVmfs-0  photon-VM1.3   Tue Feb 28 08:30:56 2017  photon-VM1.3                N/A     200MB     19MB  thin         ext4             read-write  independent_persistent

[root@sc-rdops-vm08-dhcp-226-41:~]
  1. While the container is still running, rebooted the esx. Once the esx came back up, powered on the vm.
  2. Then ran admin cli command to list volumes:
[root@sc-rdops-vm08-dhcp-226-41:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py ls
Volume   Datastore     Created By VM  Created                   Attached To VM (name/uuid)  Policy  Capacity  Used  Disk Format  Filesystem Type  Access      Attach As
-------  ------------  -------------  ------------------------  --------------------------  ------  --------  ----  -----------  ---------------  ----------  ----------------------
testVol  sharedVmfs-0  photon-VM1.3   Tue Feb 28 08:30:56 2017  photon-VM1.3                N/A     200MB     19MB  thin         ext4             read-write  independent_persistent

[root@sc-rdops-vm08-dhcp-226-41:~]
  1. After that I tried to run 'docker volume ls' command on the vm - error "Cannot communicate with ESX"
root@photon-xVmYMbyTn [ ~ ]# docker volume ls
list vmdk: VolumeDriver.List: 'list' failed: connection reset by peer (errno=104). Cannot communicate with ESX, please refer to the FAQ https://github.com/vmware/docker-volume-vsphere/wiki#faq
DRIVER              VOLUME NAME
root@photon-xVmYMbyTn [ ~ ]#
@ashahi1 ashahi1 added this to the 0.13 milestone Feb 28, 2017
@ashahi1
Copy link
Contributor Author

ashahi1 commented Feb 28, 2017

Attached logs:
vmdk_ops.txt
docker-volume-vsphere.txt

@ashahi1 ashahi1 changed the title Docker cli command does not works if vm crashes while the container is still running. Docker cli command does not works after reboot if vm crashes while the container is still running. Feb 28, 2017
@ashahi1 ashahi1 changed the title Docker cli command does not works after reboot if vm crashes while the container is still running. Docker cli command does not works after vm reboot if vm had crashed while the container was still running with volumes attached. Feb 28, 2017
@shuklanirdesh82
Copy link
Contributor

@ashahi1

After that I tried to run 'docker volume ls' command on the vm - error "Cannot communicate with ESX"

can you walk through steps mentioned at FAQ? It is better to make sure that all services are up and running fine.

FAQ: https://vmware.github.io/docker-volume-vsphere/documentation/user-guide/faq/#i-see-connection-reset-by-peer-errno104-in-the-services-logs-what-is-the-cause

@pdhamdhere
Copy link
Contributor

vSphere Host Agent (Hostd) took 13 seconds to start;

2017-02-28T08:42:26Z jumpstart[66472]: executing start plugin: hostd
...
2017-02-28T08:42:39.176Z info hostd[A05A9D0] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 1 : VMware Host Agent started on host sc-rdops-vm08-dhcp-226-41.en

In the meantime, vmdkops tried to start, failed and gave up

2017-02-28T08:42:33Z jumpstart[66472]: executing start plugin: vmdk-opsd
2017-02-28T08:42:34Z watchdog-vmdkops-opsd: [68429] Begin '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019', min-uptime = 60, max-quick-failures = 5, max-total-failures = 10, bg_pid_file = '', reboot-flag = '0'
2017-02-28T08:42:34Z watchdog-vmdkops-opsd: Executing '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019'
2017-02-28T08:42:34Z watchdog-vmdkops-opsd: '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019' exited after 0 seconds (quick failure 1) 1
2017-02-28T08:42:34Z watchdog-vmdkops-opsd: Executing '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019'
2017-02-28T08:42:35Z watchdog-vmdkops-opsd: '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019' exited after 1 seconds (quick failure 2) 1
2017-02-28T08:42:35Z watchdog-vmdkops-opsd: Executing '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019'
2017-02-28T08:42:35Z watchdog-vmdkops-opsd: '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019' exited after 0 seconds (quick failure 3) 1
2017-02-28T08:42:35Z watchdog-vmdkops-opsd: Executing '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019'
2017-02-28T08:42:36Z watchdog-vmdkops-opsd: '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019' exited after 1 seconds (quick failure 4) 1
2017-02-28T08:42:36Z watchdog-vmdkops-opsd: Executing '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019'
2017-02-28T08:42:36Z watchdog-vmdkops-opsd: '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019' exited after 0 seconds (quick failure 5) 1
2017-02-28T08:42:36Z watchdog-vmdkops-opsd: Executing '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019'
2017-02-28T08:42:37Z watchdog-vmdkops-opsd: '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019' exited after 1 seconds (quick failure 6) 1
2017-02-28T08:42:37Z watchdog-vmdkops-opsd: End '/usr/lib/vmware/vmdkops/bin/vmdk_ops.py -p 1019', failure limit reached

Now that Hostd is up & running, manually started service /etc/init.d/vmdk-opsd start and works fine.

Need to figure out how to add watchdog dependency so that vmdkops is not started until Hostd is up and running.

@pdhamdhere
Copy link
Contributor

vmdkops does start after Hostd. However Hostd may not be fully up.

        load_vmci()

        kv.init()
        connectLocalSi() <<<<<<<<
        handleVmciRequests(port)

Can we defer "connectLocalSi" to later on first request? CC/ @msterin @govint

@govint
Copy link
Contributor

govint commented Mar 2, 2017

Rather than give up after a few retries the esx service could finish the retries and then continue. Getting the service interface is done on each call now so thats not an issue.

@pdhamdhere
Copy link
Contributor

@govint main->connectLocalSi() exits if connection can not be established during startup. Are you also suggesting to remove connectLocalSi() from main?

@govint
Copy link
Contributor

govint commented Mar 2, 2017

No, try to connect but don't exit because connection wasn't established with the hostd.

@govint govint changed the title Docker cli command does not works after vm reboot if vm had crashed while the container was still running with volumes attached. Docker cli command does not works after host reboot if vm had crashed while the container was still running with volumes attached. Mar 2, 2017
@govint
Copy link
Contributor

govint commented Mar 2, 2017

Tried out a change to not exit the service on start up and instead try to connect to hostd on the next command. WHats happening is the pyVim pkg is remembering some status and even if hostd is started keeps thowing host connect fault over and over. Unless the service is restarted this continues for ever.

@msterin msterin changed the title Docker cli command does not works after host reboot if vm had crashed while the container was still running with volumes attached. Docker volume command does not works ("connection reset") after ESX rebooted while a container was running with volumes attached. Mar 2, 2017
@pdhamdhere pdhamdhere changed the title Docker volume command does not works ("connection reset") after ESX rebooted while a container was running with volumes attached. vmdk_ops fails to start after ESX reboot Mar 7, 2017
@tusharnt tusharnt closed this as completed Mar 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants