Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FVT]:mkvm returns Unknown issue Path on ppc64le vm #1747

Closed
tingtli opened this issue Aug 19, 2016 · 16 comments
Closed

[FVT]:mkvm returns Unknown issue Path on ppc64le vm #1747

tingtli opened this issue Aug 19, 2016 · 16 comments

Comments

@tingtli
Copy link
Contributor

tingtli commented Aug 19, 2016

xCAT 2.12.2 8/17 build. both on rhels7.2 and ubuntu14.04.1

[root@c910f03fsp03v02 result]# rmvm c910f03fsp03v04 -f -p
[root@c910f03fsp03v02 result]# mkvm c910f03fsp03v04 -s 20G
c910f03fsp03v04: Error: Unknown issue Path c910f03fsp03v04.sda.qcow2 already exists at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 318.
[root@c910f03fsp03v02 result]# echo $?
1
[root@c910f03fsp03v02 result]# lsdef -l c910f03fsp03v04
Object name: c910f03fsp03v04
    arch=ppc64le
    cons=kvm
    currchain=boot
    currstate=boot
    groups=all
    initrd=xcat/osimage/rhels7.2-ppc64le-install-compute/initrd.img
    kcmdline=quiet inst.repo=http://!myipfn!:80/install/rhels7.2/ppc64le inst.ks=http://!myipfn!:80/install/autoinst/c910f03fsp03v04 BOOTIF=42:18:0a:03:09:04  ip=dhcp  inst.cmdline  console=tty0 console=hvc0,9600n8r
    kernel=xcat/osimage/rhels7.2-ppc64le-install-compute/vmlinuz
    mac=42:18:0a:03:09:04
    mgt=kvm
    netboot=grub2
    os=rhels7.2
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    profile=compute
    provmethod=rhels7.2-ppc64le-install-compute
    serialflow=hard
    serialport=0
    serialspeed=9600
    status=powering-off
    statustime=08-19-2016 05:40:37
    vmcpus=1
    vmhost=node-32030901
    vmmemory=4096
    vmnicnicmodel=virtio
    vmnics=br0
    vmstorage=dir:///var/lib/libvirt/images/

@gurevichmark This issue occurs on our daily run test environment. it will be reinstalled again and again. Let me know if you need me to setup a environment.

@whowutwut
Copy link
Member

@tingtli Can you also provide the output for "ls -ltr /var/lib/libvirt/images" or provide us the machine to debug on? Is the hyper visor reinstalled on each run or the rmvm run to clean up the old vm before this is executed again?

@gurevichmark
Copy link
Contributor

gurevichmark commented Aug 19, 2016

@tingtli I think this is working correctly. When you run rmvm <vmname> the VM is removed but the disks are left untouched. Then when you run mkvm <vmname> -s <size> you are asking to create a VM with a new disk, but the old disk is still there.
So there are 2 choices here:

  1. If you wish to keep the old disk to be reused by the VM, the commands should be
    -rmvm <vmname>
    -mkvm <vmname>
  2. If you wish to create a VM with a new disk, the commands should be
    -rmvm -p <vmname> (or rmvm -f -p <vmname> if VM is powered on)
    -mkvm <vmname> -s <size>

Before retrying this, please make sure there are no old disks remain on the KVM host machine for the VMs that have been removed. Login to KVM host and issue virsh vol-list default to see the list of disks and which VMs they belong to. To manually remove one issue virsh vol-delete <name> --pool default

@tingtli
Copy link
Contributor Author

tingtli commented Aug 19, 2016

@gurevichmark @whowutwut I am sorry. The steps should be
rmvm c910f03fsp03v04 -f -p
@gurevichmark Please login node c910f03fsp03v02(10.3.9.2) to take a look. When you login node c910f03fsp03v02, please first login 10.4.31.1, then ssh to c910f03fsp03v02.
one thing you need to notice is this node would probably be reinstalled by our test control node, so please don't save your code on that.

@gurevichmark
Copy link
Contributor

@tingtli I logged onto your system to see what is going on. It appears that on the KVM host system node-32030901, you have 4 running VMs:

[root@localhost ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 41    c910f03fsp03v10                running
 44    c910f03fsp03v11                running
 88    c910f03fsp03v02                running
 92    c910f03fsp03v140               running

But there are some disks defined for VMs that are not running:

[root@localhost ~]# virsh vol-list default
 Name                 Path
------------------------------------------------------------------------------
 c910f03fsp03v02.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v02.sda.qcow2
 c910f03fsp03v03.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v03.sda.qcow2
 c910f03fsp03v04.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v04.sda.qcow2
 c910f03fsp03v05.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v05.sda.qcow2
 c910f03fsp03v06.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v06.sda.qcow2
 c910f03fsp03v07.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v07.sda.qcow2
 c910f03fsp03v10.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v10.sda.qcow2
 c910f03fsp03v11.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v11.sda.qcow2
 c910f03fsp03v110.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v110.sda.qcow2
 c910f03fsp03v120.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v120.sda.qcow2
 c910f03fsp03v130.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v130.sda.qcow2
 c910f03fsp03v131.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v131.sda.qcow2
 c910f03fsp03v140.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v140.sda.qcow2
 c910f03fsp03v141.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v141.sda.qcow2
 c910f03fsp03v142.sda.qcow2 /var/lib/libvirt/images/c910f03fsp03v142.sda.qcow2

Generally that is not a problem, these disks can stay around in case you want to define a VM and reuse old disks with a command mkvm <vmname>. But if you want to start from scratch and define a VM with a new disk mkvm <vmname> -s <size> these disks should be deleted. Normally rmvm <vmname> -f -p would do that, but somehow things got out of sync on node-32030901.
I recommend you take a look at this list of disks and manually delete the ones you feel are not needed using virsh vol-delete <diskname> --pool default. For example virsh vol-delete c910f03fsp03v04.sda.qcow2 --pool default. After that rmvm and mkvm in your testcases should work correctly.

@tingtli
Copy link
Contributor Author

tingtli commented Aug 22, 2016

@gurevichmark Mark, i use your way and it works now. My question is why we need to delete it using the virsh commands. Why rmvm does not do that?

@gurevichmark
Copy link
Contributor

I think this is what is happening:

  1. At one point there was a running VM c910f03fsp03v04 with a disk c910f03fsp03v04.sda.qcow2
  2. A command rmvm c910f03fsp03v04 was issued. This removed the VM, but since -p option was not used, the disk c910f03fsp03v04.sda.qcow2 was not removed from the KVM host.
  3. Now, when you try to create a new VM with the same name and use -s <size> option, an error is reported because the disk is still there. Also if you try to use rmvm -p now, it will not remove the disk, since there is no longer a running VM (it was removed in step 2) and KVM host does not remember which disks used to belong to the VM that is no longer there.

There are 3 ways to recover from this:

  1. Run virsh command on KVM host to manually remove the disk
  2. Run mkvm <name> without the -s <size> option to create a new VM and reuse the old disk. Then run rmvm -f -p to remove the VM and its disk.
  3. Run mkvm <name> -f -s <size>. The -f option removes the old disks before creating a new VM.

@gurevichmark
Copy link
Contributor

@tingtli Can you please let me know if you agree with my explanation in the last comment for this issue. If you agree, can you please close this issue.

@gurevichmark
Copy link
Contributor

@tingtli Since there is no activity on this issue for 22 days, I am going to close it. Please re-open if you feel the issue is still there.

@caomengmeng
Copy link
Contributor

reproduced on sles12.1 ppc64le test environment:

RUN:if [ "ppc64le" != "ppc64"  -a  "kvm" != "ipmi" ];then if [[ "dir:///var/lib/libvirt/images/" =~ "phy" ]]; then rmvm c910f03fsp03v04 -f -p  &&  mkvm c910f03fsp03v04; else rmvm c910f03fsp03v04 -f -p  &&  mkvm c910f03fsp03v04 -s 20G; fi;fi

[if [ "ppc64le" != "ppc64"  -a  "kvm" != "ipmi" ];then if [[ "dir:///var/lib/libvirt/images/" =~ "phy" ]]; then rmvm c910f03fsp03v04 -f -p  &&  mkvm c910f03fsp03v04; else rmvm c910f03fsp03v04 -f -p  &&  mkvm c910f03fsp03v04 -s 20G; fi;fi] Running Time:4 sec
RETURN rc = 1
OUTPUT:
c910f03fsp03v04: Error: Unknown issue Path c910f03fsp03v04.sda.qcow2 already exists at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 318.
CHECK:rc == 0   [Failed]

@caomengmeng caomengmeng reopened this Sep 23, 2016
@tingtli
Copy link
Contributor Author

tingtli commented Sep 23, 2016

@gurevichmark We follow your guide to delete the disk files on node-32030901 manually. Then it seems that this problem was resolved. But after running about 1 mouths. it seems it happens again.
As you mentioned, Normally rmvm -f -p would do that, but somehow things got out of sync on node-32030901.
So, i guess probably, -p is not work correctly. Would you please take a look at it again. Thanks very much!

@gurevichmark
Copy link
Contributor

@tingtli It is hard to recreate this problem, and see what is going on, since it does not happen every time. I still think somehow, someone either manually or as part of the test, runs rmvm for that VM and does not specify -p option. This removes VM and leaves the disk there. After that rmvm -p from your test run can not remove the disk, because the VM is not there anymore.

I think we have 2 choices here:

  1. Simple solution is to add -f flag to your mkvm command, this will remove the disk and create a new VM each time.
  2. We can try to confirm my theory that prior to your test the VM is not there but the disk is there. To do that, you will need to add these 2 commands before and after each rmvm command you run. The output will show us which VMs and which disks are there before and after rmvm command is executed.
    a. xdsh node-32030901 virsh list
    b. xdsh node-32030901 virsh vol-list default

@gurevichmark
Copy link
Contributor

Recommendation from @daniceexi for this issue:

Hi Mark,
I do think the command 'rmvm <vm> -p' should clean up the disk even the vm object has been removed from the vm host. 
If it's really hard to change our code to fix the 'rmvm <vm> -p' issue, a reasonable message should be displayed.
Also we can add a '-f' flag for 'mkvm -s' to force the replace of the existed disk.

Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)

I will use this issue to do the fix:

If possible, the rmvm -p will remove the disk even if vm is not there. If not possible, rmvm -p will return an error saying no disks were removed if vm is not there.

@gurevichmark
Copy link
Contributor

After some discussion, on @daniceexi recommendation, moving this issue to next release to do some more study.
As a workaround for testing, use suggestions from a comment posted to this issue on Aug 22.

@gurevichmark
Copy link
Contributor

Used pull request #1942 to make rmvm command fail if vm is not there anymore. Also changed an error message for mkvm command to give user advice on using -f option to remove storage.

@tingtli
Copy link
Contributor Author

tingtli commented Nov 7, 2016

Move to next release since it is under monitor

@tingtli tingtli modified the milestones: 2.13, 2.12.4 Nov 7, 2016
@whowutwut whowutwut modified the milestones: 2.12.4, 2.13 Nov 8, 2016
@whowutwut
Copy link
Member

Let's close this and re-open if it comes up again during the monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants