Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vic-init fails with 4 NICs configured #2802

Closed
hickeng opened this issue Oct 19, 2016 · 13 comments
Closed

vic-init fails with 4 NICs configured #2802

hickeng opened this issue Oct 19, 2016 · 13 comments
Assignees
Labels
area/vsphere Intergration and interoperation with vSphere impact/test/integration Requires creation of or changes to an integration test kind/debt Problems that increase the cost of other work kind/defect Behavior that is inconsistent with what's intended priority/p2 source/customer Reported by a customer, directly or via an intermediary team/foundation

Comments

@hickeng
Copy link
Member

hickeng commented Oct 19, 2016

The output in #2794 is caused by vic-init exiting when 4 separate NICs are specified. The installer generates the wrong slot number for the 4th NIC and vic-init does not handle this well:

DEBU[2016-10-19T16:36:40Z] Setting "bridge" to slot 288
ethernet3.pciSlotNumber = "1184"

We should not be exiting vic-init if/when this occurs - while rebooting the VM can redress some issues, a hardware misconfiguration is not one of them. Without vic-init running we cannot enable SSH or console for manual debugging.

vic-init output for reference:

time="2016-10-19T16:36:47Z" level=info msg="DHCP response: IP=192.168.78.152, SubnetMask=ffffff00, Gateway=192.168.78.2, DNS=[192.168.78.2], Lease Time=30m0s"
time="2016-10-19T16:36:47Z" level=info msg="setting ip address 192.168.78.152/24 for link eth3"
time="2016-10-19T16:36:47Z" level=debug msg="added address 192.168.78.152/24 to link eth3"
time="2016-10-19T16:36:47Z" level=debug msg="updateEndpoint(192.168.78.152/24, &{Common:{ExecutionEnvironment: ID:256 Name: Notes:} Static:false IP:<nil> Assigned:{IP:<nil> Mask:<nil>} Network:{Common:{ExecutionEnvironment: ID:Network:HaNetwork-VM Network Name:management Notes:} Type: Gateway:{IP:<nil> Mask:<nil>} Default:false Nameservers:[] Pools:[] Aliases:[]} DHCP:0xc4203f4880})"
time="2016-10-19T16:36:47Z" level=debug msg="not setting route for network: default=false gateway=192.168.78.2"
time="2016-10-19T16:36:47Z" level=debug msg="&{Common:{ExecutionEnvironment: ID:256 Name: Notes:} Static:false IP:<nil> Assigned:{IP:192.168.78.152 Mask:ffffff00} Network:{Common:{ExecutionEnvironment: ID:Network:HaNetwork-VM Network Name:management Notes:} Type: Gateway:{IP:192.168.78.2 Mask:ffffff00} Default:false Nameservers:[192.168.78.2] Pools:[] Aliases:[]} DHCP:0xc4203f4880}"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"::1 ipv6-loopback ip6-localhost ip6-loopback ipv6-localhost\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"127.0.0.1 localhost.localdomain localhost\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"fe00:: ip6-localnet\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"ff02::1 ip6-allnodes\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"ff02::2 ip6-allrouters\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"127.0.1.1 ghicken-test\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"ff00:: ip6-mcastprefix\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"192.168.78.152 management.localhost\\n\" to /etc/hosts"
time="2016-10-19T16:36:47Z" level=info msg="Added nameservers: [192.168.78.2]"
time="2016-10-19T16:36:47Z" level=debug msg="&{Mutex:{state:1 sema:0} EntryConsumer:<nil> dirty:true path:/etc/resolv.conf nameservers:[[192 168 78 2]] timeout:15000000000 attempts:5}"
time="2016-10-19T16:36:47Z" level=debug msg="&{lines:[nameserver 192.168.78.2 options timeout:15 options attempts:5] i:0}"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"nameserver 192.168.78.2\\n\" to /etc/resolv.conf"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"options timeout:15\\n\" to /etc/resolv.conf"
time="2016-10-19T16:36:47Z" level=debug msg="writing \"options attempts:5\\n\" to /etc/resolv.conf"
time="2016-10-19T16:36:47Z" level=error msg="failed to apply network endpoint config: unable to acquire reference to link 288: 0 eth interfaces match /sys/bus/pci/devices/0000:00:19.0/0000:*:00.0/net/* ([])"
time="2016-10-19T16:36:47Z" level=debug msg="Removing the signal notifier"
time="2016-10-19T16:36:47Z" level=debug msg="Signalling the child reaper loop"
time="2016-10-19T16:36:47Z" level=debug msg="Closing the reapers signal channel"
time="2016-10-19T16:36:47Z" level=info msg="Stopping extension Toolbox"

vic-machine log:

INFO[2016-10-19T16:36:40Z] Creating appliance on target
DEBU[0000] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).createApplianceSpec:289]
DEBU[0000] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).addIDEController:265]
DEBU[0000] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).addIDEController:265] [1.125588ms]
DEBU[0000] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).addParaVirtualSCSIController:277]
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).addParaVirtualSCSIController:277] [3.107932ms]
DEBU[0001] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).addNetworkDevices:209]
DEBU[2016-10-19T16:36:40Z] Setting "external" to slot 192
DEBU[2016-10-19T16:36:40Z] Setting "client" to slot 224
DEBU[2016-10-19T16:36:40Z] Setting "management" to slot 256
DEBU[2016-10-19T16:36:40Z] Setting "bridge" to slot 288
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).addNetworkDevices:209] [18.527895ms]
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).createApplianceSpec:289] [24.747837ms]
DEBU[2016-10-19T16:36:41Z] vm folder name: "ghicken-test_2"
DEBU[2016-10-19T16:36:41Z] vm inventory path: "/ha-datacenter/vm/ghicken-test"
DEBU[0001] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).GenerateExtensionName:364]
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).GenerateExtensionName:364] [10.228262ms]
DEBU[0001] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).reconfigureApplianceSpec:567]
DEBU[0001] [BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).configIso:377]
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).configIso:377] [11.11156ms]
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).reconfigureApplianceSpec:567] [13.726115ms]
DEBU[0001] [ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).createAppliance:403] [574.581136ms]

PCI devices with label's

root@ghicken-test [ /sys/bus/pci/devices ]# ls -l */*/label
-r--r--r-- 1 root root 4096 Oct 19 16:45 0000:00:15.0/0000:03:00.0/label
-r--r--r-- 1 root root 4096 Oct 19 16:36 0000:00:15.1/0000:04:00.0/label
-r--r--r-- 1 root root 4096 Oct 19 16:36 0000:00:16.0/0000:0b:00.0/label
-r--r--r-- 1 root root 4096 Oct 19 16:36 0000:00:17.0/0000:13:00.0/label
-r--r--r-- 1 root root 4096 Oct 19 16:36 0000:00:18.0/0000:1b:00.0/label
root@ghicken-test [ /sys/bus/pci/devices ]# cat 0000:00:15.0/0000:03:00.0/label
SCSI0
root@ghicken-test [ /sys/bus/pci/devices ]# cat 0000:00:15.1/0000:04:00.0/label
Ethernet3
root@ghicken-test [ /sys/bus/pci/devices ]# cat 0000:00:16.0/0000:0b:00.0/label
Ethernet0
root@ghicken-test [ /sys/bus/pci/devices ]# cat 0000:00:17.0/0000:13:00.0/label
Ethernet1
root@ghicken-test [ /sys/bus/pci/devices ]# cat 0000:00:18.0/0000:1b:00.0/label
Ethernet2
root@ghicken-test [ /sys/bus/pci/devices ]#

slot data for ethernet vNICs from vmx:

[root@office1-sfo2-dhcp51:/vmfs/volumes/56a2eeef-3cafae6e-2bd3-000c29b00df4/ghicken-test_2] cat *.vmx | grep pciSlotNumber
ethernet0.pciSlotNumber = "192"
ethernet1.pciSlotNumber = "224"
ethernet2.pciSlotNumber = "256"
ethernet3.pciSlotNumber = "1184"
@hickeng
Copy link
Member Author

hickeng commented Oct 19, 2016

This is easy to recreate on a local ESX env - simply create enough vSwitches that each network can be assigned it's own vNIC. Deploying with --debug=1 inhibits the rebooting, but to get hold of the logs requires modifying isos/appliance-staging.sh to comment out disabling of the root user.

@hmahmood
Copy link
Contributor

Related: #1674

@hickeng
Copy link
Member Author

hickeng commented Oct 19, 2016

Correcting the slot number in the configuration doesn't help:

time="2016-10-19T16:36:47Z" level=error msg="failed to apply network endpoint config: unable to acquire reference to link 288: 0 eth interfaces match /sys/bus/pci/devices/0000:00:15.0/0000:*:00.1/net/* ([])"

It should be:

/sys/bus/pci/devices/0000:00:15.1/0000:*:00.0/net/*

So it looks like the slot to pci logic is flawed as well.

@mdubya66 mdubya66 added kind/defect Behavior that is inconsistent with what's intended and removed kind/bug/p0 labels Oct 19, 2016
@mdubya66
Copy link
Contributor

If the fix is simple we should fix it, but if not then can we document our way out of this?

@mdubya66 mdubya66 added this to the VIC GA Release milestone Oct 19, 2016
@mdubya66 mdubya66 added the impact/test/integration Requires creation of or changes to an integration test label Oct 19, 2016
@mdubya66
Copy link
Contributor

@karthik-narayan we might need inputs on this

@hickeng
Copy link
Member Author

hickeng commented Oct 24, 2016

closing as dup of #1674

@hickeng
Copy link
Member Author

hickeng commented Oct 24, 2016

Reopening for actual fix

@mdubya66 mdubya66 removed this from the VIC GA Release milestone Oct 31, 2016
@mdubya66
Copy link
Contributor

For 0.8.0 we will limit the number of nics to 3. This will need to get fixed before we ship a 1.0.

@mlh78750
Copy link
Contributor

When this is fixed, we must also fix #3125.

@mdubya66 mdubya66 added the impact/doc/note Requires creation of or changes to an official release note label Mar 6, 2017
@stuclem
Copy link
Contributor

stuclem commented Mar 7, 2017

Copied the note that appears numerous times in the core doc into the release notes for 0.9:


  • Deployment fails if you configure a VCH to use 4 NICs. #2802
    A VCH supports a maximum of 3 distinct network interfaces. The bridge network requires its own port group, at least two of the public, client, and management networks must share a network interface and therefore a port group. Container networks do not go through the VCH, so they are not subject to this limitation. This limitation will be removed in a future release.

@hickeng and @mlh78750 does this cover it?

@mlh78750
Copy link
Contributor

mlh78750 commented Mar 9, 2017

@stuclem looks good.

@stuclem
Copy link
Contributor

stuclem commented Mar 13, 2017

Thanks @mlh78750

@stuclem stuclem removed the impact/doc/note Requires creation of or changes to an official release note label Mar 13, 2017
@mdubya66 mdubya66 added priority/p2 kind/debt Problems that increase the cost of other work and removed priority/p4 labels Jul 18, 2017
@hickeng hickeng added the area/vsphere Intergration and interoperation with vSphere label Jul 18, 2017
@hickeng
Copy link
Member Author

hickeng commented Jul 18, 2017

Debt label because this derives from the use of pcislot identifiers which are not predictable past 3 NICs. We should be using the /label content - should also have a vsphere enhancement to allow specification of label so as not to rely on config spec ordering.

@cgtexmex cgtexmex added this to the Sprint 16 Foundation milestone Aug 30, 2017
caglar10ur added a commit that referenced this issue Sep 6, 2017
We were generating 192, 224, 256 and 288 as the PCI slot number for a 4 NIC system.

The problem was 288 was behind pciBridge8 which we don't have in our VM.
We are now starting from 1184 (1184, 1216, 1248, and 1280...) to get the persistent
PCI slot mapping between hypervisor and guest.

Fixes #2802
AngieCris pushed a commit to AngieCris/vic that referenced this issue Nov 20, 2017
We were generating 192, 224, 256 and 288 as the PCI slot number for a 4 NIC system.

The problem was 288 was behind pciBridge8 which we don't have in our VM.
We are now starting from 1184 (1184, 1216, 1248, and 1280...) to get the persistent
PCI slot mapping between hypervisor and guest.

Fixes vmware#2802
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vsphere Intergration and interoperation with vSphere impact/test/integration Requires creation of or changes to an integration test kind/debt Problems that increase the cost of other work kind/defect Behavior that is inconsistent with what's intended priority/p2 source/customer Reported by a customer, directly or via an intermediary team/foundation
Projects
None yet
Development

No branches or pull requests

8 participants