Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMware - Set static IP in Confconsole causes crash & stacktrace #1457

Closed
JedMeister opened this issue May 15, 2020 · 9 comments
Closed

VMware - Set static IP in Confconsole causes crash & stacktrace #1457

JedMeister opened this issue May 15, 2020 · 9 comments

Comments

@JedMeister
Copy link
Member

As reported by Brian in the forums when he sets a static IP within Confconfconsole (installed from ISo onto a VMware ESXi/vSphere v6.0 VM), he gets a crash and a stacktrace (rather than a static IP as expected). Here's a screenshot of the stack trace:

I have been unable to recreate the issue (tested in a KVM VM, a VirtualBox VM and a VMWare Player v15.5 VM). My investigations of the code were fruitless.

Could you please have a look @OnGle ?

@JedMeister JedMeister added this to the 16.1 milestone May 15, 2020
@JedMeister JedMeister changed the title VMware Set static IP in Confconsole causes crash & stacktrace VMware - Set static IP in Confconsole causes crash & stacktrace May 15, 2020
@OnGle
Copy link
Member

OnGle commented May 17, 2020

Yep no worries I'll have a geez now

@OnGle
Copy link
Member

OnGle commented May 17, 2020

From the stacktrace I see a couple places where an invalid None could popup. Possibly caused by getting net.addr, net.netmask or net.gateway in https://github.com/turnkeylinux/confconsole/blob/05a1c6e34e37662e2dc7ff7825e93d26de7e4e92/ifutil.py#L245

Essentially from my looking I believe:

If it's net.gateway then it's caused by a CalledProcessError occuring when calling route -n (https://github.com/turnkeylinux/turnkey-netinfo/blob/69bac1c2f07d9a969cf32560b29f669ab5067c4a/netinfo/__init__.py#L120)
If it's net.addr or net.netmask then it's caused by an IOError occuring from a fcntl.ioctl(...) call. (https://github.com/turnkeylinux/turnkey-netinfo/blob/69bac1c2f07d9a969cf32560b29f669ab5067c4a/netinfo/__init__.py#L97)

Eitherway both cases should probably be handled better by the time they get to confconsole.

I'll do some testing and see if I can reproduce the issue.

@OnGle
Copy link
Member

OnGle commented May 17, 2020

I was able to reproduce the issue by removing the default routing rules on a regular core iso:

route del default

Then opening confconsole and navigating too Advanced -> Networking -> Static IP


I believe it was net.gateway which triggered it but not by returning an error but by providing no routing rule for 0.0.0.0


I'm unsure specifically how we want to deal with this issue, but once we've decided on a solution I'd be happy to implement it.

@JedMeister
Copy link
Member Author

JedMeister commented May 18, 2020

Thanks for this awesome write up of your investigations.

So essentially there are 2 things we need to do here:

  • make confconsole more robust in managing these possibilities (i.e. give some useful error message rather than crash with a stacktrace); and
  • work out why the underlaying issue occurs and how to fix it.

To reiterate what we discussed elsewhere, IMO resolving (or even attempting to resolve) the underlying issue is outside the scope of Confconsole IMO.

So if you want to have a look at improving Confconsole's robustness and provide some more useful error message(s), I'll have a closer look at how this situation may have arisen, and what a user might do to resolve it...

@OnGle
Copy link
Member

OnGle commented May 18, 2020

Increasing confconsole's "robustness"; pending additional testing

PRs for after testing if all goes well:

@JedMeister
Copy link
Member Author

Ok, So I'm not sure if this was the exact same scenario as the OP experienced, but I've managed to reproduce the issue in VMware Player (v15.5) by setting the network adaptor to "host only" (in my tests, both "NAT" and "Bridged" worked as expected).

Note that in my tests, I started with a v16.0 WordPress VM (installed from ISO). Then:

  • with "Bridged" networking:
    • got a DHCP IP by default [worked[
    • set a static IP [worked]
    • reset back to DHCP [worked]
  • Shut down VM, set networking to "NAT" and restarted
    • got new DHCP address [worked]
    • set static IP [worked]
    • reset back to DHCP [worked]
  • Shut down VM, set networking to "Host only" (see fig 1 below) and restarted
    • initially displayed old DHCP network info (from "NAT" config?)
    • selected "Networking" from Confconsole "Advanced" menu - got first error - see fig 2 below
    • clicked 'ok' and got dropped back to selection screen asking 'DHCP' or 'Static' (no networking info shown)
    • selecting 'DHCP' returns to above noted error - see fig 2 below
    • selecting 'Static' returns the OP's stacktrace! - See fig 3 below.
    • note that there is no route - see fig 4 below

fig 1 - VMware network config that causes issue:*
network_config

fig 2 - Initial Confconsole networking error with "Host only" networking:
confconsole-dhcp-error-msg

fig 3 - Trying to set a static IP (same stacktrace as OP):
confconsole-static-ip-stacktrace

fig 4 - No routes listed:
route-output

@JedMeister
Copy link
Member Author

JedMeister commented May 18, 2020

Thanks for these updates @OnGle - on face value they work great! 😄

However, it looks like there may actually be a deeper issue at play here... I say that because my initial testing of the updated packages found no errors (when really it should have thrown the new error).


[update] I wrote up a post about that, but have since moved it to a new issue - #1458

@OnGle
Copy link
Member

OnGle commented May 18, 2020

My testing indicated that using DHCP, changed the routing in a way that makes it not necessarily cause the same effects on the system each run. Before I would consider this tests significant I think we should check the routing after each run, and note exactly what's occured between each run.


Also I believe the issues in DHCP should be treated differently to the ones in Static IP config as they do not occur under the same conditions. DHCP working may fix routing for Static IP config or broken DHCP may break it further, in my opinion this is an entirely different and unrelated issue.

@JedMeister
Copy link
Member Author

This issue is closed by:
turnkeylinux/turnkey-netinfo#1 & turnkeylinux/confconsole#45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants