Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG-fetchall handler fatal error when unable to connect to netscaler #10

Open
mrnetops opened this issue Oct 22, 2020 · 6 comments
Open

Comments

@mrnetops
Copy link

mrnetops commented Oct 22, 2020

time="2020-10-22T05:00:31Z" level=fatal msg="timout connecting to 192.168.48.20" handler=fetchall route=loadbalancer user=foo

lbapi dies when this occurs and has to be restarted.

  • also spelling error: timEout ;)
@mrnetops
Copy link
Author

mrnetops commented Oct 22, 2020

I'm guessing it's o.Log.Fatal(err) in sdk_fork New

func New(conf *SdkConf) *SdkFork {
        ////////////////////////////////////////////////////////////////////////////
        var err error
        ////////////////////////////////////////////////////////////////////////////
        o := &SdkFork{
                Virtualserver: virtualserver.New(),
                Loadbalancer:  loadbalancer.New(),
        }
        ////////////////////////////////////////////////////////////////////////////
        o.Target = conf.Target
        o.Log = conf.Log
        ////////////////////////////////////////////////////////////////////////////
        err = o.setConnection()
        if err != nil {
                o.Log.Fatal(err)
        }
        return o
}

@mrnetops
Copy link
Author

Looks like we're peppered with a number of o.Log.Fatal(err) entries which largely appear to be stability timebombs that should be handled more gracefully.

./certificate/avi.go:			o.Log.Fatal(err)
./poolgroup/avi.go:			o.Log.Fatal(err)
./loadbalancer/netscaler_etl.go:			o.Log.Fatal(err)
./loadbalancer/netscaler_etl.go:		o.Log.Fatal(err)
./loadbalancer/netscaler.go:			o.Log.Fatal(err)
./loadbalancer/avi.go:			o.Log.Fatal(err)
./pool/avi.go:			o.Log.Fatal(err)
./sdkfork/sdk_fork.go:		o.Log.Fatal(err)
./virtualserver/avi.go:			o.Log.Fatal(err)
./monitor/netscaler.go:			o.Log.Fatal(err)
./monitor/netscaler.go:			o.Log.Fatal(err)
./monitor/avi.go:			o.Log.Fatal(err)
./persistence/avi.go:			o.Log.Fatal(err)

@CarlosOVillanueva
Copy link
Contributor

Log into the lbapi server and run docker logs -f <lbapi instance>.

More than likely, there is a netscaler instance that is no longer live, and the system is trying to connect to it. It will show it in the logs. If that is the case, remove the netscaler and it's HA members from lbapi.

@mrnetops
Copy link
Author

That's going to be my short term fix, but to be clear it is that's a short term fix to a long term problem where lbapi is vulnerable to dying from a variety of potentially transitory issues that need to be handled more gracefully.

Connection timeout should a warning and moving on, not killing the entire lbapi, as I imagine is the case for pretty much every o.Log.Fatal(err) entry outside of main.go

@CarlosOVillanueva
Copy link
Contributor

The reasoning for leaving it as is was to raise an alarm if lbapi was unable to talk to a load balancer. This would prevent a client from attempting to build a virtual service on that unit and force the support team to look into why a load balancer was timing out or not available - if that makes sense. But to your point, I completely agree that there are better ways to handle this and it should not be a total panic.

I'll look into having go recover automatically following the exception or even possibly setting the docker restart policy to reload lbapi, following the exception. In either case though, there has to be some mechanism to alert the support team that lbapi cannot talk to the destination resource.

@mrnetops
Copy link
Author

That's a fair point. Sounds a lot like needing a prometheus exporter + alertmanager ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants