Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort Kubernetes Ingress update if Kubernetes API call fails #1295

Merged

Conversation

regner
Copy link
Contributor

@regner regner commented Mar 14, 2017

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240

Tests are currently failing but wanted to get the PR up for a few reasons:

  1. Confirm this is how we want to solve this issue (abort the whole ingress update if a single API call fails)
  2. That there isn't a better way to do the tests
  3. I would like the tests to catch a specific error, but don't know what error to expect being raised from the API call

Fixes #1240

@timoreimann timoreimann self-requested a review March 14, 2017 20:35
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. IMHO that's the right approach. See my comment though on making an error/missing distinction when we get the endpoints.
  2. Generally the test approach is fine. I made a few suggestions though on how to improve the test implementation.
  3. Does my code comment answer this question?

apiEndpointsError: true,
}
provider = Kubernetes{}
_, err = provider.loadIngresses(client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we convert the test into a table-driven one using sub-tests. Among several benefits, this should improve readability and reduce the amount of test code needed.

_, err := provider.loadIngresses(client)

if err == nil {
t.Fatal("Should have raised error when loading services but did not.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be t.Error in my view.

@@ -1543,6 +1612,10 @@ func (c clientMock) GetIngresses(namespaces k8s.Namespaces) []*v1beta1.Ingress {
}

func (c clientMock) GetService(namespace, name string) (*v1.Service, bool, error) {
if c.apiServiceError {
return &v1.Service{}, true, c.err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally, if an error occurs in a function returning multiple values, the non-error parameters are set to their default values. Thus, I'd use false for the boolean return value.

Slightly related/unrelated: I don't know why we return true in line 1624 when we seemingly can't find the service.

@@ -1552,6 +1625,10 @@ func (c clientMock) GetService(namespace, name string) (*v1.Service, bool, error
}

func (c clientMock) GetEndpoints(namespace, name string) (*v1.Endpoints, bool, error) {
if c.apiEndpointsError {
return &v1.Endpoints{}, true, c.err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also false here.

ObjectMeta: v1.ObjectMeta{
Namespace: "testing",
Annotations: map[string]string{
"traefik.frontend.passHostHeader": "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this annotation for the tests at hand?

I'm always in favor of keeping test fixtures as minimized as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to move it to a separate test specifically for checking the !exists cases.

Spec: v1beta1.IngressSpec{
Rules: []v1beta1.IngressRule{
{
Host: "foo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

return nil, err
}

if !exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a test for this case. I suppose it's already covered by a pre-existing test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the does not exist part? No that was on my todo list, it is actually why the ingress part of the new test has sample data. Was thinking of moving it to a separate test though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate test sounds good. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for this and the !exists endpoints

@@ -196,10 +199,10 @@ func (provider *Kubernetes) loadIngresses(k8sClient k8s.Client) (*types.Configur
endpoints, exists, err := k8sClient.GetEndpoints(service.ObjectMeta.Namespace, service.ObjectMeta.Name)
if err != nil || !exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we not splitting up the two cases here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bad rebase is why actually. Specifically wrote some test code for it. Will fix,

type clientMock struct {
ingresses []*v1beta1.Ingress
services []*v1.Service
endpoints []*v1.Endpoints
watchChan chan interface{}

err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of this variable by changing the type of api{Service,Endpoints}Error to error and returning the error if it's non-nil in the stubbed getter methods.

_, err := provider.loadIngresses(client)

if err == nil {
t.Fatal("Should have raised error when loading services but did not.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just checking for the error being nil, what I think we can do here is pass a custom error variable to the mock (e.g., errors.New("failed")) and check against this very error. It shouldn't matter what error value we inject since our code doesn't depend on any specific error -- we just want to verify that we return the error to the caller that the (mocked) API gives us.
Similar for the endpoint case.

I hope this addresses your third question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work great. Will make the change.

@regner
Copy link
Contributor Author

regner commented Mar 15, 2017

Tests SHOULD be passing now. Still need to move the tests over to using the table driven test method, need to add tests for the !exists code paths, and am not sure how we want to handle when things don't exist. For example when a service doesn't exist do we still add the frontend? The two cases of this right now are when services or endpoints don't exist. Both cases can easily be someone putting a typo in the service name or port name.

@timoreimann
Copy link
Contributor

AFAICS, the only alternative we have to skipping frontends with missing services/endpoints is to revert the entire configuration (i.e., apply the error case resolution). IMHO, we shouldn't do that since such misses are usually due to a user mistake (as you said) and would consequently "punish" all other frontends that did everything correct.
There's also the case where a service or endpoint isn't available because the k8s reconciliation loops haven't completed yet. That's another situation where skipping and hoping for the next watch API event to trigger a benevolent update seems like the better option to me.

I believe the focus of this PR should be to fix #1240, which is to protect against the Kube API temporarily being unavailable. By handling that case gracefully (through reverting to the previous configuration) and maintaining the "missing" behavior as-is, I think we have satisfied the goal at hand. Potential improvements to the missing case, especially if the direction isn't super clear, can be handled by follow-up issues/PRs.

Just my 2 cents. :)

@regner
Copy link
Contributor Author

regner commented Mar 15, 2017

Was reviewing this PR this morning and noticed that the condition in my new test to check for failure was actually wrong and would be passing even if it shouldn't.

Fixed that and added test data back in. Without the test data the loadIngress logic doesn't get far enough along to trigger the errors we want to test for. So added the test data back in.

Very sorry about that! The logic in loadIngress was fine and I didn't have to make any changes there. It was just the tests not actually doing anything useful.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment.

Also, anxiously waiting for that table-driven version. :-)

provider := Kubernetes{}
_, err := provider.loadIngresses(client)

if err == nil || err.Error() != "failed service call" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify the assertion by comparing error values:

if err != client.apiServiceError {
  ...
}

Once you turn the test into a table-driven one, define and inject an error variable var apiErr = errors.New("error") and compare against that in each test case.

provider = Kubernetes{}
_, err = provider.loadIngresses(client)

if err == nil || err.Error() != "failed endpoints call" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern as above.


endpoints := []*v1.Endpoints{}
watchChan := make(chan interface{})
apiErr := errors.New("failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but I would use something more random than failed here.

emilevauge
emilevauge previously approved these changes Mar 15, 2017
Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @regner :)
Could you squash your commits?
Apart from minor comments, LGTM.

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: traefik#1240

Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.
@regner regner force-pushed the 1240-skip-kubernetes-update-if-api-fails branch from c96f69f to fab3cb3 Compare March 16, 2017 00:29
@regner
Copy link
Contributor Author

regner commented Mar 16, 2017

Commits all squashed and that final test case added.

@timoreimann be aware, as this isn't obvious with the squash, that I renamed the test from TestServiceErrors to TestKubeAPIErrors as it isn't just related to services but endpoints as well.

I also changed apiErr := errors.New("failed") to apiErr := errors.New("failed kube api call") as per the comment from @emilevauge.

Other then that I only added the TestMissingResources test case and nothing else should have changed.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one tiny test coding bug and filed two suggestions. :-)

}

for _, tc := range testCases {
t.Run(tc.desc, func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to capture the loop variable by an assignment tc := tc since we run the tests in parallel (as documented). Otherwise, chances are we will be testing a single case only.

},
},
{
Host: "bar",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Could we change the host name to something like "host_with_missing_service" or place an according comment to indicate that the purpose of this Ingress resource test-wise is to not be found because of that reason?

It takes a bit of back of forth scrolling to understand the purpose of the individual objects this test is setting up. I think purposeful names / comments could alleviate that.

},
},
{
Host: "tar",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, something that indicates this Ingress' Service's Endpoints are missing?

@timoreimann
Copy link
Contributor

The new TestMissingResources test is still failing according to CI.

@emilevauge emilevauge dismissed their stale review March 16, 2017 13:54

Tests are failing

@regner
Copy link
Contributor Author

regner commented Mar 16, 2017

My most recent commit where I add a blank Servers attribute the "missing_service" expected output is required to pass the diff. @timoreimann helped me figure this out, but essentially without it we end up comparing a nil to a blank Servers definition.

-          (string) (len=15) "missing_service": (*types.Backend)(0xc4201b0450)({
-           Servers: (map[string]types.Server) <nil>,
+          (string) (len=15) "missing_service": (*types.Backend)(0xc4201b0270)({
+           Servers: (map[string]types.Server) {
+           },

One of the unfortunate parts of that is the normal logging we do for the failed comparison doesn't show this difference. The above output was retrieved when @timoreimann made the following change:

if !reflect.DeepEqual(actual, expected) {
    t.Fatalf("expected %+v, got %+v", string(expectedJSON), string(actualJSON))
    t.Fatalf("expected\n%v\ngot\n\n%v", spew.Sdump(expected), spew.Sdump(actual))
}

Something to keep in mind in the future.

Regner Blok-Andersen added 3 commits March 16, 2017 13:49
… test to help ensure we have useful output incase of failures
…verride the return value for if the endpoints exist.

After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- we did it, @regner. :-)

Major thanks for bearing with me for so long and helping to discover a few issues with the existing tests. This should prove valuable in improving the correcting the tests in the future, which is vital for our code base to stay healthy.

Again: Great job! 🎉

@timoreimann
Copy link
Contributor

timoreimann commented Mar 17, 2017

@regner I pushed a small commit that includes a link to #1307.

The tests went green before, so I think that we can squash and merge as soon as we have @emilevauge's approval. 🎉

@regner
Copy link
Contributor Author

regner commented Mar 17, 2017

Good call on adding the link

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks a lot @regner & @timoreimann :)

@emilevauge emilevauge merged commit 75d92c6 into traefik:v1.2 Mar 17, 2017
emilevauge pushed a commit that referenced this pull request Apr 4, 2017
* Abort Kubernetes Ingress update if Kubernetes API call fails

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240

Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.

* Specifically capturing the tc range as documented here: https://blog.golang.org/subtests

* Updating service names in the mock data to be more clear

* Updated expected data to match what currently happens in the loadIngress

* Adding a blank Servers to the expected output so we compare against that instead of nil.

* Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures

* Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist.

After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.

* Adding quick TODO line about removing the properExists property

* Link to issue 1307 re: properExists flag.
emilevauge pushed a commit that referenced this pull request Apr 4, 2017
* Abort Kubernetes Ingress update if Kubernetes API call fails

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240

Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.

* Specifically capturing the tc range as documented here: https://blog.golang.org/subtests

* Updating service names in the mock data to be more clear

* Updated expected data to match what currently happens in the loadIngress

* Adding a blank Servers to the expected output so we compare against that instead of nil.

* Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures

* Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist.

After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.

* Adding quick TODO line about removing the properExists property

* Link to issue 1307 re: properExists flag.
emilevauge pushed a commit that referenced this pull request Apr 4, 2017
* Abort Kubernetes Ingress update if Kubernetes API call fails

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240

Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.

* Specifically capturing the tc range as documented here: https://blog.golang.org/subtests

* Updating service names in the mock data to be more clear

* Updated expected data to match what currently happens in the loadIngress

* Adding a blank Servers to the expected output so we compare against that instead of nil.

* Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures

* Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist.

After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.

* Adding quick TODO line about removing the properExists property

* Link to issue 1307 re: properExists flag.
emilevauge pushed a commit that referenced this pull request Apr 11, 2017
* Abort Kubernetes Ingress update if Kubernetes API call fails

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240

Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.

* Specifically capturing the tc range as documented here: https://blog.golang.org/subtests

* Updating service names in the mock data to be more clear

* Updated expected data to match what currently happens in the loadIngress

* Adding a blank Servers to the expected output so we compare against that instead of nil.

* Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures

* Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist.

After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.

* Adding quick TODO line about removing the properExists property

* Link to issue 1307 re: properExists flag.
emilevauge pushed a commit that referenced this pull request Apr 11, 2017
* Abort Kubernetes Ingress update if Kubernetes API call fails

Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240

Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.

* Specifically capturing the tc range as documented here: https://blog.golang.org/subtests

* Updating service names in the mock data to be more clear

* Updated expected data to match what currently happens in the loadIngress

* Adding a blank Servers to the expected output so we compare against that instead of nil.

* Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures

* Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist.

After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.

* Adding quick TODO line about removing the properExists property

* Link to issue 1307 re: properExists flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants