Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request accepting grace period delaying graceful shutdown. #1971

Merged
7 changes: 6 additions & 1 deletion cmd/traefik/configuration.go
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,11 @@ func NewTraefikDefaultPointersConfiguration() *TraefikConfiguration {
DialTimeout: flaeg.Duration(configuration.DefaultDialTimeout),
}

// default LifeCycle
defaultLifeycle := configuration.LifeCycle{
GraceTimeOut: flaeg.Duration(configuration.DefaultGraceTimeout),
}

defaultConfiguration := configuration.GlobalConfiguration{
Docker: &defaultDocker,
File: &defaultFile,
Expand All @@ -202,6 +207,7 @@ func NewTraefikDefaultPointersConfiguration() *TraefikConfiguration {
ForwardingTimeouts: &forwardingTimeouts,
TraefikLog: &defaultTraefikLog,
AccessLog: &defaultAccessLog,
LifeCycle: &defaultLifeycle,
}

return &TraefikConfiguration{
Expand All @@ -213,7 +219,6 @@ func NewTraefikDefaultPointersConfiguration() *TraefikConfiguration {
func NewTraefikConfiguration() *TraefikConfiguration {
return &TraefikConfiguration{
GlobalConfiguration: configuration.GlobalConfiguration{
GraceTimeOut: flaeg.Duration(10 * time.Second),
AccessLogsFile: "",
TraefikLogsFile: "",
LogLevel: "ERROR",
Expand Down
32 changes: 1 addition & 31 deletions cmd/traefik/traefik.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ import (
"github.com/containous/traefik/log"
"github.com/containous/traefik/provider/ecs"
"github.com/containous/traefik/provider/kubernetes"
"github.com/containous/traefik/provider/rancher"
"github.com/containous/traefik/safe"
"github.com/containous/traefik/server"
"github.com/containous/traefik/types"
Expand Down Expand Up @@ -228,36 +227,7 @@ func run(globalConfiguration *configuration.GlobalConfiguration) {

http.DefaultTransport.(*http.Transport).Proxy = http.ProxyFromEnvironment

if len(globalConfiguration.EntryPoints) == 0 {
globalConfiguration.EntryPoints = map[string]*configuration.EntryPoint{"http": {Address: ":80"}}
globalConfiguration.DefaultEntryPoints = []string{"http"}
}

if globalConfiguration.Rancher != nil {
// Ensure backwards compatibility for now
if len(globalConfiguration.Rancher.AccessKey) > 0 ||
len(globalConfiguration.Rancher.Endpoint) > 0 ||
len(globalConfiguration.Rancher.SecretKey) > 0 {

if globalConfiguration.Rancher.API == nil {
globalConfiguration.Rancher.API = &rancher.APIConfiguration{
AccessKey: globalConfiguration.Rancher.AccessKey,
SecretKey: globalConfiguration.Rancher.SecretKey,
Endpoint: globalConfiguration.Rancher.Endpoint,
}
}
log.Warn("Deprecated configuration found: rancher.[accesskey|secretkey|endpoint]. " +
"Please use rancher.api.[accesskey|secretkey|endpoint] instead.")
}

if globalConfiguration.Rancher.Metadata != nil && len(globalConfiguration.Rancher.Metadata.Prefix) == 0 {
globalConfiguration.Rancher.Metadata.Prefix = "latest"
}
}

if globalConfiguration.Debug {
globalConfiguration.LogLevel = "DEBUG"
}
globalConfiguration.SetEffectiveConfiguration()

// logging
level, err := logrus.ParseLevel(strings.ToLower(globalConfiguration.LogLevel))
Expand Down
61 changes: 60 additions & 1 deletion configuration/configuration.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (

"github.com/containous/flaeg"
"github.com/containous/traefik/acme"
"github.com/containous/traefik/log"
"github.com/containous/traefik/provider/boltdb"
"github.com/containous/traefik/provider/consul"
"github.com/containous/traefik/provider/docker"
Expand All @@ -37,12 +38,17 @@ const (

// DefaultIdleTimeout before closing an idle connection.
DefaultIdleTimeout = 180 * time.Second

// DefaultGraceTimeout controls how long Traefik serves pending requests
// prior to shutting down.
DefaultGraceTimeout = 10 * time.Second
)

// GlobalConfiguration holds global configuration (with providers, etc.).
// It's populated from the traefik configuration file passed as an argument to the binary.
type GlobalConfiguration struct {
GraceTimeOut flaeg.Duration `short:"g" description:"Duration to give active requests a chance to finish before Traefik stops"`
LifeCycle *LifeCycle `description:"Timeouts influencing the server life cycle"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect other settings than timeouts inside of the LifeCycle struct? If not do we want to rename it to LifeCycleTimeouts for consistency reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might, I don't know.

The likely more important point for me is that the other approach stutters: LifeCycleTimeouts.GraceTimeOut has one timeout too many, so it should either be removed from the struct name or the parameters IMHO. I decided for the former as that allows us to extend the struct in the future without the need for another name update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. SGTM.

GraceTimeOut flaeg.Duration `short:"g" description:"(Deprecated) Duration to give active requests a chance to finish before Traefik stops"` // Deprecated
Debug bool `short:"d" description:"Enable debug mode"`
CheckNewVersion bool `description:"Periodically check if a new version has been released"`
AccessLogsFile string `description:"(Deprecated) Access logs file"` // Deprecated
Expand Down Expand Up @@ -81,6 +87,52 @@ type GlobalConfiguration struct {
DynamoDB *dynamodb.Provider `description:"Enable DynamoDB backend with default settings"`
}

// SetEffectiveConfiguration adds missing configuration parameters derived from
// existing ones. It also takes care of maintaining backwards compatibility.
func (gc *GlobalConfiguration) SetEffectiveConfiguration() {
if len(gc.EntryPoints) == 0 {
gc.EntryPoints = map[string]*EntryPoint{"http": {Address: ":80"}}
gc.DefaultEntryPoints = []string{"http"}
}

// Make sure LifeCycle isn't nil to spare nil checks elsewhere.
if gc.LifeCycle == nil {
gc.LifeCycle = &LifeCycle{}
}

// Prefer legacy grace timeout parameter for backwards compatibility reasons.
if gc.GraceTimeOut > 0 {
log.Warn("top-level grace period configuration has been deprecated -- please use lifecycle grace period")
gc.LifeCycle.GraceTimeOut = gc.GraceTimeOut
}

if gc.Rancher != nil {
// Ensure backwards compatibility for now
if len(gc.Rancher.AccessKey) > 0 ||
len(gc.Rancher.Endpoint) > 0 ||
len(gc.Rancher.SecretKey) > 0 {

if gc.Rancher.API == nil {
gc.Rancher.API = &rancher.APIConfiguration{
AccessKey: gc.Rancher.AccessKey,
SecretKey: gc.Rancher.SecretKey,
Endpoint: gc.Rancher.Endpoint,
}
}
log.Warn("Deprecated configuration found: rancher.[accesskey|secretkey|endpoint]. " +
"Please use rancher.api.[accesskey|secretkey|endpoint] instead.")
}

if gc.Rancher.Metadata != nil && len(gc.Rancher.Metadata.Prefix) == 0 {
gc.Rancher.Metadata.Prefix = "latest"
}
}

if gc.Debug {
gc.LogLevel = "DEBUG"
}
}

// DefaultEntryPoints holds default entry points
type DefaultEntryPoints []string

Expand Down Expand Up @@ -446,3 +498,10 @@ type ForwardingTimeouts struct {
DialTimeout flaeg.Duration `description:"The amount of time to wait until a connection to a backend server can be established. Defaults to 30 seconds. If zero, no timeout exists"`
ResponseHeaderTimeout flaeg.Duration `description:"The amount of time to wait for a server's response headers after fully writing the request (including its body, if any). If zero, no timeout exists"`
}

// LifeCycle contains configurations relevant to the lifecycle (such as the
// shutdown phase) of Traefik.
type LifeCycle struct {
RequestAcceptGraceTimeout flaeg.Duration `description:"Duration to keep accepting requests before Traefik initiates the graceful shutdown procedure"`
GraceTimeOut flaeg.Duration `description:"Duration to give active requests a chance to finish before Traefik stops"`
}
50 changes: 50 additions & 0 deletions configuration/configuration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ package configuration
import (
"fmt"
"testing"
"time"

"github.com/containous/flaeg"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
Expand Down Expand Up @@ -199,3 +201,51 @@ func TestEntryPoints_Set(t *testing.T) {
})
}
}

func TestSetEffecticeConfiguration(t *testing.T) {
tests := []struct {
desc string
legacyGraceTimeout time.Duration
lifeCycleGraceTimeout time.Duration
wantGraceTimeout time.Duration
}{
{
desc: "legacy grace timeout given only",
legacyGraceTimeout: 5 * time.Second,
wantGraceTimeout: 5 * time.Second,
},
{
desc: "legacy and life cycle grace timeouts given",
legacyGraceTimeout: 5 * time.Second,
lifeCycleGraceTimeout: 12 * time.Second,
wantGraceTimeout: 5 * time.Second,
},
{
desc: "legacy grace timeout omitted",
legacyGraceTimeout: 0,
lifeCycleGraceTimeout: 12 * time.Second,
wantGraceTimeout: 12 * time.Second,
},
}

for _, test := range tests {
test := test
t.Run(test.desc, func(t *testing.T) {
t.Parallel()
gc := &GlobalConfiguration{
GraceTimeOut: flaeg.Duration(test.legacyGraceTimeout),
}
if test.lifeCycleGraceTimeout > 0 {
gc.LifeCycle = &LifeCycle{
GraceTimeOut: flaeg.Duration(test.lifeCycleGraceTimeout),
}
}

gc.SetEffectiveConfiguration()
gotGraceTimeout := time.Duration(gc.LifeCycle.GraceTimeOut)
if gotGraceTimeout != test.wantGraceTimeout {
t.Fatalf("got effective grace timeout %d, want %d", gotGraceTimeout, test.wantGraceTimeout)
}
})
}
}
42 changes: 40 additions & 2 deletions docs/configuration/commons.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,16 @@
## Main Section

```toml
# Duration to give active requests a chance to finish before Traefik stops.
# DEPRECATED - for general usage instruction see [lifeCycle.graceTimeOut].
#
# If both the deprecated option and the new one are given, the deprecated one
# takes precedence.
# A value of zero is equivalent to omitting the parameter, causing
# [lifeCycle.graceTimeOut] to be effective. Pass zero to the new option in
# order to disable the grace period.
#
# Optional
# Default: "10s"
# Default: "0s"
#
# graceTimeOut = "10s"

Expand Down Expand Up @@ -303,6 +309,38 @@ Given provider-specific support, the value may be overridden on a per-backend ba
Can be provided in a format supported by [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration) or as raw values (digits).
If no units are provided, the value is parsed assuming seconds.

## Life Cycle

Controls the behavior of Traefik during the shutdown phase.

```toml
[lifeCycle]

# Duration to keep accepting requests prior to initiating the graceful
# termination period (as defined by the `graceTimeOut` option). This
# option is meant to give downstream load-balancers sufficient time to
# take Traefik out of rotation.
# Can be provided in a format supported by [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration) or as raw values (digits).
# If no units are provided, the value is parsed assuming seconds.
# The zero duration disables the request accepting grace period, i.e.,
# Traefik will immediately proceed to the grace period.
#
# Optional
# Default: 0
#
# requestAcceptGraceTimeout = "10s"

# Duration to give active requests a chance to finish before Traefik stops.
# Can be provided in a format supported by [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration) or as raw values (digits).
# If no units are provided, the value is parsed assuming seconds.
# Note: in this time frame no new requests are accepted.
#
# Optional
# Default: "10s"
#
# graceTimeOut = "10s"
```

## Timeouts

### Responding Timeouts
Expand Down
61 changes: 61 additions & 0 deletions integration/basic_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ package integration
import (
"fmt"
"net/http"
"os"
"strings"
"syscall"
"time"

"github.com/containous/traefik/integration/try"
Expand Down Expand Up @@ -101,3 +103,62 @@ func (s *SimpleSuite) TestPrintHelp(c *check.C) {
})
c.Assert(err, checker.IsNil)
}

func (s *SimpleSuite) TestRequestAcceptGraceTimeout(c *check.C) {
s.createComposeProject(c, "reqacceptgrace")
s.composeProject.Start(c)

whoami := "http://" + s.composeProject.Container(c, "whoami").NetworkSettings.IPAddress + ":80"

file := s.adaptFile(c, "fixtures/reqacceptgrace.toml", struct {
Server string
}{whoami})
defer os.Remove(file)
cmd, display := s.traefikCmd(withConfigFile(file))
defer display(c)
err := cmd.Start()
c.Assert(err, checker.IsNil)
defer cmd.Process.Kill()

// Wait for Traefik to turn ready.
err = try.GetRequest("http://127.0.0.1:8000/", 2*time.Second, try.StatusCodeIs(http.StatusNotFound))
c.Assert(err, checker.IsNil)

// Make sure exposed service is ready.
err = try.GetRequest("http://127.0.0.1:8000/service", 3*time.Second, try.StatusCodeIs(http.StatusOK))
c.Assert(err, checker.IsNil)

// Send SIGTERM to Traefik.
proc, err := os.FindProcess(cmd.Process.Pid)
c.Assert(err, checker.IsNil)
err = proc.Signal(syscall.SIGTERM)
c.Assert(err, checker.IsNil)

// Give Traefik time to process the SIGTERM and send a request half-way
// into the request accepting grace period, by which requests should
// still get served.
time.Sleep(5 * time.Second)
resp, err := http.Get("http://127.0.0.1:8000/service")
c.Assert(err, checker.IsNil)
defer resp.Body.Close()
c.Assert(resp.StatusCode, checker.Equals, http.StatusOK)

// Expect Traefik to shut down gracefully once the request accepting grace
// period has elapsed.
waitErr := make(chan error)
go func() {
waitErr <- cmd.Wait()
}()

select {
case err := <-waitErr:
c.Assert(err, checker.IsNil)
case <-time.After(10 * time.Second):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5s (sleep) + 10s (after) = 15s
reqAcceptGraceTimeOut (10s) + defaultGraceTimeout(10s) = "20s"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grace period shouldn't be used: We send the last request to the server at t =~ 5 when we are halfway into the request accept grace timeout. It should finish at t = 5 + delta, so at t = 10 no request should be left that needs to get served.

I left an extra comment to explain the chosen timeout.

// By now we are ~5 seconds out of the request accepting grace period
// (start + 5 seconds sleep prior to the mid-grace period request +
// 10 seconds timeout = 15 seconds > 10 seconds grace period).
// Something must have gone wrong if we still haven't terminated at
// this point.
c.Fatal("Traefik did not terminate in time")
}
}
22 changes: 22 additions & 0 deletions integration/fixtures/reqacceptgrace.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
defaultEntryPoints = ["http"]

logLevel = "DEBUG"

[entryPoints]
[entryPoints.http]
address = ":8000"

[lifeCycle]
requestAcceptGraceTimeout = "10s"

[file]
[backends]
[backends.backend]
[backends.backend.servers.server]
url = "{{.Server}}"

[frontends]
[frontends.frontend]
backend = "backend"
[frontends.frontend.routes.service]
rule = "Path:/service"
2 changes: 2 additions & 0 deletions integration/resources/compose/reqacceptgrace.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
whoami:
image: emilevauge/whoami