-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only create Connection object for active TChannel connections #587
Conversation
e8cfc7f
to
a16b9cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashantv spent over 1 hour on this, can't find any issues so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've read through this three times and don't have any feedback. canaries & crossdock should catch any interop issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only blocking comment is about removing the exported error. The remainder are nits and questions.
What's our rollout strategy for this change? I'd prefer to deploy it somewhere before we include it in a new release...
@@ -100,19 +96,11 @@ var ( | |||
// on a closed connection | |||
ErrConnectionClosed = errors.New("connection is closed") | |||
|
|||
// ErrConnectionNotReady is returned when a caller attempts to send a | |||
// request through a connection which has not yet been initialized | |||
ErrConnectionNotReady = errors.New("connection is not yet ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this is technically a breaking change. Can we keep it and just document that it's now unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point.
opts := ch.connectionOptions.withDefaults() | ||
|
||
connID := _nextConnID.Inc() | ||
log := ch.log.WithFields(LogFields{ | ||
{"connID", connID}, | ||
{"localAddr", conn.LocalAddr()}, | ||
{"remoteAddr", conn.RemoteAddr()}, | ||
{"remoteHostPort", remotePeer.HostPort}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better name we can choose for this field? To me, remoteHostPort
isn't obviously different from remoteAddr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could come up with a different name, but this is just a refactoring change, and this was the name of the field previously, so I'd like to avoid any renamings:
https://github.com/uber/tchannel-go/blob/dev/connection.go#L1034
@@ -191,12 +191,12 @@ func TestPeerRemoveClosedConnection(t *testing.T) { | |||
|
|||
func TestPeerConnectCancelled(t *testing.T) { | |||
WithVerifiedServer(t, nil, func(ch *Channel, hostPort string) { | |||
ctx, cancel := NewContext(109 * time.Millisecond) | |||
ctx, cancel := NewContext(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the original 109ms timeout was a typo...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm assuming too, I'm pretty sure the test didn't need that extra 9ms!
cancel() | ||
|
||
_, err := ch.Connect(ctx, "10.255.255.1:1") | ||
require.Error(t, err, "Connect should fail") | ||
assert.EqualError(t, ErrRequestCancelled, err.Error(), "Unknown error") | ||
assert.EqualError(t, err, ErrRequestCancelled.Error(), "Unexpected error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( I hate this package sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, consistency is too hard :(
peer_test.go
Outdated
assert.EqualValues(t, 5, count.Load(), | ||
"Expect 5 exchange updates: peer add, init mex, new conn, ping, pong") | ||
assert.EqualValues(t, 4, count.Load(), | ||
"Expect 5 exchange updates: peer add, new conn, ping, pong") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Update error message to 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
preinit_connection.go
Outdated
func (ch *Channel) initError(c net.Conn, connDir connectionDirection, id uint32, err error) error { | ||
ch.log.WithFields( | ||
LogField{"connectionDirection", connDir}, | ||
LogField{"error", err.Error()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrField
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally forgot that existed, thanks!
func setInitDeadline(ctx context.Context, c net.Conn) func() { | ||
deadline, ok := ctx.Deadline() | ||
if !ok { | ||
deadline = time.Now().Add(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very, very long time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah :( Unfortunately tests start failing with 1 second, and I think the previous code didn't even have a timeout
deadline = time.Now().Add(5 * time.Second) | ||
} | ||
|
||
c.SetDeadline(deadline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always need to do this, or only if we're in the !ok
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a deadline in the context, we use that, otherwise we use 5 seconds as a default timeout for inits.
We always want to set a deadline on the connection though to avoid connection exchanges that never end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad - I mixed up the context and the connection.
Currently, we create a Connection object as soon as we have a TCP connection, and then we try to add more information (such as remote peer information) to the logging context after the init handshake. This can lead to data races since we're modifying the logging context while other goroutines may be logging messages with the same context. Instead of using locks for the logging context, instead separate the handshake phase of connections out of Connection. Fixes #584
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough reviews everyone.
In terms of rollout strategy, we can cut a canary of our internal proxy with this change (against dev
), as well as our test services, and if things look good with those, we can cut the release.
@@ -100,19 +96,11 @@ var ( | |||
// on a closed connection | |||
ErrConnectionClosed = errors.New("connection is closed") | |||
|
|||
// ErrConnectionNotReady is returned when a caller attempts to send a | |||
// request through a connection which has not yet been initialized | |||
ErrConnectionNotReady = errors.New("connection is not yet ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point.
opts := ch.connectionOptions.withDefaults() | ||
|
||
connID := _nextConnID.Inc() | ||
log := ch.log.WithFields(LogFields{ | ||
{"connID", connID}, | ||
{"localAddr", conn.LocalAddr()}, | ||
{"remoteAddr", conn.RemoteAddr()}, | ||
{"remoteHostPort", remotePeer.HostPort}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could come up with a different name, but this is just a refactoring change, and this was the name of the field previously, so I'd like to avoid any renamings:
https://github.com/uber/tchannel-go/blob/dev/connection.go#L1034
@@ -191,12 +191,12 @@ func TestPeerRemoveClosedConnection(t *testing.T) { | |||
|
|||
func TestPeerConnectCancelled(t *testing.T) { | |||
WithVerifiedServer(t, nil, func(ch *Channel, hostPort string) { | |||
ctx, cancel := NewContext(109 * time.Millisecond) | |||
ctx, cancel := NewContext(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm assuming too, I'm pretty sure the test didn't need that extra 9ms!
cancel() | ||
|
||
_, err := ch.Connect(ctx, "10.255.255.1:1") | ||
require.Error(t, err, "Connect should fail") | ||
assert.EqualError(t, ErrRequestCancelled, err.Error(), "Unknown error") | ||
assert.EqualError(t, err, ErrRequestCancelled.Error(), "Unexpected error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, consistency is too hard :(
peer_test.go
Outdated
assert.EqualValues(t, 5, count.Load(), | ||
"Expect 5 exchange updates: peer add, init mex, new conn, ping, pong") | ||
assert.EqualValues(t, 4, count.Load(), | ||
"Expect 5 exchange updates: peer add, new conn, ping, pong") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
preinit_connection.go
Outdated
func (ch *Channel) initError(c net.Conn, connDir connectionDirection, id uint32, err error) error { | ||
ch.log.WithFields( | ||
LogField{"connectionDirection", connDir}, | ||
LogField{"error", err.Error()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally forgot that existed, thanks!
func setInitDeadline(ctx context.Context, c net.Conn) func() { | ||
deadline, ok := ctx.Deadline() | ||
if !ok { | ||
deadline = time.Now().Add(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah :( Unfortunately tests start failing with 1 second, and I think the previous code didn't even have a timeout
deadline = time.Now().Add(5 * time.Second) | ||
} | ||
|
||
c.SetDeadline(deadline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a deadline in the context, we use that, otherwise we use 5 seconds as a default timeout for inits.
We always want to set a deadline on the connection though to avoid connection exchanges that never end
deadline = time.Now().Add(5 * time.Second) | ||
} | ||
|
||
c.SetDeadline(deadline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad - I mixed up the context and the connection.
👍 |
As part of #587, we no longer use the connection's `nextMessageID` field for init req messages, and so a new outbound connection would reuse id 1 for the first ping/call.
As part of #587, we no longer use the connection's `nextMessageID` field for init req messages, and so a new outbound connection would reuse id 1 for the first ping/call.
Currently, we create a Connection object as soon as we have a TCP
connection, and then we try to add more information (such as remote peer
information) to the logging context after the init handshake. This can
lead to data races since we're modifying the logging context while
other goroutines may be logging messages with the same context.
Instead of using locks for the logging context, instead separate the
handshake phase of connections out of Connection.
Fixes #584