Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only create Connection object for active TChannel connections #587

Merged
merged 2 commits into from
Feb 24, 2017

Conversation

prashantv
Copy link
Contributor

Currently, we create a Connection object as soon as we have a TCP
connection, and then we try to add more information (such as remote peer
information) to the logging context after the init handshake. This can
lead to data races since we're modifying the logging context while
other goroutines may be logging messages with the same context.

Instead of using locks for the logging context, instead separate the
handshake phase of connections out of Connection.

Fixes #584

Copy link
Contributor

@alxn alxn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashantv spent over 1 hour on this, can't find any issues so far.

Copy link
Contributor

@billf billf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've read through this three times and don't have any feedback. canaries & crossdock should catch any interop issues.

Copy link
Contributor

@akshayjshah akshayjshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only blocking comment is about removing the exported error. The remainder are nits and questions.

What's our rollout strategy for this change? I'd prefer to deploy it somewhere before we include it in a new release...

@@ -100,19 +96,11 @@ var (
// on a closed connection
ErrConnectionClosed = errors.New("connection is closed")

// ErrConnectionNotReady is returned when a caller attempts to send a
// request through a connection which has not yet been initialized
ErrConnectionNotReady = errors.New("connection is not yet ready")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this is technically a breaking change. Can we keep it and just document that it's now unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point.

opts := ch.connectionOptions.withDefaults()

connID := _nextConnID.Inc()
log := ch.log.WithFields(LogFields{
{"connID", connID},
{"localAddr", conn.LocalAddr()},
{"remoteAddr", conn.RemoteAddr()},
{"remoteHostPort", remotePeer.HostPort},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name we can choose for this field? To me, remoteHostPort isn't obviously different from remoteAddr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could come up with a different name, but this is just a refactoring change, and this was the name of the field previously, so I'd like to avoid any renamings:
https://github.com/uber/tchannel-go/blob/dev/connection.go#L1034

@@ -191,12 +191,12 @@ func TestPeerRemoveClosedConnection(t *testing.T) {

func TestPeerConnectCancelled(t *testing.T) {
WithVerifiedServer(t, nil, func(ch *Channel, hostPort string) {
ctx, cancel := NewContext(109 * time.Millisecond)
ctx, cancel := NewContext(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the original 109ms timeout was a typo...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I'm assuming too, I'm pretty sure the test didn't need that extra 9ms!

cancel()

_, err := ch.Connect(ctx, "10.255.255.1:1")
require.Error(t, err, "Connect should fail")
assert.EqualError(t, ErrRequestCancelled, err.Error(), "Unknown error")
assert.EqualError(t, err, ErrRequestCancelled.Error(), "Unexpected error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( I hate this package sometimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, consistency is too hard :(

peer_test.go Outdated
assert.EqualValues(t, 5, count.Load(),
"Expect 5 exchange updates: peer add, init mex, new conn, ping, pong")
assert.EqualValues(t, 4, count.Load(),
"Expect 5 exchange updates: peer add, new conn, ping, pong")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Update error message to 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

func (ch *Channel) initError(c net.Conn, connDir connectionDirection, id uint32, err error) error {
ch.log.WithFields(
LogField{"connectionDirection", connDir},
LogField{"error", err.Error()},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrField?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally forgot that existed, thanks!

func setInitDeadline(ctx context.Context, c net.Conn) func() {
deadline, ok := ctx.Deadline()
if !ok {
deadline = time.Now().Add(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a very, very long time...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :( Unfortunately tests start failing with 1 second, and I think the previous code didn't even have a timeout

deadline = time.Now().Add(5 * time.Second)
}

c.SetDeadline(deadline)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always need to do this, or only if we're in the !ok block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a deadline in the context, we use that, otherwise we use 5 seconds as a default timeout for inits.

We always want to set a deadline on the connection though to avoid connection exchanges that never end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad - I mixed up the context and the connection.

Currently, we create a Connection object as soon as we have a TCP
connection, and then we try to add more information (such as remote peer
information) to the logging context after the init handshake. This can
lead to data races since we're modifying the logging context while
other goroutines may be logging messages with the same context.

Instead of using locks for the logging context, instead separate the
handshake phase of connections out of Connection.

Fixes #584
Copy link
Contributor Author

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough reviews everyone.

In terms of rollout strategy, we can cut a canary of our internal proxy with this change (against dev), as well as our test services, and if things look good with those, we can cut the release.

@@ -100,19 +96,11 @@ var (
// on a closed connection
ErrConnectionClosed = errors.New("connection is closed")

// ErrConnectionNotReady is returned when a caller attempts to send a
// request through a connection which has not yet been initialized
ErrConnectionNotReady = errors.New("connection is not yet ready")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point.

opts := ch.connectionOptions.withDefaults()

connID := _nextConnID.Inc()
log := ch.log.WithFields(LogFields{
{"connID", connID},
{"localAddr", conn.LocalAddr()},
{"remoteAddr", conn.RemoteAddr()},
{"remoteHostPort", remotePeer.HostPort},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could come up with a different name, but this is just a refactoring change, and this was the name of the field previously, so I'd like to avoid any renamings:
https://github.com/uber/tchannel-go/blob/dev/connection.go#L1034

@@ -191,12 +191,12 @@ func TestPeerRemoveClosedConnection(t *testing.T) {

func TestPeerConnectCancelled(t *testing.T) {
WithVerifiedServer(t, nil, func(ch *Channel, hostPort string) {
ctx, cancel := NewContext(109 * time.Millisecond)
ctx, cancel := NewContext(100 * time.Millisecond)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I'm assuming too, I'm pretty sure the test didn't need that extra 9ms!

cancel()

_, err := ch.Connect(ctx, "10.255.255.1:1")
require.Error(t, err, "Connect should fail")
assert.EqualError(t, ErrRequestCancelled, err.Error(), "Unknown error")
assert.EqualError(t, err, ErrRequestCancelled.Error(), "Unexpected error")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, consistency is too hard :(

peer_test.go Outdated
assert.EqualValues(t, 5, count.Load(),
"Expect 5 exchange updates: peer add, init mex, new conn, ping, pong")
assert.EqualValues(t, 4, count.Load(),
"Expect 5 exchange updates: peer add, new conn, ping, pong")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

func (ch *Channel) initError(c net.Conn, connDir connectionDirection, id uint32, err error) error {
ch.log.WithFields(
LogField{"connectionDirection", connDir},
LogField{"error", err.Error()},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally forgot that existed, thanks!

func setInitDeadline(ctx context.Context, c net.Conn) func() {
deadline, ok := ctx.Deadline()
if !ok {
deadline = time.Now().Add(5 * time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :( Unfortunately tests start failing with 1 second, and I think the previous code didn't even have a timeout

deadline = time.Now().Add(5 * time.Second)
}

c.SetDeadline(deadline)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a deadline in the context, we use that, otherwise we use 5 seconds as a default timeout for inits.

We always want to set a deadline on the connection though to avoid connection exchanges that never end

deadline = time.Now().Add(5 * time.Second)
}

c.SetDeadline(deadline)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad - I mixed up the context and the connection.

@prashantv prashantv merged commit 3abb4c0 into dev Feb 24, 2017
@prashantv prashantv deleted the pre_conn branch February 24, 2017 23:28
@alxn
Copy link
Contributor

alxn commented Feb 24, 2017

👍

prashantv added a commit that referenced this pull request Feb 27, 2017
As part of #587, we no longer use the connection's `nextMessageID` field
for init req messages, and so a new outbound connection would reuse id 1
for the first ping/call.
prashantv added a commit that referenced this pull request Feb 28, 2017
As part of #587, we no longer use the connection's `nextMessageID` field
for init req messages, and so a new outbound connection would reuse id 1
for the first ping/call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants