Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique session ids #64

Merged
merged 3 commits into from
Nov 20, 2017
Merged

Unique session ids #64

merged 3 commits into from
Nov 20, 2017

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Nov 9, 2017

Quoting the main commit message:

    L2TPv3 session IDs have to be unique on the entire system, not just per tunnel.
    The only reason that tunneldigger got away with using 1 for all sessions is that
    older Linux kernels failed to properly check for duplicate session IDs.  That
    got fixed by kernel commit dbdbc73b44782e22b3b4b6e8b51e7a3d245f3086.
    
    This patch adds unique session IDs to tunneldigger in a backwards-compatible
    way.  If both ends of the tunnel agree to use a unique session ID, they both
    will use the tunnel ID as the session ID.  To manage this mutual agreement, we
    introduce space for up to 32 feature flags.  Three messages are affected:
    
    CONTROL_TYPE_USAGE optionally takes 4 additional bytes after the padding, used
    to send the client's feature flags.  Brokers may skew the usage they report back
    depending on client features.  Old clients will not send these bytes, which is
    interpreted as all flags being 0.
    
    CONTROL_TYPE_PREPARE is extended the same way, also sending the client's feature
    flags, so that the server does not have to remember them.  These are the
    features that the client is offering to be using for this session.  Old clients
    will not send any flags, which tells the server that no features are supported.
    Old servers will ignore flags if they received them.
    
    CONTROL_TYPE_TUNNEL is extended the same way, reporting the features that are
    actually going to be used for this session.  Typically, this will be the
    client's features masked by what the server supports.  If a feature-aware client
    talks to an old server, the flags are going to be missing, telling the client
    that no features are going to be used in this session.
    
    The first feature flag is 'unique session IDs'.  If enabled, the session ID of
    each side of the connection is going to be the same as the tunnel ID.
    Furthermore, the broker gains an option to report full usage for clients not
    supporting unique session IDs, making sure they connect to other servers (if any
    are available).

I think this is ready to be reviewed. However, before merging, I'd like to

  • roll out an experimental firmware showing that the client stuff really works, and
  • at least try to figure out why we still have two clients connecting to the Gateway that now should report usage 0xFFFF.

Fixes #57

@RalfJung
Copy link
Member Author

RalfJung commented Nov 9, 2017

Oh, another thing I'd like to do: Have the client fail in context_setup_tunnel if the server returns a flag we do not know. That's clearly a non-conforming server which we do not want to have anything to do with.

@mitar
Copy link
Member

mitar commented Nov 9, 2017

A side note: were there any changes on port usage? Can now multiple tunnels be on the same port? Maybe now that they fixed IDs, they can also support that? Or maybe we could send the patch upstream. Currently we have to use NAT ugliness to map different internal ports to same outside port, but there is nothing in the specs which would really require this. Kernel should differentiate between packets based on the ID, not based on the port.

If they fixed this, we could remove NAT and make everything much simpler and also support IPv6.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 9, 2017

I did not look into this at all. The old NAT hacks certainly still work.

Copy link
Member

@kostko kostko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! I've added some minor comments.

@@ -21,6 +21,9 @@ namespace=default
connection_rate_limit=10
; Set PMTU to a fixed value. Use 0 for automatic PMTU discovery.
pmtu=0
; Whether this server runs a new kernel (4.13 and various stable series; in particular, 4.9.36) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perform some autodetection to determine this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could -- but isn't that too fragile? I don't even have complete data which other kernels the "bad" patch was backported to.

Copy link
Member Author

@RalfJung RalfJung Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, one thing we could do is detect the EEXIST on l2tpv3 session creation, and treat that as a hint that we need unique sessions IDs. That would mean one client has a failed connection attempt, and old clients take 30s to re-try, but the next time around they would see usage 0xFFFF.

Would you prefer that? I think I would.

# Verify cookie value.
timestamp = msg_data[:2]
timestamp = msg_data[offset:offset+2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces around + operator.

signed_value = '%s%s%s' % (address[0], address[1], timestamp)
signature = hmac.HMAC(SECRET_KEY, signed_value, hashlib.sha1).digest()[:6]
timestamp = struct.unpack('!H', timestamp)[0]

if signature != msg_data[2:8] or abs(protocol_time() - timestamp) > 2:
# Reject message if more than 2 protocol ticks old. One tick is 1 >> 6 = 64 seconds.
if signature != msg_data[offset:offset+6] or abs(protocol_time() - timestamp) > 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces around + operator.

tunnel_manager = self.get_tunnel_manager()
client_features = 0
try:
client_features = struct.unpack('!I', msg_data[8:8+4])[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces around + operator.

@@ -39,6 +39,10 @@
PMTU_PROBE_REPEATS = 4
PMTU_PROBE_COMBINATIONS = PMTU_PROBE_SIZE_COUNT * PMTU_PROBE_REPEATS

# Session feature flags
FEATURE_UNIQUE_SESSION_ID = 1 << 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be moved into protocol.py.

@RalfJung
Copy link
Member Author

try to figure out why we still have two clients connecting to the Gateway that now should report usage 0xFFFF.

I think I know what is happening. The problem is that these clients do not yet have b367a33. If they fail to get any reply from any broker, they will, by default, connect to the first one, which happens to be the one that already has the newer kernel.

L2TPv3 session IDs have to be unique on the entire system, not just per tunnel.
The only reason that tunneldigger got away with using 1 for all sessions is that
older Linux kernels failed to properly check for duplicate session IDs.  That
got fixed by kernel commit dbdbc73b44782e22b3b4b6e8b51e7a3d245f3086.

This patch adds unique session IDs to tunneldigger in a backwards-compatible
way.  If both ends of the tunnel agree to use a unique session ID, they both
will use the tunnel ID as the session ID.  To manage this mutual agreement, we
introduce space for up to 32 feature flags.  Three messages are affected:

CONTROL_TYPE_USAGE optionally takes 4 additional bytes after the padding, used
to send the client's feature flags.  Brokers may skew the usage they report back
depending on client features.  Old clients will not send these bytes, which is
interpreted as all flags being 0.

CONTROL_TYPE_PREPARE is extended the same way, also sending the client's feature
flags, so that the server does not have to remember them.  These are the
features that the client is offering to be using for this session.  Old clients
will not send any flags, which tells the server that no features are supported.
Old servers will ignore flags if they received them.

CONTROL_TYPE_TUNNEL is extended the same way, reporting the features that are
actually going to be used for this session.  Typically, this will be the
client's features masked by what the server supports.  If a feature-aware client
talks to an old server, the flags are going to be missing, telling the client
that no features are going to be used in this session.

The first feature flag is 'unique session IDs'.  If enabled, the session ID of
each side of the connection is going to be the same as the tunnel ID.
Furthermore, the broker gains an option to report full usage for clients not
supporting unique session IDs, making sure they connect to other servers (if any
are available).
…pporting unique session IDs that we are full
@RalfJung
Copy link
Member Author

I have an updated version of this that automatically requires unique session IDs upon the first EEXIST on session creation. However, I also rebased on top of #67 (as I want them both on our servers).

I will update this PR once #67 got merged.

@RalfJung
Copy link
Member Author

We now have 10 nodes running with the new client, and 3 of our 4 gateways running with the new broker, for 4 days. Everything seems to run smoothly. So as far as I am concerned, this is ready to be merged.

@RalfJung
Copy link
Member Author

Updated PR to contain the auto-detection, and rebased on top of master.

@kostko kostko merged commit 0a4b330 into wlanslovenija:master Nov 20, 2017
@kostko
Copy link
Member

kostko commented Nov 20, 2017

Thanks!

@RalfJung RalfJung deleted the unique-session-ids branch November 25, 2017 09:45
@RalfJung RalfJung mentioned this pull request Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recent kernels enforce system-wide unique L2TPv3 session IDs
3 participants