Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anti replay #1005

Closed
wants to merge 6 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 203 additions & 62 deletions draft-ietf-tls-tls13.md
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,14 @@ informative:
author:
-
ins: H. Krawczyk
Mac17:
title: "Security Review of TLS1.3 0-RTT"
date: 2017
target: https://github.com/tlswg/tls13-spec/issues/1001
author:
-
ins: C. MacCarthaigh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's "MacCárthaigh", but maybe RFCs have to be ascii. And now my first comment gets to be super vain! Oh man.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, ASCII only. Feel free to supply some other flattening :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost able to put people's names in RFCs, but the road is a long one (for reasons that I won't burden you with).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the 8th fallacy of naming! (Not all names are representable in Unicode.) ((Number probably wrong; I made it up.))



--- abstract

Expand Down Expand Up @@ -600,6 +608,14 @@ RFC EDITOR PLEASE DELETE THIS SECTION.
(*) indicates changes to the wire protocol which may require implementations
to update.

draft-21

- Add a per-ticket nonce so that each ticket is associated with a
different PSK (*).

- Add discussion of 0-RTT and replay. Recommend that implementations
implement some anti-replay mechanism.

draft-20

- Add "post_handshake_auth" extension to negotiate post-handshake authentication
Expand Down Expand Up @@ -1370,7 +1386,7 @@ keys derived using the offered PSK.
Unless the server takes special measures outside those provided by TLS,
the server has no guarantee that the same
0-RTT data was not transmitted on multiple 0-RTT connections
(see {{replay-time}} for more details).
(see {{replay-time}} and {{replay-0rtt}} for more details).
This is especially relevant if the data is authenticated either
with TLS client authentication or inside the application layer
protocol. However, 0-RTT data cannot be duplicated within a connection (i.e., the server
Expand Down Expand Up @@ -2938,62 +2954,6 @@ servers MUST process the client's ClientHello and then immediately
send the ServerHello, rather than waiting for the client's
EndOfEarlyData message.

#### Replay Properties {#replay-time}

As noted in {{zero-rtt-data}}, TLS provides a limited mechanism for
replay protection for data sent by the client in the first flight.
This mechanism is intended to ensure that attackers cannot replay
ClientHello messages at a time substantially after the original
ClientHello was sent.

To properly validate the ticket age, a server needs to store
the following values, either locally or by encoding them in
the ticket:

- The time that the server generated the session ticket.
- The estimated round trip time between the client and server;
this can be estimated by measuring the time between sending
the Finished message and receiving the first message in the
client's second flight, or potentially using information
from the operating system.
- The "ticket_age_add" parameter from the NewSessionTicket message in
which the ticket was established.

The server can determine the client's view of the age of the ticket by
subtracting the ticket's "ticket_age_add value" from the
"obfuscated_ticket_age" parameter in the client's "pre_shared_key"
extension. The server can independently determine its view of the
age of the ticket by subtracting the the time the ticket was issued
from the current time. If the client and server clocks were running
at the same rate, the client's view of would be shorter than the
actual time elapsed on the server by a single round trip time. This
difference is comprised of the delay in sending the NewSessionTicket
message to the client, plus the time taken to send the ClientHello to
the server.

The mismatch between the client's and server's views of age is thus
given by:

~~~~
mismatch = (client's view + RTT estimate) - (server's view)
~~~~

There are several potential sources of error that make an exact
measurement of time difficult. Variations in client and server clock
rates are likely to be minimal, though potentially with gross time
corrections. Network propagation delays are the most likely causes of
a mismatch in legitimate values for elapsed time. Both the
NewSessionTicket and ClientHello messages might be retransmitted and
therefore delayed, which might be hidden by TCP. For browser clients
on the Internet, this implies that an
allowance on the order of ten seconds to account for errors in clocks and
variations in measurements is advisable; other deployment scenarios
may have different needs. Outside the selected range, the
server SHOULD reject early data and fall back to a full 1-RTT
handshake. Clock skew distributions are not
symmetric, so the optimal tradeoff may involve an asymmetric range
of permissible mismatch values.

## Server Parameters

The next two messages from the server, EncryptedExtensions and
Expand Down Expand Up @@ -3588,6 +3548,127 @@ appropriate application traffic key as described in {{updating-traffic-keys}}.
In particular, this includes any alerts sent by the
server in response to client Certificate and CertificateVerify messages.

## 0-RTT and Anti-Replay {#replay-time}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters much, but should this section be labeled {#anti-replay} or something else instead? The old {#replay-time} section was named that when it only described the stateless time-mismatch mechanism.


As noted in {{zero-rtt-data}}, unlike 1-RTT data, TLS does not provide
inherent replay protections for 0-RTT data. Instead, it provides
mechanisms which allow a server to implement a number of limited
server-side anti-replay defenses. Servers SHOULD implement
either Single-Use Tickets {{single-use-tickets}} or
Client Hello Recording {{client-hello-recording}}
as described
below, and if not, SHOULD implement the stateless mechanism
described in {{stateless-anti-replay}}.
See {{replay-0rtt}} for more information on the limitations
of these mechanisms.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention that clients and application protocols must assume in their security profile (and in-terms of what they are willing to send via 0-RTT) that servers are only implementing stateless-anti-replay? (When the "server" spans multiple clusters with non-trivial latency between them, stateless-anti-replay is the only one that works and it has the weakest guarantees.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is mentioned below, but repeating it here as well may be worthwhile.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to use an instance of Client Hello Recording per cluster in a distributed setup, effectively reducing the # of possible replay for a "Large Number" to 1 per cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the three alternatives in the SHOULD all equal?

Servers need not permit 0-RTT at all, but those which do SHOULD implement either Single-Use Tickets {{single-use-tickets}}, Client Hello Recording {{client-hello-recording}}, or the stateless mechanism described in {{stateless-anti-replay}}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a WG decision ultimately, but IMO they are not. the first two clearly are stronger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go further: the third method doesn't work - it doesn't prevent replays. It permits thousands to billions of replays, depending on the amount of bandwidth and hosts available, and it doesn't mitigate several of the attacks. It should be taken out - it's insecure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would argue for something stronger:

"0-RTT server implementations that must interoperate with third party systems and applications MUST implement a robust anti-replay mechanism".

My reasoning here is that these a CDN or TLS accelerator that enables 0-RTT without robust anti-replay, will break other downstream systems. (For example upstream 0-RTT leading to throttle exhaustion down stream). I want clear and strong language so that there can be no ambiguity when a CVE is requested against the upstream component. It's not ok imo to break a basic assumption about the internet like that.


### Single-Use Tickets

The simplest form of anti-replay defense is for the server to only
allow each session ticket once. In order to implement this, the server
maintains a database of all outstanding valid tickets; deleting each
ticket from the database as it is used. If an unknown ticket is
provided, the server falls back to a full handshake as normal.

If the tickets are not self-contained but rather are database keys,
and these PSKs are deleted upon use, then connections established
using one PSK enjoy forward security with respect to other PSKs
established on the same connection. This is a security advantage for
all 0-RTT data and for PSK usage when PSK is used without DH.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this part really fits in the current exposition as-is. (Also, doesn't the addition of a per-ticket nonce into the PSk ticket derivation give self-contained tickets the same forward secrecy property?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because compromise of the STEK leads to compromise of all tickets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, database-key tickets certainly have better forward security properties than self-contained tickets. I'm just not sure about what the "with respect to other PSKs established on the same connection" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do to rewrite it.


Because this mechanism requires sharing the session database between
server nodes, it may be hard to achieve high rates of PSK and and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor mistake: "and and"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"between server nodes in environments with multiple servers acting as endpoints for the same service"

0-RTT success when compared with self-encrypted tickets which do not
require consistent server-side storage for basic functionality but
only for 0-RTT anti-replay.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps replace but only for 0-RTT anti-replay with , but may require such storage for 0-RTT anti-replay or something else that makes the sense more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be some more clarification that 0-RTT success and PSK success are partially independent, and that tickets can still be used for PSK even if the single-use property cannot be guaranteed; that is, PSK can succeed even in cases where 0-RTT must be rejected for safety. (Unless I misunderstand?)



### Client Hello Recording

An alternative form of anti-replay is to record each ClientHello or a
unique value derived from the ClientHello and reject
duplicates. However, recording all ClientHellos causes state to grow
without bound, so in practice the server must instead record
ClientHellos within a given time window based on the
"obfuscated_ticket_age" value provided by the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead say "Recording all ClientHellos causes state to grow without bound. A server can instead record ClientHellos within a given time window and use the "obfuscated_ticket_age" to ensure that tickets aren't reused outside that window."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MT's version is better, yes.


In order to implement this mechanism, a server needs to store the
following values, either locally or by encoding them in the ticket:

- The time that the server generated the session ticket.
- The estimated round trip time between the client and server;
this can be estimated by measuring the time between sending
the Finished message and receiving the first message in the
client's second flight, or potentially using information
from the operating system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lines 3600 - 3604 are probably impractical. If folks do want to use STEK-encrypted tickets for global resumption, then the RTT to one location will be very different than another. Even absent that, RTTs can vary quite a lot for Mobile users. As an implementor, I'd just use a global tolerance value (like 500ms or something).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to just say the server should store one thing:

  • The time that the server generated the session ticket, offset by an estimate of the round trip time between client and server.

That also makes it more clear that only information in the ticket can be used for these calculations (any RTT information in the resumption handshake can not be trusted).

- The "ticket_age_add" parameter from the NewSessionTicket message in
which the ticket was established.

The server can determine the client's view of the age of the ticket by
subtracting the ticket's "ticket_age_add value" from the
"obfuscated_ticket_age" parameter in the client's "pre_shared_key"
extension. The server can determine the approximate time that the
client sent the ClientHello as:

~~~~
creation time + (client's view - RTT estimate/2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the division by two is necessary. The ticket gets delayed by half an RTT on the way to the client in the first place, and half an RTT on the way back to the server. So it nets out to one RTT of difference. Means we also needn't worry about any asymmetry between the two directions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that initially, but I believe that's wrong, because we're interested in the client's claimed sending time.

Consider the case where clocks are globally synchronized and we have a 200 ms RTT.

  • We send NST at 0
  • The client receives NST at 100ms
  • The client waits 1000ms and then sends CH (at 1100)
  • The server receives CH at 1200

Now, if the client had put the absolute time in CH, it would have been 1100, but it puts in relative time so that's 1000. When we add 1/2 RTT, we get 1100. If we were to add RTT, we would get 1200, which is wrong, because that's when the server got it.

This is different from below where we are interested in the mismatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right! Wait a second, no. I'm still wrong. But you're not totally right either. If we want to get very pedantic about it (pedantic on a crypto spec? no!) , all we know is that the client is some portion of the RTT behind. We don't know that it's 1/2. The RTT might be asymmetric. Imagine it's 20ms server -> client, but 180ms client to server.

  • We send NST at 0
  • The client receives NST at 20ms
  • The client waits 1000ms and then sends CH (at 1020)
  • The server receives at 1200

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, totally. It's just the best we can do.

Remember that we're not trusting this value, we're just using it to have the best chance of getting the time within the window we are saving CH for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Well, we are sort of trusting this, because we are using it to distinguish between hard fail and forced 1-RTT. OTOH, the attacker can always force us into that posture by just delaying the packet until its out of window, so I don't think that it's an issue. But it might be easiest to just use "issue time + obfuscated ticket age"....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ekr , the comments are so deep I got lost on which section we're in. I thought this was about the stateless anti-replay.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be that you always want to force 1-RTT and never hard-fail on detecting replays. There may be some legitimate scenarios to receive a replay. For example, TCP FastOpen (or QUIC) combined with 0RTT and packet duplication in the network. Hard-fail could result in some weird race conditions here. Having hard-fail (fatal alert, I assume?) as distinct from forcing 1-RTT could also just give an attacker more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP FO is still TCP - a duplicate packet will be rejected by the TCP state machine and shouldn't make it as far as TLS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with Nygren about not giving the attacker more information -- that is, don't hard-fail and just fall back to 1-RTT [unless you're under attack and need to shed load]. The 1-RTT will fail for the attacker's replays, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm persuaded by knekrit's argument
  2. MT, cute trick, but it doesn't work if you want the window to be on arrival time, so only for the stateless technique, not strike registers.
  3. enygren: I was more ambivalent about whether to abort in this round

~~~~

For a given storage window, the server implements anti-replay as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to introduce the phrase "storage window" somehow (or reword)?

follows:

1. If the creation time is outside of the window, then accept the PSK but
reject 0-RTT.

2. If the ClientHello matches an existing ClientHello, then
abort the handshake using an "illegal_parameter" alert
(this should never happen in a functional system).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more efficient implementations can use some hashing/bloom filter rather than storing the entire client hello. This will have false-positives, in which case zero RTT data should just be rejected, not the connection aborted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh hipster crypto to the rescue.

Efficient storage isn't so much the problem as global synchronization within reasonable time frames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure the entire client hello can be stored, but why not do it more efficiently?

I don't think this state is global (ie one client hello cache per cluster).


3. Otherwise, store the ClientHello during the window
and accept 0-RTT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be wise to explicitly recommend storing the PSK binder rather than the entire ClientHello. This has the benefit of essentially being a compact token cryptographically tied to the 0-RTT key (also preventing someone from polluting the replace cache with random data).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ClientHello needs to be valid, then polluting the cache is as trivial as just creating new and different ClientHello values. In a way, the binder is just a way of having the other side calculate your hash for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kyle is right that you need to validate the binder, because it changes the cost of polluting the cache to the cost of getting a new PSK (and you can use PSK-specific filtering to blacklist bad actors).

Maybe it's obvious to others, but we also can't use a hash of the packet because if the CH contains two PSKs, then the attacker can corrupt the second binder without detection and potentially pollute the cache. So, I think you want either CH.Random or the binder.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be adding something here ala ...

It is also critical to the sure that the record of ClientHellos that led to accepted 0-RTT sections from a given window is complete, before accepting any new 0-RTT sections for that same window. For example, if the system recording ClientHellos crashes with no durable record of the ClientHellos previously accepted, then the system needs to wait at least one full window before accepting any ClientHellos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably not going to fly. The window is large and waiting that long would kill all the benefits that 0-RTT provides. I would instead recognize that challenges exist in synchronizing state across participating nodes. See Erik Nygren's comments on the list that amount to basically "not gonna happen", which I agree with.

It's fine to recommend this design, but the need to have globally consistent state is a massive hurdle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The window here is just the clock skew tolerance window (ie 10s of seconds). Waiting 1 window before starting to accept 0-RTT data sounds very reasonable to me.

I believe the intent of these is to have 1 of these client hello record caches per cluster, rather than a global state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my wording confused things. All I was referring to is that when you restart a strike register with a clean slate, you need to wait the window period of time before accepting any new entries. The reason is because of this race:

T1: Strike register accepts key K
T2: Strike register crashes, loses all memory of key K
T3: Strike register restarts
T4: Strike register accepts K again

To avoid that, the register needs to have a pause on start-up. Or it can record everything durably, but that's very slow. The register can still respond with microseconds during ordinary operation.
Clearly, I need better wording.

Because this mechanism does not require storing all outstanding
tickets, it may be easier to implement in distributed systems with
high rates of resumption and 0-RTT, at the cost of potentially
weaker anti-replay defense because of the difficulty reliably
storing and retrieving the received ClientHello messages.

### Stateless Time-Based Anti-Replay {#stateless-anti-replay}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stateless mitigation is pointless, as it doesn't bound the number of replays, but some notes anyway ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Limited" should probably be in the subsection heading.


Finally, the server can implement a very rough anti-replay mechanism
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost call this a "replay reduction" or "replay limitation" mechanism instead of "anti-replay", as "anti-replay" could be read as meaning it is a stronger mechanism than it actually is.

merely by measuring the mismatch between client and server views of
time. The server can determine its view of the age of the ticket by
subtracting the the time the ticket was issued from the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the the"

time. If the client and server clocks were running at the same rate,
the client's view of would be shorter than the actual time elapsed on

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something missing after client's view of...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops "ticket_age"

the server by a single round trip time. This difference is comprised
of the delay in sending the NewSessionTicket message to the client,
plus the time taken to send the ClientHello to the server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unfortunate to have the discussion of the time calculations both here and in Client Hello Recording, though I don't have an alternative proposal.


The mismatch between the client's and server's views of age is thus
given by:

~~~~
mismatch = (client's view + RTT estimate) - (server's view)
~~~~

There are several potential sources of error that make an exact
measurement of time difficult. Variations in client and server clock
rates are likely to be minimal, though potentially with gross time
corrections. Network propagation delays are the most likely causes of
a mismatch in legitimate values for elapsed time. Both the
NewSessionTicket and ClientHello messages might be retransmitted and
therefore delayed, which might be hidden by TCP. For browser clients
on the Internet, this implies that an
allowance on the order of ten seconds to account for errors in clocks and
variations in measurements is advisable; other deployment scenarios
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more we talk about this I kind of want to drop down to less than 10 seconds, like maybe 5 or even 1. (Yes, I know the text here is just talking order of magnitude.)

may have different needs. Outside the selected range, the
server SHOULD reject early data and fall back to a full 1-RTT
handshake. Clock skew distributions are not
symmetric, so the optimal tradeoff may involve an asymmetric range
of permissible mismatch values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the discussion above about storing received ClientHello(-related stuff) settles, we should probably normalize this text with what we end up with there.

Also, a server could rate-limit how often it accepts 0-RTT, to provide some reduction in the amount of replay possible. The amount of reduction gained probably is not enough to make it worth doing, but I'll toss it out there.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would suggest adding something that says:

Note that while stateless anti-replay can bound over how long in time a packet may be replayed, the total amount of replays tolerated is bounded by bandwidth and system capacity. This can be thousands to billions of replays in real-world settings.

And I'd argue for adding this too:

Stateless anti-replay SHOULD NOT be used in environments without strong assurance of application and system behavior and MUST NOT be used in environments that must interoperate with third-party systems and applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the clarification but not the normative requirement.

## End of Early Data

%%% Updating Keys
Expand All @@ -3611,9 +3692,10 @@ appropriate application traffic key.

### New Session Ticket Message {#NSTMessage}

At any time after the server has received the client Finished message, it MAY send
a NewSessionTicket message. This message creates a pre-shared key
(PSK) binding between the ticket value and the resumption master secret.
At any time after the server has received the client Finished message,
it MAY send a NewSessionTicket message. This message creates a
pre-shared key (PSK) binding between the ticket value and a secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"PSK binding" is perhaps a confusing term to use, given that we have "PSK binders" that are different.
Maybe "association" or "relationship"?

derived from the resumption master secret.

The client MAY use this PSK for future handshakes by including the
ticket value in the "pre_shared_key" extension in its ClientHello
Expand Down Expand Up @@ -3646,6 +3728,7 @@ handshake, for example.
struct {
uint32 ticket_lifetime;
uint32 ticket_age_add;
opaque ticket_nonce<1..255>;
opaque ticket<1..2^16-1>;
Extension extensions<0..2^16-2>;
} NewSessionTicket;
Expand All @@ -3668,6 +3751,9 @@ ticket_age_add
obtain the value that is transmitted by the client. The server MUST
generate a fresh value for each ticket it sends.

ticket_nonce
: A unique per-ticket value.

ticket
: The value of the ticket to be used as the PSK identity.
The ticket itself is an opaque label. It MAY either be a database
Expand Down Expand Up @@ -3695,6 +3781,16 @@ max_early_data_size
depend on being able to send large quantities of padding in early data records.
{:br }

The PSK associated with the ticket is computed as:

~~~~
HKDF-Expand-Label(resumption_master_secret,
"resumption", ticket_nonce, Hash.length)
~~~~

Because the ticket_nonce value is distinct for each NewSessionTicket
message, a different PSK will be derived for each ticket.

Note that in principle it is possible to continue issuing new tickets
which indefinitely extend the lifetime of the keying
material originally derived from an initial non-PSK handshake (which
Expand Down Expand Up @@ -4439,8 +4535,8 @@ etc. The initial secret is simply a string of Hash.length zero bytes.
Concretely, for the
present version of TLS 1.3, secrets are added in the following order:

- PSK (a pre-shared key established externally or a resumption_master_secret
value from a previous connection)
- PSK (a pre-shared key established externally or derived from
the resumption_master_secret value from a previous connection)
- (EC)DHE shared secret ({{ecdhe-shared-secret-calculation}})

This produces a full key derivation schedule shown in the diagram below.
Expand Down Expand Up @@ -5324,6 +5420,12 @@ secret. The resumption PSK has been designed so that the
resumption master secret computed by connection N and needed to form
connection N+1 is separate from the traffic keys used by connection N,
thus providing forward secrecy between the connections.
In addition, if multiple tickets are established on the same
connection, they are associated with different keys, so compromise of
the PSK associated with one ticket does not lead to the compromise of
connections established with PSKs associated with other tickets.
This property is most interesting if tickets are stored in a database
(and so can be deleted) rather than if they are self-encrypted.

The PSK binder value forms a binding between a PSK
and the current handshake, as well as between the session where the
Expand Down Expand Up @@ -5568,6 +5670,45 @@ application protocols separately ensuring that confidential
information is not inadvertently leaked.


## Replay Attacks on 0-RTT {#replay-0rtt}

Replayable 0-RTT data presents a number of security threats to
TLS-using applications. Specifically, if applications are not
engineered to be idempotent, then duplication of requests
may cause side effects (e.g., purchasing an item or transferring
money) to be duplicated, thus harming the site or the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is some HTTPS mindset sneaking on. (Which is not necessarily wrong, just something to be aware of.)

In addition, if data can be replayed a large number of times,
this enables a variety of attacks via side channels such
as cache timing or measuring the speed of cryptographic
operations {{Mac17}}.

Copy link

@enygren enygren May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps say "idempotent and side-effect free" rather than just "idempotent"? DELETE and PUT are idempotent but do have side-effects and without additional application layer controls an attacker doing 0-RTT replay could reorder them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to side-effect free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I would talk about "actions" rather than requests. The things that the server does in response to receiving 0-RTT are what will be exploited.

Side-effect free is useful, but forbidding that doesn't really cover it. In @enygren's example, the side effects are relevant, but the primary effect (creation/update of a resource vs. removal) is what we're really concerned with. The only way to ensure that this is perfectly safe is to use the "safe" definition in HTTP - that is the request does nothing but generate a response. And even then, it's rare that such a request is ever free from side effects or side channels.

This risks us defining something that is very HTTP-centric. I would prefer that we instead say that idempotency is desirable for the actions that the server takes, but that idempotency could be insufficient. That is more or less what the text here is getting at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this a bit, but note that I'm not sure that any anti-replay mechanism we have considered will handle this in the face of sufficient client and server complicity.

Consider the following case:

  • Client sends CH + Early data
  • Attacker injects SH + RST
  • Client retries with a fresh connection (and a fresh CH)
  • Server processes the data
  • Attacker replays CH + Early Data

Ugh

The limited anti-replay described in {{replay-time}} are intended to
prevent large-scale replay but do not provide complete protection
against replays. Specifically, because they fall back to the 1-RTT
handshake when the server does not have any information about the
client, e.g., because it is in a different cluster which does not
share state or because the ticket has been deleted as described in
{{single-use-tickets}}. If the application layer protocol retransmits
data in this setting, then it is possible for an attacker to induce a
replay attack by sending the ClientHello to both the original cluster
(which processes the data immediately) and another cluster which will
fall back to 1-RTT and process the data upon application layer
replay. The scale of this attack is limited by the client's
willingness to replay and therefore only allows a small number of
replays, which will also use different encryption keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the "therefore only allows a small number of replays". That's all up to the client. I don't consider 10 to be small, which is where we are at in Firefox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I agree with MT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed "small" to "limited"


If implemented correctly,
the mechanisms described in {{single-use-tickets}} and
{{client-hello-recording}}, prevent a
replayed ClientHello and its associated 0-RTT data from being accepted
multiple times by any cluster with consistent state. However, if
state is not completely consistent, then an attacker might be able to
have multiple copies of the data be accepted during the replication
window. The stateless mechanism described in
{{stateless-anti-replay}} only prevents replay outside the
time window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should reiterate that this can be tens or hundreds of thousands of replays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main body text, not the security considerations, if I'm reading the line numbers correctly.
The security considerations might do well to reiterate the consequences of billions of replays (e.g., Colm's analysis).



Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something like this which covers the fundamental requirement and responsibilities:

"The onus is on clients not to send messages in 0-RTT data which are not safe to have replayed and which they would not be willing to retry across multiple 1-RTT connections. The onus is on servers to protect themselves against attacks employing 0-RTT data replication."

(or "___ have responsibility to" instead of "the onus is on"?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty obvious, but worth stating, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, worth stating again. Maybe also that the application profile should tell the client to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to an earlier location

# Working Group Information

The discussion list for the IETF TLS working group is located at the e-mail
Expand Down