Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: MAC Address usage within UUIDs #13

Open
ben221199 opened this issue Jun 6, 2021 · 10 comments
Open

Discussion: MAC Address usage within UUIDs #13

ben221199 opened this issue Jun 6, 2021 · 10 comments
Labels
Discussion Further information is requested

Comments

@ben221199
Copy link

In section 4.3.3 (UUIDv6 Node Usage) of this specification, there is the following text:

UUIDv6 node bits SHOULD be set to a 48-bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address.

May I ask what the motivation is for this section that tells me that I "SHOULD NOT"?

I know that using MAC addresses isn't fully secure, but to fix that, RFC 4122 also has Section 4.5. This section says if you cannot use or don't want to use MAC addresses, you can set one bit to 1 (that is always 0 in normal MAC addresses) and generate some 47-bit integer for the remaining bits. However, section 4.3.3 also says I "SHOULD NOT" use Section 4.5.

I don't see any problem with accepting Section 4.5, because setting that unicast/multicast bit makes it more secure (and the only difference is that your generated id isn't 48 bits, but just 47 bits).

@bradleypeabody
Copy link
Contributor

The reasons for not using MAC address are:

  • possible security issues with intentionally leaking data about which machine produced the value and machine manufacturer
  • uniqueness guarantees from MAC addresses are not what they once were with VMs and all manner of virtual network interfaces being commonplace today

It says SHOULD NOT instead of MUST NOT because it's still at the discretion of the implementation, it's just not recommended.

@alexshpilkin
Copy link

alexshpilkin commented Jul 28, 2021

The original author of this issue might want to correct me, but it seems that there are two interpretations of the question in this issue and @bradleypeabody may have responded to the wrong one: the word “fake” in the title is key.

The first is about the choice between using real MAC addresses and pseudorandom numbers in the node ID field, in general; RFC 4122 §4.5 already discusses the possibility of using either but explicitly leaves the choice to the creator of the UUID (though arguably it treats the real MAC address case as the primary one). If the experience of the authors of this draft suggests that, fifteen years later, it is worth elevating the pseudorandom number case to a SHOULD, I’m willing to believe them. So, no argument here.

The second is about the conflict between the ways 4122 and this draft specify for putting a pseudorandom number in the node ID. This draft (in its v6 section—I haven’t looked at v7 and v8 in detail) tells you to just chuck 48 bits of randomness there. RFC 4122 instead respects the internal structure of MAC addresses by instructing people to set 47 of the 48 bits randomly, but set the I/G bit (“unicast/multicast”, least significant bit of the first octet) to one, on the theory that a real network card’s MAC address is not ever going to be a multicast one. (It also notes that the proper way would instead be to set the next bit, U/L, which is explicitly intended to mark MAC addresses that do not fit into the IEEE-managed namespace, but people have been using multicast addresses before the RFC was written so that’s what the RFC documents.) Thus this draft has a collision probability that is √2 times less (by having an additional bit of randomness) but 4122 allows detecting whether a node ID should be interpreted as a real MAC address (by isolating the random and non-random node ID spaces from each other).

Yes, the RFC 4122 way is kind of awkward and clunky and in an ideal world we probably wouldn’t have used it. But in a world where it is already specified for v1 UUIDs, is it worth having a different one for v6 UUIDs, just to have one more random bit to play with? I can believe there is a different and valid reason, but I don’t see it written down anywhere.

@alexshpilkin
Copy link

... And now that I’ve taken the time to write all of this down, it finally occurs to me that maybe the main thrust of the draft here is that the node ID should not, in fact, identify nodes in any sense, but should instead be generated anew for each UUID (in order to e.g. avoid exposing how many UUID-generating hosts there are), which 4122 §4.5 does not technically prohibit but does not really seem to anticipate.

But while the algorithm in §4.3.4 does tell you to use fresh randomness, 11 items down, I have to admit that this intention was never clear to me from the language in §4.3.3 (quoted in the original issue text) even though I’ve reread the v6 parts of the draft several times over the last couple of weeks. In fact, it still reads as “don’t use real MAC addresses (because security) or set the multicast bit (because we said so)” to me, not “generate fresh random node IDs for each UUID, dummy”.

So even if you think setting the multicast bit as in 4122 is not worth it and don’t want to explain why (?), it seems that §4.3.3 could still use a clarification and/or an explicit comparison to 4122 §4.5.

@ben221199
Copy link
Author

That is a lot of text.

My point was that RFC 4122 Section 4.5 has two options: use MAC address OR use some random with setting a specific bit to 1.
I see UUIDv6 as UUIDv1 with correct time sorting, so the only change UUIDv6 should have in comparison to UUIDv1 is the bit-order of the time field.
In that case UUIDv6 should just follow Section 4.5 of RFC 4122.

So, if we are talking about this spec... I would just write that UUIDv6 follows Section 4.5, but maybe you can add one sentence to tell that it isn't recommended.

@ben221199
Copy link
Author

So, in conclusion:

I would like if the UUIDv6 spec tells me:

  • If the I/G bit is not set, the 48 bits are a MAC address, but also telling that using it like this is NOT RECOMMENDED.
  • If the I/G bit is set, the other 47 bits are a (pseudo)random and that the use of it completely up to the user.

In that case, UUIDv6 follows RFC 4122 Section 4.5 and still gives possibility for custom node-id.

@bradleypeabody
Copy link
Contributor

The general ideas that the draft is intended to communicate are:

  • For UUID6, it is valid and okay to just shift the bits around and use the rest of the behavior from RFC4122
  • But, use of random data instead is preferred over using the MAC address (because both security and questions around uniqueness of MAC addresses in modern environments, e.g. if you run your code on an AWS server in the cloud, what are the odds you are getting a physical MAC address with the originally intended uniqueness guarantees that RFC4122 is assuming?)
  • UUIDs are opaque, with the exception of the timestamp. This probably needs to be made more clear in the draft.

but 4122 allows detecting whether a node ID should be interpreted as a real MAC address

What is the use case for interpreting this data? The purpose of the MAC address from RFC4122 is to obtain a unique value, not to communicate the MAC address of the system to the recipient. So why would the recipient of a UUID need to know if this part is a MAC address or random data? If there is a good answer to this question, then I think it's worth revisiting, but if not, then hopefully the intention behind just using random data is clear - it's just intended to produce a unique value, we're not concerned with reading the MAC address from it after.

@ben221199
Copy link
Author

I cannot remember if I missed the previous comment. However, I see that there is a new version of the draft. I think this is an improvement in comparison to the previous draft. It now reads:

The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5


I also want to give answer to the question in the previous comment:

Section 4.5 of RFC4122 is called Node IDs that Do Not Identify the Host, so in UUIDv1 the Node ID has the purpose of "identifying" hosts. This identifying can be done in 2 ways: using the MAC address or using a 47-bit random number. Why 47-bit? Because this unicast/multicast-bit must be set when NOT using a MAC address. When the bit is NOT set, systems should expect a MAC address. The spec already mentions that using MAC could be insecure in some ways, so that is why they came up with this 47-bit number solution with unicast/multicast-bit.
image

Because UUIDv6 is actually EXACTLY the same as UUIDv1, except for the inverted time field and the version number, it should actually follow the EXACT same rules as UUIDv1. This means that if you want to use a MAC, it bit should be 0. And if you want to use the 47-bit number, you should set the bit to 1.

In fact, your section about UUIDv6 could be something like this alone:

UUIDv6 follows the exact same rules as UUID version 1 described in [RFC4122]. The only difference between UUIDv1 and UUIDv6 is the timestamp field and the version field of course. In case of the timestamp field, all the bits should be in inverted order. This makes it possible to sort the UUID on time. In case of the version field, the value should be 6. Also, we want to encourage you to use a 47-bit random instead of a MAC address. Note that in this case, the unicast/multicast-bit should be set, conform [RFC4122], Section 4.5.

So, to answer the following question:

What is the use case for interpreting this data?

When I read a UUIDv1 with the unicast/multicast-bit NOT set, I expect a MAC.
When I read a UUIDv1 with the unicast/multicast-bit set, I expect a 47-bit random. (Implementation up to the user.)
The use case for the Node ID is "identifying" the host.
If I can directly identify which machine created some object, based on the MAC in the UUIDv1 of that object, I also want to identify which machine created some (other) object, based on the MAC in the UUIDv6 of that object.
Because, after all, UUIDv6 is in fact the same as UUIDv1, then I expect the same behaviour for UUIDv6.


PS: Maybe check for correct interpunction in the draft. In case of my first quote, there was a dot missing in the end.

@kyzer-davis kyzer-davis reopened this Feb 24, 2022
@kyzer-davis kyzer-davis changed the title Reason for not using (fake) MAC addresses in UUIDv6 Discussion: MAC Address usage within UUIDs Feb 24, 2022
@kyzer-davis kyzer-davis added the Discussion Further information is requested label Feb 24, 2022
@kyzer-davis
Copy link
Contributor

kyzer-davis commented Mar 1, 2022

Draft 03 Security Considerations is more or less the same.
Draft 03 UUID Version 6 text has suggestions against the MAC in the node but allows MACs to be backwards compatible with 1:1 remaps of a UUIDv1

@fluffy
Copy link

fluffy commented Jun 24, 2022

FWIW ... at one point in time it was very hard to seed a good random number generator on some devices ( particularly ones without human IO ). Today most those devices have hardware RNG support. Given the privacy issues of MAC, I think we should be looking at totally phasing out any MAC based UUID.

@sergeyprokhorenko
Copy link

@fluffy

...Today most those devices have hardware RNG support...

Do you mean Hardware random number generator? It would be great to use it to generate the "seed" for a faster cryptographically secure pseudorandom number generator - for UUID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants