Skip to content

Latest commit

 

History

History
285 lines (206 loc) · 10.3 KB

identifier_grammar.rst

File metadata and controls

285 lines (206 loc) · 10.3 KB

Identifier Grammar

Server Name

A homeserver is uniquely identified by its server name. This value is used in a number of identifiers, as described below.

The server name represents the address at which the homeserver in question can be reached by other homeservers. The complete grammar is:

server_name = host [ ":" port]
port = *DIGIT

where host is as defined by RFC3986, section 3.2.2.

Examples of valid server names are:

  • matrix.org
  • matrix.org:8888
  • 1.2.3.4 (IPv4 literal)
  • 1.2.3.4:1234 (IPv4 literal with explicit port)
  • [1234:5678::abcd] (IPv6 literal)
  • [1234:5678::abcd]:5678 (IPv6 literal with explicit port)

Room Versions

Room versions are used to change properties of rooms that may not be compatible with other servers. For example, changing the rules for event authorization would cause older servers to potentially end up in a split-brain situation due to them not understanding the new rules.

A room version is defined as a string of characters which MUST NOT exceed 32 codepoints in length. Room versions MUST NOT be empty and SHOULD contain only the characters a-z, 0-9, ., and -.

Room versions are not intended to be parsed and should be treated as opaque identifiers. Room versions consisting only of the characters 0-9 and . are reserved for future versions of the Matrix protocol.

The complete grammar for a legal room version is:

room_version = 1*room_version_char
room_version_char = DIGIT
                  / %x61-7A         ; a-z
                  / "-" / "."

Examples of valid room versions are:

  • 1 (would be reserved by the Matrix protocol)
  • 1.2 (would be reserved by the Matrix protocol)
  • 1.2-beta
  • com.example.version

Common Identifier Format

The Matrix protocol uses a common format to assign unique identifiers to a number of entities, including users, events and rooms. Each identifier takes the form:

&localpart:domain

where & represents a 'sigil' character; domain is the server name of the homeserver which allocated the identifier, and localpart is an identifier allocated by that homeserver.

The sigil characters are as follows:

  • @: User ID
  • !: Room ID
  • $: Event ID
  • +: Group ID
  • #: Room alias

The precise grammar defining the allowable format of an identifier depends on the type of identifier.

User Identifiers

Users within Matrix are uniquely identified by their Matrix user ID. The user ID is namespaced to the homeserver which allocated the account and has the form:

@localpart:domain

The localpart of a user ID is an opaque identifier for that user. It MUST NOT be empty, and MUST contain only the characters a-z, 0-9, ., _, =, -, and /.

The domain of a user ID is the server name of the homeserver which allocated the account.

The length of a user ID, including the @ sigil and the domain, MUST NOT exceed 255 characters.

The complete grammar for a legal user ID is:

user_id = "@" user_id_localpart ":" server_name
user_id_localpart = 1*user_id_char
user_id_char = DIGIT
             / %x61-7A                   ; a-z
             / "-" / "." / "=" / "_" / "/"

Rationale

A number of factors were considered when defining the allowable characters for a user ID.

Firstly, we chose to exclude characters outside the basic US-ASCII character set. User IDs are primarily intended for use as an identifier at the protocol level, and their use as a human-readable handle is of secondary benefit. Furthermore, they are useful as a last-resort differentiator between users with similar display names. Allowing the full unicode character set would make very difficult for a human to distinguish two similar user IDs. The limited character set used has the advantage that even a user unfamiliar with the Latin alphabet should be able to distinguish similar user IDs manually, if somewhat laboriously.

We chose to disallow upper-case characters because we do not consider it valid to have two user IDs which differ only in case: indeed it should be possible to reach @user:matrix.org as @USER:matrix.org. However, user IDs are necessarily used in a number of situations which are inherently case-sensitive (notably in the state_key of m.room.member events). Forbidding upper-case characters (and requiring homeservers to downcase usernames when creating user IDs for new users) is a relatively simple way to ensure that @USER:matrix.org cannot refer to a different user to @user:matrix.org.

Finally, we decided to restrict the allowable punctuation to a very basic set to reduce the possibility of conflicts with special characters in various situations. For example, "*" is used as a wildcard in some APIs (notably the filter API), so it cannot be a legal user ID character.

The length restriction is derived from the limit on the length of the sender key on events; since the user ID appears in every event sent by the user, it is limited to ensure that the user ID does not dominate over the actual content of the events.

Matrix user IDs are sometimes informally referred to as MXIDs.

Historical User IDs

Older versions of this specification were more tolerant of the characters permitted in user ID localparts. There are currently active users whose user IDs do not conform to the permitted character set, and a number of rooms whose history includes events with a sender which does not conform. In order to handle these rooms successfully, clients and servers MUST accept user IDs with localparts from the expanded character set:

extended_user_id_char = %x21-39 / %x3B-7F  ; all ascii printing chars except :

Mapping from other character sets

In certain circumstances it will be desirable to map from a wider character set onto the limited character set allowed in a user ID localpart. Examples include a homeserver creating a user ID for a new user based on the username passed to /register, or a bridge mapping user ids from another protocol.

Implementations are free to do this mapping however they choose. Since the user ID is opaque except to the implementation which created it, the only requirement is that the implemention can perform the mapping consistently. However, we suggest the following algorithm:

  1. Encode character strings as UTF-8.
  2. Convert the bytes A-Z to lower-case.
    • In the case where a bridge must be able to distinguish two different users with ids which differ only by case, escape upper-case characters by prefixing with _ before downcasing. For example, A becomes _a. Escape a real _ with a second _.
  3. Encode any remaining bytes outside the allowed character set, as well as =, as their hexadecimal value, prefixed with =. For example, # becomes =23; á becomes =c3=a1.

Rationale

The suggested mapping is an attempt to preserve human-readability of simple ASCII identifiers (unlike, for example, base-32), whilst still allowing representation of any character (unlike punycode, which provides no way to encode ASCII punctuation).

Room IDs and Event IDs

A room has exactly one room ID. A room ID has the format:

!opaque_id:domain

An event has exactly one event ID. An event ID has the format:

$opaque_id:domain

The domain of a room/event ID is the server name of the homeserver which created the room/event. The domain is used only for namespacing to avoid the risk of clashes of identifiers between different homeservers. There is no implication that the room or event in question is still available at the corresponding homeserver.

Event IDs and Room IDs are case-sensitive. They are not meant to be human readable.

Group Identifiers

Groups within Matrix are uniquely identified by their group ID. The group ID is namespaced to the group server which hosts this group and has the form:

+localpart:domain

The localpart of a group ID is an opaque identifier for that group. It MUST NOT be empty, and MUST contain only the characters a-z, 0-9, ., _, =, -, and /.

The domain of a group ID is the server name of the group server which hosts this group.

The length of a group ID, including the + sigil and the domain, MUST NOT exceed 255 characters.

The complete grammar for a legal group ID is:

group_id = "+" group_id_localpart ":" server_name
group_id_localpart = 1*group_id_char
group_id_char = DIGIT
             / %x61-7A                   ; a-z
             / "-" / "." / "=" / "_" / "/"

Room Aliases

A room may have zero or more aliases. A room alias has the format:

#room_alias:domain

The domain of a room alias is the server name of the homeserver which created the alias. Other servers may contact this homeserver to look up the alias.

Room aliases MUST NOT exceed 255 bytes (including the # sigil and the domain).