Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preferredUsername and natural language support #395

Open
steve-bate opened this issue Oct 20, 2023 · 9 comments
Open

preferredUsername and natural language support #395

steve-bate opened this issue Oct 20, 2023 · 9 comments

Comments

@steve-bate
Copy link

Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. -- Section 4.1 Actors.

There doesn't appear to be preferredUsername language map support in the JSON-LD context. Furthermore, the defacto (Mastodon) usage of preferredUsername would not interoperate with a language map since it's used as an account or login name.

For at least those reasons, I think we should update the note to remove preferredUsername from the set of properties having natural language support.

@tesaguri
Copy link

In a sense, preferredUsername sort of supports language-tagged strings, in that the term definition doesn't set "@language": null nor is the term value subject to type coercion (cf. Section 4.2.4 of JSON-LD spec). For example, the preferredUsername value in Example 9 of ActivityPub Recommendation has the language tag of ja because the context sets the default language to ja.

But yes, I don't believe the property is actually intended to have language-tagged values, and agree with you that the note may be an erratum.

Moreover, I wonder if the term (and perhaps hreflang, mediaType, rel and units in addition?) shouldn't have accepted language-tagged strings in the first place, but I digress.

@steve-bate
Copy link
Author

In a sense, preferredUsername sort of supports language-tagged strings

Yes, I was referring to language maps.. The name and summary properties have language-mapped context terms:

        "name": "as:name",
        "nameMap": {
            "@id": "as:name",
            "@container": "@language"
        },
        "summary": "as:summary",
        "summaryMap": {
            "@id": "as:summary",
            "@container": "@language"
        },

The preferredUsername property does not, which seems correct for the context. I think it's just the recommendation text that should be modified.

@evanp
Copy link
Collaborator

evanp commented Oct 25, 2023

So, the problem we have here is that the ActivityPub specification says to use the preferredUsernameMap property for internationalized support of the preferredUsername. However, there is no such property defined in AS2.

We have two remedies for this:

  • Add a preferredUsernameMap to the AS2 context.
  • Publish an erratum for AP saying not to use the -Map version for username specifically.

Which of these we choose depends on how important i18n of usernames is for ActivityPub.

In discussion on the issue triage call, we felt that having an exact match for preferredUsername was important for AP processors, especially for e.g. determining a Webfinger address for an AP actor.

If supporting multiple usernames in the future is necessary, for i18n or any other reasons, this would be a good topic for a vocabulary extension.

For this reason, I proposed an erratum to the AP document that specifies not to use the -Map property that doesn't exist.

https://www.w3.org/wiki/ActivityPub_errata/Proposed

The note at the end of Section 4.1 should read, Properties containing natural language values, such as name or summary, make use of natural language support defined in ActivityStreams." There is no natural language value for preferredUsername defined in Activity Streams 2.0.

@tesaguri
Copy link

In #395 (comment), I wrote:

I don't believe the property is actually intended to have language-tagged values

because of its real-world usage as a WebFinger identifier. However, this is not quite obvious on reflection, considering the fact that neither preferredUsername nor JRD subject of WebFinger is limited to ASCII string and the acct: URI scheme explicitly allows non-ASCII user components as part of internationalization considerations, implying that they are intended to be capable of representing internationalized identifiers.

Today, it seems that the majority (if not all) of usernames are ASCII-only in Fediverse, but as a real-world example in a traditional (centralized) social media platform, non-ASCII identifiers are commonly used in Weibo to mention accounts. For example, you can write something like ActivityPub is a decentralized social networking protocol standardized by W3C (@w3c中国). In this example, the post is in English whereas the mention (@w3c中国) is in Simplified Chinese (well, at least partly), which should ideally be language-tagged separately.

But it's true that having multiple representations of preferredUsername might be troublesome given its real-world role as a WebFinger identifier1. What we would need for internationalized usernames is the ability to language-tag a single username rather than to have multiple usernames in different languages, so perhaps it should be made RECOMMENDED that the preferredUsernameMap value have only a single entry and that its value match the preferredUsername value (well, it seems that the name, summary and content properties are mainly used in the same way in practice), if a preferredUsernameMap term were to be added.

(For context, a new implementation, Kitsune, is considering non-ASCII username support:
https://corteximplant.com/@0x0/111325717243948397.)


However, there is no such property defined in AS2.

This is a nitpick, but the preferredUsername property isn't defined in Activity Streams/Activity Vocabulary Recommendations either. The property seems to be one of the extension properties introduced by ActivityPub, like source (although the spec doesn't clearly state so).

This implies that it's not that the AS2 specs defined the preferredUsername property but intentionally omitted a preferredUsernameMap term, which makes me wonder if the omission of the term in the context is in fact just an oversight by the AP spec, in which case adding a preferredUsernameMap term would sound slightly more sensible.

Footnotes

  1. Perhaps having multiple representations and making all of them resolve to the same actor via WebFinger would work, but there would still be a question as to which representation should be considered the canonical username, for example.

@steve-bate
Copy link
Author

I think an update to the AP Note text (or a clarification in the Errata) is sufficient. I don't think a JSON-LD preferredUserName language map is needed or desirable. If the preferredUsername is a string, then JSON-LD 1.1 requires it to be UTF-8. If a server implementation constrains it to be ASCII, that's an implementation decision outside the scope of ActivityPub. It's not clear to me that a language tag (versus a map) for preferredUsername is useful either, but if there is a use case for it, it can be specified in the document @context (like you mentioned earlier, and at either the document-level or preferredUserName term-level).

@tesaguri
Copy link

tesaguri commented Nov 1, 2023

It's not clear to me that a language tag (versus a map) for preferredUsername is useful either,

There are countless materials on it out there, but if I have to pick one, see the W3C article Why use the language attribute? for details (sorry if you didn't mean that in this way). Most importantly, it's crucial for assistive technologies, like screen readers for selecting the language to speak the text as, or braille displays for selecting the braille system to transliterate the text to.

If the preferredUsername is a string, then JSON-LD 1.1 requires it to be UTF-8. If a server implementation constrains it to be ASCII, that's an implementation decision outside the scope of ActivityPub.

So you agree with me that preferredUsername isn't meant to be ASCII-only, don't you? If it were ASCII-only, you might well consider its contents not meant to be in a natural language, but being arbitrary Unicode text on the other hand implies the opposite, which is what I meant to argue.

but if there is a use case for it, it can be specified in the document @context (like you mentioned earlier, and at either the document-level or preferredUserName term-level).

Yes, that would work semantically, but without a language map term, there would be multiple ways to express the same semantics in the compacted form. Unfortunately, only few, if any, consumers process Activity Streams documents as JSON-LD, and the intention of AS specs, IIUC, is that they don't even need to do so. Otherwise, you wouldn't need language map terms like nameMap in the first place (you could just write "name": [{"@value": "…", "@language": "…"}, …] after all).

(Well, in #395 (comment), I was talking in terms of theories (which I believe a discussion on formal specs like this needs to consider to some extent) rather than practicalities, but it may well have sounded like I was advocating for interpreting real-world documents that way, and I'm sorry about that. Now I'm talking in terms of practicality.)

Let me put it another way: my personal rule of thumb is that a JSON-LD @context is a stash of things that plain-JSON consumers aren't meant to process (cf. Simplicity of Design Goals and Rationale section of JSON-LD Recommendation). Putting @language in the @context and preserving the syntax of (non-map) preferredUsername means that you don't consider the language tag to be something that ordinary consumers want to understand, but I don't believe so since accessibility is more than nice to have.

@nightpool
Copy link
Collaborator

nightpool commented Nov 17, 2023

Putting @language in the @context and preserving the syntax of (non-map) preferredUsername means that you don't consider the language tag to be something that ordinary consumers want to understand

I don't think this follows. Putting a @language into the root @context is a perfectly natural and easy way to specify the language of a document for pure JSON consumers, and in some ways it's way easier to just say "store the language for an actor based on the @language" property then it is for a JSON-LD consumer that has to keep track of where in the document the context is applied and how that overrides different subsequent properties, etc. I agree that more exotic constructions (for example, specifying the @context for nested actor documents embedded in e.g. a list of posts or the {@value:, @language:} array you mention would be awful for pure JSON consumers, but I don't think that applies to determining the root language for an actor, which is a reasonable request to make of all processors (JSON or JSON-LD) and easily doable by just taking a quick peak in the "metadata"

the only question I think that remains in my mind is whether specifying the language on the root actor level is enough for most users in most cases, or whether it's necessary to require . To me, requiring all JSON processors to look for preferredUsername || Object.values(preferredUsernameMap)[0] is a much more annoying burden then asking them to keep track of a @language property for each actor.

In either situation, I think we need to make it clear that preferredUsername should be treated as a functional property, and should only ever have one value, regardless of what language that value may be. Otherwise I don't think there's any way it can meaningfully serve as a "preferred" username.

@trwnh
Copy link

trwnh commented Dec 27, 2023

In either situation, I think we need to make it clear that preferredUsername should be treated as a functional property, and should only ever have one value, regardless of what language that value may be. Otherwise I don't think there's any way it can meaningfully serve as a "preferred" username.

For current usage (i.e. mapping to Webfinger acct), maybe... but for the given definition (A short username which may be used to refer to the actor, with no uniqueness guarantees.) this isn't strongly implied. But that's a more general problem. There should be a more unambiguous way of deriving a Webfinger acct URI that does give uniqueness guarantees, and there should be a way to specify one (or more!) actual "preferred username(s)" that get used in the order that they are listed (i.e. defined as @container: @list?) This is probably a lot to untangle though and would probably be best explored in a FEP or some other avenue of discussion...

@TallTed
Copy link
Member

TallTed commented Dec 27, 2023

@nightpool — Please edit your #395 (comment) and codefence every instance of any @word (single backticks will do, a la `@context`), such that every comment made here does not ping those GitHub users who did not choose to participate in this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants