New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please remove gender/sex datatype in the spec and never allow demographic data points #1288
Comments
(This is in reference to the "sex" autocomplete tag.) Yeah, this is reasonable. Sex and gender is almost never actually needed in any way in most forms; nearly all collection is over-collection for no good reason. When it is needed, it's typically for government purposes, where the categories are actually set ahead of time, and the spec's suggestion of using So I support removing the "sex" autocomplete label from the spec. It implicitly encourages a bad practice, and doesn't match common usage when it actually is required. (There's nothing wrong with honorific prefix; while some honorific prefixes are gendered, many aren't, and it's a fairly standard part of "how would you like us to refer to you?" data collection.) |
Sorry, further references that @sideshowbarker found:
|
Both the sex autocomplete field name and the gender drop down are examples that are actively going out of their way to show that sex is not a binary proposition. The autocomplete field name explicitly states that it is a free-form field with no predefined values. It gives three examples, so obviously isn't binary. The examples don't even have a Western bias. The definition explicitly refers to this as a "gender identity" field. Similarly, the datalist example is explicitly an example of a field that doesn't have only two values. It's a free-form field, it just happens to have two values that can be selected easily. Realistically, those two values are tho most common by orders of magnitude, so it makes sense they'd be the ones that are available in this way. Realistically, lots of Web sites include a sex field. Given this, I think it's reasonable for the spec to show best practices for having one. We can certainly improve the examples if you think they leave something to be desired. Removing it is a net negative, though, since it would leave people thinking that, e.g., a |
Regardless that it is our moral responsibility to protect users from the gathering of personal information and the most obvious way to do so is to suggest that this data is not captured... Realistically, lots of Web sites include a sex field Sure codified content can be auto-mapped to UI features (like web browsers that can show a credit card icon next to a card payment field because it has a globally known type) and autocomplete can work too, but, that only requires a web spec for referencing a standards that are managed by appropriate bodies. So you could have:
Largely, I don't think HTML5 should be codifying any of the elements in this section as it is restrictive (there's so many things you can code that aren't included), it's English (data fields should be in the language of the business/user where possible) and it has things wrong (telephone numbers in databases often require free form text ("ask for bob"), transaction amounts cannot be floats in many currencies and Wales isn't including in Alpha-2 country codes (GB-WLS)) and it's too verbose: as a standard that you expect non-English people to learn, it should limit how much it includes and this seems like a step too far. |
This implies that the gathering of personal information is intrinsically immoral, which is a rather strong position to take and not one that I think is particularly obvious or even necessarily right. There's lots of moral reasons why collecting personal information might be a great good. For example, collecting personal information is a key part of a post-crisis emergency response, as a prerequisite to reuniting families separated during the crisis.
There's a whole section on writing a form to collect food preferences: https://html.spec.whatwg.org/multipage/forms.html#introduction-4 Sexual preferences aren't an area that sites that collect that information typically have difficulty doing correctly, as far as I'm aware. I'm not sure I would say that "lots of websites" collect "health problems", but in any case that is an area for which there is ample legislation in many jurisdictions so the spec doesn't have to worry about showing best practices.
The point isn't to show that it can be autocompleted, but to show that it would be wrong to use a
I'm not sure which section you mean, this bug has covered a number of disparate sections.
I'm not sure what you mean. What do you think should be included but isn't?
There are examples in English, but the spec itself is language-neutral. Going back to sex identity in particular, one of the main examples explicitly calls out a case that is non-Western in origin.
That's not a telephone number field, it's a notes field. Telephone number fields typically have to be structured because they are processed by computer (e.g. call centers automatically calling the number then connecting the line to the operator when it connects to the consumer).
The term "valid floating-point number" in the spec doesn't mean IEEE float. It's more equivalent to a fixed point infinite-precision numeric type.
Not sure what you mean here. The string "GB-WLS" doesn't appear in the spec. Most of the items in your most recent comment seem outside of the scope of this issue. Indeed, even the scope of this issue covers multiple items. Please file only one issue per topic. It's totally fine to file multiple issues. This bug requests two things:
The second is impossible. The HTML spec does not allow or disallow anything. If you want to prevent demographic data collection, you should take that up with governments, which is how societies control what is allowed or disallowed. The first would be a net harm. Since people are going to collect sex identity information, we should show best practices for doing so. |
Yes, without a reasonable justification (you want to communicate with your users, sure ask for contact details) it is intrinsically immoral for a website to capture users' demographic information. Who needs to know your sex or gender?
So how many times should you enter this data, maybe 5-10 times a year if you are confident of it being public and otherwise I'd guess < 5 times a year? I think I put in my pizza preferences more often than I enter my gender or sex details, but often I'm being asked for it by organisations for shopping, signing up to publications and wonder why? Likely because they're using it to profile (and ultimately discriminate?) based on gender or sex. They think it is appropriate to take this data... do most shops in the street need to know it? Does your dry cleaner need to know it? Does your gardener, mail-man, etc need to know? Why do websites need it as part of the specification for every website? The use cases for it are in the minority and the organisations that do need to capture it (health, government, sociological research, etc) hopefully have well documented internal standards and are not copying...
It might also be worth reading up on data protection laws, EU has some pretty well documented ones, which are becoming clearer on this matter: http://www.lgbt-ep.eu/press-releases/new-eu-data-protection-law-meps-want-to-protect-lgbt-peoples-privacy/). General advice, don't capture it.
Exactly, there is no html autocomplete=pizza-topping so why is there an autocomplete=sex? I hope people are entering their favourite pizza more often on websites than they are entering their gender or sex.
There is a requirement for counties to be defined as 2 letter country codes; except 2 letters don't cover all countries, some use longer codes, like Wales, my point is that HTML should focus on what it's meant to be good at (website function and rendering) not classification of data which it is has gotten wrong either through the restrictive autocomplete and type suggestions or examples in the spec..
If a website developer wants to create <input type=text name=sex ...> then let them, I don't have a problem with their ability to do it... but it is no business of the html spec to be suggesting it and getting it very wrong, which is harmful (https://html.spec.whatwg.org/multipage/forms.html#the-datalist-element). html spec doesn't put in examples asking people what race they are, so it shouldn't put in examples and autocomplete suggestions for the gender or sex people are associated with. In the context of this suggestion (html not getting involved in data classifiction) it definitely doesn't cause harm, developers do what is appropriate for their use case and users adapt to each field as is necessary for their use case. If it affects autocomplete, well focus on fields that people really do need every day (okay, people shop regularly and get tired entering their name, address, ..., but demographics like sex or gender are not requirements in most sites for buying things online). I'm not asking the spec to ban demographics being possible in a web page, but a ban for demographics to be referenced in the spec as example, type or autocomplete use cases. This is becoming a heavily policed area for developers and they should be avoiding it until they understand the data protection laws in their area. Providing references to where developers should seek guidance on this and providing an html standard way for appropriate bodies to identify types might be best. Summarised my points are
|
This is a controversial opinion. I think you will find many people disagree.
The more interesting question, IMHO, is "how common is it for Web sites to ask for this information", and "how common is it for Web sites to ask for it in a bad way". For sex identity, it is asked relatively frequently, and it is asked poorly (as a binary or enumerated list) very commonly.
I agree that we should hope for this. Unfortunately it is remarkably common for people to get this wrong.
The short answer is because browser vendors asked for the latter but not the former, and we attempt to specify what browser vendors are going to implement.
Please file this as a separate issue.
The HTML spec focuses on what HTML implementations need to implement.
I've explained why it is the business of the spec, and why it's not wrong. If you wish to counter those specific arguments, that is the most productive way of following-up on this bug.
That's a non-sequitur.
I've explained why it is important to show this particular case. The most productive way to follow-up on this bug would be to explain why my arguments are incorrect.
I disagree, for the reasons stated in earlier comments. If you disagree with my take on this please address my arguments directly.
The HTML spec already attempts to address this very point.
I'm not sure what this means.
Please file separate issues for feature requests. |
Today I ran across an example of a form where "sex" was an appropriate and useful field, and where non-binary gender was included. I recalled this discussion, and thought I would offer it here: link As this is a form produced by the government of India, presumably an autocomplete containing only "male" and "female" would be incorrect for all such forms produced by that entity. |
Right, as @Hixie states, the HTML Standard already addresses this point, and discourages fields with a binary-only choice. (Note that autocomplete filling out common values does not restrict what is actually typed.) Given the lack of responses to @Hixie's earlier points, and the lack of an actionable change request here that would not simply make things worse by removing advice, it's probably best to close this. We're always happy to continue discussing in the closed thread, however. |
Given the content of the source file it seems evident that the spec doesn't fully understand gender or sex.
Why? The examples suggest male/female is an appropriate datalist for sex, that perhaps sex/gender are interchangeable and this data harms people and results in discrimination that is worse than having difficulty filling out the form.
We know that the majority of websites are insecure (seriously, insecure and not necessarily by hackers, but by designs that leak lots of data publicly, don't manage access controls well or don't protect it properly internally) and that even when not insecure illegally, they may be insecure legally to policing bodies.
So instead of offering average web devs a standard that suggests: go grab demographic data and play with it; instead point to guidance on best practice for avoiding using demographic data: from ethical, morals, legal and security arguments. Then suggest that if they do need to know more, where they can find guidance and what to do and not to do.
For the vast majority of websites (shopping, news, media, public information, social) the demographic data points are not there to be analysed, in fact in many countries it may well be illegal for most to try to use the data for adults, never-mind children, because the only business reason to have it would be discriminatory and where businesses do need to capture it, they often need to protect it very carefully, so standardising it to make it easy for as many systems as possible to read it is probably not the first priority.
The spec doesn't standardise religion and I couldn't see standardising ethnicity or national identity, well gender and sex deserve the same protection
If websites don't ask, then it can be safer for people:
In the web world, people shouldn't have to worry about the physical discriminations they might incur in the real world, so please remove the suggestion that sex and gender are suitable in a web standard for any website and it might be worth removing honorific-prefix too.
The text was updated successfully, but these errors were encountered: