Please remove gender/sex datatype in the spec and never allow demographic data points #1288

markalanrichards · 2016-05-20T01:46:08Z

Given the content of the source file it seems evident that the spec doesn't fully understand gender or sex.

Why? The examples suggest male/female is an appropriate datalist for sex, that perhaps sex/gender are interchangeable and this data harms people and results in discrimination that is worse than having difficulty filling out the form.

We know that the majority of websites are insecure (seriously, insecure and not necessarily by hackers, but by designs that leak lots of data publicly, don't manage access controls well or don't protect it properly internally) and that even when not insecure illegally, they may be insecure legally to policing bodies.

So instead of offering average web devs a standard that suggests: go grab demographic data and play with it; instead point to guidance on best practice for avoiding using demographic data: from ethical, morals, legal and security arguments. Then suggest that if they do need to know more, where they can find guidance and what to do and not to do.

For the vast majority of websites (shopping, news, media, public information, social) the demographic data points are not there to be analysed, in fact in many countries it may well be illegal for most to try to use the data for adults, never-mind children, because the only business reason to have it would be discriminatory and where businesses do need to capture it, they often need to protect it very carefully, so standardising it to make it easy for as many systems as possible to read it is probably not the first priority.

The spec doesn't standardise religion and I couldn't see standardising ethnicity or national identity, well gender and sex deserve the same protection

If websites don't ask, then it can be safer for people:

Then when Fredrick says he is in love with Sam, nobody knows whether Fredrick is breaking laws in certain countries.
When a woman shops online doing activities only men can do in her country, she may not have to worry.
When someone doesn't fit the binary categories often used for gender or sex, that they don't have to face the dilemma of having to make themselves fit something they are not and suffering the marketing mistakes and customer service mistakes that follow.
And the list goes on (for those with better experience, feel free to raise more areas where not knowing the sex or gender improves the web experience)

In the web world, people shouldn't have to worry about the physical discriminations they might incur in the real world, so please remove the suggestion that sex and gender are suitable in a web standard for any website and it might be worth removing honorific-prefix too.

tabatkins · 2016-06-17T20:34:58Z

(This is in reference to the "sex" autocomplete tag.)

Yeah, this is reasonable. Sex and gender is almost never actually needed in any way in most forms; nearly all collection is over-collection for no good reason. When it is needed, it's typically for government purposes, where the categories are actually set ahead of time, and the spec's suggestion of using type=textis actually inappropriate (<select> is better suited in those cases, or radio buttons).

So I support removing the "sex" autocomplete label from the spec. It implicitly encourages a bad practice, and doesn't match common usage when it actually is required.

(There's nothing wrong with honorific prefix; while some honorific prefixes are gendered, many aren't, and it's a fairly standard part of "how would you like us to refer to you?" data collection.)

tabatkins · 2016-06-17T20:39:51Z

Sorry, further references that @sideshowbarker found:

datalist example showing "male" and "female" as the two options for a "gender" dropdown - this is prime for removal. Plenty of good examples for datalists that don't reinforce the gender binary and erase many of our friends.
the "sex" field in the VCard format - provides options for "male", "female", "n/a", "other", and "unknown". This is documenting a format defined elsewhere, so I'm less sure of the appropriateness of changing this, but it might be reasonable to have a note informally deprecating it in favor of either using the "gender-identity" field, or leaving off such information entirely.

Hixie · 2016-06-18T02:27:17Z

Both the sex autocomplete field name and the gender drop down are examples that are actively going out of their way to show that sex is not a binary proposition.

The autocomplete field name explicitly states that it is a free-form field with no predefined values. It gives three examples, so obviously isn't binary. The examples don't even have a Western bias. The definition explicitly refers to this as a "gender identity" field.

Similarly, the datalist example is explicitly an example of a field that doesn't have only two values. It's a free-form field, it just happens to have two values that can be selected easily. Realistically, those two values are tho most common by orders of magnitude, so it makes sense they'd be the ones that are available in this way.

Realistically, lots of Web sites include a sex field. Given this, I think it's reasonable for the spec to show best practices for having one. We can certainly improve the examples if you think they leave something to be desired. Removing it is a net negative, though, since it would leave people thinking that, e.g., a <select> would be appropriate.

markalanrichards · 2016-06-19T21:25:42Z

Regardless that it is our moral responsibility to protect users from the gathering of personal information and the most obvious way to do so is to suggest that this data is not captured...

Realistically, lots of Web sites include a sex field
Lots of websites include fields for food preferences, sexual preferences, health problem and HTML5 doesn't code it: you leave it up to the website designer to choose things like <input type="text" name="allergies"> which works and something that should be input so infrequently entered by a user (like gender/sex) doesn't need to be autocompleted.

Sure codified content can be auto-mapped to UI features (like web browsers that can show a credit card icon next to a card payment field because it has a globally known type) and autocomplete can work too, but, that only requires a web spec for referencing a standards that are managed by appropriate bodies.

So you could have:

<standard name="vcard" body="IMC">
  <field name="EMAIL" >
    <inputoption tag="input" type="hidden">
    ...
  </field>
  ...
</standard>

Largely, I don't think HTML5 should be codifying any of the elements in this section as it is restrictive (there's so many things you can code that aren't included), it's English (data fields should be in the language of the business/user where possible) and it has things wrong (telephone numbers in databases often require free form text ("ask for bob"), transaction amounts cannot be floats in many currencies and Wales isn't including in Alpha-2 country codes (GB-WLS)) and it's too verbose: as a standard that you expect non-English people to learn, it should limit how much it includes and this seems like a step too far.

Hixie · 2016-06-20T20:09:00Z

Regardless that it is our moral responsibility to protect users from the gathering of personal information

This implies that the gathering of personal information is intrinsically immoral, which is a rather strong position to take and not one that I think is particularly obvious or even necessarily right. There's lots of moral reasons why collecting personal information might be a great good. For example, collecting personal information is a key part of a post-crisis emergency response, as a prerequisite to reuniting families separated during the crisis.

Lots of websites include fields for food preferences, sexual preferences, health problem and HTML5 doesn't code it

There's a whole section on writing a form to collect food preferences: https://html.spec.whatwg.org/multipage/forms.html#introduction-4

Sexual preferences aren't an area that sites that collect that information typically have difficulty doing correctly, as far as I'm aware.

I'm not sure I would say that "lots of websites" collect "health problems", but in any case that is an area for which there is ample legislation in many jurisdictions so the spec doesn't have to worry about showing best practices.

you leave it up to the website designer to choose things like <input type="text" name="allergies"> which works and something that should be input so infrequently entered by a user (like gender/sex) doesn't need to be autocompleted.

The point isn't to show that it can be autocompleted, but to show that it would be wrong to use a <select> widget that only allows two options.

I don't think HTML5 should be codifying any of the elements in this section

I'm not sure which section you mean, this bug has covered a number of disparate sections.

as it is restrictive (there's so many things you can code that aren't included)

I'm not sure what you mean. What do you think should be included but isn't?

it's English (data fields should be in the language of the business/user where possible)

There are examples in English, but the spec itself is language-neutral. Going back to sex identity in particular, one of the main examples explicitly calls out a case that is non-Western in origin.

and it has things wrong (telephone numbers in databases often require free form text ("ask for bob")

That's not a telephone number field, it's a notes field. Telephone number fields typically have to be structured because they are processed by computer (e.g. call centers automatically calling the number then connecting the line to the operator when it connects to the consumer).

transaction amounts cannot be floats in many currencies

The term "valid floating-point number" in the spec doesn't mean IEEE float. It's more equivalent to a fixed point infinite-precision numeric type.

and Wales isn't including in Alpha-2 country codes (GB-WLS)) and it's too verbose

Not sure what you mean here. The string "GB-WLS" doesn't appear in the spec.

Most of the items in your most recent comment seem outside of the scope of this issue. Indeed, even the scope of this issue covers multiple items. Please file only one issue per topic. It's totally fine to file multiple issues.

This bug requests two things:

remove gender/sex datatype
never allow demographic data points

The second is impossible. The HTML spec does not allow or disallow anything. If you want to prevent demographic data collection, you should take that up with governments, which is how societies control what is allowed or disallowed.

The first would be a net harm. Since people are going to collect sex identity information, we should show best practices for doing so.

markalanrichards · 2016-06-20T21:37:54Z

This implies that the gathering of personal information is intrinsically immoral

Yes, without a reasonable justification (you want to communicate with your users, sure ask for contact details) it is intrinsically immoral for a website to capture users' demographic information. Who needs to know your sex or gender?

health organisations and governments
you might volunteer it on signing up to a social network or for a research project.

So how many times should you enter this data, maybe 5-10 times a year if you are confident of it being public and otherwise I'd guess < 5 times a year? I think I put in my pizza preferences more often than I enter my gender or sex details, but often I'm being asked for it by organisations for shopping, signing up to publications and wonder why? Likely because they're using it to profile (and ultimately discriminate?) based on gender or sex. They think it is appropriate to take this data... do most shops in the street need to know it? Does your dry cleaner need to know it? Does your gardener, mail-man, etc need to know? Why do websites need it as part of the specification for every website? The use cases for it are in the minority and the organisations that do need to capture it (health, government, sociological research, etc) hopefully have well documented internal standards and are not copying...

 <select name=sex>
   <option value="">
   <option>Female
   <option>Male
  </select>

It might also be worth reading up on data protection laws, EU has some pretty well documented ones, which are becoming clearer on this matter: http://www.lgbt-ep.eu/press-releases/new-eu-data-protection-law-meps-want-to-protect-lgbt-peoples-privacy/). General advice, don't capture it.

There's a whole section on writing a form to collect food preferences: https://html.spec.whatwg.org/multipage/forms.html#introduction-4

Exactly, there is no html autocomplete=pizza-topping so why is there an autocomplete=sex? I hope people are entering their favourite pizza more often on websites than they are entering their gender or sex.

Not sure what you mean here. The string "GB-WLS" doesn't appear in the spec.

There is a requirement for counties to be defined as 2 letter country codes; except 2 letters don't cover all countries, some use longer codes, like Wales, my point is that HTML should focus on what it's meant to be good at (website function and rendering) not classification of data which it is has gotten wrong either through the restrictive autocomplete and type suggestions or examples in the spec..

This bug requests two things:

remove gender/sex datatype

never allow demographic data points

The second is impossible. The HTML spec does not allow or disallow anything. If you want to prevent demographic data collection, you should take that up with governments, which is how societies control what is allowed or disallowed.

The first would be a net harm. Since people are going to collect sex identity information, we should show best practices for doing so.

If a website developer wants to create <input type=text name=sex ...> then let them, I don't have a problem with their ability to do it... but it is no business of the html spec to be suggesting it and getting it very wrong, which is harmful (https://html.spec.whatwg.org/multipage/forms.html#the-datalist-element). html spec doesn't put in examples asking people what race they are, so it shouldn't put in examples and autocomplete suggestions for the gender or sex people are associated with.

In the context of this suggestion (html not getting involved in data classifiction) it definitely doesn't cause harm, developers do what is appropriate for their use case and users adapt to each field as is necessary for their use case. If it affects autocomplete, well focus on fields that people really do need every day (okay, people shop regularly and get tired entering their name, address, ..., but demographics like sex or gender are not requirements in most sites for buying things online).

I'm not asking the spec to ban demographics being possible in a web page, but a ban for demographics to be referenced in the spec as example, type or autocomplete use cases. This is becoming a heavily policed area for developers and they should be avoiding it until they understand the data protection laws in their area. Providing references to where developers should seek guidance on this and providing an html standard way for appropriate bodies to identify types might be best.

Summarised my points are

html got sex and gender wrong
sex and gender are too complicated to classify (it can be an option, a textfield, etc and likely will have different validation rules that mean autocomplete is harmful and potentially broken)
html is restricting data classifications (let others define more, better)
there are better organisations for classifying data, why not offer them html features so they can be do the job

Hixie · 2016-06-21T17:51:02Z

without a reasonable justification (you want to communicate with your users, sure ask for contact details) it is intrinsically immoral for a website to capture users' demographic information.

This is a controversial opinion. I think you will find many people disagree.

Who needs to know your sex or gender?

The more interesting question, IMHO, is "how common is it for Web sites to ask for this information", and "how common is it for Web sites to ask for it in a bad way". For sex identity, it is asked relatively frequently, and it is asked poorly (as a binary or enumerated list) very commonly.

The use cases for it are in the minority and the organisations that do need to capture it (health, government, sociological research, etc) hopefully have well documented internal standards and are not copying...

I agree that we should hope for this. Unfortunately it is remarkably common for people to get this wrong.

Exactly, there is no html autocomplete=pizza-topping so why is there an autocomplete=sex?

The short answer is because browser vendors asked for the latter but not the former, and we attempt to specify what browser vendors are going to implement.

There is a requirement for counties to be defined as 2 letter country codes; except 2 letters don't cover all countries, some use longer codes, like Wales

Please file this as a separate issue.

my point is that HTML should focus on what it's meant to be good at (website function and rendering) not classification of data which it is has gotten wrong either through the restrictive autocomplete and type suggestions or examples in the spec..

The HTML spec focuses on what HTML implementations need to implement.

it is no business of the html spec to be suggesting it and getting it very wrong, which is harmful (https://html.spec.whatwg.org/multipage/forms.html#the-datalist-element).

I've explained why it is the business of the spec, and why it's not wrong. If you wish to counter those specific arguments, that is the most productive way of following-up on this bug.

html spec doesn't put in examples asking people what race they are, so it shouldn't put in examples and autocomplete suggestions for the gender or sex people are associated with.

That's a non-sequitur.

I'm not asking the spec to ban demographics being possible in a web page, but a ban for demographics to be referenced in the spec as example, type or autocomplete use cases.

I've explained why it is important to show this particular case. The most productive way to follow-up on this bug would be to explain why my arguments are incorrect.

html got sex and gender wrong

I disagree, for the reasons stated in earlier comments. If you disagree with my take on this please address my arguments directly.

sex and gender are too complicated to classify (it can be an option, a textfield, etc and likely will have different validation rules that mean autocomplete is harmful and potentially broken)

The HTML spec already attempts to address this very point.

html is restricting data classifications (let others define more, better)

I'm not sure what this means.

there are better organisations for classifying data, why not offer them html features so they can be do the job

Please file separate issues for feature requests.

lyndsysimon · 2016-09-05T15:57:11Z

Today I ran across an example of a form where "sex" was an appropriate and useful field, and where non-binary gender was included. I recalled this discussion, and thought I would offer it here: link

As this is a form produced by the government of India, presumably an autocomplete containing only "male" and "female" would be incorrect for all such forms produced by that entity.

domenic · 2016-09-05T16:38:20Z

Right, as @Hixie states, the HTML Standard already addresses this point, and discourages fields with a binary-only choice. (Note that autocomplete filling out common values does not restrict what is actually typed.)

Given the lack of responses to @Hixie's earlier points, and the lack of an actionable change request here that would not simply make things worse by removing advice, it's probably best to close this. We're always happy to continue discussing in the closed thread, however.

domenic closed this as completed Sep 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please remove gender/sex datatype in the spec and never allow demographic data points #1288

Please remove gender/sex datatype in the spec and never allow demographic data points #1288

markalanrichards commented May 20, 2016 •

edited

tabatkins commented Jun 17, 2016

tabatkins commented Jun 17, 2016

Hixie commented Jun 18, 2016

markalanrichards commented Jun 19, 2016 •

edited

Hixie commented Jun 20, 2016

markalanrichards commented Jun 20, 2016

Hixie commented Jun 21, 2016

lyndsysimon commented Sep 5, 2016

domenic commented Sep 5, 2016

Please remove gender/sex datatype in the spec and never allow demographic data points #1288

Please remove gender/sex datatype in the spec and never allow demographic data points #1288

Comments

markalanrichards commented May 20, 2016 • edited

tabatkins commented Jun 17, 2016

tabatkins commented Jun 17, 2016

Hixie commented Jun 18, 2016

markalanrichards commented Jun 19, 2016 • edited

Hixie commented Jun 20, 2016

markalanrichards commented Jun 20, 2016

Hixie commented Jun 21, 2016

lyndsysimon commented Sep 5, 2016

domenic commented Sep 5, 2016

markalanrichards commented May 20, 2016 •

edited

markalanrichards commented Jun 19, 2016 •

edited