Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline: 2+ major versions isn't enough #174

Closed
dfabulich opened this issue May 11, 2023 · 53 comments
Closed

Baseline: 2+ major versions isn't enough #174

dfabulich opened this issue May 11, 2023 · 53 comments
Labels
baseline definition Issues related to the definition of Baseline

Comments

@dfabulich
Copy link
Collaborator

Baseline sets a target of 2+ major versions. Today, that means that the <dialog> element, released in Safari 15.4, is "baseline," because the latest version of Safari is 16.4.

I tried launching a feature with <dialog> a couple of months ago in March 2023, without a polyfill, and we got a bunch of customer support tickets from users of iOS Safari 14.

Furthermore, under the 2+ major version policy, assuming Safari 17 releases on schedule in Fall 2023, on the day Safari 17 drops, every feature from Safari 16 will be officially "baseline." For the rest of that month, no more than 50% of users will actually have Safari 17 yet; the "2+ major versions" policy will effectively drop down to 1 major version.

"Baseline" should be 3+ or 4+ major versions.

@romainmenke
Copy link
Contributor

romainmenke commented May 11, 2023

The same is true, but worse for Chrome.

If a new version of Chrome lands "today" and if that release would trigger a feature to be included in Baseline, then there would only have been a one month window for users of that browser to adopt latest + 1.

I bet some of us have even gone longer than one month between restarts of Chrome and have delayed updates that way.


This is only relevant if developers adopt Baseline as a support target and if people use those features as foundational building blocks.

Which is why I don't think Baseline should be communicated as anything more than a learning/teaching aid : #132 (comment)

@foolip
Copy link
Collaborator

foolip commented May 11, 2023

It's great to see discussion about the criteria for what's considered baseline. The current "last two major versions" criteria is a best effort at "will work for most developers most of the time" but it's not set in stone, and there are tradeoffs.

A few approaches that have been up for discussion so far:

  • Supported in past N releases (without considering release dates or their adoption by users)
  • Supported in all releases in the past N months/years. This remove the impact of release cadence, but also means the number of releases in that time window would be fluctuating as release dates aren't perfectly regular.
  • Supported in all browsers and using per-version usage data like that behind browserslist to have a criteria like "supported for X% of Chrome users, X% of Edge users, X% of Firefox users, and X% of Safari users" for each browser individually

We haven't considered it an option to use global user share in the style of "supported for X% of users regardless of browsers."

Above all, the criteria needs to make sense for developers, needs to work for most developers most of the time. There won't be a perfect and objectively right criteria, but I hope that feedback like this issue will move it in the right direction.

@romainmenke
Copy link
Contributor

romainmenke commented May 11, 2023

needs to work for most developers most of the time.

I also asked this in the other issue, but this keeps coming up :)

What does this mean?
What needs to work, who are "most developers" and what is "most of the time"?

It is perfectly possible to stay in that description and still have an extremely biased baseline that only works for wealthy people from western countries that are aged 20-60 and do not have any accessibility needs. I know this is not the intention here, but I do fear it will be the outcome.

The current baseline is this extremely biased baseline. I urge you too look at browser versions from the perspective of non western regions, children on second hand devices, pensioners on old devices, people who can't afford to upgrade, ...

"Works for most developers, most of the time" is even against the spirit of the web which is intended to be an open and accessible space.


Can't stress this enough, but this is irrelevant when Baseline is purely an indicator for "ready to learn more about X". But that is not how it is currently being framed here, in the announcement and on MDN.

@dfabulich
Copy link
Collaborator Author

dfabulich commented May 11, 2023

The guideline "works for most developers most of the time" doesn't just need clarification.

I argue that it has no meaning at all.

When you say "works for most developers most of the time," it sounds quantitative, but the only actual quantities that we can measure are end-user usage-data numbers like "X% of users/pageviews are in browsers that support <dialog>".

There's no such thing as measuring whether a feature "works for a developer." Features work (or don't work) for end users and whatever creaky old browser they're using right now. And "most of the time" seems to imply that a feature would work for a given developer some of the time and sometimes not work for that same developer. But this is unquantifiable twice over.

If a feature works for only 85% of my users, did the feature work for me, the developer, 85% of the time? 0% of the time? (Of course, it works for me on my machine 100% of the time. 😎)

Since the guideline is unquantifiable, it's impossible to present data to support or refute it. I can show that <dialog> works for 94% of users globally, according to caniuse, but no one can provide data to tell whether <dialog> does or doesn't work for most developers (or any developer) most of the time.

It is perfectly possible to stay in that description and still have an extremely biased baseline that only works for wealthy people from western countries that are aged 20-60 and do not have any accessibility needs. I know this is not the intention here, but I do fear it will be the outcome.

Ironically, it's the wealthy western web that's mostly on iOS, which is usually the laggard browser for modern web standards.

Import maps just released in iOS 16.4 in March 2023, but it released in Chrome 89 in Feb 2021. Globally, 76% of users can use import maps, but in the US, only 58% of users can support them. Probably that'll be a bit higher by the time iOS 17 releases in September, but if I use import maps on my site in 2023, and so my site stops working for 30% of Americans, that's not OK.

Since I'm running a business here, I want to pick a baseline that will serve the rich and poor alike. 2+ major versions doesn't serve either group of users well.

@dfabulich
Copy link
Collaborator Author

We haven't considered it an option to use global user share in the style of "supported for X% of users regardless of browsers."

When you say that we haven't considered it, do you mean, "we think it's a bad idea" or "we just didn't think about it"?

For the record, here's how Android Studio handles it. When you create a new project, it asks you want Android version number to support. The report looks like this, allowing you to choose what percentage of users to support and what newer versions have to offer.

image

In the table above, Android Studio recommends that new projects support Android 7.0 and up, by default. When you click on another option, it shows you the new features available at that API level.

Based on that, here's what I recommend:

  1. Define "year-based Baselines" not on the "prior two major versions," but only on the current latest public release (even a minor release) for Chrome, Edge, Firefox, and Safari. Whatever those browsers are shipping on Jan 1 2023 is what defines Baseline 2023.

  2. Make a simple chart, like the chart above, showing what percentage of users can use which Baseline. Perhaps only 50% of users can access all of Baseline 2023, and 94% of users can access all of Baseline 2022, and 95% of users can access all of Baseline 2021, etc. It reduces a multidimensional caniuse report into a simple 2D table.

  3. Select a global user share threshold X%, and recommend a single Baseline year when all of its features are supported by X% of users worldwide. I suggest 95%, not because that's a number that "works for most developers most of the time," because that's meaningless, but because it's subjective, and I think most people would agree that 95% feels pretty reasonable.

    In some years, users might choose to upgrade faster or slower, so Baseline 2023 might be recommended in October 2024, but Baseline 2024 might not really be recommended until Feb 2026. (Thus, there's no way to know in advance whether we'd have to wait for two major versions or three major versions to get to 95% user support.)

    Developers might choose a higher bar, picking only features available to 97% of users or 99% of users, using data in the chart to drive that decision.

  4. All features supported in all major browsers would have a Baseline banner, showing in what year the last "holdout" browser released unprefixed/unflagged support for that feature, even if Safari just shipped the feature yesterday. The MDN banner could say, "Baseline 2024: This feature became available in all major browsers in May 2023." Developers, noticing that it's not 2024 yet, might pause before adopting the feature in production, or could try it out right away, if they're feeling adventurous.

    If Baseline 2022 was at 95% percent, then the banner could say: "Baseline 2022: This feature has been widely available since 2022."

@foolip
Copy link
Collaborator

foolip commented May 11, 2023

needs to work for most developers most of the time.

What does this mean?

It's about answering the question "can I rely on this feature working for my users?" It's the same question that you come to MDN's compat tables or caniuse.com for, but condensed into something you can understand at a glance.

For the answer to be work (be useful) it has to match developer expectations and it has to match the reality of what will actually work for their users.

This isn't a precisely defined and quantifiable guideline and I don't think anyone expects data alone to guide us to the best definition.

IMHO a compelling argument against the current definition is what @dfabulich noted in this issue: <dialog> is Baseline by the current definition, but trying to use it in production led to user complaints.

@styfle
Copy link

styfle commented May 11, 2023

condensed into something you can understand at a glance

I think thats the problem. Baseline is not really better than the current matrix at the bottom. Because as a developer, I know which browsers are supported by my application. But I don't know which browsers are supported by Baseline (its a moving target).

I think step one is to ensure that Baseline is never mentioned without also declaring the year.

In addition to that, perhaps the year could represent the date that all browsers supported the feature. So it would be more like a lowest common denominator. Then I could think about Baseline 2022 the same way I think about ECMAScript 2022.

@foolip
Copy link
Collaborator

foolip commented May 12, 2023

We haven't considered it an option to use global user share in the style of "supported for X% of users regardless of browsers."

When you say that we haven't considered it, do you mean, "we think it's a bad idea" or "we just didn't think about it"?

I mean that it's an obvious option, but not seriously considered because it would involve browser market share, which would be very hard for browser vendors to come together around as a criteria. We'd have to tackle at least two problems. First, what data can be trusted here, and I suspect there would be no agreement on that. Second, how to deal with the fact that market share can go up and down, so at least theoretically features could go in and out of Baseline as that happens. But none of this was discussed at any length.

@foolip
Copy link
Collaborator

foolip commented May 12, 2023

@dfabulich thanks for sharing that Android Studio screenshot, I agree that something like that based on yearly Baseline cuts has a lot of promise. A great feature of this proposal is that it doesn't bake market share into the definition of Baseline, but uses it to give a recommendation about which Baseline year to target.

@styfle your suggestion is along similar lines, and I think that may very well make sense once Baseline 23 is done and we're into 2024.

Note that a yearly release of Baseline is already part of the plan, see https://web.dev/baseline/.

@chris-morgan
Copy link

#173 is also obviously relevant here. As it stands, “last 2 versions” means 1–2 years for Safari (which seems a fairly reasonable sort of period in general, though adding a month or two at the low end would probably improve it), and 4–10 weeks in other browsers (which is very unreasonably little, and also means that something like Firefox ESR is never considered, since it’s more like ten months back).

This metric is currently probably ignoring around 15–20% of actual users most of the time. (The amount will vary through each browser’s release cycle, being worst immediately after a release.) The only reason it’s not a total disaster is the different versioning scheme used by Safari, because Safari from a year ago is in most areas close enough to other browsers from a year ago. But if #173 gets fixed (so that “last two versions” of Safari would currently mean 16.3 rather than 15), then this would be a hopelessly bad and terribly misleading metric of a baseline.

“Major versions” is clearly an unsuitable measure. You need something either time-based or usage-based. Time is the easy and practical one to use. And it should definitely be more than “as little as four weeks ago”. Fourteen months feels a fairly practical sort of a figure, not overly conservative but giving some time to update that usage figures show is needed.

@ddbeck
Copy link
Collaborator

ddbeck commented May 12, 2023

I'm really grateful for the discussion here and I'm learning a lot (it's especially great to see the Android Studio example—it's both wonderful and painfully intimidating at the same time). We relied on survey data in which web developers said they developed for browser version numbers per se, but it didn't capture the whole story, I think.

I do want to note one thing about scope, prompted by @dfabulich's earlier comment:

There's no such thing as measuring whether a feature "works for a developer." Features work (or don't work) for end users and whatever creaky old browser they're using right now.

I don't think that's exactly right: we can survey and interview web developers and learn about their experiences with features and browsers over time and what constraints they work within. "What are web developers under pressure to do?" is a distinct question from "What browsers are people using this week?"

That is to say, a feature not working for a given end user may or may not be a problem, if a dev (or their bosses or clients) made an intentional choice to not support that user's creaky old browser. Or maybe a specific creaky old browser is actually fundamentally important and widespread usage is totally immaterial (let's show some sympathy to the poor developer who has to maintain compatibility with some ancient iPad kiosks that haven't had an update in years).

We know that web developers routinely make support decisions that are not rooted in global usage share numbers or time since release or number of releases. We know that no one summary indicator of support would be definitive for every developer.

But I think we can still help developers understand what the web can do with respect to their own constraints, if we choose a definition of Baseline that is understandable and predictable enough that a web developer can compare their own work against Baseline. But plainly we have to do some work to help developers know where they are relative to Baseline (e.g., with dates, with usage reports, and so on) no matter where we end up with a definition.

@othermaciej
Copy link

othermaciej commented May 15, 2023

De facto the way "2+ major versions" has been interpreted is "2 years for Safari, but 8 weeks for Chrome or Firefox" (or maybe it's 1 year, and 4 weeks?).

For consistency, Safari major version should go to one digit after the period; 16.4 is logically as much a major release as Chrome 112.0, and includes substantially more web platform features and bug fixes, so it's hard for me to see why one counts as a major release and the other doesn't. We potentially ship web platform features in all of .0-.6, for a total of 7 releases a year that should plausibly count. Considering only 1 a year seems to be an overly literal interpretation of the version string.

This is somewhat separate from how many major versions to go back for each browser. Perhaps a lot more Safari users are on older versions. However, at a glance at least, this does not seem supported by the available evidence.

Below I'm including data from Statcounter "Browser Version" share for April 2023 (I transposed the CSV download). Note that mobile versions. of Chrome and Safari are collapsed to single catchall versions like "Chrome for Android" and "Safari iPhone", so the stats below are meaningful only for desktop. I note that if you go back to Safari 15.4, what was then the year-ago Safari release, there's 9 different versions of Chrome with more use share, with the oldest of these being Chrome 79.0. If you go back to Safari 15.0 (the last "major version" by the scheme that only counts .0 releases), there are 37 different versions of Chrome with more use share, of which the oldest is 49.0.

Perhaps Statcounter is not reliable enough. But ideally the "how many versions back" policy should be informed by actual data, not by counting the dots in version numbers, or by anecdotes about user complaints.

Source: https://gs.statcounter.com/browser-version-market-share

Date | 2023-04
-- | --
Chrome for Android | 37.2
Safari iPhone | 14.47
Chrome 112.0 | 9.73
Chrome 111.0 | 8.79
Edge 112 | 3.18
Chrome 109.0 | 2.11
Samsung Internet 20.0 | 1.91
Chrome for iPhone | 1.71
Safari 16.3 | 1.63
Other | 1.5
Edge 111 | 1.44
Firefox 111.0 | 1.11
Firefox 112.0 | 0.97
Safari 15.6 | 0.83
Opera 97.0 | 0.82
Android 0 | 0.74
Safari iPad | 0.73
Safari 16.4 | 0.69
Chrome 110.0 | 0.48
UC Browser 13.4 | 0.41
Safari 16.2 | 0.41
Chrome iPad | 0.34
Safari 14.1 | 0.33
Opera 96.0 | 0.31
Safari 16.1 | 0.28
Chrome 103.0 | 0.27
Chrome 108.0 | 0.25
Opera 74.1 | 0.23
Chrome 79.0 | 0.22
Yandex Browser 23.3 | 0.22
Safari 15.5 | 0.2
360 Safe Browser 0 | 0.2
UC Browser 15.3 | 0.2
IE 11.0 | 0.19
Safari 13.1 | 0.19
Opera 69.0 | 0.16
Chrome 83.0 | 0.14
Chrome 93.0 | 0.14
Samsung Internet 19.0 | 0.13
Safari 14.0 | 0.12
Safari 15.4 | 0.11
Firefox 102.0 | 0.11
Chrome 107.0 | 0.1
Edge 109 | 0.1
Coc Coc 115.0 | 0.1
Chrome 105.0 | 0.09
Edge 110 | 0.09
Chrome 87.0 | 0.09
Safari 16.0 | 0.09
Whale 1.0 | 0.09
Chrome 106.0 | 0.08
Safari 15.3 | 0.08
Chrome 91.0 | 0.08
Opera Mini 4.4 | 0.08
Chrome 99.0 | 0.07
Chrome 85.0 | 0.07
Opera 95.0 | 0.07
Mozilla 0 | 0.07
UC Browser 12.12 | 0.07
KaiOS 2.5 | 0.07
Chrome 94.0 | 0.07
Chrome 56.0 | 0.07
Opera 68.0 | 0.07
Chrome 102.0 | 0.06
Chrome 104.0 | 0.06
Firefox 110.0 | 0.06
Chrome 86.0 | 0.06
Chrome 100.0 | 0.05
Samsung Internet 17.0 | 0.05
Samsung Internet 18.0 | 0.05
Chrome 97.0 | 0.05
Chrome 92.0 | 0.05
Safari 15.1 | 0.05
Safari 15.2 | 0.05
Chrome 81.0 | 0.05
Opera 67.1 | 0.05
Safari 9.1 | 0.05
Puffin 9.9 | 0.05
Chrome 101.0 | 0.04
Instabridge 21.9 | 0.04
Chrome 98.0 | 0.04
Chrome 96.0 | 0.04
Firefox 52.0 | 0.04
Chrome 84.0 | 0.04
Chrome 69.0 | 0.04
Chrome 49.0 | 0.04
Chrome 75.0 | 0.04
Chrome 76.0 | 0.04
Chrome 89.0 | 0.04
Chrome 80.0 | 0.04
UC Browser 13.2 | 0.04
Safari 12.1 | 0.04
Chrome 90.0 | 0.04
Edge 108 | 0.03
Firefox 109.0 | 0.03
Samsung Internet 16.0 | 0.03
Opera 66.2 | 0.03
Sogou Explorer 0 | 0.03
Firefox 78.0 | 0.03
Chrome 74.0 | 0.03
Safari 15.0 | 0.03

@dfabulich
Copy link
Collaborator Author

There's been a lot of good feedback on the definition of Baseline, but I'm very unclear on what's happening next.

Is this going to be a discussion topic for the next WebDX Community Group meeting? If so, when is that meeting scheduled?

Per web-platform-dx/admin#2 there's no publicly available information on when the next meeting is scheduled, or any information on past meetings (or even whether there have been any past meetings).

What happens now?

@chris-morgan
Copy link

Perhaps Statcounter is not reliable enough.

Since people are commonly not aware of this: it’s based on trackers that are blocked by extremely common configurations (probably most ad/content blockers: certainly uBlock Origin with its default lists, and Firefox’s built-in Enhanced Tracking Protection blocks it in Strict mode, and in Private Browsing windows in its (default) Standard mode). It will almost certainly significantly undercount Firefox especially.

The way it pretends that most mobile UAs are instantly up to date (as you noted) is also rather perplexing and unrealistic.

@othermaciej
Copy link

Since people are commonly not aware of this: it’s based on trackers that are blocked by extremely common configurations (probably most ad/content blockers: certainly uBlock Origin with its default lists, and Firefox’s built-in Enhanced Tracking Protection blocks it in Strict mode, and in Private Browsing windows in its (default) Standard mode). It will almost certainly significantly undercount Firefox especially.

This is a fair point, but it probably doesn't significantly change the point about relative version spread of different browsers. Perhaps Firefox is even more weighted towards recent versions than the data suggests (since ETP would not be in very old versions). Maybe all browsers are more weighted towards recent versions in reality.

The way it pretends that most mobile UAs are instantly up to date (as you noted) is also rather perplexing and unrealistic.

I don't think they are claiming they're always up to date, they are just failing to make version distinctions. That seems like a string limitation. I believe mobile versions of Chrome and Safari both have identifiable version in the UA string and/or via UA Client Hints.

Anyway, I presented StatCounter not as the best available usage share data, but because it's an easily accessible public data source. Better data or other sources welcome! Even taking account of all the methodology limitations in StatCounter, I think the data still contradicts the position that a month of Firefox or Chrome updates gets the same user saturation as a year of Safari updates. And the current definition of Baseline assumes that position.

@bkardell
Copy link

bkardell commented May 16, 2023

I mean that it's an obvious option, but not seriously considered because it would involve browser market share, which would be very hard for browser vendors to come together around as a criteria. We'd have to tackle at least two problems. First, what data can be trusted here, and I suspect there would be no agreement on that.

But... Shouldn't there be? Even if such a dataset (or maybe even more specifically: way to create such a dataset) doesn't exist, shouldn't it? It definitely feels worth a valiant effort at least to see if we can find a way to agree on facts that ultimately shape so much/so many decisions.

@othermaciej
Copy link

othermaciej commented May 17, 2023

I mean that it's an obvious option, but not seriously considered because it would involve browser market share, which would be very hard for browser vendors to come together around as a criteria. We'd have to tackle at least two problems. First, what data can be trusted here, and I suspect there would be no agreement on that.

But... Shouldn't there be? Even if such a dataset (or maybe even more specifically: way to create such a dataset) doesn't exist, shouldn't it? It definitely feels worth a valiant effort at least to see if we can find a way to agree on facts that ultimately shape so much/so many decisions.

Also - I don't think browser vendors were consulted about the current criteria, which are 8 weeks for Chrome, Firefox, and Edge; and 2 years for Safari. At least, no one from Apple was consulted. No one even asked us which Safari versions are major versions. We would not have said it was just the one .0 release a year. The current policy is neither neutral nor non-controversial. It wrongly implies that Safari updates much more rarely than other browsers and/or that Safari users are much less likely to have a recent version.

@atopal
Copy link
Collaborator

atopal commented May 17, 2023

Thanks a lot for all the comments here. Definitely very useful in iterating towards a more useful definition of Baseline. I want to address a couple of specific things:

@dfabulich Yes, we’re going to discuss this in the next WebDX CG meeting, and I’m collecting incoming feedback for that purpose. We’d love for people to join the CG and attend the next call. Once you join, you should be able to subscribe the calendar. The next meeting will be on May 25th

@othermaciej we did bring this up in the WebDX CG multiple times and have asked for feedback, but didn’t hear about any concerns. We based this initial browser set on research we did where we asked about browsers and versions that devs expect to see a feature in. That said research on this is not easy, this is just a starting point and it looks like 2 major versions back is not the one to go with for now, and we’ll focus on a better definition. I’ll ask to put this on the agenda for the next WebDX CG call.

@foolip
Copy link
Collaborator

foolip commented May 17, 2023

@othermaciej thanks for the feedback! It's a long thread so I want to bring your attention to some possible approaches in #174 (comment). I'm fairly confident we can arrive at something reasonable given the input of browser vendors and web developers.

I'd also like to highlight that the rollout of this on MDN isn't automatic, rather it shows up when features are designated as Baseline in this repo, which is manual with human review. So we can and will steer clear of cases that are in this gray zone of probably too new to call Baseline. Of the features we've already done, I think @layer and structuredClone() are the most debatable. However, both were released in Safari 15.4, so the Safari major/minor issue doesn't come into play.

A good test feature for any definition might be Container Queries, which we (@ddbeck) left commented out. It would be Baseline by the "2+ releases" rule, but not the "2+ major releases" rule.

@foolip
Copy link
Collaborator

foolip commented May 17, 2023

I mean that it's an obvious option, but not seriously considered because it would involve browser market share, which would be very hard for browser vendors to come together around as a criteria. We'd have to tackle at least two problems. First, what data can be trusted here, and I suspect there would be no agreement on that.

But... Shouldn't there be? Even if such a dataset (or maybe even more specifically: way to create such a dataset) doesn't exist, shouldn't it? It definitely feels worth a valiant effort at least to see if we can find a way to agree on facts that ultimately shape so much/so many decisions.

@bkardell it's an excellent point. Do you want to file a new issue for that?

@othermaciej
Copy link

@othermaciej thanks for the feedback! It's a long thread so I want to bring your attention to some possible approaches in #174 (comment). I'm fairly confident we can arrive at something reasonable given the input of browser vendors and web developers.

Yes, I did see it. I find the options open-ended enough in how they are stated that I don't know if they solve this problem?

I'd also like to highlight that the rollout of this on MDN isn't automatic, rather it shows up when features are designated as Baseline in this repo, which is manual with human review. So we can and will steer clear of cases that are in this gray zone of probably too new to call Baseline. Of the features we've already done, I think @layer and structuredClone() are the most debatable. However, both were released in Safari 15.4, so the Safari major/minor issue doesn't come into play.

Interesting. Is the policy for manual human review (who decides, what criteria they should apply) written down?

A good test feature for any definition might be Container Queries, which we (@ddbeck) left commented out. It would be Baseline by the "2+ releases" rule, but not the "2+ major releases" rule.

I'm not claiming that "releases" is a better rule than "major releases", necessarily. My claim is that your definition of what is a "major release" is incorrect for Safari. Safari 15.4 is a major release. Safari 15.0 is a major release. Safari 15.4.1 would not be. We increment either the first or second number for what we consider a major release of the browser, instead of always the first, like Chrome, Edge or Firefox do. We do it this way so the version number (on both iOS and Mac) stays mostly aligned with iOS version numbers as we find this is less confusing to users and developers. Any x.0 or x.y release can contain new web platform features and significant fixes. For intermediate bug fix release, we only update the number (after the second dot). Surely this is the meaningful definition of "major release", not merely whether a marketing version string contains only one nonzero element.

Engagement with the Safari team would have clarified this.

I think for now, it would be best to update the definition of what counts as a major version for Safari, even while continuing the broader discussion of what the Baseline criteria should be.

(Perhaps this would be more on-topic for #173?)

@foolip
Copy link
Collaborator

foolip commented May 20, 2023

@othermaciej thanks for the feedback! It's a long thread so I want to bring your attention to some possible approaches in #174 (comment). I'm fairly confident we can arrive at something reasonable given the input of browser vendors and web developers.

Yes, I did see it. I find the options open-ended enough in how they are stated that I don't know if they solve this problem?

None of them rely on a semver-like concept of major release, so any of them should solve the problem.

I'd also like to highlight that the rollout of this on MDN isn't automatic, rather it shows up when features are designated as Baseline in this repo, which is manual with human review. So we can and will steer clear of cases that are in this gray zone of probably too new to call Baseline. Of the features we've already done, I think @layer and structuredClone() are the most debatable. However, both were released in Safari 15.4, so the Safari major/minor issue doesn't come into play.

Interesting. Is the policy for manual human review (who decides, what criteria they should apply) written down?

No, but @ddbeck was going to document some things around governance and it would make sense to document review policy as well.

Also let us know if Apple wants to have reviewers, similar to BCD.

I think for now, it would be best to update the definition of what counts as a major version for Safari, even while continuing the broader discussion of what the Baseline criteria should be.

I agree. I think we should use BCD's concept of release, which doesn't depend on the structure of the version string.

This shouldn't change any Baseline badge shown on MDN today.

As @atopal said this will be on the agenda for next week.

@romainmenke
Copy link
Contributor

I'm not claiming that "releases" is a better rule than "major releases", necessarily. My claim is that your definition of what is a "major release" is incorrect for Safari. Safari 15.4 is a major release. Safari 15.0 is a major release. Safari 15.4.1 would not be.

Maybe it's better to use a different wording in the context of Baseline? Something like significant release. It is easier to teach people that Chrome 109 is significant and that Safari 15.4 is significant, than explaining that semver major/minor doesn't translate well to browser versioning.

For Chrome a significant release is an increment in the first number, for Safari an increment in the second number.

@danburzo
Copy link

Taking yearly snapshots of the interoperable web platform surface makes a lot of sense for many different uses, and the current definition of Baseline (2+ featureful versions) works reasonably well for including or excluding features from these snapshots.

However, my concern is that reusing the criteria to continuously update what’s ’safe to use right now’ results in imprudent advice to developers, as others have pointed in previous comments. What’s the argument against offering more conservative guidance?

One idea that hasn’t been discussed, as far as I could find, is placing the feature in relation with these snapshots. So a feature could be one of:

  • Has been included in a previous snapshot — wide support / safe to use
  • Qualifies for the upcoming snapshot — good support / possibly safe to use
  • Does not qualify for the upcoming snapshot — non-interoperable / needs personal evaluation

This essentially makes the current Baseline a semi-recommendation, while raising the bar for what's safe to use.

@atopal
Copy link
Collaborator

atopal commented May 25, 2023

I should clarify that the intent with using browsers versions was to have an easy to understand proxy for how much uptake a feature has between versions of the same browser. Moving forward, we'll make sure to not to refer to Safari Y.x releases as major versions, to ensure there is no unnecessary confusion about their significance.

Independently of that, we need to ensure that the shorthand (x browser versions back) actually represents the coverage that developers expect, which seems like is somewhere above 90% and below 100%, and over the next few weeks we'll try to get identify better data on that as well as research on what expectations are.

@danburzo snapshots and the ongoing classifications solve different problems. One is a "push message", and gives us the ability to talk about what has changed between releases, the other is a "pull message", giving developers information when they need it and are looking for it.

There is definitely an argument to be made for being more conservative, and 2+ versions might not be enough, but that would also be true for the snapshot (at least shortly after announcing it).

With all of this, it's probably a good reminder that we expect more than 90% of the platform to safely be in the "Baseline" bucket. The vast majority of features of the platform have been around for a long time.

@chris-morgan
Copy link

chris-morgan commented May 25, 2023

With all of this, it's probably a good reminder that we expect more than 90% of the platform to safely be in the "Baseline" bucket. The vast majority of features of the platform have been around for a long time.

Platform features, sure, but that’s not the point of Baseline, and completely irrelevant—if it were relevant, there wouldn’t be much purpose to Baseline. The point is how ready new features are, which is about actual availability in the browsers people use. And the most generous stats on the current “two major versions” definition, including similar browsers by engine (even though they don’t always have exactly the same feature coverage, for various reasons), fall way short of 90% of users.

If the current definition goes back to Safari 15.0, then depending on the point in the release cycle, it’s only covering about 80–85% of actual users. However, by sheer accident (as it appears to me), most of the remainder featurewise is caught due to the longer Safari coverage, since most features that Safari had 1–2 years ago that Chromium and Firefox had 1–2 months ago, Chromium and Firefox also had 1–2 years ago. This is the sole saving grace of the current definition.

If the current definition were changed to match the realistic definition of “two major versions” for Safari (as declared by their developers and observed by anyone who can look past a stubbornly fixed notion that X.Y.Z must be major.minor.patch to the clear reality), to what would currently be Safari 16.4, then at some points in release cycles (being worst, I think, immediately after the release of the next major version of Chromium after a Safari X.1 release, which is probably around early November) it could be covering less than two thirds of all users (and that’s globally—the figure would be worse in some areas).

The current definition is already dangerously broken and misleading. Yes, the difference is only visible in a few cases, but those few cases are most of what Baseline’s purpose should be.

@mrleblanc101
Copy link

What's the definition of the "last 2 version" in the case of Safari.
Current version of Safari is 16.4, does that mean 16.4 and 16.3 or 16.0 and 15.0 ?

@foolip
Copy link
Collaborator

foolip commented May 26, 2023

@romainmenke are the numbers you're sharing from the server logs of a specific website, and if so which one? I'm under no illusion that the data behind browserslist is authoritative, but is there any publicly available data source that is less bad?

@foolip
Copy link
Collaborator

foolip commented May 26, 2023

@mrleblanc101 we decided in the CQ meeting yesterday that we're going to use the last 2 versions of Safari listed in https://github.com/mdn/browser-compat-data/blob/main/browsers/safari.json, which currently means 16.4 and 16.5.

Initially, the plan was 15.x + 16.x, but there was a lot of good feedback in this issue and #173 that this doesn't make sense, it's too literal an interpretation of "major version".

@chris-morgan
Copy link

But still only two versions? Ugh. So now it will literally be covering less than two thirds of users at some times of year. It should urgently be left as it is until that aspect is fixed, because otherwise it just makes this a mockery of a concept of a “baseline”, rather than a bad definition redeemed in most cases by an accident.

@foolip
Copy link
Collaborator

foolip commented May 26, 2023

@chris-morgan we're very aware that the current definition is too close to the bleeding edge in some cases, and while we're working on an updated definition, we'll steer clear of the gray zone and not mark things as Baseline on MDN if they're too new.

As I mentioned in #174 (comment), @layer and structuredClone() are the two newest features that were marked as Baseline ahead of the launch. According to caniuse.com, they have just above 90% global availability:
https://caniuse.com/css-cascade-layers
https://caniuse.com/mdn-api_structuredclone

It's possible those won't be Baseline by a new definition, we'll see.

@dontcallmedom
Copy link
Member

(thank you all for the great input so far - I would like to ask we all refrain from using derogatory or inflammatory language when expressing our opinions to ensure we keep the conversation constructive and professional)

@mrleblanc101
Copy link

mrleblanc101 commented May 26, 2023

@mrleblanc101 we decided in the CQ meeting yesterday that we're going to use the last 2 versions of Safari listed in https://github.com/mdn/browser-compat-data/blob/main/browsers/safari.json, which currently means 16.4 and 16.5.

Initially, the plan was 15.x + 16.x, but there was a lot of good feedback in this issue and #173 that this doesn't make sense, it's too literal an interpretation of "major version".

Woah that's crazy ! I guess that problem is more that Safari is tied to OS release, but people don't update OS as much as browser (since OS don't auto update in the background like browser do).

I also raise HUGE concern about supporting only the last 2 minor version of iOS as Baseline. That will leave ten's of MILLIONS, maybe hundred of millions of people at risk of using a broken website.

At this point Baseline is just useless and people will need to keep using tools like Can I Use to know the real support and juge depending on their Analitycs data.

@dontcallmedom
Copy link
Member

the decision on Safari is only to align it with other browsers; it doesn't impact features currently marked as baseline, and it doesn't imply this will stay as we revisit the definition.

And again, let's avoid inflammatory and derogatory (or ableist for that matter) language.

@romainmenke
Copy link
Contributor

#174 (comment)

@romainmenke are the numbers you're sharing from the server logs of a specific website, and if so which one? I'm under no illusion that the data behind browserslist is authoritative, but is there any publicly available data source that is less bad?

My point was more that this data source can be valuable, even when everyone agrees it is not a good source.
By filtering out known bad parts you can still spot general trends.

But if the bad parts are not ignored, the assumptions and conclusions based on this data will be off. 40% is just massive and skews the data heavily.


Our data covers ±40 sites, mainly regional to Belgium.
Some of the higher traffic sites :

I want to make it clear that this data is not authoritative in any way :)
It's just a data point which can serve as part of a frame of reference for browserslist.

I think it's easier to test things that are not true at this point, than the other way around.

example :

Caniuse claims 90% support for @layer

This is untrue because statcounter incorrectly labels users on android.

That it is untrue can be easily checked by anyone keeping there own stats.

How much support does @layer have?

No idea

@foolip
Copy link
Collaborator

foolip commented May 26, 2023

@romainmenke cool, so IIUC you work for or run a web development agency in Belgium with around 40 sites, that you can get stats for? I agree that's useful to contrast with other public data sources.

Do you happen to have numbers for mobile browsers to give us an idea of what the version breakdown looks like?

@atopal
Copy link
Collaborator

atopal commented May 28, 2023

@mrleblanc101 I understand your concern. "Last 2 versions" indicates that something might be labeled Baseline when it in fact is not widely supported yet. The goal is to come up with a definition that is both easy to understand and reflect developer expectations when it comes to availability. It seems like coverage of somewhere between 90% and 98% will be needed for that. We're currently exploring ways to better understand that expectation and better understand the data available to make a determination of availability.

Thinking out loud, I'm wondering if it would be better to declare something much more conservative as Baseline, while we explore options. Maybe something like "available in all browsers for at least 2 years", which of course might taint expectations the other way as "This is obviously too long to wait".

@jgraham That's an interesting idea. So, each vendor declares a version threshold they want developers to target, and keeps that updated regularly. That seems very similar to the Android Studio example in the Dan's comment above. I do agree though that vendors would have to also publish the relative percentage of reach to make it transparent for devs what they get when they do that.

We could then test whether those vendor recommendations are in line with developer expectations, and if yes, we could declare a feature Baseline when it has support in all Baseline versions of all browsers.

@dfabulich
Copy link
Collaborator Author

dfabulich commented May 28, 2023

I have some data to share! https://github.com/dfabulich/baseline-calculator

I gathered data from caniuse.com and brower-compat-data to compute how long it takes for web features to achieve widespread availability.

tl;dr: It takes about four years for 80% of features to achieve 95% market share.

Market Share 50% of features 75% of features 80% of features 90% of features 95% of features 97% of features 98% of features 99% of features
80% share 0 months 2 months 5 months 16 months 21 months 24 months 29 months never
90% share 7 months 21 months 22 months 32 months never never never never
95% share 30 months 45 months 48 months never never never never never
97% share 56 months never never never never never never never
98% share never never never never never never never never
99% share never never never never never never never never

Details on methodology, steps to reproduce, and interpretations of the data are available in the README of the repository.

Based on this data, I argue for the following conclusions:

Baseline can and should be time-based

  • We should change the goal for the Baseline definition. The current goal for the definition of Baseline is to set a guideline that "works for most developers most of the time." I've argued above that this goal has no meaning, that it's impossible use data to argue whether a feature does or doesn't work for most developers (or any developer) most of the time.
  • The new goal should be "80% of features with 95% market share." That is, we should say "Baseline means that a feature has been supported in all major browsers for N months." And then, we should choose an N such that 80% of features have achieved 95% market share in that time. I pick the numbers 80 and 95 just because they feel reasonable to me.
  • Baseline should therefore be defined as features that have been available in all major browsers for at least 48 months (four years). The data shows that four years is the amount of time it takes to hit these targets. We can probably revisit this calculation every few years. If, in 2027, we find that 80% of features achieve 95% market share in 36 months, we can redefine Baseline as three years then.

@atopal
Copy link
Collaborator

atopal commented May 28, 2023

That's fantastic Dan! Thanks for sharing. I particularly like that you looked at which features would be excluded at different levels of availability.

One major caveat when it comes to historical comparisons: IE11 was released in 2013 and never updated. Other than UC browser, I'm not aware of any browser that shares that issue. I'm going to guess that with IE discounted it takes much less than 4 years after availability across all browsers before developers feel it's widely available. One way to test that would be to look at usage metrics for features.

Either way, fantastic tool to have, thanks for making that available, Dan! I'm sure it will be very useful in testing assumptions we are making.

@dfabulich
Copy link
Collaborator Author

I have a section about this in the repo readme, which you might want to review:

Looking at the cell in the lower left (99% share, 50% of features), the table is saying that more than half of Baselineable features have never achieved 99% market share. In fact, only 61 of the 311 Baselineable features have achieved 99% market share; only 143 have achieved 98% market share.

That's because 98% market share is a surprisingly high bar to achieve. If you insist on avoiding web features that haven't achieved 98% market share, you'll be missing out on features that shipped years ago.

Just to pick a few examples, none of these features have reached 98% market share:

  • CSS
    • calc
    • grid
  • JS language
    • let/const
    • classes
    • arrow funtions
    • destructuring
    • template literals
  • JS API
    • Promises
    • IndexedDB
    • URL API
    • Proxy object
  • File formats
    • TTF
    • MP3 audio
    • MP4/H.264 video

All of these features became supported in all major browsers more than five years ago. Many of them became supported in all major browsers in 2015 with the release of Edge 12.

Why does it take so long to achieve 98% market share? There are a few reasons:

  • Despite browser vendors advertising their "evergreen" release strategy, both Apple and Google have withdrawn support for older devices. Users of these devices can't upgrade to the latest browser version without paying to replace their hardware.
  • Even users who can upgrade often upgrade slowly. It's common for iOS users to upgrade annually, if that.
  • There are a bunch of weird old browsers out there. Internet Explorer, Opera Mini, UC Browser… these add up to 1% market share all by themselves.

But that means that we shouldn't pick and choose individual browsers with low market share and ignore them. We should say "we target 95% market share." The fact that IE isn't getting update isn't what matters; what matters is that IE has 0.37% market share. But watch out, because Opera Mini has 0.99% market share all by itself, and Android 6 and below has 2% share, and iPhone 7 and below have another 2%.

That's why it takes four years to get to 95% share, and that's why it really, truly, isn't safe to adopt features after just two years from the day all major browsers start supporting it.

@atopal
Copy link
Collaborator

atopal commented May 29, 2023

Thank you Dan! I did see that section, but I don't think it applies to IE11. IE11 was an actively supported browser with much more than 2% marketshare that was installed on new computers for almost a decade.

In the table you provided, there is a massive gap in time between 90% share and 95% share, from 22 to 48 months. Reasonable developers can decide that 5 points are not worth waiting 26 months for. Also, I haven't run the data, and I'm not sure you can, but my hunch is that getting to 90% would be much faster today than say 5 years ago, specifically because IE11 and because I can see many positive developments to reduce fragmentation (eg. Safari on Mac now being OS independent) and none in the other direction.

That's why it takes four years to get to 95% share, and that's why it really, truly, isn't safe to adopt features after just two years from the day all major browsers start supporting it.

But it's not "really, truly safe" to ever adopt a "new" feature. There will always be people excluded, assuming that's what you mean by not safe. Most of them never ever get to 97%. Why is that not the safe threshold, but 95% is, why not 93%?

In the end, it's a judgement call, because developers and businesses have to balance limited resources and consider diminishing returns. I think one thing we have seen so far is that we neither have enough/reliable data about browser uptake, nor enough research on developer expectations to make definitive statements. That's worth spending time on, I don't think we need to be in a hurry here.

That said, I'm quite excited for all the gaps we have already identified and really appreciate everyone's effort to contribute to a better understanding of the situation. It's not a simple or easy problem to solve, but this is clearly a big web developer pain point and something we can have a positive impact on.

@dfabulich
Copy link
Collaborator Author

dfabulich commented May 29, 2023

Also, I haven't run the data, and I'm not sure you can, but my hunch is that getting to 90% would be much faster today than say 5 years ago

I did run that data for 95%, but it's too sparse. Out of 110 features released in the last five years, only 74 of them have reached 95% share, because most of them just haven't had four years to mature!

specifically because IE11 and because I can see many positive developments to reduce fragmentation (eg. Safari on Mac now being OS independent) and none in the other direction.

Sadly, that's false. There is a recent major change in the other direction: Google Chrome withdrew support for Android 5 in 2021 and Android 6 in 2022. The plan is to withdraw support for Android 7 later this year, and Android 8 next year, in perpetuity. This policy will create persistent fragmentation in the Android ecosystem forever. 😭

More broadly, I'm afraid that a common view among the team here is that the purpose of Baseline is to signal to developers in a clear, convincing way that features are now mature in less than two years.

But, if your target market share is 95%, that's just not true; it probably never will be.

I think one thing we have seen so far is that we neither have enough/reliable data about browser uptake, nor enough research on developer expectations to make definitive statements.

I agree with half of that: we need to survey developers to find out their desired market share thresholds and how long they're willing to wait for it.

But I don't agree that we don't have enough/reliable data about browser uptake. We have enough data to tell us that 80% of features take four years to reach 95% share and two years to reach 90% share. That lines up with the data from the RUM Archive https://github.com/rum-archive/rum-archive that we've discussed in #190. I see no reason to think that additional data sources will give us a fundamentally different answer.

Now, we just need to pick a market share threshold, which we can do with a survey like the one I've drafted in #208. With that survey data, we'll have all the information we need to make a decision.

@styfle
Copy link

styfle commented May 30, 2023

That's worth spending time on, I don't think we need to be in a hurry here.

I don’t want to use the word “hurry” but I do think time is of the essence if you want developers to use Baseline, because its already visible on MDN docs. You can’t wait years and then change the definition because developers are going to be upset or assume the original definition. (For example, look at what happened to twitter’s verified checkmark when it changed meaning)

@foolip
Copy link
Collaborator

foolip commented May 30, 2023

@dfabulich https://github.com/dfabulich/baseline-calculator is pretty cool, thanks for sharing!

I'm wondering if instead of computing a time to availability for each feature, if it's possible to compute the time to 90/95/98% availability at different points in time for a hypothetical feature that is enabled at the same time in all browsers? This is the worst case for a new feature, because any time a feature lands at different times, there's a "head start" for the early browsers when it lands in the final browser and we start counting.

In other words, for every month of the past 5 years, assuming a feature is available in every subsequent browser release, how many months does it take before that hypothetical features is available to 90/95/98% of users? Both a per-browser number of months and an overall number would be interesting.

My expectation is that per-browser 95% answers will roughly match #190 (comment), and that the overall 95% availability delay in months will be very sensitive to the precise inputs for the browsers with the slowest upgrade rates, both what its market share is and how fast users upgrade.

@dfabulich
Copy link
Collaborator Author

@foolip I'm not 100% sure what you're asking here. I've filed this as dfabulich/baseline-calculator#3 and maybe we can follow up there.

@dfabulich
Copy link
Collaborator Author

dfabulich commented May 30, 2023

Good news! I've published a year-by-year cohort analysis on https://github.com/dfabulich/baseline-calculator, and it does show improvement over time.

I'm now convinced that we can define Baseline as "release + 30 months" rather than "release + 48" months.

Focusing just on the "80% of features" column, here are the results for all cohort years:

Year 80% share 90% share 95% share 97% share 98% share 99% share
2015 19 months 42 months 54 months 83 months never never
2016 19 months 36 months 48 months never never never
2017 10 months 24 months 44 months never never never
2018 4 months 16 months 36 months never never never
2019 2 months 14 months 40 months never never never
2020 0 months 8 months 30 months never never never
2021 2 months 10 months never never never never
2022 2 months never never never never never
2023 never never never never never never

Based on this, it seems plausible that when defining Baseline by a number N months, we should consider only the latest cohort year for determining "N".

80% of 2015's Baselineable features took 54 months to reach 95% market share. 80% of 2020's Baselineable features took just 30 months to get there.

Based on this data, we should define Baseline based on the latest cohort year in which 80% of features have achieved 95% market share. Currently, that's 30 months for 2020, but we can update this in future years based on new data. (We don't yet have 30 months of data for features released in 2021.)

@tidoust
Copy link
Member

tidoust commented Dec 18, 2023

Closing this issue as generally addressed: The group rolled back on the "2+ major versions" approach and adopted a new baseline definition, thanks to the research conducted here (and elsewhere). Additional issues may be created to track more specific points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
baseline definition Issues related to the definition of Baseline
Projects
None yet
Development

No branches or pull requests