Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marketshare: According to what/who/how? #190

Open
bkardell opened this issue May 17, 2023 · 23 comments
Open

Marketshare: According to what/who/how? #190

bkardell opened this issue May 17, 2023 · 23 comments

Comments

@bkardell
Copy link

Separating this from #174, tho I expect the best answer to 174 could rely on it.
In fact, there are almost countless decisions which are made (even, for example, in terms of selling ads or default search) in large part based on how many users a browser has. So, we also wind up with lots of people trying their honest best to service users and make good decisions, but they have just no hope of really knowing. Without facts that we can agree to and trust it seems impossible for anyone to make good decisions here.

As relates to baseline in particular, there are an almost infinite number of ways to slice where one could draw the baseline - but discussing whether any of them are really 'good' really requires agreement on a trustable data source.

It's not clear to me that such a thing really exists. What we cite today, as I understand it, involves trackers, "popular sites" lists and UA String inspection (which is literally full of lies, especially for browsers which would be undercounted and especially for popular sites). Most of those are things that I personally would like less of, so I don't think these methods are going to get more useful, only less. I could be wrong! I would love to understand how I am if so!

I'm not sure there has ever been a truly great way to know this. Of course, a web browser could collect some fairly easy data point itself, sending a ping once a week or something if it is used but then who would believe a browser self-reporting? I think it's definitely important that we have a way to measure what "outliers" are too - not just how many chrome, safari and firefox versions there are. Maybe an "everything else" is ok, but it would be great to have a way to really count these

@foolip
Copy link
Collaborator

foolip commented May 18, 2023

On the UA string, at least Chrome, Edge, Firefox, Opera, Safari, Samsung Internet, and UC Browser can be distinguished by the UA string. AIUI Brave and Vivaldi cannot be distinguished from Chrome.

I think the hardest problem here is which sites' logs count as representative for the global population. Wikipedia comes to mind but is it alone representative? If we had stats from many big sites, a general picture would emerge, but could we turn it into concrete numbers that would be acceptable for use in the definition of what features are broadly available? It seems fraught, but not impossible.

@foolip
Copy link
Collaborator

foolip commented May 18, 2023

FWIW, even if we had this data, I think we ought not use it as the direct input to what features are broadly available. An obvious problem is that usage fluctuates, and we don't want features to flicker in and out of Baseline as this happens. Some level of indirection is needed.

One possibility is per-browser models of release uptake or "decay" of old releases. It probably looks like S-curve adoption of the new and exponential decay for old browser versions. And each browser vendor is in the best position to define this for their own browser.

Given this, we could model when a feature is available to 95% of each browser's users. In practice it would mean >95% overall availability, but without having to use a weight (market share) for each browser.

@bkardell
Copy link
Author

FWIW, even if we had this data, I think we ought not use it as the direct input to what features are broadly available. An obvious problem is that usage fluctuates, and we don't want features to flicker in and out of Baseline as this happens. Some level of indirection is needed.

I would argue that if this can easily happen then that definition of baseline is kind of broken or not especially useful.

On the UA string, at least Chrome, Edge, Firefox, Opera, Safari, Samsung Internet, and UC Browser can be distinguished by the UA string. AIUI Brave and Vivaldi cannot be distinguished from Chrome.
I think the hardest problem here is which sites' logs count as representative for the global population

Is it conceivable that we could come up with a way that doesn't use specific regular sites or UA strings? Idk, I don't have a real suggestion but both of those seem/always have seemed to me problematic.

I feel like there aren't "representative sites" and browsers (especially the ones that don't control overly large marketshare) sometimes (more than I realized) have to UA-string lie to sites that are popular.

@romainmenke
Copy link
Contributor

Is it conceivable that we could come up with a way that doesn't use specific regular sites or UA strings? Idk, I don't have a real suggestion but both of those seem/always have seemed to me problematic.

You can test actual features in a client side script and gather stats on passes/failures of that. Each test would be minimal (e.g. 'IntersectionObserver' in self or CSS.supports('selector(:has(> *))')).

Distribution could be an iframe that anyone can add to their sites.

It wouldn't be unbiased, but it would have different biases that maybe are less bad.

@foolip
Copy link
Collaborator

foolip commented May 18, 2023

FWIW, even if we had this data, I think we ought not use it as the direct input to what features are broadly available. An obvious problem is that usage fluctuates, and we don't want features to flicker in and out of Baseline as this happens. Some level of indirection is needed.

I would argue that if this can easily happen then that definition of baseline is kind of broken or not especially useful.

Yes, this is why I don't consider the seemingly simple "available to 95% of users" an option. (Nobody has proposed it, I'm not arguing against anyone here.)

The current definition of "2+ major releases" has other downsides, but not flip-flopping.

@bkardell
Copy link
Author

I feel like we're understanding that differently somehow. I was suggesting that we'd have to define it in such a way, still using real numbers, that it didn't flip-flop easily.
But, that shouldn't be hard to do.

But, actually reversing isn't a bug, necessarily at some point, right? It could happen over time, even with the releases based version and if it did we should admit that.

@foolip
Copy link
Collaborator

foolip commented May 23, 2023

Are you thinking of features that are removed from the platform, or are there other cases where a widely supported feature is no longer widely supported? The only condition I can think of is when a browser without support gains a lot of users fast, but is there a concrete case?

@atopal
Copy link
Collaborator

atopal commented May 25, 2023

I guess it can (very rarely) happen when a feature is removed that had been shipped in all major browsers, in which case yes, we'd have to remove the Baseline indicator for a feature. Much harder when it comes to the "Baseline 24" named feature set, that is supposed to be a fixed feature set, discussed in #176

Re share, browser versions are only proxies for what is in the end market share, and I agree that we're lacking actual reliable data on that. Maybe one first step would be to document the current state and the related issues?

And maybe there are multiple problems to solve here, one of marketshare per browser (which might be impossible in the end), but also versions used as a share of a given browser, as @foolip said, which might be much more tractable.

@romainmenke
Copy link
Contributor

Much harder when it comes to the "Baseline 24" named feature set, that is supposed to be a fixed feature set

Maybe it shouldn't be a fixed set?

Nuance between "Passes baseline requirements since 2024" and "Passes baseline requirements of 2024"

@atopal
Copy link
Collaborator

atopal commented May 25, 2023

Yes, that's definitely one way to handle it and maybe the only real option, but it removes some of the utility that comes from a fixed set. Let's continue this conversation in #176

@dfabulich
Copy link
Collaborator

As far as I know, the most widely used data source for browser version market share is StatCounter.

https://gs.statcounter.com/browser-version-market-share

This is the datasource that caniuse.com uses. Browserslist, in turn, pulls its data from caniuse.com, so it's StatCounter all the way down. But that's why Browserslist and Caniuse allow you to submit your own data from Google Analytics.

@foolip
Copy link
Collaborator

foolip commented May 26, 2023

Is anyone aware of any large web properties that make their browser stats publicly available?

@romainmenke
Copy link
Contributor

https://radar.cloudflare.com/adoption-and-usage

This has some bits of info.
Doesn't have any version info, but it does have a distribution of mobile vs desktop.

Chrome 30.2%
Chrome Mobile 26.6%
Mobile Safari 11.2%
Chrome Mobile Webview 6.2%
Firefox 5.4%
Edge 4.9%
Facebook 3.4%
Safari 3.2%

@JakeChampion we're interested in stats on browser version usage and uptake of new versions.

As far as I know no stats are gathered for polyfill.io
This is likely still true after the move to fastly?

@JakeChampion
Copy link

@JakeChampion we're interested in stats on browser version usage and uptake of new versions.

As far as I know no stats are gathered for polyfill.io
This is likely still true after the move to fastly?

That's correct

@atopal
Copy link
Collaborator

atopal commented May 28, 2023

As far as I know, the most widely used data source for browser version market share is StatCounter.

https://gs.statcounter.com/browser-version-market-share

This is the datasource that caniuse.com uses. Browserslist, in turn, pulls its data from caniuse.com, so it's StatCounter all the way down. But that's why Browserslist and Caniuse allow you to submit your own data from Google Analytics.

As @romainmenke mentioned, Browserlist doesn't provide per version data for mobile browsers (not sure why), so caniuse.com doesn't use that as the source. @Fyrd would you mind sharing where mobile usage data on caniuse.com comes from?

@dfabulich
Copy link
Collaborator

The issue about mobile Android Chrome versions is filed here. Fyrd/caniuse#3518

As I said:

  • StatCounter doesn't provide version data for mobile browsers. You'd have to ask them why. (They're hard to get a hold of.)
  • Caniuse uses StatCounter data, and that's why they don't provide mobile version data.
  • Browserslist ("browsers list", plural) uses Caniuse, and that's why they don't provide mobile version data.

@tunetheweb
Copy link

tunetheweb commented May 29, 2023

Last year Akamai launched RUMArchive a public BigQuery dataset of their traffic (anonymised), including Browser and version used.

For example, this query (1.94 GB so pretty cheap) gets the latest browser versions for all of April:

SELECT DEVICETYPE, USERAGENTFAMILY, USERAGENTVERSION, SUM(BEACONS) AS BEACONCOUNT
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE date >= '2023-04-01' AND date < '2023-05-01'
GROUP BY DEVICETYPE, USERAGENTFAMILY, USERAGENTVERSION
ORDER BY BEACONCOUNT DESC

Which gives this result:

DEVICETYPE USERAGENTFAMILY USERAGENTVERSION BEACONCOUNT
Mobile Mobile Safari 16 1,514,039,616
Mobile Mobile Safari UI/WKWebView 16 646,227,344
Desktop Chrome 112 645,050,437
Desktop Chrome 111 586,880,206
Mobile Chrome Mobile 112 436,583,977
Desktop Edge 112 245,407,639
Mobile Chrome Mobile 111 239,950,871
Mobile Mobile Safari 15 178,082,529
Mobile Chrome Mobile WebView 111 165,569,914
Tablet Mobile Safari 16 164,934,334
Desktop Safari 16 142,521,826
Desktop Edge 111 106,985,538
Mobile Samsung Internet 20 95,654,823
Mobile Chrome Mobile WebView 112 92,066,387
Mobile Chrome Mobile iOS 112 86,763,013
Mobile Facebook   76,041,176
Desktop Chrome 109 69,526,101
Mobile Mobile Safari UI/WKWebView 15 63,244,791
Desktop Safari 15 58,525,750
Desktop Firefox 111 50,272,708

Full data for that query here: https://docs.google.com/spreadsheets/d/14KiJdLG5iEtYoUp1E_mpLBxHBAsvUPbOZlNK6RKj6H8/edit#gid=1816182137

Probably best reaching out to @nicjansma if you want more details (or to verify my query!).

@foolip
Copy link
Collaborator

foolip commented May 29, 2023

@tunetheweb that's very cool, thank you!

Pardon my clumsy SQL, but I managed to query the version breakdown of Chrome and Safari mobile:

WITH versions AS (
  SELECT USERAGENTVERSION, SUM(BEACONS) AS VERSIONBEACONS
  FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
  WHERE date >= '2023-04-01' and date < '2023-05-01' and USERAGENTFAMILY = 'Chrome Mobile' # or 'Mobile Safari'
  GROUP BY USERAGENTVERSION
), total as (SELECT SUM(VERSIONBEACONS) FROM versions)
SELECT USERAGENTVERSION, VERSIONBEACONS, (100 * VERSIONBEACONS / total[0]) AS PERCENTAGE FROM versions, total
ORDER BY VERSIONBEACONS DESC

The top 10 rows for Chrome Android:

Chrome version Beacons Percentage
112 436657626 58.10
111 240050542 31.94
110 12100592 1.61
106 7692788 1.02
109 7590241 1.01
108 7327194 0.97
104 4437725 0.59
107 3565258 0.47
103 3104149 0.41
99 2767815 0.37

And Safari iOS:

Safari version Beacons Percentage
16 1678974033 84.19
15 221756814 11.12
14 44911843 2.25
13 39042738 1.96
12 7247390 0.36
10 912379 0.05
11 678644 0.03
9 426731 0.02
  125032 0.01
7 86954 0.00

According to this data then, Chrome 111+112 make up 90%, and Chrome 106-112 make up 95%.

For Safari iOS, 15+16 make up 95%, and I've filed rum-archive/rum-archive#16 about minor versions here.

@foolip
Copy link
Collaborator

foolip commented May 29, 2023

Counting how many releases are needed to cover 95% of users on a bunch of browsers, according to the above RUMArchive data:

  • Chrome desktop: 13 (100-112, 1 year apart)
  • Chrome mobile: 7 (106-112, 6 months apart, per above)
  • Edge: 3 (110-112, 2 months apart)
  • Firefox desktop: 12 (102-112, 10 months apart, due to ~6% being on 102, which is ESR)
  • Firefox mobile: 8 (106-112, 6 months apart)
  • Safari desktop: 4 major releases (13-16, 3 years apart, but see Track Safari minor versions rum-archive/rum-archive#16)
  • Safari mobile: 2 major releases (15+16, 1 year apart, per above)

I don't know anything about what kinds of users are over- and underrepresented in this data, but it's a data point at least.

@foolip
Copy link
Collaborator

foolip commented Jun 3, 2023

In dfabulich/baseline-calculator#7 I've reported that something seems quite strange with the Firefox version breakdown that caniuse has, which I believe are from statcounter.

I think a useful exercise (for all browsers) would be to compare version breakdown between sources (rumarchive and statcounter currently) and look for differences in the shape of the distribution, in particular in how fat the long tail is. This makes a huge difference to any availability calculation, and has a bigger impact the closer to 100% we want to get.

@bluesmoon
Copy link

Keep in mind that rum archive has usage data, not users. Each user has multiple hits.

@foolip
Copy link
Collaborator

foolip commented Jun 8, 2023

https://en.wikipedia.org/wiki/Usage_share_of_web_browsers has some good information. From there I found https://analytics.usa.gov/ which at first glance seems like it should be representative of the USA, but the site itself says that 97.7% of traffic is international, which seems strange. But it does say "Visitor Locations Right Now" and also "Realtime data may not be accurate" so maybe it's a temporary hickup.

@foolip
Copy link
Collaborator

foolip commented Oct 31, 2023

I've taken a look at the statcounter data (via caniuse) in similar way to #190 (comment) to see how many versions are needed to reach 95% of each browser individually, directly from the data and ignoring caniuse features in the analsysis. I've sent dfabulich/baseline-calculator#10 and created a spreadsheet to to explore, and found that you need these version ranges to get to 95%:

  • Chrome 83-117 (3 years and 4 months apart)
  • Edge 111-115 (4 months apart)
  • Firefox 78-116 (3 years and 1 month apart)
  • Safari 13.1-17 (3 years and 6 months apart)

For Chrome and Firefox, these ranges are much wider than in #190 (comment). I don't have any theory of why and I don't know which data source is closer to the truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants