Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis-CI Node version usage stats. #6659

Closed
jdalton opened this issue Sep 30, 2016 · 7 comments
Closed

Travis-CI Node version usage stats. #6659

jdalton opened this issue Sep 30, 2016 · 7 comments
Labels

Comments

@jdalton
Copy link

@jdalton jdalton commented Sep 30, 2016

I was wondering if the Travis-CI team had insights into its Node 0.10 usage?
Usage stats help developers gauge when to drop support for older Node versions.

👉 It's good to know the extent by which Node version usage stats are distorted by CIs.

Context:

From time to time @seldo, of npm, posts npm usage stats for various Node versions.
In a recent update the usage stats for Node 0.10, which falls off Node support in October,
were ~13%.

Drilling in a bit more, it appears 90% of those on 0.10 come from Linux (maybe from Travis?).
@thealphanerd

@jdalton
Copy link
Author

@jdalton jdalton commented Sep 30, 2016

Using the Google BigQuery Github dataset I created a query to rank Node versions specified in .travis.yml files:

SQL query:
SELECT 
  COUNT(*) as cnt, version
FROM 
  JS(
    (SELECT content FROM [bigquery-public-data:github_repos.contents] WHERE id IN (
      SELECT id FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 11) = ".travis.yml" 
    )),
    content,
    "[{ name: 'version', type: 'string'}]",
    "function(row, emit) {
       var yml = row.content || '';
       var match = yml.match(/^\s*node_js\s*:((?:\s*-.+)+|\s*\[[\s\S]+?\])/m);
       var snippet = match ? match[1] : '';
       var cleaned = snippet.replace(/#.+/g, '');
       var versions = cleaned.match(/[\w.&*]+/g) || [];

       versions.forEach(function(ver) {
         var match = ver.match(/^v?([0.]*\d+)/);
         var snippet = match ? match[1] : '';
         var cleaned = snippet.replace(/\b0{2,}\b/g, '0').replace(/^\./, '0.');
         var major = cleaned || ver.replace(/\bio\.js\b/g, 'iojs');

         emit({ version: major });
       });
     }" 
  )
GROUP BY version
ORDER BY cnt DESC
LIMIT 50
The results show a large number of 0.10 runs which kind of meshes with the npm usage stats:

@seldo
Is there a way to filter out Travis and other CIs IP ranges from the npm usage data?

@Joshua-Anderson
Copy link

@Joshua-Anderson Joshua-Anderson commented Oct 1, 2016

Travis CI runs on aws and gcs, so I think filtering IPs is probably impossible to do without excluding all the production applications that run on these platforms.

@jdalton
Copy link
Author

@jdalton jdalton commented Oct 1, 2016

@Joshua-Anderson

Travis CI runs on aws and gcs, so I think filtering IPs is probably impossible to do without excluding all the production applications that run on these platforms.

Could you double check.

Via @seldo:

status/782037773150629888:

Their IPs are extremely easy to detect if you are the npm registry :-)

status/782038480993996800:

They have 4 main IPs, linked to http://travis-ci.com hostnames, that account for 10% of traffic.

@Joshua-Anderson
Copy link

@Joshua-Anderson Joshua-Anderson commented Oct 1, 2016

@jdalton You're right, I'm a little out of date: https://docs.travis-ci.com/user/ip-addresses/ Travis-CI doesn't give an explicit range for GCS yet though.

@jdalton
Copy link
Author

@jdalton jdalton commented Oct 1, 2016

@Joshua-Anderson

To summarize docs.travis-ci.com/user/ip-addresses,
there's 5 IPs (4 linux; 1 osx) + unknown linux IPs?

As of mid March 2016:

Container-based (travis-ci.com):
  1) 54.172.141.90/32
  2) 52.3.133.20/32

Container-based (travis-ci.org):
  3) 52.0.240.122/32
  4) 52.22.60.255/32

OS X:
  5) 208.78.110.192/27

Sudo-enabled Linux:
  N/A

👉 The unknown is IPs for sudo-enabled Linux.

I performed a quick query to see how many Node Travis configs are sudo enabled.

There's a gotcha though. The docs state that the default for unspecified sudo changed some time in 2015 so this query doesn't account for that if that's the case. I'm not entirely sure when, in 2016, the snapshot of GitHub data was taken either.

SQL query:
SELECT 
  COUNT(*) as cnt, sudo
FROM 
  JS(
    (SELECT content FROM [bigquery-public-data:github_repos.contents] WHERE id IN (
      SELECT id FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 11) = ".travis.yml" 
    )),
    content,
    "[{ name: 'sudo', type: 'boolean'}]",
    "function(row, emit) {
       var yml = row.content || '';
       var isNode = /^\s*node_js\s*:\s*(?:-.+|\[[\s\S]+?\])/m.test(yml);

       if (isNode) {
         var match = yml.match(/^\s*sudo\s*:\s*(.+)/m);
         var snippet = match ? match[1] : '';
         var cleaned = snippet.replace(/#.+/g, '').trim();

         var sudo = cleaned != 'false' && cleaned != 'no';
         emit({ sudo: sudo });
       }
     }" 
  )
GROUP BY sudo
ORDER BY cnt DESC

The results show there are considerably more, 2.6x, sudo runs.
In other words more IPs which aren't available than those that are.

Filtering in on just 0.10 it looks like more than 3.5x of them are sudo runs.
In other words, there are considerably more undetectable than detectable 0.10 runs.

SQL query:
SELECT 
  COUNT(*) as cnt, sudo
FROM 
  JS(
    (SELECT content FROM [bigquery-public-data:github_repos.contents] WHERE id IN (
      SELECT id FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 11) = ".travis.yml" 
    )),
    content,
    "[{ name: 'sudo', type: 'boolean'}]",
    "function(row, emit) {
       var yml = row.content || '';
       var match = yml.match(/^\s*node_js\s*:((?:\s*-.+)+|\s*\[[\s\S]+?\])/m);
       var snippet = match ? match[1] : '';
       var cleaned = snippet.replace(/#.+/g, '');
       var versions = cleaned.match(/[\w.&*]+/g) || [];

       versions = versions.map(function(ver) {
         var match = ver.match(/^v?([0.]*\d+)/);
         var snippet = match ? match[1] : '';
         var cleaned = snippet.replace(/\b0{2,}\b/g, '0').replace(/^\./, '0.');
         var major = cleaned || ver.replace(/\bio\.js\b/g, 'iojs');
         return major;
       });

       if (versions.indexOf('0.10') < 0) {
         return;
       }
       var match = yml.match(/^\s*sudo\s*:\s*(.+)/m);
       var snippet = match ? match[1] : '';
       var cleaned = snippet.replace(/#.+/g, '').trim();

       var sudo = cleaned != 'false' && cleaned != 'no';
       emit({ sudo: sudo });
     }" 
  )
GROUP BY sudo
ORDER BY cnt DESC

Does that gel with Travis' data?

@gr2m
Copy link

@gr2m gr2m commented Oct 1, 2016

This is great work, thanks for doing this!

@stale
Copy link

@stale stale bot commented Apr 14, 2018

Thanks for contributing to this issue. As it has been 90 days since the last activity, we are automatically closing the issue. This is often because the request was already solved in some way and it just wasn't updated or it's no longer applicable. If that's not the case, please do feel free to either reopen this issue or open a new one. We'll gladly take a look again! You can read more here: https://blog.travis-ci.com/2018-03-09-closing-old-issues

@stale stale bot added the stale label Apr 14, 2018
@stale stale bot closed this Apr 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants