Create lists of commands to test coverage parity against #1070

waldyrious · 2016-09-17T10:39:20Z

No description provided.

leostera · 2016-09-17T11:03:38Z

We should absolutely leverage the online linux man pages to periodically fetch a big, big list of commands.

Sample: http://linux.die.net/man/1/ has almost 10,000 commands.

waldyrious · 2016-09-17T11:11:06Z

We could make separate projects to track commands based on platform (since they overlap, we can't use milestones, which is a pity since it would give us a nice progress bar)

Linux:

(now moved to the spreadsheet mentioned below)

Windows:

OS X:

be5invis · 2017-01-16T17:28:11Z

@waldyrious For Windows, commands in CMD and PowerShell are DIFFERENT. For example, dir is a CMD built-in, also an alias of Get-ChildItem in PowerShell.
(Even ls in PowerShell is an alias, though they are going to remove it.)

waldyrious · 2017-01-16T17:46:31Z

@be5invis thanks for bringing that up. It is certainly something we need to consider (e.g. we currently treat all linuxes the same, even though some of the commands are shell-specific). See #190 and #816 for previous discussion.

That said, that problem does not affect this issue: the former deals with how we organize the command pages we do have, while this issue is about identifying which commands we don't yet have, but should.

be5invis · 2017-01-16T17:51:17Z

@waldyrious The full PowerShell commands on my PC:
https://gist.github.com/be5invis/57d906e6f6935f7a1f19279878c2c214

sbrl · 2017-05-04T16:28:06Z

If the tldr client emits a different exit status depending on whether the page exists or not (like tldr-bash-client does, then we could have an semi-automatic bash script that runs through a list of commands and emits a list of commands that don't exist yet. I could even write something like that & create a gist quite easily.

It would certainly help people who want to contribute find a page that needs doing.

agnivade · 2017-05-04T17:42:22Z

You can always check the files present in the repo itself for parity no ?

sbrl · 2017-05-04T19:53:31Z

@agnivade Yeah, we could do that too! Do a git clone and then a find tldr -iname "command.md"` or something

sbrl · 2017-06-29T12:50:04Z

Executing

var els = document.querySelectorAll("dt a[href]");
var cmds = [];
for(let el of els) cmds.push(el.innerText);
cmds.join("\n");

on http://linux.die.net/man/1/, gives this file: linux-commands.txt

This is obviously pending sorting, which I'll do soon.

sbrl · 2017-07-01T09:12:43Z

Sorting complete! Here's what I came up with:

cat linux-commands.txt | xargs -P4 -I {} bash -c 'if [[ "$(find tldr/pages/ -name {}.md | wc -l)" -ne 0 ]]; then echo yep>>yeses.txt; else echo nope>>nos.txt; fi'
echo We have $(cat yeses.txt | wc -l) out of $(cat linux-commands.txt | wc -l) commands in tldr-pages -  $(cat nos.txt | wc -l) commands are missing.

Running the above reveals that:

We've got 328 commands documented
We've got 9497 to go
We've done ~3.34% so far

waldyrious · 2017-09-03T20:52:26Z

I wonder if, after we've compiled one or more lists of commands to add, we could somehow calculate the completeness percentages automatically and display them in the README with a badge.

If we do compile multiple lists, we could even organize the completion badges in a table to provide a dashboard similar to the progress table of Wikipedia's WikiProject Missing encyclopedic articles.

Does anyone have an idea whether something like that is doable and/or hints about how to go about implementing it?

agnivade · 2017-09-04T04:51:17Z

I would like to take a stab at this. I am thinking of just taking the GNU coreutils list and test parity against it. The linux.die.net page contains a lot commands which have to be installed separately.

The badge thing can be easily done with a custom svg element.

waldyrious · 2017-09-04T07:40:19Z

@agnivade I can't wait to see what you come up with! I'm more than willing to provide the actual content of the lists if that takes some work off your plate (I have a bunch of notes and links in a google doc, besides the resources I listed above).

agnivade · 2017-09-04T08:38:54Z

Sure, that would be great.

sbrl · 2017-09-04T14:32:31Z

Oooh, awesome :D

agnivade · 2017-09-14T04:38:57Z

@waldyrious - I might take a stab at it this weekend. Can you share the links/notes that you have ?

waldyrious · 2017-09-14T08:35:06Z

Sure. I'll block off one hour to work on this today, and will post the resulting data.

waldyrious · 2017-09-15T13:19:59Z

Heads-up: the wiki page "Pages plan" has been deleted to centralize tracking of missing pages in this thread. I've moved all the information that was present there to this spreadsheet, which is publicly viewable and anyone can add comments. It's a work in progress (I just started it). I'll give write access to the current maintainers.

sbrl · 2017-09-15T15:38:38Z

@waldyrious Wow, that's an impressive spreadsheet! Is there a filter for just the ones that haven't been done yet? How are bulk lists of commands added to the list?

waldyrious · 2017-09-15T15:43:11Z

There will be a filter, yeah -- that's one of the reasons I've decided to build it in a spreadsheet. The lists of commands will be added manually (using various helper tools, of course), since the various sources don't use a common format. Let me know on Gitter if you'd like to work on this so we can coordinate.

agnivade · 2017-09-15T16:00:07Z

I am concerned about how do I get the total list of commands programmatically. Since I would like to run the list against every commit merged with master.

waldyrious · 2017-09-15T18:18:28Z

That document is by no means meant to be the final location of the list. It's just the way I figured would be easiest to get it started and quickly filling it. I don't know yet what setup would be the best balance of (1) community maintenance of the data, (2) machine consumption of the contents, (3) automatic synchronization (as much as possible) as new pages are added. Ideas are welcome.

Also, the choice of how to set this up would depend on how often we would want to update the list. I think we can start with something reasonably static, to make things easier, especially since we have a lot of work to catch up to established commands before it would make sense to start chasing more dynamic lists (say, top node.js-based CLI tools or something like that)

agnivade · 2017-09-15T19:33:43Z

(3) automatic synchronization (as much as possible) as new pages are added.

Umm no .. I think you got the wrong idea. 😝 We don't need to synchronize when new pages are added. That would be crazy. It seems like you put a lot of effort into this. Frankly, I didn't need so much details.

Here's what my plan is -

On every commit to master branch, run a script which will get all the commands in the repo, get a list of target commands we want to match our list with, calculate percentage and update the svg badge which shows the percent completion.

That's it. No need to update any list when new pages are added.

waldyrious · 2017-09-15T22:16:05Z

Hahah, yeah, I got a little carried away there. Although I might have given off the wrong impression.

The way I was planning to have this "automatic sync" feature was to simply open one issue per command to add, and assign them to milestones according to the lists they appear in. That way we'd get a nice live overview page with progress bars for each of the lists we'd want to reach parity with. For reference, my inspiration came from the overview table of Wikipedia's WikiProject Missing encyclopedic articles.

In addition one milestone per (major) source, we might also want platform-specific lists (windows commands, bsd, etc.), and maybe topic-specific lists (email clients, text editors, compilers, etc.).

Of course, this doesn't prevent us from having a "master completeness list" and use that to compute a single "overall completeness" metric. We do need to decide what goes into that list, though. The obvious choice is a metric of the most popular pages (e.g. the top 1000 entries sorted by how many of those lists they appear on), but let me know if you think something else would make more sense.

agnivade · 2017-09-16T04:19:09Z

Your idea seems like a lot of manual work, something which I personally would want to avoid. I was planning for that "completeness" metric and be done with it. If we indeed decide that it's just gonna be 1000 commands, then we might as well compute the list and check it in in the repo, so that my code can easily compare with it.

waldyrious · 2017-09-16T07:14:41Z

Sure, as I said, the list will probably not change much after we compile it. My idea is just a nice-to-have I might do on my own later on (unless you guys object).

For the master list, we just need to decide what is the criteria we'll use to define its contents -- from there it's just a matter of collecting the rest of the data and applying the filters.

So in that regard, what are your thoughts regarding which criteria to use: which lists to compare against, how many commands to include, etc?

waldyrious · 2017-09-16T08:48:52Z

Update: the table is pretty much ready now. Some areas that still need some help:

cells painted yellow indicate sources that haven't been integrated yet. There are four of them. Any help in that regard would be appreciated (just parsing those sources into a plaintext list of commands would suffice)
cells painted orange indicate expected counts (number of "x" marks in that column) that don't match the automated count. I'm not sure what's going on there, so it would be nice if someone with fresh eyes could double-check those columns.

Apart from that, we can start deciding how to use that data to compile our master list :)

Note: I didn't include the linux.die.net manpages, since even just the first section contains about 10,000 commands, which makes the table unwieldy and kinda overwhelming, to be honest.

waldyrious · 2017-09-16T08:55:04Z

By the way, the plan to use milestones won't be possible after all. I had already reached this conclusion before, but forgot it in the meantime: it turns out GitHub only allows a single milestone per issue, so there would be no way to simultaneously track progress towards multiple coverage parity goals :(

That said, we could still have a milestone for the master parity list, which IMO would be a good thing as it would make those missing commands more visible as issues that newcomers could tackle. (It could also be the target URL for the badge.)

agnivade · 2017-09-16T17:47:29Z

cells painted orange indicate expected counts (number of "x" marks in that column) that don't match the automated count.

So are you saying you have manually counted each 'x' just to verify the automated count ? That's some dedication ! Why bother with the manual count at all if there is already automation for it ? Unless you suspect that =COUNTIF(L5:L,"x") is wrong ?

waldyrious · 2017-09-16T18:06:44Z

So are you saying you have manually counted each 'x' just to verify the automated count ? That's some dedication !

Oh god no, haha :P I'm not that crazy ;)
I had the correct count from the actual lists that can be seen in the "lists" sheet (number of lines, basically, which any decent text editor will provide), which gives me more confidence in the result than using the formulas. Besides, some of the automated counts indeed are correct. I found some duplicated entries before, due to imperfect filtering, and that fixed some of the mismatched counts -- but I can't figure out what's causing the remaining mismatches...

agnivade · 2017-09-16T18:47:58Z

Ah I see :) Didn't notice that there was another sheet.

sbrl · 2017-09-16T21:25:26Z

Awesome work! Yeah, perhaps we could have a 'current goal' to document all the commands in a given list, and keep moving to new lists as we complete old ones. Having a list of commands auto-generated that have yet to be documented for the 'current goal' parity list would be helpful for newcomers, yeah.

The sheet is rather unwieldy though on my screen, since the frozen panes take up about 60% of my available screen real-estate on my laptop 😕

agnivade · 2017-09-17T08:24:14Z

I think we should move the orange and yellow cells to a new row below. Because its in the same row as coverage. And it just signifies the expected count, not coverage.

And lastly, our current coverage % is 52 right ?

waldyrious · 2017-09-17T09:48:53Z

The sheet is rather unwieldy though on my screen, since the frozen panes take up about 60% of my available screen real-estate on my laptop 😕

I made the heading more compact. Is that workable now?

I think we should move the orange and yellow cells to a new row below. Because its in the same row as coverage. And it just signifies the expected count, not coverage.

Agreed, I just did that. Ideally we won't even have to include the expected count on the table, but until we figure out what's going on with the mismatched values, we'll need those cells.

waldyrious · 2017-09-17T10:31:38Z

Ok, I've filled the table some more.
Also, apparently I can't reproduce the count mismatch anymore, so ¯\_(ツ)_/¯

The two sources that still need parsing into a plain list of command names are Inconsolation and ArchWiki's List of applications. Any help appreciated!

And lastly, our current coverage % is 52 right ?

Yes, but that's a plain fraction that doesn't consider the relative importance of the missing commands. I'd rather have a weighted coverage percentage, where each entry is weighted by the number of occurrences in these other lists. ~~I'll have that working in a bit.~~ Edit: done (see top right corner of the table). Looks like at this point it isn't much different from the plain percentage, though 😝

sbrl · 2017-09-17T13:58:52Z

Through weird es6 magic, I bring you a list of commands for the Inconsolation lists! Here's the code I used in the firefox console for reference:

(function() {
let result = [];
document.querySelectorAll(".entry-content > p:nth-child(4) a[href]").forEach((el) => {
    if(el.innerText.search(":") === -1 || el.innerText.trim()[0] !== el.innerText.trim()[0].toLowerCase() || el.innerText.search(/\./) !== -1) return;
    result.push(...el.innerText.split(":")[0].split(/\s*(and|,)\s*/gi));
});
result = result.filter((cmd) => cmd.search(/[,\*\(\) \{\}]|and/) == -1 || cmd.length == 0);
console.log(result.filter((el, i, arr) => arr.indexOf(el) === i).join("\n"));
})();

...I've pasted them into the spreadsheet. They might need a little bit of tidy-up work though, since the input was messy.

That archwiki one though looks tough, since they don't detail the name of all commands in the list.

waldyrious · 2017-09-17T15:02:31Z

Can you explain the code? I'm afraid just parsing the link titles will produce a list with way too many missing entries, because many of the titles don't contain command names directly. On the other hand, I'm not sure I can think of anything that would work better without involving manual processing of each page linked from the entries... 😕

As for the ArchWiki page, I guess it would suffice to extract only the contents of the sections titled "Console". That will definitely leave some gaps in the output, but the page isn't meant to be a structured list anyway, nor it focuses specifically on command line programs, so I guess it's reasonable to parse it more loosely.

sbrl · 2017-09-17T20:35:40Z

It is a bit messy, isn't it! 😛 What it does is extract the names of the commands listed on the page, since I assumed that it was an index of all the commands the author had talked about. It discards the following:

Items in the list with a capital letter at the beginning
Items without a colon

Once done, it extracts the bit before the colon and does the following:

Splits it on , and and
Discards any parts containing and, ,, (, ), {, },
Discards any parts that have a length of zero.

gingerbeardman · 2018-01-30T17:54:57Z

Some related discussion here: #1953

agnivade · 2018-02-01T15:39:02Z

This has been pending too long ! I will go on vacation soon, I promise to work on this during that time !

gingerbeardman · 2018-02-01T15:40:36Z

Enjoy your vacation! If the two coincide, then so be it 👍

waldyrious added the new command Issues requesting creation of a new page. label Sep 17, 2016

waldyrious self-assigned this Sep 17, 2016

leostera added the tooling Helper tools, scripts and automated processes. label Sep 17, 2016

leostera mentioned this issue Sep 17, 2016

Interactive wizard to submit a new command #1073

Open

waldyrious mentioned this issue Oct 4, 2016

Issues on Windows tldr-pages/tldr-node-client#52

Closed

agnivade self-assigned this Sep 4, 2017

waldyrious mentioned this issue Sep 4, 2017

Standardize the project description #1192

Closed

waldyrious mentioned this issue Sep 12, 2017

Import content from manpages with EXAMPLES sections #1489

Open

waldyrious mentioned this issue Sep 15, 2017

Add the quota command #832

Closed

slash3b mentioned this issue Sep 15, 2017

expr, bzip2: add pages; iptables: fix typo #827

Merged

waldyrious added documentation Issues/PRs modifying the documentation. and removed new command Issues requesting creation of a new page. labels Sep 15, 2017

agnivade mentioned this issue Feb 20, 2018

Calculating command coverage #2002

Closed

agnivade removed their assignment Oct 17, 2019

Create lists of commands to test coverage parity against #1070

Create lists of commands to test coverage parity against #1070

Comments

waldyrious commented Sep 17, 2016

leostera commented Sep 17, 2016 • edited by waldyrious Loading

waldyrious commented Sep 17, 2016 • edited Loading

be5invis commented Jan 16, 2017 • edited Loading

waldyrious commented Jan 16, 2017

be5invis commented Jan 16, 2017 • edited Loading

sbrl commented May 4, 2017 • edited Loading

agnivade commented May 4, 2017

sbrl commented May 4, 2017 • edited Loading

sbrl commented Jun 29, 2017

sbrl commented Jul 1, 2017

waldyrious commented Sep 3, 2017

agnivade commented Sep 4, 2017

waldyrious commented Sep 4, 2017

agnivade commented Sep 4, 2017

sbrl commented Sep 4, 2017

agnivade commented Sep 14, 2017

waldyrious commented Sep 14, 2017

waldyrious commented Sep 15, 2017 • edited Loading

sbrl commented Sep 15, 2017

waldyrious commented Sep 15, 2017

agnivade commented Sep 15, 2017

waldyrious commented Sep 15, 2017 • edited Loading

agnivade commented Sep 15, 2017

waldyrious commented Sep 15, 2017

agnivade commented Sep 16, 2017

waldyrious commented Sep 16, 2017

waldyrious commented Sep 16, 2017 • edited Loading

waldyrious commented Sep 16, 2017

agnivade commented Sep 16, 2017

waldyrious commented Sep 16, 2017 • edited Loading

agnivade commented Sep 16, 2017

sbrl commented Sep 16, 2017

agnivade commented Sep 17, 2017

waldyrious commented Sep 17, 2017

waldyrious commented Sep 17, 2017 • edited Loading

sbrl commented Sep 17, 2017 • edited Loading

waldyrious commented Sep 17, 2017

sbrl commented Sep 17, 2017

gingerbeardman commented Jan 30, 2018

agnivade commented Feb 1, 2018

gingerbeardman commented Feb 1, 2018

leostera commented Sep 17, 2016 •

edited by waldyrious

Loading

waldyrious commented Sep 17, 2016 •

edited

Loading

be5invis commented Jan 16, 2017 •

edited

Loading

be5invis commented Jan 16, 2017 •

edited

Loading

sbrl commented May 4, 2017 •

edited

Loading

sbrl commented May 4, 2017 •

edited

Loading

waldyrious commented Sep 15, 2017 •

edited

Loading

waldyrious commented Sep 15, 2017 •

edited

Loading

waldyrious commented Sep 16, 2017 •

edited

Loading

waldyrious commented Sep 16, 2017 •

edited

Loading

waldyrious commented Sep 17, 2017 •

edited

Loading

sbrl commented Sep 17, 2017 •

edited

Loading