New pages structure #190

rprieto · 2014-04-23T23:49:05Z

(copied form the conversation at #147)

The current folder structure looks like:

pages
  |__ common
  |   |__ tar.md         # gnu command everyone should have
  |   |__ ssh.md         # gnu command everyone should have
  |   |__ npm.md         # just a tool that could be installed on any OS
  |__ linux
  |   |__ emerge.md      # clearly a linux-only tool
  |__ osx
      |__ ssh.md         #  OSX has a different version with different flags

This has worked so far, but

the split can feel arbitrary
you need to jump around to see if a command is already covered
clients potentially need to make a lot of requests to get a given page

There seems to be a consensus among clients to move to a flatter folder structure, that would look like this:

pages
   |__ tar.md
   |__ ssh.md
   |__ ssh.osx.md
   |__ emerge.md
   |__ npm.md

By default, we can display <command>.md, but <command>.<os>.md should have precedence if available.

This means the clients would let OSX users query Linux commands, but after all why not, they might just be curious about it. Or they might be using tldr on a Mac while SSHing on a Linux box that doesn't have it.

To clear up platform constraints, the description for emerge for example could say "for Linux" or "Gentoo specific command".

This would be a breaking change though, so we need a plan of attack. One option is to address all open PRs to get to a stable state, then copy all pages to the new structure. The old structure can live on for a while until all clients update. The PR guidelines would say to push changes to the new structure only.

The text was updated successfully, but these errors were encountered:

felixonmars · 2014-07-14T13:41:30Z

+1 for keeping the old structure working while switching to the new one, as users may not upgrade their clients in time (or if they cares).

waldyrious · 2014-08-07T22:01:26Z

+1 with @felixonmars

felixonmars · 2014-10-07T07:43:13Z

Hi, any schedule on this? It has been several more months after the discussion 😺

rprieto · 2014-10-07T09:41:29Z

Good point, it's still something I'm keen to look into, but requires more thinking now that there's many clients.

Should have start a conversation on gitter?

felixonmars · 2014-10-08T03:32:28Z

I've joined the conversation, though didn't see anything - does it keep logs?

M3kH · 2015-12-28T21:53:00Z

Can this done by the metadata syntax on MarkDown? maybe as additional to avoid duplicated files on categories/tags?

---
support:
 - osx
 - linux
 -- debian
...
---

igorshubovych · 2015-12-28T22:09:19Z

@M3kH in the end of this optimization process we will build man :)

M3kH · 2015-12-28T22:25:45Z

I would later prefer store informations in the markdown as yml instead of using the file names.
But this is my personal opinion :-)

waldyrious · 2017-09-04T10:36:17Z

Quoting a comment by @agnivade in #1436, which replaced all references to OS X with the new name, macOS:

Just a note that our platform folder is still named osx. And changing that might have consequences for various clients, so keeping it as is for now.

ibnesayeed · 2017-12-01T16:44:53Z

While we are here talking about the new page structure, we should also consider the possibility of adding custom tldrs that are not maintained by the shared repository, but a place where users can add their own overrides or additions for quick references. A note around this approach can be seen in #1726 (comment).

chamini2 · 2018-02-06T20:33:31Z

I like @ibnesayeed's idea. Kind of like brew taps?

zlatanvasovic · 2019-12-13T20:10:08Z

Any updates? Can you provide any advantages this system would give over the current one? As I see it, everything you want to have is already included in the current system. There is already a related issue in tldr-node-client: tldr-pages/tldr-node-client/issues/247

sbrl · 2019-12-14T22:53:52Z

Now that we have languages to contend with, this seems like even less of a good idea.

zlatanvasovic · 2019-12-14T23:08:55Z

Do others agree to close this?

ping @tldr-pages

agnivade · 2019-12-15T14:07:15Z

I agree that languages make this less appealing.

zlatanvasovic · 2020-05-27T20:48:04Z

@sbrl A tldr-bot may help with that, e.g. by comparing different languages. Although I really doubt we'll ever encounter such translation issue, except for the originally-Chinese pages. So it comes down to one language only.

sbrl · 2020-05-27T21:06:48Z

If it can happen - it will lol

But yeah, we'll need to significantly expand the tldr-bot I think to support this change if we go ahead with option E.

waldyrious · 2020-05-27T22:44:52Z

Well, we can certainly continuing the convention of treating the English pages as the master ones, regardless of where they're stored. The file structure was never a technical obstacle to break this anyway — it was enforced by convention, and it can continue to be if we agree that's what makes maintenance of the multilingual pages viable for the maintenance team.

Of course, it would be awesome to, on top of this, have tldr-bot perform an automatic check to make sure the example commands (not their descriptions, of course) are the same in every language version of a page — we just need to ignore the token contents, which are language-specific, and the comparison can be done pretty much verbatim. Or am I oversimplifying the problem?

owenvoke · 2020-05-27T23:43:27Z

I think either option E or the current situation makes the most sense personally.

I think with tldr-bot it would be good to add some GitHub Action workflows (or update the bot scripts) for various things to make this easier. 👍

agnivade · 2020-05-28T04:10:43Z

Yes, the problem of reviews is a good point. But IMO it's a human obstacle, not a technical one. @waldyrious' point about treating English pages as master copies by "convention" is a good one. And some good amount of tooling via the tldr-bot to check for overall consistency of pages should take us a long way.

I would also suggest writing down language owners somewhere so that they can be added to reviews, or even have tldr-bot take care of it automatically.

@fejx - would be open to take the lead on this one and add some tooling around tldr-bot to ease this transition ? Also cc @Keating950 who is working on maintaining a list of pages needing translations.

fejx · 2020-05-29T20:07:57Z

I am willing to work on this issue, but I am unsure what you mean by "adding some tooling around tldr-bot". Are you referring to additional linting checks? Or are you referring to scripts to transform the existing structure into the new (as well as the other way around)?

agnivade · 2020-05-30T04:56:49Z

Ah I was referring to the linting checks. Things like:

Adding a comment on a PR which modifies a page in one language, but there are copies of the page in other languages.
@waldyrious' comment

have tldr-bot perform an automatic check to make sure the example commands (not their descriptions, of course) are the same in every language version of a page — we just need to ignore the token contents, which are language-specific, and the comparison can be done pretty much verbatim.
If a page is added which has translations in other languages, post links to those pages in the PR so that it is easy to check for style consistency, and the PR submitter is made aware that other pages already exist. So the number of examples and their order has to be exactly same.
Having language owners for non-english pages and ping them whenever a PR is created in any of those folders. (I think the GitHub CODEOWNERS file may automatically take care of this.)

The overall objective is to maintain consistency of the pages across languages.

fejx · 2020-05-30T12:41:55Z

Sure thing! I will try and see if I get a chance to take a look at it this weekend.

ibnesayeed · 2020-05-30T12:56:56Z

While this is an implementation detail that users might not care about, I personally think a scheme that groups variations related to a specific command together would be much more approachable and manageable from contributors' perspective. If the fear is that too many files in a single folder will make things slow in certain machines/OSes, I think users of this program are technically sound ones who generally have development machines that are rather recent and powerful. Even that issue can be solved by creating folders for each command and place all the variations of documents related to a specific command in its corresponding folder. If, in the future, we find there are way too many commands to manage, it can further be organized in groups of folders named after the first letter of each command.

agnivade · 2020-05-30T14:51:09Z

That's not a bad point. So far, we have been considering platform > language, and language > platform, but command > language is also an option. The problem that I see is that it's an inverted way of grouping things. Usually things are grouped at a larger level (platform, language), and smaller things are included inside it (pages). It's not an issue per se, but I don't see any problems from a contributor's perspective with option E too.

fejx · 2020-05-30T19:38:09Z

I don't see how the structure command > language would reduce the number of files in a folder.

|-- en
|-- |-- ls.md
|-- |-- grep.md
|-- |-- man.md
|-- de
|-- |-- ls.md
|-- |-- grep.md
|-- |-- man.md

These folder have three files each.

|-- ls
|-- |-- de.md
|-- |-- en.md
|-- grep
|-- |-- de.md
|-- |-- en.md
|-- man
|-- |-- de.md
|-- |-- en.md

Here, the root still has three files, they are just folders instead.

agnivade · 2020-05-31T03:35:07Z

It's not a matter of reducing the files. I think it's about "would be much more approachable and manageable from contributors' perspective.". Although I don't see any issue with that with option E too.

sbrl · 2020-06-01T20:14:02Z

I like that idea of tooling @waldyrious, and thanks so much @fejx for taking the lead there!

zlatanvasovic · 2020-06-01T20:38:33Z

Just one thing. If the structure is language / platform / page or platform / language / page, it is easier to implement it into a tldr client. Any structure where page is before the others complicates implementing the language and platform features. This is coming from refactoring Python and C clients recently.

fejx · 2020-06-01T20:56:30Z

Sure, I am looking forward to implement the tooling! However, I will wait with the implementation until we have converged to a mutual decision and the issue is closed. Otherwise, I might end up implementing checkstyles for a structure we are not going to use.

agnivade · 2020-06-02T05:04:26Z

@ibnesayeed - Can you lay out specifically why you believe inverting the grouping by putting the page first "would be much more approachable and manageable from contributors" ? And what is the problem this solves over option E ?

Thanks.

ibnesayeed · 2020-06-02T12:25:34Z

One of the reasons why I think putting command name first in the hierarchy would be a better choice has to do with how semantic URIs and resources on the Web behave. Just think about it, what is the main resource we are talking about that both contributors and consumers are interested in? I think it is the command itself, not the language, not the platform, those are variations of the main resource. This is precisely how content negotiation works in web servers. The URI points to the abstract resource, from there, content type, language, content encoding, character encoding, and many other aspects are negotiated. Content negotiation often happens via respective headers, but for static files, many servers use a hierarchy of file extensions to store and serve different representations from a specific directory. This is why I think we should have one folder per command and place every variation (those we envision now or may support in the future) can be placed in that folder.

Now, lets go through some practical issues if files are organized in platform and language hierarchy and suppose a contributor or user trying to modify local copy, but he/she does not know about file globbing (i.e., wildcards) and prefers to navigate through files using GUI file managers:

What if they are not sure which platforms support the command, should they try each platform folder and then each language folder until they find what they were looking for?
What if they are not sure which platforms, the new command they are adding, supports, which folder should they place it in?
What if they do not know what languages a command is documented in already and what languages are still missing, are they suppose to check every language folder?
What if a command was earlier only supported on one platform, but later it added support for another platform and a while later it supported even more platforms, should all related files/folders be moved around in the hierarchy each time the support matrix changes or is it better to simply play around with nested file extensions when variations change or added?
What if they want to see all the variations (i.e., supported platforms and completed languages etc.) that are available for a specific command with the intent to fill some of the cavities, should they be hunting all over the folder hierarchy?

The assumption in these examples is that they know which command they are looking for (as the command is the primary resource). There might be some counter-examples as well, such as one contributor woke up on a breezy morning and decided to add translations to a specific language, irrespective of which commands are available to translate, but I would argue, such situations will be rare and they are not impossible in the proposed hierarchy of resource first, variations next.

One folder per command gives some flexibility for future changes. For example, if a client prefers PDF, PNG, GIF, JSON, or HTML, these content types can be generated and placed in the respective commands' folders with appropriate extensions. Something like:

<cmd_name>/<cmd_name>[.<platform>][.<lang>][.<other_attrs>].<mime_ext>`

Or

<cmd_initial>/<cmd_name>/<cmd_name>[.<platform>][.<lang>][.<other_attrs>].<mime_ext>`

The latter form would be suitable if we think that the number of commands will be more than the number of files certain platforms' file systems can handle easily. This technique is used in cache stores as well where first few characters of the hash of a resource's identifier is used to place it in a sub-folder.

agnivade · 2020-06-03T07:57:40Z

That's great feedback. Thank you for spending the time to write this.

I think comparing asset serving in web servers with how these pages are organized are two orthogonal concerns. The URLs in a web server need not necessarily match with the actual content in a directory.

Coming to the issues with option E, yes some of them are fair concerns. I'll try to address them individually.

What if they are not sure which platforms support the command, should they try each platform folder and then each language folder until they find what they were looking for?

I think a contributor should be pretty sure which language they want to contribute in. Coming to platform, there's just 2 places to look into - the specific platform they are in, and then common.

What if they are not sure which platforms, the new command they are adding, supports, which folder should they place it in?

That is already clarified in the spec. It's common.

What if they do not know what languages a command is documented in already and what languages are still missing, are they suppose to check every language folder?

This is a weird usecase. Are you saying someone who knows like 5-6 languages decides to contribute all translations for a given page ? Maybe it's a problem then.

What if a command was earlier only supported on one platform, but later it added support for another platform and a while later it supported even more platforms, should all related files/folders be moved around in the hierarchy each time the support matrix changes or is it better to simply play around with nested file extensions when variations change or added?

Assuming we are not going with the file extension route, this problem would still exist in this model. You would have to move around files in the sub-folders.

What if they want to see all the variations (i.e., supported platforms and completed languages etc.) that are available for a specific command with the intent to fill some of the cavities, should they be hunting all over the folder hierarchy?

Right, this is slightly cumbersome. But again an edge-case IMO. Personally, I have never needed this.

I think for these edge cases, it may not be right to hinge our decision solely based on the comfort for contributors "searching files through a GUI". I agree that editing files for a given platform is possibly the only sore thumb here. But I think good amount of tooling is the right answer here.

I am not opposed to this option per se. But it seems like it is asking us to to spend too much effort for little benefit. We also have to take into account the fact that all clients have to change their code to accomodate the new structure.

I will leave it to what others think.

agnivade · 2020-06-11T14:44:15Z

Not to sound hasty, but I'd hate to lose the steam that we have picked up here. Rarely do we see 6 year old issues being discussed actively.

If there aren't any objections, I'd like to start off with option E as the accepted model.

fejx · 2020-06-19T17:16:40Z

Since noone objected, I would suggest closing this issue about the decision and opening a new one regarding the execution of the new page structure.

waldyrious · 2020-06-19T19:25:42Z

@fejx what do we gain by opening a new issue rather than using this one?

zlatanvasovic · 2020-06-19T20:12:48Z

A clear discussion which doesn't require scrolling past 100 replies.

agnivade · 2020-06-20T04:30:51Z

In other projects, I have seen the top comment being edited and a summary being added to reflect the final decision taken. That way, anybody can get a tldr without having to read through the whole thread. I don't have any opinions either way, just excited to see some work getting started.

waldyrious · 2020-06-20T08:00:23Z

In other projects, I have seen the top comment being edited and a summary being added to reflect the final decision taken.

I'd prefer this too.

fejx · 2020-06-20T16:25:57Z

Adding a summary on the top is a good idea, but I still would like to open a separate issue.
My reasons why I want a new issue:

The decision is a separate discussion which has nothing to do with the execution. Therefore, a separate discussion should take place in a separate issue.
The issue is full of comments which are irrelevant for the execution.
Closing would put a defined end to the discussion, allowing us to focus on the how not the what.
It is easier to read for archival purposes. (New issue will link to the discussion about the decision)

waldyrious · 2020-06-20T19:01:56Z

Fair enough. We don't usually have separate issues for discussion vs. implementation, but your points are well taken.

Feel free to create the new issue for the implementation. I'll go ahead and close this issue with the decision of going with Option E described above.

This was referenced Oct 2, 2014

Allow specifying platform in tldr command line #62

Closed

Non-gnu apps that don't usually come with unix-likes should not be grouped by OS #61

Closed

rprieto mentioned this issue Mar 25, 2015

feature: pages index. #267

Merged

rprieto mentioned this issue Jun 9, 2015

sed: remove extra, incorrect space #281

Closed

waldyrious mentioned this issue Dec 7, 2015

uname: mention lsb_release #302

Merged

waldyrious mentioned this issue Aug 24, 2016

tree.md: add more useful examples #990

Merged

waldyrious added the decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc. label Aug 24, 2016

waldyrious added the architecture Organization of the pages per language, platform, etc. label Aug 31, 2016

waldyrious mentioned this issue Sep 21, 2016

Add more fields to the commands #1082

Open

waldyrious mentioned this issue Jan 16, 2017

Create lists of commands to test coverage parity against #1070

Open

waldyrious mentioned this issue May 1, 2017

Syntax for linking to another tool's page #784

Open

agnivade mentioned this issue Dec 1, 2017

Can not-so-common commands be added? #1726

Closed

waldyrious mentioned this issue Feb 2, 2018

Big reorganisation needed #1966

Open

sbrl mentioned this issue Feb 6, 2018

Criteria for supported platforms should be documented in contributing guidelines #1965

Closed

waldyrious mentioned this issue Sep 23, 2018

Support for multiple (human) languages #2339

Closed

waldyrious closed this as completed Jun 20, 2020

fejx mentioned this issue Jun 21, 2020

New directory structure #4120

Closed

6 tasks

CleanMachine1 reopened this Apr 27, 2022

CleanMachine1 closed this as completed Apr 27, 2022

New pages structure #190

New pages structure #190

Comments

rprieto commented Apr 23, 2014

felixonmars commented Jul 14, 2014

waldyrious commented Aug 7, 2014

felixonmars commented Oct 7, 2014

rprieto commented Oct 7, 2014

felixonmars commented Oct 8, 2014

M3kH commented Dec 28, 2015

igorshubovych commented Dec 28, 2015

M3kH commented Dec 28, 2015

waldyrious commented Sep 4, 2017

ibnesayeed commented Dec 1, 2017

chamini2 commented Feb 6, 2018

zlatanvasovic commented Dec 13, 2019

sbrl commented Dec 14, 2019

zlatanvasovic commented Dec 14, 2019

agnivade commented Dec 15, 2019

zlatanvasovic commented May 27, 2020

sbrl commented May 27, 2020

waldyrious commented May 27, 2020

owenvoke commented May 27, 2020

agnivade commented May 28, 2020

fejx commented May 29, 2020

agnivade commented May 30, 2020 • edited by waldyrious Loading

fejx commented May 30, 2020

ibnesayeed commented May 30, 2020

agnivade commented May 30, 2020

fejx commented May 30, 2020 • edited Loading

agnivade commented May 31, 2020

sbrl commented Jun 1, 2020

zlatanvasovic commented Jun 1, 2020

fejx commented Jun 1, 2020

agnivade commented Jun 2, 2020

ibnesayeed commented Jun 2, 2020

agnivade commented Jun 3, 2020

agnivade commented Jun 11, 2020

fejx commented Jun 19, 2020

waldyrious commented Jun 19, 2020

zlatanvasovic commented Jun 19, 2020

agnivade commented Jun 20, 2020

waldyrious commented Jun 20, 2020

fejx commented Jun 20, 2020 • edited Loading

waldyrious commented Jun 20, 2020

agnivade commented May 30, 2020 •

edited by waldyrious

Loading

fejx commented May 30, 2020 •

edited

Loading

fejx commented Jun 20, 2020 •

edited

Loading