Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pages structure #190

Closed
rprieto opened this issue Apr 23, 2014 · 82 comments
Closed

New pages structure #190

rprieto opened this issue Apr 23, 2014 · 82 comments
Labels
architecture Organization of the pages per language, platform, etc. decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc.

Comments

@rprieto
Copy link
Contributor

rprieto commented Apr 23, 2014

(copied form the conversation at #147)

The current folder structure looks like:

pages
  |__ common
  |   |__ tar.md         # gnu command everyone should have
  |   |__ ssh.md         # gnu command everyone should have
  |   |__ npm.md         # just a tool that could be installed on any OS
  |__ linux
  |   |__ emerge.md      # clearly a linux-only tool
  |__ osx
      |__ ssh.md         #  OSX has a different version with different flags

This has worked so far, but

  • the split can feel arbitrary
  • you need to jump around to see if a command is already covered
  • clients potentially need to make a lot of requests to get a given page

There seems to be a consensus among clients to move to a flatter folder structure, that would look like this:

pages
   |__ tar.md
   |__ ssh.md
   |__ ssh.osx.md
   |__ emerge.md
   |__ npm.md

By default, we can display <command>.md, but <command>.<os>.md should have precedence if available.

This means the clients would let OSX users query Linux commands, but after all why not, they might just be curious about it. Or they might be using tldr on a Mac while SSHing on a Linux box that doesn't have it.

To clear up platform constraints, the description for emerge for example could say "for Linux" or "Gentoo specific command".

This would be a breaking change though, so we need a plan of attack. One option is to address all open PRs to get to a stable state, then copy all pages to the new structure. The old structure can live on for a while until all clients update. The PR guidelines would say to push changes to the new structure only.

@felixonmars
Copy link
Collaborator

+1 for keeping the old structure working while switching to the new one, as users may not upgrade their clients in time (or if they cares).

@waldyrious
Copy link
Member

+1 with @felixonmars

@felixonmars
Copy link
Collaborator

Hi, any schedule on this? It has been several more months after the discussion 😺

@rprieto
Copy link
Contributor Author

rprieto commented Oct 7, 2014

Good point, it's still something I'm keen to look into, but requires more thinking now that there's many clients.

Should have start a conversation on gitter?

@felixonmars
Copy link
Collaborator

I've joined the conversation, though didn't see anything - does it keep logs?

@M3kH
Copy link

M3kH commented Dec 28, 2015

Can this done by the metadata syntax on MarkDown? maybe as additional to avoid duplicated files on categories/tags?

---
support:
 - osx
 - linux
 -- debian
...
---

@igorshubovych
Copy link
Collaborator

@M3kH in the end of this optimization process we will build man :)

@M3kH
Copy link

M3kH commented Dec 28, 2015

I would later prefer store informations in the markdown as yml instead of using the file names.
But this is my personal opinion :-)

@waldyrious waldyrious added the decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc. label Aug 24, 2016
@waldyrious waldyrious added the architecture Organization of the pages per language, platform, etc. label Aug 31, 2016
@waldyrious
Copy link
Member

Quoting a comment by @agnivade in #1436, which replaced all references to OS X with the new name, macOS:

Just a note that our platform folder is still named osx. And changing that might have consequences for various clients, so keeping it as is for now.

@ibnesayeed
Copy link

While we are here talking about the new page structure, we should also consider the possibility of adding custom tldrs that are not maintained by the shared repository, but a place where users can add their own overrides or additions for quick references. A note around this approach can be seen in #1726 (comment).

@chamini2
Copy link
Contributor

chamini2 commented Feb 6, 2018

I like @ibnesayeed's idea. Kind of like brew taps?

@zlatanvasovic
Copy link
Contributor

Any updates? Can you provide any advantages this system would give over the current one? As I see it, everything you want to have is already included in the current system. There is already a related issue in tldr-node-client: tldr-pages/tldr-node-client/issues/247

@sbrl
Copy link
Member

sbrl commented Dec 14, 2019

Now that we have languages to contend with, this seems like even less of a good idea.

@zlatanvasovic
Copy link
Contributor

Do others agree to close this?

ping @tldr-pages

@agnivade
Copy link
Member

I agree that languages make this less appealing.

@zlatanvasovic
Copy link
Contributor

@sbrl A tldr-bot may help with that, e.g. by comparing different languages. Although I really doubt we'll ever encounter such translation issue, except for the originally-Chinese pages. So it comes down to one language only.

@sbrl
Copy link
Member

sbrl commented May 27, 2020

If it can happen - it will lol

But yeah, we'll need to significantly expand the tldr-bot I think to support this change if we go ahead with option E.

@waldyrious
Copy link
Member

Well, we can certainly continuing the convention of treating the English pages as the master ones, regardless of where they're stored. The file structure was never a technical obstacle to break this anyway — it was enforced by convention, and it can continue to be if we agree that's what makes maintenance of the multilingual pages viable for the maintenance team.

Of course, it would be awesome to, on top of this, have tldr-bot perform an automatic check to make sure the example commands (not their descriptions, of course) are the same in every language version of a page — we just need to ignore the token contents, which are language-specific, and the comparison can be done pretty much verbatim. Or am I oversimplifying the problem?

@owenvoke
Copy link
Member

I think either option E or the current situation makes the most sense personally.

I think with tldr-bot it would be good to add some GitHub Action workflows (or update the bot scripts) for various things to make this easier. 👍

@agnivade
Copy link
Member

Yes, the problem of reviews is a good point. But IMO it's a human obstacle, not a technical one. @waldyrious' point about treating English pages as master copies by "convention" is a good one. And some good amount of tooling via the tldr-bot to check for overall consistency of pages should take us a long way.

I would also suggest writing down language owners somewhere so that they can be added to reviews, or even have tldr-bot take care of it automatically.

@fejx - would be open to take the lead on this one and add some tooling around tldr-bot to ease this transition ? Also cc @Keating950 who is working on maintaining a list of pages needing translations.

@fejx
Copy link
Contributor

fejx commented May 29, 2020

I am willing to work on this issue, but I am unsure what you mean by "adding some tooling around tldr-bot". Are you referring to additional linting checks? Or are you referring to scripts to transform the existing structure into the new (as well as the other way around)?

@agnivade
Copy link
Member

agnivade commented May 30, 2020

Ah I was referring to the linting checks. Things like:

  • Adding a comment on a PR which modifies a page in one language, but there are copies of the page in other languages.
  • @waldyrious' comment

    have tldr-bot perform an automatic check to make sure the example commands (not their descriptions, of course) are the same in every language version of a page — we just need to ignore the token contents, which are language-specific, and the comparison can be done pretty much verbatim.

  • If a page is added which has translations in other languages, post links to those pages in the PR so that it is easy to check for style consistency, and the PR submitter is made aware that other pages already exist. So the number of examples and their order has to be exactly same.
  • Having language owners for non-english pages and ping them whenever a PR is created in any of those folders. (I think the GitHub CODEOWNERS file may automatically take care of this.)

The overall objective is to maintain consistency of the pages across languages.

@fejx
Copy link
Contributor

fejx commented May 30, 2020

Sure thing! I will try and see if I get a chance to take a look at it this weekend.

@ibnesayeed
Copy link

While this is an implementation detail that users might not care about, I personally think a scheme that groups variations related to a specific command together would be much more approachable and manageable from contributors' perspective. If the fear is that too many files in a single folder will make things slow in certain machines/OSes, I think users of this program are technically sound ones who generally have development machines that are rather recent and powerful. Even that issue can be solved by creating folders for each command and place all the variations of documents related to a specific command in its corresponding folder. If, in the future, we find there are way too many commands to manage, it can further be organized in groups of folders named after the first letter of each command.

@agnivade
Copy link
Member

That's not a bad point. So far, we have been considering platform > language, and language > platform, but command > language is also an option. The problem that I see is that it's an inverted way of grouping things. Usually things are grouped at a larger level (platform, language), and smaller things are included inside it (pages). It's not an issue per se, but I don't see any problems from a contributor's perspective with option E too.

@fejx
Copy link
Contributor

fejx commented May 30, 2020

I don't see how the structure command > language would reduce the number of files in a folder.

|-- en
|-- |-- ls.md
|-- |-- grep.md
|-- |-- man.md
|-- de
|-- |-- ls.md
|-- |-- grep.md
|-- |-- man.md

These folder have three files each.

|-- ls
|-- |-- de.md
|-- |-- en.md
|-- grep
|-- |-- de.md
|-- |-- en.md
|-- man
|-- |-- de.md
|-- |-- en.md

Here, the root still has three files, they are just folders instead.

@agnivade
Copy link
Member

It's not a matter of reducing the files. I think it's about "would be much more approachable and manageable from contributors' perspective.". Although I don't see any issue with that with option E too.

@sbrl
Copy link
Member

sbrl commented Jun 1, 2020

I like that idea of tooling @waldyrious, and thanks so much @fejx for taking the lead there!

@zlatanvasovic
Copy link
Contributor

Just one thing. If the structure is language / platform / page or platform / language / page, it is easier to implement it into a tldr client. Any structure where page is before the others complicates implementing the language and platform features. This is coming from refactoring Python and C clients recently.

@fejx
Copy link
Contributor

fejx commented Jun 1, 2020

Sure, I am looking forward to implement the tooling! However, I will wait with the implementation until we have converged to a mutual decision and the issue is closed. Otherwise, I might end up implementing checkstyles for a structure we are not going to use.

@agnivade
Copy link
Member

agnivade commented Jun 2, 2020

@ibnesayeed - Can you lay out specifically why you believe inverting the grouping by putting the page first "would be much more approachable and manageable from contributors" ? And what is the problem this solves over option E ?

Thanks.

@ibnesayeed
Copy link

One of the reasons why I think putting command name first in the hierarchy would be a better choice has to do with how semantic URIs and resources on the Web behave. Just think about it, what is the main resource we are talking about that both contributors and consumers are interested in? I think it is the command itself, not the language, not the platform, those are variations of the main resource. This is precisely how content negotiation works in web servers. The URI points to the abstract resource, from there, content type, language, content encoding, character encoding, and many other aspects are negotiated. Content negotiation often happens via respective headers, but for static files, many servers use a hierarchy of file extensions to store and serve different representations from a specific directory. This is why I think we should have one folder per command and place every variation (those we envision now or may support in the future) can be placed in that folder.

Now, lets go through some practical issues if files are organized in platform and language hierarchy and suppose a contributor or user trying to modify local copy, but he/she does not know about file globbing (i.e., wildcards) and prefers to navigate through files using GUI file managers:

  • What if they are not sure which platforms support the command, should they try each platform folder and then each language folder until they find what they were looking for?
  • What if they are not sure which platforms, the new command they are adding, supports, which folder should they place it in?
  • What if they do not know what languages a command is documented in already and what languages are still missing, are they suppose to check every language folder?
  • What if a command was earlier only supported on one platform, but later it added support for another platform and a while later it supported even more platforms, should all related files/folders be moved around in the hierarchy each time the support matrix changes or is it better to simply play around with nested file extensions when variations change or added?
  • What if they want to see all the variations (i.e., supported platforms and completed languages etc.) that are available for a specific command with the intent to fill some of the cavities, should they be hunting all over the folder hierarchy?

The assumption in these examples is that they know which command they are looking for (as the command is the primary resource). There might be some counter-examples as well, such as one contributor woke up on a breezy morning and decided to add translations to a specific language, irrespective of which commands are available to translate, but I would argue, such situations will be rare and they are not impossible in the proposed hierarchy of resource first, variations next.

One folder per command gives some flexibility for future changes. For example, if a client prefers PDF, PNG, GIF, JSON, or HTML, these content types can be generated and placed in the respective commands' folders with appropriate extensions. Something like:

<cmd_name>/<cmd_name>[.<platform>][.<lang>][.<other_attrs>].<mime_ext>`

Or

<cmd_initial>/<cmd_name>/<cmd_name>[.<platform>][.<lang>][.<other_attrs>].<mime_ext>`

The latter form would be suitable if we think that the number of commands will be more than the number of files certain platforms' file systems can handle easily. This technique is used in cache stores as well where first few characters of the hash of a resource's identifier is used to place it in a sub-folder.

@agnivade
Copy link
Member

agnivade commented Jun 3, 2020

That's great feedback. Thank you for spending the time to write this.

I think comparing asset serving in web servers with how these pages are organized are two orthogonal concerns. The URLs in a web server need not necessarily match with the actual content in a directory.

Coming to the issues with option E, yes some of them are fair concerns. I'll try to address them individually.

What if they are not sure which platforms support the command, should they try each platform folder and then each language folder until they find what they were looking for?

I think a contributor should be pretty sure which language they want to contribute in. Coming to platform, there's just 2 places to look into - the specific platform they are in, and then common.

What if they are not sure which platforms, the new command they are adding, supports, which folder should they place it in?

That is already clarified in the spec. It's common.

What if they do not know what languages a command is documented in already and what languages are still missing, are they suppose to check every language folder?

This is a weird usecase. Are you saying someone who knows like 5-6 languages decides to contribute all translations for a given page ? Maybe it's a problem then.

What if a command was earlier only supported on one platform, but later it added support for another platform and a while later it supported even more platforms, should all related files/folders be moved around in the hierarchy each time the support matrix changes or is it better to simply play around with nested file extensions when variations change or added?

Assuming we are not going with the file extension route, this problem would still exist in this model. You would have to move around files in the sub-folders.

What if they want to see all the variations (i.e., supported platforms and completed languages etc.) that are available for a specific command with the intent to fill some of the cavities, should they be hunting all over the folder hierarchy?

Right, this is slightly cumbersome. But again an edge-case IMO. Personally, I have never needed this.

I think for these edge cases, it may not be right to hinge our decision solely based on the comfort for contributors "searching files through a GUI". I agree that editing files for a given platform is possibly the only sore thumb here. But I think good amount of tooling is the right answer here.

I am not opposed to this option per se. But it seems like it is asking us to to spend too much effort for little benefit. We also have to take into account the fact that all clients have to change their code to accomodate the new structure.

I will leave it to what others think.

@agnivade
Copy link
Member

Not to sound hasty, but I'd hate to lose the steam that we have picked up here. Rarely do we see 6 year old issues being discussed actively.

If there aren't any objections, I'd like to start off with option E as the accepted model.

@fejx
Copy link
Contributor

fejx commented Jun 19, 2020

Since noone objected, I would suggest closing this issue about the decision and opening a new one regarding the execution of the new page structure.

@waldyrious
Copy link
Member

@fejx what do we gain by opening a new issue rather than using this one?

@zlatanvasovic
Copy link
Contributor

A clear discussion which doesn't require scrolling past 100 replies.

@agnivade
Copy link
Member

In other projects, I have seen the top comment being edited and a summary being added to reflect the final decision taken. That way, anybody can get a tldr without having to read through the whole thread. I don't have any opinions either way, just excited to see some work getting started.

@waldyrious
Copy link
Member

In other projects, I have seen the top comment being edited and a summary being added to reflect the final decision taken.

I'd prefer this too.

@fejx
Copy link
Contributor

fejx commented Jun 20, 2020

Adding a summary on the top is a good idea, but I still would like to open a separate issue.
My reasons why I want a new issue:

  • The decision is a separate discussion which has nothing to do with the execution. Therefore, a separate discussion should take place in a separate issue.
  • The issue is full of comments which are irrelevant for the execution.
  • Closing would put a defined end to the discussion, allowing us to focus on the how not the what.
  • It is easier to read for archival purposes. (New issue will link to the discussion about the decision)

@waldyrious
Copy link
Member

Fair enough. We don't usually have separate issues for discussion vs. implementation, but your points are well taken.

Feel free to create the new issue for the implementation. I'll go ahead and close this issue with the decision of going with Option E described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Organization of the pages per language, platform, etc. decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc.
Projects
Development

No branches or pull requests