Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wiki getting spammed #2873

Closed
Shreeshrii opened this issue Jan 27, 2020 · 53 comments
Closed

wiki getting spammed #2873

Shreeshrii opened this issue Jan 27, 2020 · 53 comments

Comments

@Shreeshrii
Copy link
Collaborator

Please see https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM/_history
The content on 4.0-with-LSTM page has been deleted.

There should be some control or moderation of changes to wiki.

I would also suggest using https://github.com/apps/stale bot to prune out inactive\old issues, though number of days of inactivity can be larger than default of 60.

@amitdo
Copy link
Collaborator

amitdo commented Jan 27, 2020

@zdenop,

Please tell me what do you think about the following option:
- Clone the wiki and move it to a regular repo, so every edit can be reviewed.

With the current policy, which started today, it seems that most people are blocked from contributing to the wiki (including me).

@stweil
Copy link
Contributor

stweil commented Jan 27, 2020

What about using http://tesseract-ocr.github.io/ for that? The repository is https://github.com/tesseract-ocr/tesseract-ocr.github.io.

Of course we should invest some work in the current HTML page, add a theme (which one?), but finally we could offer both source code and user documentation there.

@zdenop
Copy link
Contributor

zdenop commented Jan 27, 2020

Actually wiki is git repo: try clone https://github.com/tesseract-ocr/tesseract.wiki.git

I put you and @Shreeshrii as collaborators - can you try to edit wiki?

@Shreeshrii
Copy link
Collaborator Author

Thanks, this will stop the spam posts but we need to choose the option which also allows others to easily contribute to documentation efforts, with moderation/approval.

I do not know enough to weigh in on the pros and cons of alternatives proposed.

@zdenop
Copy link
Contributor

zdenop commented Jan 28, 2020

What I found at the moment is that github allows to edit wiki in 2 ways:

  1. every github user
  2. only contributors (users with write access to project/repository)

There is no other possibility...

@stweil
Copy link
Contributor

stweil commented Jan 28, 2020

Here is a test which shows how the documentation could be presented in the future: https://ub-mannheim.github.io/. The current Wiki including the full history could be moved to https://github.com/tesseract-ocr/tesseract-ocr.github.io, and contributors would simply send pull requests to add or update documentation.

@Shreeshrii
Copy link
Collaborator Author

In addition, I would suggest keeping a single page of wiki, which points to the new locations of documentation.

Does github.io need the pages to be in HTML or would markdown work?

@stweil
Copy link
Contributor

stweil commented Jan 28, 2020

All pages are markdown (unmodified copy from Wiki).

@Shreeshrii
Copy link
Collaborator Author

Questions:

Should Api/examples be moved to within tesseract repo in examples directory?
(I had noticed that the current examples do not build with master because of some changes in includes.)

Should Tesseract 5 also be added as a category?
The moved wiki pages could then be streamlined into a better structure - merge duplicates et.

@amitdo
Copy link
Collaborator

amitdo commented Jan 28, 2020

Zdenko, thanks for the invitation. Yes, now I can edit the wiki.

I still think we should let most users (but not the spammers) edit the docs just like we do with the code. Stefan's proposal looks good to me.

@Shreeshrii
Copy link
Collaborator Author

The current Wiki including the full history could be moved to https://github.com/tesseract-ocr/tesseract-ocr.github.io, and contributors would simply send pull requests to add or update documentation.

Stefan's proposal looks good to me.

I agree.

add a theme (which one?)

Is there any way to preview the available themes to choose?

@stweil What would be the next steps to make this operational?

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

Is there any way to preview the available themes to choose?

You can use these instructions without really creating a new GitHub Page or read https://pages.github.com/themes/.

See also the other documentation on GitHub Pages.

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

What would be the next steps to make this operational?

  1. Add all files including their Git history from tesseract.wiki to tesseract-ocr.github.io.
  2. Move the new files in tesseract-ocr.github.io to a new subdirectory doc.
  3. Add _config.yml (configures theme), remove index.html and replace it by an updated README.md (navigation for start page).
  4. Generate and add README.md files for the navigation in doc.
  5. Handle special cases (.png and .asciidoc files) which were added from tesseract.wiki

@Shreeshrii
Copy link
Collaborator Author

@stweil Thank you!

Does Tesseract have an official logo? Should we adopt the one being used on https://opensource.google/projects/tesseract ? The theme used can then be coordinated with it.

@zdenop Your thoughts?

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

https://github.com/tesseract-ocr/tesseract-ocr.github.io/tree/wiki includes the steps 1...4 from my list. @zdenop, if you agree I'd push that branch to master to make it operational.

@amitdo
Copy link
Collaborator

amitdo commented Jan 29, 2020

@stweil,

I think it would be better to separate the doxygen repo from the wiki repo.

https://stackoverflow.com/questions/15563685/can-i-create-more-than-one-repository-for-github-pages

Another thing, I don't see that the wiki's history is preserve in your site.

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

I did not clone the wiki history for the UB-Mannheim site, but it is preserved in https://github.com/tesseract-ocr/tesseract-ocr.github.io/tree/wiki.

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

I think it would be better to separate the doxygen repo from the wiki repo.

All GitHub Pages content for Tesseract would always be under https://tesseract-ocr.github.io/. The related repository is https://github.com/tesseract-ocr/tesseract-ocr.github.io.

Each other repository, for example https://github.com/tesseract-ocr/tessdata, can contribute GitHub Pages which would be visible under https://tesseract-ocr.github.io/tessdata.

So separating the Doxygen generated API documentation would require a repository with that documentation, for example https://github.com/tesseract-ocr/api.

Example: https://ub-mannheim.github.io/ with https://ub-mannheim.github.io/PalMA/.

@Shreeshrii
Copy link
Collaborator Author

Index page can also link to https://github.com/tesseract-ocr/docs

@zdenop
Copy link
Contributor

zdenop commented Jan 29, 2020

@stweil : thanks - go ahead. I do not have free time at part of year, so I appreciate any support...

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jan 29, 2020

I think it would be better to separate the doxygen repo from the wiki repo.

I agree. I think the doxygen repo can be pretty large, keeping info for three releases. It would be better if wiki documentation can be separate from this.

@amitdo
Copy link
Collaborator

amitdo commented Jan 29, 2020

My suggestions:

Move wiki content to https://github.com/tesseract-ocr/guide
Alternative names for this repo: Doc, Docs. doc, docs. You can change the name of the current 'docs' repo to 'presentations' or 'slides'.

Move doxygen content to https://github.com/tesseract-ocr/doxygen

https://github.com/tesseract-ocr/tesseract-ocr.github.io/ should only contain links for the other reoos.

I'm not sure if this is necessary, but instead of one doxygen repo, you can have one 'doxygen-x.xx' repo for each release.

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

I think the doxygen repo can be pretty large, keeping info for three releases.

That's correct. Each version needs about 140 MB, and I think there should be the versions 3.x (currently 3.05), 4.x (currently 4.1) and latest (Git master). Here are the current sizes:

131M	3.05.02
142M	3.x
131M	4.0.0
4.0K	_config.yml
2.8M	doc
12K	LICENSE.md
4.0K	README.md
90M	.git
496M	total

https://github.com/tesseract-ocr/doxygen might be too specific, as the software could change its name or be replaced by a different one in the future. I suggest https://github.com/tesseract-ocr/tessapi. Then forks still have the tess prefix, so chances are higher to have a unique repository name. Renaming https://github.com/tesseract-ocr/tesseract-ocr.github.io would be sufficient for the initial setup. Then https://github.com/tesseract-ocr/tesseract-ocr.github.io could start as a fresh new repository.

@stweil
Copy link
Contributor

stweil commented Jan 29, 2020

Move wiki content to https://github.com/tesseract-ocr/guide

Forks would then also be named guide which is rather generic (with the risk of name collisions). What about https://github.com/tesseract-ocr/tessguide or https://github.com/tesseract-ocr/tessdoc?

@Shreeshrii
Copy link
Collaborator Author

tessdoc sounds good.

@amitdo
Copy link
Collaborator

amitdo commented Jan 29, 2020

https://github.com/tesseract-ocr/doxygen might be too specific, as the software could change its name or be replaced by a different one in the future.

Yes, you're right. One alternative is clang-doc

I suggest https://github.com/tesseract-ocr/tessapi. Then forks still have the tess prefix, so chances are higher to have a unique repository name.

Ok.

Renaming https://github.com/tesseract-ocr/tesseract-ocr.github.io would be sufficient for the initial setup.
Then https://github.com/tesseract-ocr/tesseract-ocr.github.io could start as a fresh new repository.

I hope that no one who forked this repo will complain.

Forks would then also be named guide which is rather generic (with the risk of name collisions).

You're right.

What about https://github.com/tesseract-ocr/tessguide or https://github.com/tesseract-ocr/tessdoc?

Choose one :)

@stweil
Copy link
Contributor

stweil commented Jan 30, 2020

Here is the new status:

@stweil
Copy link
Contributor

stweil commented Jan 30, 2020

Please note that repo changes need some time (typically a few minutes) until they are visible on GitHub Pages.

@Shreeshrii
Copy link
Collaborator Author

Thanks @stweil. Looking good.

Will you be deleting the wiki pages and announcing this in the forum/mailing lists now?

@amitdo
Copy link
Collaborator

amitdo commented Jan 30, 2020

Thanks Stefan. Nice work.

@amitdo
Copy link
Collaborator

amitdo commented Jan 30, 2020

About the 'old' wiki. There are links from the forum, blogs and other sites to the wiki.

I suggest to keep only the top header in each page, and use the global _Footer.md to automatically append something like this message to all pages:

The wiki content has been moved to a separate repository.

If you want, after you do these changes, you can remove the history and keep just the last commit.

@amitdo
Copy link
Collaborator

amitdo commented Jan 30, 2020

Another option is to just keep the 'Home' page, delete its content and add the message.

I think my previous suggestion is better.

@amitdo
Copy link
Collaborator

amitdo commented Jan 30, 2020

An example. So if we will delete this page, the links will become broken.

@stweil
Copy link
Contributor

stweil commented Jan 30, 2020

I suggest to keep only the top header in each page, and use the global _Footer.md [...]

There is now a footer which marks the wiki pages as unmaintained and links to the new URL.
I added also a prominent hint to the start page of the wiki.

@Shreeshrii
Copy link
Collaborator Author

Is it possible to put an auto redirect from the wiki pages to the new pages?

@amitdo
Copy link
Collaborator

amitdo commented Jan 30, 2020

I saw your top warning in the 'Home' page, but didn't notices the footer.

I created this page: https://github.com/tesseract-ocr/tesseract/wiki/_Header.md

Please delete it. I don't see an option to delete it.

Sorry for the spamming :)

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Feb 1, 2020

All links from README in https://github.com/tesseract-ocr/tessdoc are broken.

Edit:
Please see https://github.com/Shreeshrii/tessdoc
I found that it works online if I add .md to the links

# Tesseract User Manual

## Tesseract 3

- [Training-Tesseract-3.00–3.02](Training-Tesseract-3.00–3.02.md)
- [Training-Tesseract-3.03–3.05](Training-Tesseract-3.03–3.05.md)

@stweil This should work for you locally also. Please check.

@stweil
Copy link
Contributor

stweil commented Feb 2, 2020

There are also still many absolute links and texts which refer to the Wiki.

@Shreeshrii
Copy link
Collaborator Author

Please see nzbget/nzbget#383
Migrate documentation from GitHub wiki to GitHub pages

@Shreeshrii
Copy link
Collaborator Author

How about removing the wiki content on each page and replacing with link to corresponding github pages, something similar to https://github.com/netdata/netdata/wiki/a-github-star-is-important

At a later date we can just have a home page in wiki and point to the new github pages for documentation.
eg. https://github.com/netdata/netdata/wiki

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Feb 2, 2020

https://github.com/tcort/markdown-link-check
maybe useful for checking all the links

https://github.com/dkhamsing/awesome_bot
awesome_bot checks for valid URLs in a file, it can be used to verify pull requests updating a README.

@Shreeshrii
Copy link
Collaborator Author

I saw your top warning in the 'Home' page, but didn't notices the footer.

I have added the MOVE notice as a custom sidebar to the wiki pages. It will be more noticeable than the footer and will be available on every wiki page.

@Shreeshrii
Copy link
Collaborator Author

I have replaced the content in all wiki pages (except Home and ReadMe) with a MOVE message and added a link to corresponding tessdoc page.

I have also created a PR with FAQ.md (converted from asciidoc version).

If there are no more pending items regarding wiki this issue can be closed.

We can open a new issue regarding reorg of pages in tessdoc repo.

@Shreeshrii
Copy link
Collaborator Author

https://tesseract-ocr.github.io/tessapi/ (repo: https://github.com/tesseract-ocr/tessapi). This contains generated source code documentation.

@stweil Suggest renaming this as tessapidoc to indicate this is documentation repo.

@amitdo
Copy link
Collaborator

amitdo commented Feb 4, 2020

Currently, the tessapi repo lists all headers and source files in the tree.

Shouldn't it just list the headers files in the include dir? These files form the public API.

In the future we may also include the files in the (not yet exist) examples dir.

@amitdo
Copy link
Collaborator

amitdo commented Feb 4, 2020

Could we use a style that that does not use frames?
https://doc.dpdk.org/api/index.html

@amitdo
Copy link
Collaborator

amitdo commented Feb 4, 2020

The tessapi repo is too big. What about my earlier suggestion to split it to several repos?

tessapi-3, tessapi-4, tessapi-5 .

@stweil
Copy link
Contributor

stweil commented Feb 5, 2020

Ideally generated documentation should not be part of the tessapi repository at all. There is no need to keep the history for generated data, and only the latest documentation for the different branches or releases is relevant. If that documentation could be stored on storage with a web interface, that would be sufficient. Is there such storage available?

The generated documentation not only includes the API but is also a documentation of the full source code. I think that is fine because both parts are needed, but maybe the name tessapi (and also tessapidoc) is misleading.

@amitdo
Copy link
Collaborator

amitdo commented Feb 6, 2020

... both parts are needed

Most devs need just the API docs, not the full docs, but examples are also needed to demonstrate how to use the API. The full code docs is needed only for people developing Tesseract itself.

but maybe the name tessapi (and also tessapidoc) is misleading.

tesscodedoc ?

I think its ok to do a forced push to that repo. With this policy, you can solve the history issue.

@stweil
Copy link
Contributor

stweil commented Feb 6, 2020

Could we use a style that that does not use frames?

Sure, that would be possible. But why? When I shrink the menu on the left side with the mouse, it looks like the style without frames.

@stweil
Copy link
Contributor

stweil commented Feb 6, 2020

The tessapi repo is too big.

Currently it documents 3.05.02, 3.x, 4.0.0 and latest. I think one of 3.05.02 or 3.x could be removed which would reduce the size by about 25 %.

latest is always the master branch and is most important for me.

Should we document branches (3.05, 4.1) or tagged versions (3.05.02, 4.1.1) for older code? Maybe the unmaintained branches / versions don't need code documentation online, so removing one more would be possible.

I suggest to keep only latest and 4.x (currently based on the 4.1 branch).

@amitdo
Copy link
Collaborator

amitdo commented Feb 6, 2020

But why?

In one word: Accessibility.

Try to zoom-in a page (to 200%) when browsing that repo and you should see the problem.

@stweil
Copy link
Contributor

stweil commented Feb 6, 2020

The relevant option in Doxyfile is GENERATE_TREEVIEW. It seems to be disabled by default, but is enabled for Tesseract.

@amitdo amitdo closed this as completed Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants