Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help list for training/tesstrain.sh #1469

Closed
FernandoGOT opened this issue Apr 11, 2018 · 2 comments
Closed

Help list for training/tesstrain.sh #1469

FernandoGOT opened this issue Apr 11, 2018 · 2 comments
Labels

Comments

@FernandoGOT
Copy link

I recommend a help list when someone execute only training/tesstrain.sh and a more detailed if used with training/tesstrain.sh --help like the other commands

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 12, 2018

A simple implementation could display the commented out usage info.


cat <<EOF
Usage: $0 [options]

# USAGE:
#
# tesstrain.sh
#    --fontlist FONTS           # A list of fontnames to train on.
#    --fonts_dir FONTS_PATH     # Path to font files.
#    --lang LANG_CODE           # ISO 639 code.
#    --langdata_dir DATADIR     # Path to tesseract/training/langdata directory.
#    --output_dir OUTPUTDIR     # Location of output traineddata file.
#    --overwrite                # Safe to overwrite files in output_dir.
#    --linedata_only            # Only generate training data for lstmtraining.
#    --run_shape_clustering     # Run shape clustering (use for Indic langs).
#    --exposures EXPOSURES      # A list of exposure levels to use (e.g. "-1 0 1").
#
# OPTIONAL flags for input data. If unspecified we will look for them in
# the langdata_dir directory.
#    --training_text TEXTFILE   # Text to render and use for training.
#    --wordlist WORDFILE        # Word list for the language ordered by
#                               # decreasing frequency.
#
# OPTIONAL flag to specify location of existing traineddata files, required
# during feature extraction. If unspecified will use TESSDATA_PREFIX defined in
# the current environment.
#    --tessdata_dir TESSDATADIR     # Path to tesseract/tessdata directory.
#
# NOTE:
# The font names specified in --fontlist need to be recognizable by Pango using
# fontconfig. An easy way to list the canonical names of all fonts available on
# your system is to run text2image with --list_available_fonts and the
# appropriate --fonts_dir path.


EOF

@Shreeshrii
Copy link
Collaborator

The above falls through to parsing the arguments,

ERROR: Need to specify a language --lang

So, needs to be modified to exit the script.

@zdenop zdenop closed this as completed in 7dbf5a0 Oct 2, 2018
zdenop added a commit that referenced this issue Oct 9, 2018
* 'master' of https://github.com/tesseract-ocr/tesseract:
  Fix CID 1164579 (Explicit null dereferenced)
  print help for tesstrain.sh; fixes #1469
  Fix CID 1395882 (Uninitialized scalar variable)
  Fix comments
  Move content of ipoints.h to points.h and remove ipoints.h
  remove duplicate help from combine_lang_model
  Fix typo.
  use tprintf instead of printf to be able disable messages by quiet option (issue #1240)
  add "sudo ldconfig" to install instruction. fixes #1212
  unittest: Replace NULL by nullptr
  unittest: Format code
  tesseract app: check if input file exists; fixes #1023
  Format code (replace ( xxx ) by (xxx))
  Simplify boolean expressions
  Win32: use the ISO C and C++ conformant name "_putenv" instead of deprecated "putenv"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants