Skip to content
forked from synhershko/hspell

hspell fork -- free Hebrew spellchecker and morphology engine.

License

Unknown, GPL-2.0 licenses found

Licenses found

Unknown
LICENSE
GPL-2.0
COPYING
Notifications You must be signed in to change notification settings

unhammer/hspell

 
 

Repository files navigation

This is version 1.1 of Hspell, the free Hebrew spellchecker and morphology
engine.

You can get Hspell from:
	http://hspell.ivrix.org.il/

Hspell was written by Nadav Har'El and Dan Kenigsberg.

People who wish to integrate Hspell's technology into their or others' GPL
software (e.g., word processors, editors), are encouraged to do so, and are
welcome to contact us for help. People who wish to help us with enlarging
the word lists, are also encouraged to contact us and will be appreciated
(see below for instructions on how you can help).

The rest of this README file explains Hspell's spelling standard (niqqud-less),
a bit about the technology behind Hspell, how to use the "hspell" program
(but see the manual page for more current information), and lists a few future
directions. See the separate INSTALL file for instructions on how to install
Hspell.


About Hspell's spelling standard
--------------------------------

Hspell was designed to be 100% and strictly compliant with the official
niqqud-less spelling rules ("Ha-ktiv Khasar Ha-niqqud", colloquially known as
"Ktiv Male") published by the Academy of the Hebrew Language. This is both an
advantage and a disadvantage, depending on your viewpoint. It's an advantage
because it encourages a *correct* and consistent spelling style throughout
your writing. It is a disadvantage, because a few of the Academia's official
spelling decisions are relatively unknown to the general public.

Users of Hspell (and all Hebrew writers, for that matter) are encouraged to 
read the Academia's official niqqud-less spelling rules (which are printed at
the end of most modern Hebrew dictionaries), and to refer to Hebrew
dictionaries which use the niqqud-less spelling (such as Millon Ha-hove or
Rav Milim). We also provide in docs/niqqudless.odt a document (in Hebrew)
which describes in detail Hspell's spelling standard, and why certain words
are spelled the way they are.

Future releases might include an option for alternative spelling standards.


The technology behind Hspell
----------------------------

The "hspell" program itself is mostly a simple (but efficient) program
that checks input words against a long list of valid words. The real "brains"
behind it are the word lists (dictionary) provided by the Hspell project.

In order for it to be completely free of other people's copyright restrictions,
the Hspell project is a clean-room implementation, not based on other
companies' word lists, on other companies' spell checkers, or on copying of
printed dictionaries. The word list is also not based on automatic scanning
of available Hebrew documents (such as online newspapers), because there is
no way to guarantee that such a list will be correct (and not contain
misspellings, useless proper names, slang, and so on), complete (certain
inflections might not appear in the chosen samples) or consistent (especially
when it comes to niqqud-less spelling rules).

Instead, our idea was to write programs which know how to correctly inflect
Hebrew nouns and conjugate Hebrew verbs. The input to these programs is a
list of noun stems and verb roots, plus hints needed for the correct
inflection when these cannot be figured out automatically. These input files
are obviously an important part of the Hspell project. The "word list
generators" (written in Perl, and are also part of the Hspell project) then
create the complete word-list for use by the spellchecking program, hspell.
The generated lists are useful for much more than spellchecking, by the
way - see more on that below ("the future").

Although we wrote all of Hspell's code ourselves, we are truly indebted to
the old-style "open source" pioneers - people who wrote books instead of
hiding their knowledge in proprietary software. For the correct noun
inflections, Dr. Shaul Barkali's "The Complete Noun Book" has been a great
help. Prof. Uzzi Ornan's booklet "Verb Conjugation in Flow Charts" has been
instrumental in the implementation of verb conjugation, and Barkali's
"The Complete Verb Book" was used too.
During our work we have extensively used a number of Hebrew dictionaries,
including Even Shoshan, Millon Ha-hove and Rav-Milim, to ensure the correctness
of certain words. Various Hebrew newspapers and books, both printed and online,
were used for inspiration and for finding words we still do not recognize.
We wish to thank Cilla Tuviana and Dr. Zvi Har'El for their assistance with
some grammatical questions.


Using hspell
------------

After unpacking the distribution and running "configure", "make" and
"make install" (see the INSTALL file for more information), the hspell
executable is installed (by default) in /usr/local/bin, and the dictionary
files are in /usr/local/share/hspell.

The "hspell" program can be used on any sort of text file containing Hebrew
and potentially non-Hebrew characters which it ignores. For example, it
works well on Hebrew text files, TeX/LaTeX files, and HTML. Running

	hspell filename

Will check the spelling in filename and will output the list of incorrect
words (just like the old-fashioned UNIX "spell" program did). If run without
a file parameter, hspell reads from its standard input.

In the current release, hspell expects ISO-8859-8-encoded files. If files
using a different encoding (e.g., UTF8) are to be checked, they must be
converted first to ISO-8859-8 (e.g., see iconv(1), recode(1)).

If the "-c" option is given, hspell will suggest corrections for misspelled
words, whenever it can find such corrections. The correction mechanism in this
release is especially good at finding corrections for incorrect niqqud-less
spellings, with missing or extra 'immot-qri'a.
The "-l" (verbose) option will explain for each correct word why it was
recognized, if Hspell was built with the "linginfo" optional feature enabled
(a morphological analysis is shown, i.e., fully describe all possible ways to
read the given word as an inflected word with optional prefixes).

Because hspell's output (naturally) is "logical-order", it is normally
useful to pipe it to bidiv or rev before viewing. For example

	hspell -c filename | bidiv | less

Another convenient alternative is to run hspell on a BiDi-enabled terminal.


How *you* can help
------------------

As mentioned above, hspell does not cover yet all modern Hebrew.

This is where you enter the picture. If you stumble upon a commonly used word
in modern Israeli Hebrew, you are welcome to add its stem to the appropriate
dat file. Notice that in some cases you should add some flags to hint how the
inflection should be done.
Run wolig/woo to inflect it (or simply "make"), and make sure the output is
correct. Since "open source" should not mean "low quality", you should examine
the outcome carefully to make sure the spelling is perfect - look each and
every word in a good dictionary and/or consult grammer books.

If you send us lists of tested nouns or verbs, and even proper names, you would
do great service to this GPLed project. Be sure not to copy words out of a
dictionary or another copyrighted word lists, but rather use you own knowledge
of Hebrew.

Also, we are very keen to know if you find a spelling error that creeped into
hspell's word lists.


Hspell's license
----------------

Hspell is free software, released under the GNU General Public License (GPL).
Note that not only the programs in the distribution, but also the dictionary
files and the generated word lists, are licensed under the GPL.
There is no warranty of any kind for the contents of this distribution.

See the LICENSE file for more information and the exact license terms.


Contacting the authors
----------------------

Nadav Har'El:    nyh    @ math.technion.ac.il
Dan Kenigsberg:  danken @   cs.technion.ac.il

About

hspell fork -- free Hebrew spellchecker and morphology engine.

Topics

Resources

License

Unknown, GPL-2.0 licenses found

Licenses found

Unknown
LICENSE
GPL-2.0
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 36.9%
  • Shell 27.6%
  • Perl 23.0%
  • JavaScript 4.5%
  • PHP 4.4%
  • Vim Script 3.6%