The current datasets improves upon an existing dataset by explicitly marking the boundaries of different senses and stems.
It tries to encode the implicit typographical information (such as bold and italics) into the pragmatic categories they represent (such as primary and secondary glosses).
We have added sense and stem labels. We have also indicated subsenses and have isolated sense numbers themselves. Primary glosses are captured as well as secondary glosses and highlights.
The base data for this repository was taken from https://github.com/eliranwong/unabridged-BDB-Hebrew-lexicon. The data was transformed into individual HTML files for the sake of quality assurance. A color scheme that hurts the eyes was added for that same reason. Feel free to tweak it.
- The
Placeholders
directory contains the image files for ancient languages other than Hebrew and Aramaic. These images have yet to be transcribed (not transliterated) into actual unicode text. - The
Entries
directory contains the individual BDB entries, one entry per file. The choice for HTML as an initial format it to make it easy for anyone to open, especially for quality assurance purposes.
The Brown-Driver-Briggs Lexicon of the Hebrew Bible is in the public domain. All changes and additions to the lexicon are released under a CC BY license. If you use this dataset, please provide a link to this repository.
The following contributors have worked on enhancing the dataset:
- Joel Ruark
- J. de Joode
These contributors have created and maintained the dataset that the current repo started from:
- Stephen Ku
- Eliran Wong
New contributions are welcomed and encouraged. Please try to be systematic in your enhancements.
If you are interested in contributing, the roadmap for this dataset could include things like: a) transcriptions of the placeholders, b) verified internal crosslinks, both within articles and across articles, c) the creation of a glossary based on the tags, etc.