Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Formal grammar #85
I'm having trouble understanding the exact grammar for even the very basic USFM, and I think that the specification is very vague at this. For example, I don't understand whether a USFM file must immediately start with a
Would it be possible to amend the specification with more formal grammar rules, e.g. written in BNF, EBNF or similar? This would make the specification far less ambiguous, and easier for developers like me to write correct parsers.
If you look in the usfm.sty files that are usually available wherever the spec is, The style sheet contains a bit more information about where each tag is valid. They have an Occursunder field. This should help guide you.
These Occursunder fields aren't spec'd because they are customizeable, but if you design for the default, anyone who's using a custom.sty file typically already knows customizing the stylesheet puts them outside of formal expectation of of full support.
But an usfm file does always start with the \id tag. and the id tag must always have the 3 character book code immediately following \id . This should be (was at one time) in the specification.
However, you CAN have multiple \id lines in a single usfm file, and the usfm remains valid. This isn't specified as required or not, but my testing and queries on the subject suggest there is nothing invalid with a 2nd or 66th \id field in a single file.
I'm not sure I understand the
Is there a specification for the style sheets as well? I was unable to locate a reference to it in the USFM spec.
Would something like the following would be valid USFM?
A good formal grammar for USFM could rectify most such ambiguities (but not all).
As far as I know, that is valid USFM.... unless both GEN sections contain the same chapter number in them. I think any duplicate pre chapter 1 material (any tag except a \c coming after the \id ) would make this a duplicate book as well:
Is valid but
is not valid.
Any repeated id + c chapter tag invalidates the file (chapter zero included: the introductory stuff).
However, I don't represent any official USFM body. Any comments that disagree with this likely carry more weight than my understanding.
@cmahte Thank you for your helpful responses for Jaak.
I agree that the current documentation is not sufficient as a grammar. As Michael mentions, the usfm.sty stylesheet contains some additional definition, and is as suggested more than what CSS is for HTML.
I have added a basic description of stylesheet properties to the
Also, in case it assists, let me refer you to a more formal grammar for use in checking USFM 3 content which is being developed here.