Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify and document specification regarding whitespace and new lines #5

Closed
klassenjm opened this issue Jul 5, 2016 · 7 comments
Closed

Comments

@klassenjm
Copy link
Contributor

klassenjm commented Jul 5, 2016

Proposal

  • The USFM specification will define that paragraph markers must always begin on a new line.

Details

In Paratext, using “Standard” editing view, every time you save a chapter Paratext will automatically standardize the new lines. In Unformatted view Paratext never alters the newlines present. It is therefore possible to produce files like (note \mt2):

\id 1CO My test version \mt2 The first letter of Paul 
to the
\mt1 Corinthians
\c 1
\p
\v 1 From Paul, who was called by the will of God to be an apostle of Christ Jesus ...

Some people also want to be able to temporarily insert blank lines, for example, and want them to stay until they remove them. It is debatable whether this 'feature' is a good thing because it leaves data in a non-standard state and has the potential for other software processing the text to break.

Application Support Requirements

  • This recommendation would require a change in ParaTExt so that it would also perform all normalization steps in unformatted mode.
@klassenjm klassenjm added this to the 3.0.rc1 milestone Jul 5, 2016
@klassenjm klassenjm changed the title Clarify and document specification regarding new lines Clarify and document specification regarding whitespace and new lines Jul 18, 2016
@klassenjm klassenjm modified the milestones: 3.0.rc1, 3.0.rc2 Sep 9, 2016
@cmahte
Copy link

cmahte commented Sep 10, 2016

The statement "All paragraph markers should be preceded by a single newline" would benefit greatly with the contrapositive also explicitly stated "Markers that are not paragraph markers should not be placed immediately following a newline." That is, paragraph tags are distinguishable from note and character styles because they begin with a new line.

To this end, the example of \rq ... \rq shows the character style appearing on it's own line in the USFM code. I haven't fully perused the document for other character styles appearing in collumn 1 of a row of text, but this should NOT be put forward as a best method, whether or not it is prohibited.

@DavidHaslam
Copy link

From the release notes: USFM 2.1 – April 2007
Changed \rq...\rq* from paragraph to character level markup.

@DavidHaslam
Copy link

Jeff's issue does prompt a question about the right aligned poetry tag \qr which is mostly used just before \qs Selah \qs*

In this context, the right alignment ought to style as follows:

If it will still fit on the current line,
then right align Selah at the end of the line,
else insert a new line and do the same.

Could forcing \qr to a new line in the USFM file prevent the first possibility?

@klassenjm klassenjm modified the milestones: 3.0.rc2, 3.0.0 Oct 27, 2017
@klassenjm
Copy link
Contributor Author

Updates have been made to the documentation for managing whitespace and performing whitespace normalization.

@RobH123
Copy link

RobH123 commented Feb 17, 2019

@cmahte I think the following example in the USFM 3.0 spec https://ubsicap.github.io/usfm/about/syntax.html#newlines implies that your first suggestion above was rejected:

\v 27 Can any of you live a bit longer
\f + \fr 6.27: \fq live a bit longer; \ft or \fq grow a bit taller.\f* by worrying about it?

However, because the newline before the \f is counted as whitespace, the spec also had to add later under Handling Special Contexts:

The normalization rules outlined in 3,5,7 can result in some whitespace remaining in the text which may be considered insignificant depending on its context.

For example, the space preceding the footnote in:

\v 27 Can any of you live a bit longer \f + \fr 6.27: \fq live a bit longer;

could be removed:

\v 27 Can any of you live a bit longer\f + \fr 6.27: \fq live a bit longer;

And a space after a cross reference occurring at the start of a verse

v 7 \x - \xo 2.7: \xt 1 Co 15.45.\x* Then the \nd Lord\nd* God took some soil
from the ground and formed a man

could be removed:

v 7 \x - \xo 2.7: \xt 1 Co 15.45.\xThen the \nd Lord\nd God took some soil
from the ground and formed a man

Yet, a normalization process cannot generally remove ALL whitespace preceeding and following note marker pairs. In many cases a single whitespace is expected between the texts which preceed and follows a note. As suggested and recommended earlier:

USFM validation tools may flag suspicious whitespace
USFM editors can takes steps to discourage ambigous whitespace wherever possible
USFM normalization tools can identify and handles special contexts (examples above)
USFM publication tools and other post processors can identify and handle special contexts in the manner which is most suitable for the intended output.

(Seems that "could be removed" above should be "should be removed"?)

Isn't this save-all section saying that the USFM 3.0 standard will accept ambiguous whitespace and then try to intelligently figure out what it should really mean? (I guess what it's really doing is saying that human readability of USFM3 trumps machine clarity.) Could this issue be reopened please @klassenjm ?

BTW: It seems that the 3,5,7 reference is a typo?

@RobH123
Copy link

RobH123 commented Feb 17, 2019

The statement "All paragraph markers should be preceded by a single newline" would benefit greatly with the contrapositive also explicitly stated "Markers that are not paragraph markers should not be placed immediately following a newline." That is, paragraph tags are distinguishable from note and character styles because they begin with a new line.

@cmahte Thinking about this more in light of my comment above, this would imply that \v markers should not be on new lines. I have seen Paratext output that puts "\p \v 2 Verse text..." on the same line, but I don't recall noticing that successive verses in the same paragraph stay on the same line. (That could lead to very long lines.)

Maybe the only good (but probably too radical) solution would be to declare all newlines as insignificant whitespace and to pay more attention to trailing spaces on lines instead??? (Seems that could carry through to USX as well -- I haven't studied whitespace problems in USX in recent years.)

@DavidHaslam
Copy link

It should also be noted that the following is also valid USFM for those translations with some verse tagged descriptive Psalm titles.

\d \v 1  Кіроўцу хору. На Гэцкім \add інструмэньце \add*. Псальма Давідава.

The translations in which Psalm 18 has just over two verses in the descriptive title are interesting cases....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants