Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add html text-indent option for docbuild. #1851

Closed
wants to merge 1 commit into from
Closed

Add html text-indent option for docbuild. #1851

wants to merge 1 commit into from

Conversation

zxygentoo
Copy link

@zxygentoo zxygentoo commented May 5, 2024

Just an initial PoC PR to get your opinions on.

Rationale

  • text-indent for 1 or 2em is the proper typesetting spec for some languages, eg. CJK.
  • This seems not easily achievable via Pandoc.

There is already an Indent First Line feature for odt. I'm not sure in your opinion:

  1. Is this worth adding?
  2. If the UI here confusing in some UX related way? (eg. Maybe this should be some sub-option for Add Css Styles?)

If you like the direction, we can discuss and refine the implementation a bit if necessary, then I can add tests/i18n stuff to make this a proper PR.

Cheers,
Alex

Summary:

Related Issue(s):

Reviewer's Checklist:

  • The header of all files contain a reference to the repository license
  • The overall test coverage is increased or remains the same as before
  • All tests are passing
  • All flake8 checks are passing and the style guide is followed
  • Documentation (as docstrings) is complete and understandable
  • Only files that have been actively changed are committed

@vkbo
Copy link
Owner

vkbo commented May 5, 2024

I'll have a look, but it looks mostly fine. There is no particular reason why this wasn't added to HTML. The only point is that the first paragraphs after a break should not be indented, which is how the ODT version works.

@vkbo
Copy link
Owner

vkbo commented May 5, 2024

Also, for consistency, the indentation for HTML should either be a fixed as it is for ODT, or both should be settings. I think the easiest solution is to make the indentation a general text option, and use the same indentation for both formats.

@zxygentoo
Copy link
Author

zxygentoo commented May 5, 2024

Making it a general option makes sense to me. That's why I asked the second question. And it's pretty clear to me it should be adjustable as people will need different values (eg. Chinese requires 2em, Japanese seems often use 1em). Without making things too complicated, I think something like 0.0 - 8.0 with 0.5 step should be pretty reasonable.

So, put it under Format? Like adding a Text Indentation Section?

The only point is that the first paragraphs after a break should not be indented, which is how the ODT version works.

Did you mean after a head? I'm no expert on typesetting English, but found this Section on MDN:

As the The Chicago Manual of Style puts it, "the first line of text following a subhead may begin flush left or be indented by the usual paragraph indention."

For CJK, this is often required, at least for Chinese the first paragraph is always indented. So if you want to keep the current odt behavior, maybe adding a seperate toggle only for this? Seems clearer to me than something like linking it to current lang setting hence more implicit/magical.

@vkbo
Copy link
Owner

vkbo commented May 6, 2024

Yes, under Format > Text Options.

Sure, we can have a toggle for turning it on or off for the first paragraph after a break (any break, not just a heading). If you add the settings, I can push the changes to the ODT code, and maybe you can mirror it in HTML. The ODT code is not easy to follow. I recently added footnotes to it, and it took me ages to figure out how to do it, and I wrote that code! 😄

@vkbo
Copy link
Owner

vkbo commented May 6, 2024

Also, this PR is made from your copy of the main branch directly. As described in the contributor guide, do not do that. Create a branch at your end first. Doing this from the main branch causes all sorts of problems as the main branch is what you update against from my repo when I make changes.

You should create a branch from the changes you've made and make a new PR from that, and roll back your main branch to the state of the main repo's main branch.

@zxygentoo
Copy link
Author

Cool, I will take a look at the ODT code.

Create a branch at your end first...

Sure, I will consult the contributor guide. Let me figure out the code part first 😂 , then see how to proceed. Here is just a couple lines of PoC code, I can always close this and fork again if necessary.

any break, not just a heading

Just to clarify, for breaks you mean hard/soft scene breaks as in here, right? Or you may point to the relevant code and I could probably figure it out from the ODT implementation.

@vkbo
Copy link
Owner

vkbo commented May 6, 2024

Cool, I will take a look at the ODT code.

Create a branch at your end first...

Sure, I will consult the contributor guide. Let me figure out the code part first 😂 , then see how to proceed. Here is just a couple lines of PoC code, I can always close this and fork again if necessary.

I'm just giving you a warning as I plan to make the 2.4.1 release today. When I merge that back into main after, there will be merge conflicts. Which means you will have the same merge conflicts on your main if you have made changes to it, which you now have. That's why you never work from main, even on a fork. If you keep the main clean, you can just pull in my fixes and not have to deal with conflicts in most cases. 😃

any break, not just a heading

Just to clarify, for breaks you mean hard/soft scene breaks as in here, right? Or you may point to the relevant code and I could probably figure it out from the ODT implementation.

I mean any break that isn't continuous paragraphs. There are many ways to format the output. If you look in the Tokenizer class, there are a bunch of "block styles" other than T_TEXT which corresponds to a paragraph or a paragraph fragment:

T_EMPTY = 1 # Empty line (new paragraph)
T_SYNOPSIS = 2 # Synopsis comment
T_SHORT = 3 # Short description comment
T_COMMENT = 4 # Comment line
T_KEYWORD = 5 # Command line
T_TITLE = 6 # Title
T_HEAD1 = 7 # Heading 1
T_HEAD2 = 8 # Heading 2
T_HEAD3 = 9 # Heading 3
T_HEAD4 = 10 # Heading 4
T_TEXT = 11 # Text line
T_SEP = 12 # Scene separator
T_SKIP = 13 # Paragraph break

In the ODT writer, I handle it with the pIndent flag:

if tType not in (self.T_EMPTY, self.T_TEXT):
pIndent = False
# Process Text Types
if tType == self.T_EMPTY:
if len(pText) > 1 and pStyle is not None:
if self._doJustify:
pStyle.setTextAlign("left")
if len(pText) > 0 and pStyle is not None:
tTxt = ""
tFmt: T_Formats = []
for nText, nFmt in zip(pText, pFmt):
tLen = len(tTxt)
tTxt += f"{nText}\n"
tFmt.extend((p+tLen, fmt, key) for p, fmt, key in nFmt)
# Don't indent a paragraph if it has alignment set
tIndent = self._firstIndent and pIndent and pStyle.isUnaligned()
self._addTextPar(
xText, "First_20_line_20_indent" if tIndent else "Text_20_body",
pStyle, tTxt.rstrip(), tFmt=tFmt
)
pIndent = True
pFmt = []
pText = []
pStyle = None

Something similar would be needed for HTML, but we also need to add a switch that forces it to always be True if we also want to indent first paragraphs.

But we can work out the details when you make a branch and a new PR. We also need to move the setFirstLineIndent method from the ToODT class into the Tokenizer class so that it can be used by both ToODT and ToHTML. But if you handle the HTML side of it, I can tie it all back together with the ODT format.

In any case, thanks for your contribution. I will create a task to associate with this, and we can take the implementation discussion there.

@vkbo
Copy link
Owner

vkbo commented May 6, 2024

I also see now that comments probably shouldn't trigger the non-indentation flag, so there are some more fixes to be made. I'll add a ticket for that.

@zxygentoo
Copy link
Author

zxygentoo commented May 6, 2024

Sounds all good to me. I will close this and make a clean fork to start a new feature branch for future PR.

Took a quick look at the ODT code and it's indeed pretty dense 😂 Specially for someone with no prior knowledge of OpenDocument. I like the idea starting from the HTML side first (after some basic understanding of the odt side). Let's do this then see how to tie things together.

@zxygentoo zxygentoo closed this May 6, 2024
@vkbo
Copy link
Owner

vkbo commented May 6, 2024

Just created release 2.4.1, so there should be less of a chance of merge conflicts now. There were only two actually, and they were pretty trivial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants