Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first line indent to HTML #1858

Closed
vkbo opened this issue May 10, 2024 · 19 comments · Fixed by #1898
Closed

Add first line indent to HTML #1858

vkbo opened this issue May 10, 2024 · 19 comments · Fixed by #1898
Assignees
Labels
build tool Component: Exports or the build tool enhancement Request: New feature or improvement next release Note: Features planned for next release

Comments

@vkbo
Copy link
Owner

vkbo commented May 10, 2024

The first line indent feature should be extended to include HTML, not just ODT. It also needs more settings to be customisable enough to include standard from other languages than English. See PoC PR #1851.

@vkbo vkbo added this to the Release 2.5 Beta 1 milestone May 10, 2024
@vkbo vkbo assigned vkbo and unassigned vkbo May 10, 2024
@vkbo
Copy link
Owner Author

vkbo commented May 10, 2024

I couldn't find a way to assign you to this issue, @zxygentoo, so I'll tag you instead.

I've made some improvements to the first line indent feature for ODT, and moved the settings to the Format tab. See #1857. It should make it straightforward for you to add it to HTML as well.

@vkbo vkbo added enhancement Request: New feature or improvement build tool Component: Exports or the build tool labels May 10, 2024
@zxygentoo
Copy link

Cool, I will take a look.

@vkbo
Copy link
Owner Author

vkbo commented May 10, 2024

Th gist of it is that there are now three new private/protected variables in the Tokenizer class available also in the ToHtml class.

self._firstIndent  = False    # Enable first line indent
self._firstWidth   = 1.40     # First line indent in units of em
self._indentFirst  = False    # Indent first paragraph

I hope they are self-explanatory. The default value for the indent width is the value that Libre Office defaults to, The _indentFirst indicates whether the first paragraph after a break should be indented or not, like we discussed in #1851.

@zxygentoo
Copy link

zxygentoo commented May 11, 2024

Running more tests while reading some of the code, two questions:

  1. What's the thoughts behind the M_PREVIEW M_EXPORT separation? Some Qt preview limitation? I noticed the preview indeed lack some of the styling. M_EBOOK seems not used yet?
  2. A related one, current exported HTML doesn't play well with Calibre's ToC generation. I think I can fix this if you like, seems once I understand the preview/export separation, it all come down to add some classes to h1 h2 tags (eg. class="chapter")

In my tests, Calibre seems produce better formatted ePub, Pandoc often remove HTML element classes hence lose styling due to its design.

@vkbo
Copy link
Owner Author

vkbo commented May 11, 2024

M_PREVIEW is used for the document viewer next to the editor, which only supports a subset of HTML 4. See their docs. M_EXPORT produces standard HTML 5.

The M_EBOOK was indeed intended to produce Calibre-compatible ebook HTML. I used to use their tools for conversion myself. I just never got around to adding that feature. The flag is still there.

@zxygentoo
Copy link

zxygentoo commented May 11, 2024

I'm no Calibre expert, but it seems plausible to make the M_EXPORT version calibre-compatible and call it done. I will read into the Qt doc a bit see if anything can be done there, I really want the preview looks just like the export in styling (at least for the docbuild interface).

@zxygentoo
Copy link

The code seems use different tags for preview and export. Are there reasons beyond better looking preview?

If not, maybe we can just use one set of tags for both:

  • title -> h1
  • partition -> h1
  • chapter -> h1
  • scene -> h2
  • section -> h3

and add these to html class, eg. <h1 class="chapter" ...>, then use css to get the same looking preview.
we can use different names, that's not the main point here.

This seems:

  1. Simpler, and easier to read?
  2. Let Calibre detect the structure hence better ToC etc.

For NW to make exported html with semantically meaningful classes seems sensible and beneficial? Calibre's default regex only match class=chapter, and relying more on textual info. Not that extensible but it's a different problem. And even only with the chapters will make a quite useful ToC most the time.

@vkbo
Copy link
Owner Author

vkbo commented May 13, 2024

There is a very good reason for doing it this way, and it has in part to do with HTML imports and also with ebook compatibility. The preview needs to follow the strict levels in order to generate the correct heading sizes since they cannot be overridden in the view.

@vkbo
Copy link
Owner Author

vkbo commented May 13, 2024

As for Calibre support, I do think tuning the M_EBOOK flag is the better way to go.

In either case, let's not change these things now. This needs a separate feature and discussion. I also need to refresh my thinking on why it was made this way in the first place. Adding more classes is anyway a good thing to do, and I've been gradually doing so.

@zxygentoo
Copy link

zxygentoo commented May 13, 2024

let's not change these things now

Sure, let's limit the scope of this task. And these basically are just questions from reading the code anyway.

correct heading sizes since they cannot be overridden in the view

Thought of this after posted the last comment.

If we lookout to the future a bit here. From the design perspective, it seems the least surprised path for user is to keep the preview as close as possible to the export (and strive for that export can be converted by Calibre to an eBook sensibly close to the preivew).

And it seems the current doc viewer already can't do that anyway (eg. text-indent). So for the long run, what's your thought on this? QWebEngineView is of any use? (My Qt knowledge is basically nothing, so I'm just wild guessing at this point)

@zxygentoo
Copy link

To make my point cleaner, the docbuild preview seems inherently serve different purposes from the view document viewer (the one besides the editor, which works perfectly fine for what it is), and maybe should be treated differently.

@vkbo
Copy link
Owner Author

vkbo commented May 13, 2024

The preview is only intended to preview the text, like header formatting, what is and isn't included, etc. It was never intended to preview the exact result of the saved document. It simply can't. I've pretty much added the formatting that can be added.

I don't want to add a full web view either. That is a very bulky thing to include. I think the current solution works well as it is. It won't be any better in Qt6 either as far as I can tell, so it is what it is.

@vkbo
Copy link
Owner Author

vkbo commented May 15, 2024

That said, it is possible to generate the preview line by line with direct paragraph and character formatting instead, without going via HTML. It gives full control of all formatting. It is a lot more work though, and requires a completely new formatting class. It is an option though, and could also be used to improve the document viewer. However, that's a separate task.

What would be needed for this is a ToQTextDocument builder class that outputs a QTextDocument. The document can then be added directly to the viewer or build preview panels. It may not be that much slower to build either.

This is how I'm generating the document in my other project Collett, which is written in C++, and therefore much faster than novelWriter.

@vkbo
Copy link
Owner Author

vkbo commented May 21, 2024

I've made some major improvements to the Tokenizer in #1885. It now handles all the combining of individual lines belonging to the same paragraph, so the different format classes for Markdown, HTML and ODT no longer has the tricky logic with T_TEXT and T_EMPTY back and forth. I've made some minor changes to the ToHtml class to strip away that logic. Just so you know.

I will also move the entire preview logic out of the ToHtml class and into a separate one for generating preview. That's why I needed to clean up the Tokenizer class first. So if you're working on this, you can just ignore the preview bits for now as I will delete that whole logic.

@vkbo
Copy link
Owner Author

vkbo commented May 25, 2024

I've now merged #1892, so there is no more HTML4 preview stuff in the ToHtml class. All it does not is generate HTML for export. The preview is now handled by a new ToQTextDocument class that generates the Qt document used by the display widgets directly. This can be formatted in all possible ways Qt supports and is not limited to the mentioned subset.

That means first line indent can now be added here for display in the Manuscript build preview. Not sure if you're moving ahead on the HTML implementation, but if you are, I can handle the preview part of it.

@zxygentoo
Copy link

zxygentoo commented May 27, 2024

Caught up between work and moving for a couple weeks. Now as things settled a bit, will get back to this. Haven't read the code carefully yet, but from the comments it seems a great direction.

@vkbo
Copy link
Owner Author

vkbo commented May 27, 2024

Yeah, so the problem is that after me rewriting the Tokenizer class to handle all the paragraph processing there, and also adding text indent there, the code needed to add it to HTML is exactly 2 lines. I just piggybacked it onto #1898.

Sorry about that! Feel free to have a look if there is anything more that is needed for this feature though.

@vkbo vkbo added the next release Note: Features planned for next release label May 29, 2024
@zxygentoo
Copy link

😂 I just want the feature, and this seems way better than if I added it to the old code. Just tested it, works like a charm~

@vkbo
Copy link
Owner Author

vkbo commented Jun 3, 2024

Great! I was just worried you had spent time on making it work, and me just reshuffling it all and break everything for you. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build tool Component: Exports or the build tool enhancement Request: New feature or improvement next release Note: Features planned for next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants