Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

TOC support broken after upgrade to latest QT #1509

Closed
Jimbobnz opened this issue Feb 10, 2014 · 62 comments
Closed

TOC support broken after upgrade to latest QT #1509

Jimbobnz opened this issue Feb 10, 2014 · 62 comments

Comments

@Jimbobnz
Copy link

Version: wkhtmltopdf 0.12.0
OS: Centos 6.4
No TOC is being created and no error generated.

bin/wkhtmltopdf toc http://www.w3schools.com/html/html_headings.asp w3schools.com.pdf

Even when specifying to use xsl file, it has the same results.

bin/wkhtmltopdf toc --xsl-style-sheet toc.xsl http://www.w3schools.com/html/html_headings.asp w3schools.com.pdf

@ashkulz
Copy link
Member

ashkulz commented Feb 10, 2014

Did this work with 0.11rc2?

@Jimbobnz
Copy link
Author

I'm not sure about 0.11rc2, but does work on wkhtmltopdf-0.11.0_rc1

@ashkulz
Copy link
Member

ashkulz commented Feb 10, 2014

I just now tried this on Windows, and it looks like it is working ... can you confirm it with a screenshot? Which PDF viewer are you using?

@Jimbobnz
Copy link
Author

I'm using Adobe Reader XI version 11.0.06.
Something must be working of sorts, as it does show Bookmarks in Adobe Reader, which means the HTML document is being parsed for heading tags.

Screen-shot of the command from Linux
screenshot015

The results (no TOC):
progress com pdf-0
progress com pdf-1
progress com pdf-2

Screen-shot of Adobe Read.
screenshot016

@Jimbobnz
Copy link
Author

I've just tested the TOC option on windows 32bit & 64bit an neither of them has created a 'table of contents' page. Has something changed on how to enable this option?

@Jimbobnz
Copy link
Author

I just done some more testing on wkhtmltopdf version 0.11.0 rc1, Centos 6.5 64bit.

xvfb-run --server-args="-screen 0 1028x1024x24" wkhtmltopdf --use-xserver toc --xsl-style-sheet defaulttoc.xsl google.com google.com.pdf

Table of Content has been produced.
screenshot017

@ashkulz
Copy link
Member

ashkulz commented Feb 11, 2014

Sorry, I never used the TOC option -- I thought you were talking of the outlines shown in Adobe Reader. I'll look into it a bit further.

@ashkulz ashkulz added Verified and removed NeedInfo labels Feb 11, 2014
@ashkulz
Copy link
Member

ashkulz commented Feb 12, 2014

Looks like the XSL transformation is not happening. I'm trying to find out if it is due to a change in the underlying QT, or due to some build configuration changes in wkhtmltopdf.

@ashkulz ashkulz added this to the 0.12.x milestone Feb 12, 2014
@ashkulz
Copy link
Member

ashkulz commented Feb 12, 2014

I had removed the dependency on XmlPatterns in 303957c -- I was under the impression that reverting this would fix this issue. However, the root cause turned out to be something else.

This was due to QTBUG-10309 in QtXmlPatterns which caused breakage of Google Calendar and hence XSLT support was disabled in WebKit -- apparently, there are other issues in the QT XSLT support. This was imported just after QT 4.8.0-beta in wkhtmltopdf/qt@5f4e810 and hence affected wkhtmltopdf due upgrade to QT 4.8.5 (earlier release 0.11rc2 was built with QT 4.7).

There has been no progress on the bug at the QT end, even a merge request on Gitorious being ignored. I'm not really sure what the way forward is, I am not comfortable with enabling XSLT support as it would possibly break a lot of things (but solve this use case). Using libxslt (and libxml) would be an option, but would require additional dependencies and take a lot of effort. Applying the patch above may fix the issue, but don't know what other issues will popup.

So as it stands now, TOC support is broken and will remain broken for the near future (or unless we change the approach). How many people use this feature? Please chime in here with comments, as I will have to balance working for this issue against other features/issues.

@ashkulz
Copy link
Member

ashkulz commented Feb 12, 2014

The best option going forward seems to be dropping the XSL support completely, and roll it out by hand. We have a few options in front of us, in order of simplicity (least to most):

  1. Hardcoding the HTML and CSS generated
  2. Hardcoding the HTML, and allowing the CSS to be overriden from an external file
  3. Hardcoding the nested LI generation, but allowing the outer HTML/CSS to be configurable by a template (see the default TOC XSL -- it means <xsl:template match="outline:item"> part will be hardcoded).
  4. Support XSLT by using libxslt or some other library

Option 3 will require us to use an external template library like HTML Template C++, which is also license-compatible with wkhtmltopdf (it is LGPLv2.1). We have to hardcode the generation of nested LIs, as there is no supported for nested recursive templates in most template engines.

Option 4 is ruled out due to the complexity involved; I am more leaning towards 2 or 3. Either way, support will only be added when I have the time (or a PR with the necessary changes is welcome).

@antialize
Copy link
Contributor

We used to to hardcoded html+css, but it seems hard to support everything
that everyone wants, which is why the xslt was choosen in the end.

On Wed, Feb 12, 2014 at 12:30 PM, Ashish Kulkarni
notifications@github.comwrote:

The best option going forward seems to be dropping the XSL support
completely, and roll it out by hand. We have a few options in front of us,
in order of simplicity (least to most):

  1. Hardcoding the HTML and CSS generated
  2. Hardcoding the HTML, and allowing the CSS to be overriden from an
    external file
  3. Hardcoding the nested LI generation, but allowing the outer HTML/CSS to
    be configurable by a template (see the default TOC XSL -- it means <xsl:template
    match="outline:item"> part will be hardcoded).
  4. Support XSLT by using libxslt or some other library

Option 3 will require us to use an external template library like HTML
Template C++ http://sourceforge.net/projects/htmltemplatec/, which is
also license-compatible with wkhtmltopdf (it is LGPLv2.1). We have to
hardcode the generation of nested LIs, as there is no supported for nested
recursive templates in most template engines.

Option 4 is ruled out due to the complexity involved; I am more leaning
towards 2 or 3. Either way, support will only be added when I have the time
(or a PR with the necessary changes is welcome).


Reply to this email directly or view it on GitHubhttps://github.com//issues/1509#issuecomment-34860306
.

@ashkulz
Copy link
Member

ashkulz commented Feb 12, 2014

But does anyone use it? Other than @Jimbobnz, no one has chimed in ... right now, TOC does not work at all -- one would have thought there would be lots of "me too!" reports (which would be welcome, for a change).

@Jimbobnz
Copy link
Author

I can't speak for everyone but I use the TOC feature for our reporting.

Looking back at original Google's issue tracker (see the links below) I can find a quite a few users who do use this feature. Having the TOC, cover page, headers & footers is what makes wkhtmltopdf a powerfully tool, this is what makes it stands out from is friendly nemesis, phantomjs.

https://code.google.com/p/wkhtmltopdf/issues/list?can=1&q=toc&colspec=ID+Type+Status+Priority+Milestone+Owner+Summary&cells=tiles

As the wkhtmltopdf project as been on hiatus for over 2 year (oct. 2011) it's going to take time for developers/users come-about on the uptake of the new version 0.12.x. I'd imagine most people are sticking with version 0.11.rc1 ("if it's not broken why fix it attitude"). So over time you might find more and more developers asking for this bug/issue to be fixed.

Alternative solution (if at all possible) is to do something similar to the header & footer URL page and post a JSON/XML URL encode string containing the TOC data into a user defined custom HTML page which could be used to parse the TOC content using javascript to generate a custom TOC page. The generated TOC page could then be insert back into the PDF output. Hope that all make sense.

Example command line would look like something like this:

wkhtmltopdf cover mycoverpage.html toc --toc-url mycustomtoc.html somewebpage.html cooloutput.pdf

It just a rough concept and I can fully understand how this could not be feasible.

@mn4367
Copy link
Contributor

mn4367 commented Feb 12, 2014

I'm using the TOC feature a lot.

The biggest problem with disabling the TOC feature is IMO the lack of page numbers. While it is no problem (in my use case) to create a separate HTML document from the source and prepend it via page to the front of the final result, how would I insert the correct page numbers in the self-generated TOC? The use of --dump-outline beforehand to get the page numbers is possible but it would mean to render the source twice, first to get the page numbers (with --dump-outline) and then again to put the self-generated TOC into the final output using the results from the first run. Not to mention linking, that is, I can click on an entry in the TOC to get to the page and vice-versa.

No matter how it will done, removing TOC XSLT without introducing something equivalent would mean to me that I have to stick with an older version.

@ashkulz
Copy link
Member

ashkulz commented Feb 13, 2014

Good to hear it is being used :) @Jimbobnz, @mn4367: how does option 3 sound? Are you using the default TOC XSL or something very much customized? I'm in favor of it as it would be very easy to implement and not cause support issues if/when we upgrade QT and/or Webkit.

@Jimbobnz
Copy link
Author

I am currently using the default XSL as a base template with just a few minor formatting styles in place. I must confess I don't fully understand the option 3. But as long as it works, Great.

@ashkulz
Copy link
Member

ashkulz commented Feb 13, 2014

can you post the changed XSL so that I can see if it can be easily done in option 3?

@Jimbobnz
Copy link
Author

Doh. There is no easy way to submit my custom version of the xsl file via github.

@ashkulz
Copy link
Member

ashkulz commented Feb 13, 2014

Post a gist?

@Jimbobnz
Copy link
Author

Still learning how to use github, hopefully you can see this.

https://gist.github.com/Jimbobnz/fb10e0b0197088214e0e

@mn4367
Copy link
Contributor

mn4367 commented Feb 13, 2014

I don't know if option 3 could be a replacement. For example, I'm inserting a caption before the list of TOC entries and I'm parsing the list entries in XSL to change the enumeration style from decimal to roman for certain entries (see https://gist.github.com/mn4367/8972015). So the old decision mentioned by @antialize to offer XSLT was the best way to serve allmost all needs.

For option 3, if you allow Javascript in the TOC HTML page it could be possible to modify the entries or do other things to customize the TOC. @Jimbobnz suggestion to post the TOC structure in a simple format to the HTML page sounds good to me since it allows a user to do with it what she/he wants to do. In this case I think that a standard example or default page would be useful for those users who aren't comfortable with hacking Javascript just to get a default TOC.

Still, retaining page numbers is crucial and forward and backward links like in older versions are very nice. And to push it further, I'm also using headers and footers in the TOC for example to display page numbers =8).

@ashkulz
Copy link
Member

ashkulz commented Feb 13, 2014

@mn4367: read the notes for the above commit and try to see if it works for you. The XSLT support in XmlPatterns is not complete, and the advanced stuff you are using may/may not work.

@ashkulz ashkulz added the Fixed label Feb 13, 2014
@npinchot
Copy link
Contributor

Mac OS X binaries for the development snapshot are available.

@mn4367
Copy link
Contributor

mn4367 commented Feb 16, 2014

Sorry for being very late on this topic. For OS X I can report that TOC generation doesn't work at all (issue #1534). On Windows 764 it's a little bit different:

  • In general a TOC is generated, meaning the XSL part of my stylesheet is accepted by wkhtmltopdf. The only thing that had to be adapted is the namespace.
  • My stylesheet is XSLT Version 1 and I get the warning Warning in file:///....wkhtmltopdf.exe, at line 11, column 1: Running an XSL-T 1.0 stylesheet with a 2.0 processor.. Since XSLT 2 is AFAIK backwards compatible to XSLT 1 in my opinion this warning isn't necessary.
  • In the resulting PDF the layout is completely scrambled. Obviously the CSS in the styleshet seems to be a problem for rc12. But I'm quite sure that there is nothing wrong with it since the previos versions rendered it correctly. I also did a check with Firefox and Safari (using my stylesheet and the outline.xml to create a HTML page) and both rendered it ok.
  • Another problem is that for example german umlaut characters aren't handled properly. Just use the built-in default stylesheet and change the

    tag to Table of Contents äöüÄÖÜß and you'll see the effect.

  • The same is true for umlaut characters in the document itself. The document renders correctly, but the corresponding entry in the TOC is wrong.

So I think it's fair to reopen this issue.

@mikus
Copy link

mikus commented Feb 17, 2014

I'm using the TOC feature as well. It's working again on Ubuntu 64bit even with more complicated xsl than default one. I am very grateful for this patch.

@ashkulz
Copy link
Member

ashkulz commented Feb 18, 2014

@mn4367: you have to make it XSLT 2.0, as otherwise QtXmlPatterns does not work properly. If that doesn't work, please open a new issue with those details (you seem to have created a lot of issues as well ...)

@mn4367
Copy link
Contributor

mn4367 commented Feb 18, 2014

OK, I checked it again, with <xsl:stylesheet version="1.0" ... and <xsl:stylesheet version="2.0" ... with the default built-in stylesheet and my stylesheet. Same results. I just thought that the warning isn't necessary since all XSLT features used in the default built-in stylsheet and my stylesheet are covered by XSLT 1.0 and stylesheet version is just an indicator for the processor what to expect. I can use XSLT 2.0, no problem. I'll create a new issue.

PS:
For the number of issues, is that a problem?

@jwernerny
Copy link

[Someone asked for me too reports?]

We use TOC and it is broken in 0.12. It works as expected in 0.11.0-rc2.

@ashkulz
Copy link
Member

ashkulz commented Mar 6, 2014

@jwernerny: I don't think anyone really likes them, other than the reporter ... did the 0.12.1 build not fix it for you?

@jwernerny
Copy link

I just downloaded the 0.12.1-dev build (wkhtmltopdf 0.12.1-61b740ee72b5830ad1d07a9bea5246622ed4defb). It appears the issue is fixed in it. Thanks.

@wlievens
Copy link

Another "me too" here: I use the TOC feature, and the 0.12.1-dev build seems to generate it again.

However, it appears the --toc-depth option is gone. I was using that as well (I don't want my h3's in the TOC). Is there a fix for that?

@mn4367
Copy link
Contributor

mn4367 commented Mar 12, 2014

Just for the case the --toc-depth is no longer available, as a workaround you could always use your own XSL stylsheet to customize the result.

@wlievens
Copy link

Okay I did exactly that, turned out to be simple even for an xslt noob like me.

@ashkulz
Copy link
Member

ashkulz commented Mar 13, 2014

I think that the option is now called --outline-depth, there should not be a need to hack the XSL stylesheet.

@jarrett
Copy link

jarrett commented Apr 4, 2014

Would it be possible to make an OS X dev build? Or is there an older OS X build available that does have TOC support?

@ashkulz
Copy link
Member

ashkulz commented Apr 5, 2014

the one linked from the website should have this fix, although not all the later fixes. Unfortunately, I don't have access to an OS X environment so I can't really make builds for that platform.

@jarrett
Copy link

jarrett commented Apr 5, 2014

Sorry, which website do you mean? I tried the builds I could find, and none of them had TOC support.

Is there anyone on the team who's doing OS X builds? Or if not, would you like me to give it a try?

@ashkulz
Copy link
Member

ashkulz commented Apr 5, 2014

There is only one website - http://wkhtmltopdf.org try the development version

I think that the build_osx.sh script could possibly work but ymmv. @npinchot was going to port it to the common build script but he doesn't seem to have the time right now.

@jarrett
Copy link

jarrett commented Apr 5, 2014

Unfortunately, the one at http://wkhtmltopdf.org does not have TOC support as of now. I'll try build_osx.sh when I get a chance. Maybe if I'm feeling ambitious, I'll have a look at the common build script and see if I can contribute.

@ashkulz
Copy link
Member

ashkulz commented Apr 6, 2014

That's a bit unlikely as the snapshot posted above by @npinchot on 14-Feb had the changes.

@jarrett
Copy link

jarrett commented Apr 6, 2014

I think I may have found the source of my trouble. It appears that the command line arguments changed at some point. The top Google hit for "wkhtmltopdf manual" is this:

http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html

I was invoking the program per that manual.

But just now, I was able to find a newer version of the manual, which has different arguments for the TOC:

http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf_0.10.0_rc2-doc.html

I don't know if you have any control over that website, but if so, perhaps a big, bold deprecation notice should be added to the 0.9.9 manual. As it was the top Google hit, it didn't occur to me that a) I was reading an old version, and b) the command line flags had changed.

@ashkulz
Copy link
Member

ashkulz commented Apr 6, 2014

The authoritative reference is always the official website. I think that @antialize can possibly remove/redirect those links.

@ensensis
Copy link

I rise my hand to this "request" too. Using wkhtmltopdf 0.12.0 in fedora 20.

@ashkulz
Copy link
Member

ashkulz commented May 26, 2014

@ensensis: which request are you referring to? This issue has been fixed in the development build.

@jarrett
Copy link

jarrett commented May 26, 2014

Perhaps @ensensis is referring to my suggestion: That a deprecation notice be placed on the old manual. The old one, which is no longer correct, is still the top Google hit for "wkhtmltopdf manual."

@ensensis
Copy link

ensensis commented Jul 2, 2014

Sorry for delay. I was not notified for replies. @ashkulz , yes, you are correct, the issue has been fixed in dev build. Thank you so much

@jomarie
Copy link

jomarie commented Apr 8, 2015

Hi there. We recently had to upgrade our wkhtmltopdf version in changing servers - we are now running 0.12.2.1 with the latest QT patch, and I am experiencing the missing TOC issue. I understand from the last posts that this should be fixed - is it possible that something in the most recent wkhtml / QT build is causing problems again? There is just a blank page where the TOC is supposed to be.

Thanks in advance!

@jomarie
Copy link

jomarie commented Apr 8, 2015

An update on this issue, for anyone else who might be experiencing it:
We use the following:
toc --exclude-from-outline --toc-header-text \"\"
which correctly sets the document to use a TOC, but takes out the "Table of Contents" heading. If I take this out, or put any other text for the TOC header, the TOC displays again, otherwise it just loads a blank page.

For interest, is there any other way to have the TOC without a specific header? I checked the latest documentation at http://wkhtmltopdf.org/usage/wkhtmltopdf.txt but couldn't see any such option.

@ashkulz
Copy link
Member

ashkulz commented Apr 8, 2015

What was the version you were using previously?

@jomarie
Copy link

jomarie commented Apr 8, 2015

Previous version was wkhtmltopdf 0.10.0 rc2. A previous developer implemented it and everything was working until our server change, so had no need to change before now

@ashkulz
Copy link
Member

ashkulz commented Apr 8, 2015

If you can produce a small, reproducible test case with the latest version -- please report as separate issue. If you want, you can still download the previous version from the downloads page (see archive).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests