Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

meta data for author, title, subject, generator / creator / producer #2000

Closed
matzke opened this issue Oct 15, 2014 · 16 comments
Closed

meta data for author, title, subject, generator / creator / producer #2000

matzke opened this issue Oct 15, 2014 · 16 comments

Comments

@matzke
Copy link

matzke commented Oct 15, 2014

see https://code.google.com/p/wkhtmltopdf/issues/detail?id=1095

@ashkulz
Copy link
Member

ashkulz commented Oct 31, 2014

This requires a change in the upstream Qt PDF generation code.

@matrixise
Copy link

ok closed, but what's the solution for this problem ? I think there is no solution ? is it right ?

@ashkulz
Copy link
Member

ashkulz commented Jun 1, 2015

Correct, there is no solution unless changes are made upstream in Qt.

@matzke
Copy link
Author

matzke commented Jun 1, 2015

@ashkulz do you know where i could write a ticket/feature request for that issue?

@ashkulz
Copy link
Member

ashkulz commented Jun 2, 2015

On the Qt issue tracker.

@matzke
Copy link
Author

matzke commented Jun 2, 2015

ok thanks - there is already a ticket for QPdfWriter - https://bugreports.qt.io/browse/QTBUG-44451 - however i am not sure, if this is the right class/module!? what component is the "upstream Qt PDF" part of?

@ashkulz
Copy link
Member

ashkulz commented Jun 2, 2015

It is correct. You might want to watch the issue in Qt to get notified when it gets fixed.

@garex
Copy link

garex commented Dec 20, 2015

Vote for the issue in upstream: https://bugreports.qt.io/browse/QTBUG-44451

@chris-scheurle
Copy link

In case anyone else wants meta data support without having to wait for any upstream fixes: I wrote this little script. Yes, it's messy but it works at least for me (OS X 10.11.6, on Linux you'd maybe need to install php (or rewrite that part)). 😄
It uses wkhtmltopdf to do the conversion to pdf, then exiftool to add the meta data, then qpdf to relinearize and write protect the file.

html2pdf.sh

#!/bin/bash

# run: . html2pdf.sh input.html output.pdf

if [[ "$#" > 1 ]]; then
    title="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<title>(.*)<\/title>.*$/\1/p' "${1}")")"
    author="$(php -r 'echo html_entity_decode(rawurldecode($argv[1]), ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<link rel="author" href="mailto:([^"]*)".*/\1/p' "${1}")")"
    if [ -z "${author}" ]; then
        author="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="author" content="([^"]*)".*/\1/p' "${1}")")"
    fi
    subject="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="description" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
    keywords="$(php -r 'echo implode("|[SEPARATOR]|", preg_split("/\s*,[\s,]*/", html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8"), -1, PREG_SPLIT_NO_EMPTY))."\n";' "$(sed -En -e 's/^.*<meta name="keywords" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
    generator="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="generator" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
    if [[ -z "${generator}" ]]; then
        generator='-'
    fi
    wkhtmltopdf \
        --load-error-handling 'abort' --load-media-error-handling 'abort' \
        --print-media-type --minimum-font-size 1 \
        -B 10mm -L 10mm -R 10mm -T 10mm -O Landscape -s A4 \
        --no-stop-slow-scripts \
        --run-script 'window.setTimeout(function(){window.status = "FOOBAR";}, 1000);' --window-status 'FOOBAR' \
        --title "${title}" "${1}" "${tmp}" \
    && exiftool \
        -z -P -sep "|[SEPARATOR]|" \
        -XMP:Format="application/pdf" \
        -Title="${title}" \
        -PDF:Subject="${subject}" -XMP:Description="${subject}" \
        -PDF:Author="${author}" -XMP:Creator="${author}" \
        -XMP:Keywords="${keywords//|\[SEPARATOR\]|/, }" -PDF:Keywords="${keywords//|\[SEPARATOR\]|/, }" \
        -XMP:Subject="${keywords//"\""/}" -AppleKeywords="${keywords//|\[SEPARATOR\]|/, }" \
        -XMP:Marked=True \
        -XMP:DocumentID="$([ -f "${2}" ] && exiftool -q -z -P -s3 -XMP:DocumentID "${2}" || exiftool -q -z -P -p 'uuid:$ExifTool:newguid' "${tmp}")" \
        -XMP:InstanceID="uuid:$(exiftool -q -z -P -s3 -ExifTool:newguid "${tmp}")" \
        -PDF:Creator="${generator}" -XMP:CreatorTool="${generator}" \
        -Producer="$(exiftool -q -z -P -p '$PDF:Creator / $PDF:Producer' "${tmp}")" \
        -CreateDate="$([ -f "${2}" ] && exiftool -q -z -P -s3 -PDF:CreateDate "${2}" || exiftool -q -z -P -s3 -PDF:CreateDate "${tmp}")" '-ModifyDate<PDF:CreateDate' '-XMP:MetadataDate<PDF:CreateDate' \
        -overwrite_original_in_place "${tmp}" \
    && qpdf \
        --suppress-recovery \
        --linearize --stream-data=compress \
        --encrypt "" "$(md5 -q -s "${RANDOM}$(( x=RANDOM, y=RANDOM, x>=y?x-y:y-x ))$(( $(date +%s) % RANDOM ))$(( x=RANDOM, y=RANDOM, x>=y?x-y:y-x ))${RANDOM}")" 128 \
        --accessibility=y --extract=y --print=full --modify=none -- \
        "${tmp}" "${tmp2}"
    if [[ $? -lt 1 ]]; then
        cp -f "${tmp2}" "${2}"
    else
        echo 'Some error occured' 1>&2
    fi
    if [[ -f "${tmp}" ]]; then
        rm -f "${tmp}"
    fi
    if [[ -f "${tmp2}" ]]; then
        rm -f "${tmp2}"
    fi
else
    echo "2 parameters expected, only got $#" 1>&2
fi

turns

<!DOCTYPE html>
<html lang="de">
    <head>
        <meta charset="utf-8">
        <title>Some Page</title>
        <meta name="author" content="Foo Bar">
        <link rel="author" href="mailto:Foo%20Bar%20%3cfoo%40bar.com%3e">
        <meta name="description" lang="en" content="Some really nice page">
        <meta name="keywords" lang="en" content="foo,bar,page,pdf,wkhtmltopdf,exiftool,qpdf">
        <meta name="generator" lang="en" content="Brackets">
    </head>
    <body style="font-family:sans-serif; font-size:200%;">
        <h1>Foobar!</h1>
        <p>Lorem… <em>you know the drill.</em></p>
    </body>
</html>

into
screenshot01
screenshot02

@jeacksmcione
Copy link

can you tell me how to run into windows 7?

@mikeponco
Copy link

helo chris can you tell me it`s work for another web? example wkhtmltopdf.exe http://google.com/ google.pdf? please answer thx

@chris-scheurle
Copy link

chris-scheurle commented Feb 11, 2017

@jeacksmcione commented:

can you tell me how to run into windows 7?

I'm afraid I can't. But the programs I mentioned also run on windows, so you should at least be able to do each of these steps, manually.

@chris-scheurle
Copy link

@mikeponco commented:

helo chris can you tell me it`s work for another web?

No, it does not.

You could try to use a tool like curl to download the file, first. Or use a headless browser like phantomjs to get the meta data.

@mikeponco
Copy link

owww thanks chris i think wktopdf same with mpdf can create metadata but great tools wktopd:D

xmo-odoo added a commit to odoo-dev/odoo that referenced this issue Dec 12, 2018
To avoid having to fixup half a dozen places where we're creating PDF
writers, and possibly ending up with new ill-configured writers in
the future, patch PyPDF2's own writer with a subclass setting /Creator
and /Producer.

Note that this will not affect non-post-processed PDFs generated by
wkhtmltopdf. wkhtmltopdf does not allow setting these properties[0][1], so
to fix this issue we'd have to alter _run_wkhtmltopdf to pass the
result through PyPDF2 in order to alter its metadata.

[0] wkhtmltopdf/wkhtmltopdf#2000
[1] https://bugreports.qt.io/browse/QTBUG-44451
@inetbiz
Copy link

inetbiz commented Apr 2, 2019

In case anyone else wants meta data support without having to wait for any upstream fixes: I wrote this little script. Yes, it's messy but it works at least for me (OS X 10.11.6, on Linux you'd maybe need to install php (or rewrite that part)). 😄
It uses wkhtmltopdf to do the conversion to pdf, then exiftool to add the meta data, then qpdf to relinearize and write protect the file.

html2pdf.sh

#!/bin/bash

# run: . html2pdf.sh input.html output.pdf

if [[ "$#" > 1 ]]; then
    title="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<title>(.*)<\/title>.*$/\1/p' "${1}")")"
    author="$(php -r 'echo html_entity_decode(rawurldecode($argv[1]), ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<link rel="author" href="mailto:([^"]*)".*/\1/p' "${1}")")"
    if [ -z "${author}" ]; then
        author="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="author" content="([^"]*)".*/\1/p' "${1}")")"
    fi
    subject="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="description" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
    keywords="$(php -r 'echo implode("|[SEPARATOR]|", preg_split("/\s*,[\s,]*/", html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8"), -1, PREG_SPLIT_NO_EMPTY))."\n";' "$(sed -En -e 's/^.*<meta name="keywords" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
    generator="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="generator" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
    if [[ -z "${generator}" ]]; then
        generator='-'
    fi
    wkhtmltopdf \
        --load-error-handling 'abort' --load-media-error-handling 'abort' \
        --print-media-type --minimum-font-size 1 \
        -B 10mm -L 10mm -R 10mm -T 10mm -O Landscape -s A4 \
        --no-stop-slow-scripts \
        --run-script 'window.setTimeout(function(){window.status = "FOOBAR";}, 1000);' --window-status 'FOOBAR' \
        --title "${title}" "${1}" "${tmp}" \
    && exiftool \
        -z -P -sep "|[SEPARATOR]|" \
        -XMP:Format="application/pdf" \
        -Title="${title}" \
        -PDF:Subject="${subject}" -XMP:Description="${subject}" \
        -PDF:Author="${author}" -XMP:Creator="${author}" \
        -XMP:Keywords="${keywords//|\[SEPARATOR\]|/, }" -PDF:Keywords="${keywords//|\[SEPARATOR\]|/, }" \
        -XMP:Subject="${keywords//"\""/}" -AppleKeywords="${keywords//|\[SEPARATOR\]|/, }" \
        -XMP:Marked=True \
        -XMP:DocumentID="$([ -f "${2}" ] && exiftool -q -z -P -s3 -XMP:DocumentID "${2}" || exiftool -q -z -P -p 'uuid:$ExifTool:newguid' "${tmp}")" \
        -XMP:InstanceID="uuid:$(exiftool -q -z -P -s3 -ExifTool:newguid "${tmp}")" \
        -PDF:Creator="${generator}" -XMP:CreatorTool="${generator}" \
        -Producer="$(exiftool -q -z -P -p '$PDF:Creator / $PDF:Producer' "${tmp}")" \
        -CreateDate="$([ -f "${2}" ] && exiftool -q -z -P -s3 -PDF:CreateDate "${2}" || exiftool -q -z -P -s3 -PDF:CreateDate "${tmp}")" '-ModifyDate<PDF:CreateDate' '-XMP:MetadataDate<PDF:CreateDate' \
        -overwrite_original_in_place "${tmp}" \
    && qpdf \
        --suppress-recovery \
        --linearize --stream-data=compress \
        --encrypt "" "$(md5 -q -s "${RANDOM}$(( x=RANDOM, y=RANDOM, x>=y?x-y:y-x ))$(( $(date +%s) % RANDOM ))$(( x=RANDOM, y=RANDOM, x>=y?x-y:y-x ))${RANDOM}")" 128 \
        --accessibility=y --extract=y --print=full --modify=none -- \
        "${tmp}" "${tmp2}"
    if [[ $? -lt 1 ]]; then
        cp -f "${tmp2}" "${2}"
    else
        echo 'Some error occured' 1>&2
    fi
    if [[ -f "${tmp}" ]]; then
        rm -f "${tmp}"
    fi
    if [[ -f "${tmp2}" ]]; then
        rm -f "${tmp2}"
    fi
else
    echo "2 parameters expected, only got $#" 1>&2
fi

turns

<!DOCTYPE html>
<html lang="de">
    <head>
        <meta charset="utf-8">
        <title>Some Page</title>
        <meta name="author" content="Foo Bar">
        <link rel="author" href="mailto:Foo%20Bar%20%3cfoo%40bar.com%3e">
        <meta name="description" lang="en" content="Some really nice page">
        <meta name="keywords" lang="en" content="foo,bar,page,pdf,wkhtmltopdf,exiftool,qpdf">
        <meta name="generator" lang="en" content="Brackets">
    </head>
    <body style="font-family:sans-serif; font-size:200%;">
        <h1>Foobar!</h1>
        <p>Lorem… <em>you know the drill.</em></p>
    </body>
</html>

into
screenshot01
screenshot02

@chris-scheurle Could you turn this into a github project and also add licensing and licensing URL?

@sunnybear
Copy link

at least field /Application (which is filled with wkhtmltopdf 0.12.5) can be changed inside wkhtmltopdf :)

xmo-odoo added a commit to odoo-dev/odoo that referenced this issue Mar 23, 2020
To avoid having to fixup half a dozen places where we're creating PDF
writers, and possibly ending up with new ill-configured writers in
the future, patch PyPDF2's own writer with a subclass setting /Creator
and /Producer.

Note that this will not affect non-post-processed PDFs generated by
wkhtmltopdf. wkhtmltopdf does not allow setting these properties[0][1], so
to fix this issue we'd have to alter _run_wkhtmltopdf to pass the
result through PyPDF2 in order to alter its metadata.

[0] wkhtmltopdf/wkhtmltopdf#2000
[1] https://bugreports.qt.io/browse/QTBUG-44451
robodoo pushed a commit to odoo/odoo that referenced this issue Mar 23, 2020
To avoid having to fixup half a dozen places where we're creating PDF
writers, and possibly ending up with new ill-configured writers in
the future, patch PyPDF2's own writer with a subclass setting /Creator
and /Producer.

Note that this will not affect non-post-processed PDFs generated by
wkhtmltopdf. wkhtmltopdf does not allow setting these properties[0][1], so
to fix this issue we'd have to alter _run_wkhtmltopdf to pass the
result through PyPDF2 in order to alter its metadata.

[0] wkhtmltopdf/wkhtmltopdf#2000
[1] https://bugreports.qt.io/browse/QTBUG-44451

closes #29460

Signed-off-by: Xavier Morel (xmo) <xmo@odoo.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

9 participants