New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupt image when output is specified as "-" in wkhtmltoimage on Windows #1758

Closed
rugglese opened this Issue Jun 4, 2014 · 9 comments

Comments

Projects
None yet
4 participants
@rugglese

rugglese commented Jun 4, 2014

I am using the 64-bit version of wkhtmltoimage (--version 0-12.1-61cda93 (with patched qt)) and am spawning a process via C# to convert html to and image. I am attempting to programmatically read the output of the process in code rather than writing to a file. When reading via StandardOutput, the file is corrupted (testfile2.jpg).

I have successfully created pdfs out of the same html via wkhtmltopdf using the same process.

private static void Main(string[] args)
        {
            var desktop = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            var psi = new ProcessStartInfo();
            psi.FileName = Path.Combine(@"D:\Program Files (x86)\wkhtmltopdf\bin", @"wkhtmltoimage.exe");
            psi.WorkingDirectory = Path.GetDirectoryName(psi.FileName);
            psi.UseShellExecute = false;
            psi.CreateNoWindow = true;
            psi.RedirectStandardInput = true;
            psi.RedirectStandardOutput = false;
            psi.RedirectStandardError = false;
            psi.Arguments = @"-n -q - " + desktop + @"\testfile.jpg";

            var psi2 = new ProcessStartInfo();
            psi2.FileName = Path.Combine(@"D:\Program Files (x86)\wkhtmltopdf\bin", @"wkhtmltoimage.exe");
            psi2.WorkingDirectory = Path.GetDirectoryName(psi.FileName);
            psi2.UseShellExecute = false;
            psi2.CreateNoWindow = true;
            psi2.RedirectStandardInput = true;
            psi2.RedirectStandardOutput = true;
            psi2.RedirectStandardError = true;
            psi2.Arguments = @"-n -q - -";

            var p = Process.Start(psi);
            using (StreamWriter stdin = p.StandardInput)
            {
                stdin.AutoFlush = true;
                stdin.Write(@"<h1>Hello, World!</h1>");
            }
            p.StandardInput.Close();

            // wait or exit
            p.WaitForExit(60000);

            p.Close();

            var p2 = Process.Start(psi2);
            using (StreamWriter stdin = p2.StandardInput)
            {
                stdin.AutoFlush = true;
                stdin.Write(@"<h1>Hello, World!</h1>");
            }
            p2.StandardInput.Close();
            //read output

            byte[] file;
            var buffer = new byte[32768];
            using (var ms = new MemoryStream())
            {
                while (true)
                {
                    int read = p2.StandardOutput.BaseStream.Read(buffer, 0, buffer.Length);
                    if (read <= 0)
                        break;
                    ms.Write(buffer, 0, read);
                }
                file = ms.ToArray();
            }

            p2.StandardOutput.Close();

            // wait or exit
            p2.WaitForExit(60000);
            p2.Close();

            using (var fs = new FileStream(desktop + @"\testfile2.jpg", FileMode.OpenOrCreate))
            {
                fs.Write(file, 0, file.Length);
            }
        }

testfile.jpg
testfile

testfile2.jpg
testfile2

@ashkulz

This comment has been minimized.

Member

ashkulz commented Jun 4, 2014

Does it work from the command line?

@rugglese

This comment has been minimized.

rugglese commented Jun 4, 2014

No. It is corrupted as well.

image

testimage.jpg
testimage

The pdf turned out fine, but I can't upload it.

test html.html contains:

<h1>Hello, World!</h1>
@mn4367

This comment has been minimized.

Contributor

mn4367 commented Jun 4, 2014

I can confirm this. Comparing both files binary shows a difference in the header. But also the file size is different, with the example from @rugglese the difference is 37 Bytes, with other, larger files the difference is bigger (about 120 Bytes). The difference in the header seems to be independent of the file size, at a first glance the pattern is always the same.

On the Mac there is no such problem.

@ashkulz ashkulz changed the title from Windows C# wkhtmltoimage - stdout corrupted when reading from code to corrupt image when output is specified as "-" in wkhtmltoimage on Windows Jun 5, 2014

@ashkulz ashkulz added the Verified label Jun 5, 2014

@dewiniaid

This comment has been minimized.

dewiniaid commented Jun 30, 2014

A quick scan of the output in a hex editor shows a bunch of "0D 0A" byte (CR LF) pairs, which suggests something in the pipeline is trying to convert linebreaks that aren't actually linebreaks. This is why it works fine on Mac, as the linebreaks are already correct.

A similar issue was addressed in Postgresql here (found with some quick searching) http://www.postgresql.org/message-id/16907.1106764636@sss.pgh.pa.us

@ashkulz

This comment has been minimized.

Member

ashkulz commented Jul 1, 2014

@dewiniaid: thanks for the pointer! I will try it and see if it helps.

@ashkulz

This comment has been minimized.

Member

ashkulz commented Jul 1, 2014

@dewiniaid: the following patch seems to fix the issue, but I'm not sure I'm applying it at the right place (it is more of a sledgehammer approach):

diff --git a/src/image/wkhtmltoimage.cc b/src/image/wkhtmltoimage.cc
index 0aeedc5..b552154 100644
--- a/src/image/wkhtmltoimage.cc
+++ b/src/image/wkhtmltoimage.cc
@@ -26,7 +26,15 @@
 #include <wkhtmltox/imagesettings.hh>
 #include <wkhtmltox/utilities.hh>

+#ifdef WIN32
+#include <io.h>
+#include <fcntl.h>
+#endif
+
 int main(int argc, char** argv) {
+#ifdef WIN32
+   setmode(fileno(stdout), O_BINARY);
+#endif
    //This will store all our settings
    wkhtmltopdf::settings::ImageGlobal settings;
    //Create a command line parser to parse commandline arguments
diff --git a/src/pdf/wkhtmltopdf.cc b/src/pdf/wkhtmltopdf.cc
index 8250262..caaf35e 100644
--- a/src/pdf/wkhtmltopdf.cc
+++ b/src/pdf/wkhtmltopdf.cc
@@ -34,6 +34,11 @@
 #include <wkhtmltox/pdfsettings.hh>
 #include <wkhtmltox/utilities.hh>

+#ifdef WIN32
+#include <io.h>
+#include <fcntl.h>
+#endif
+
 using namespace wkhtmltopdf::settings;
 using namespace wkhtmltopdf;

@@ -113,6 +118,9 @@ void parseString(char * buff, int &nargc, char **nargv) {
 }

 int main(int argc, char * argv[]) {
+#ifdef WIN32
+   setmode(fileno(stdout), O_BINARY);
+#endif
    //This will store all our settings
    PdfGlobal globalSettings;
    QList<PdfObject> objectSettings;

I'll investigate it a bit further and commit a fix.

@ashkulz

This comment has been minimized.

Member

ashkulz commented Jul 2, 2014

Duh! This was already reported as issue 61 and fixed for wkhtmltopdf in 48bc4c8 -- I just have to port the fix for wkhtmltoimage. Still, thanks for the pointer @dewiniaid!

@ashkulz ashkulz closed this in c2053e9 Jul 2, 2014

@ashkulz ashkulz added Fixed and removed Verified labels Jul 2, 2014

@ashkulz ashkulz added this to the 0.12.2 milestone Jul 2, 2014

@ashkulz

This comment has been minimized.

Member

ashkulz commented Jul 8, 2014

A development snapshot 0.12.2-6a13a51 is available, which should fix this issue. Please report back if your issue is not solved with the above snapshot.

@ashkulz

This comment has been minimized.

Member

ashkulz commented Jan 10, 2015

0.12.2 has been released, which includes changes related to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment