Binary downloads converted from source data? #503

jdbo opened this Issue Mar 31, 2013 · 3 comments


None yet

2 participants

jdbo commented Mar 31, 2013

capybara-webkit appears to be applying some sort of conversion to binary files downloaded through it (i.e. downloading the file attachment via page#body).

For example, the original file (a JPEG image) begins: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 00 01 00 00 FF E1 8D 08 45 78 69 66 00 00 49 49 (or "ˇÿˇ‡�JFIF����ˇ·ç�ExifII") while the downloaded file begins: C3 BF C3 98 C3 BF C3 A0 10 4A 46 49 46 01 01 01 01 C3 BF C3 A1 C2 8D 08 45 78 69 66 49 49 2A 08 (or "√ø√ò√ø√†�JFIF����√ø√°¬ç�ExifII*"). Not all elements are being converted (the similar runs of "4A 46 49 46 "/"JFIF" for example), but more than enough changes are being applied to make the downloaded file unusable as a JPEG.

I'm assuming that this conversion is happening at the driver/capybara-webkit level, but if you believe that it's happening elsewhere (within capybara itself, for example), please let me know. Any idea what this conversion might be? and/or how to bypass it?

I'm running the latest macports of qt4-mac (qt4-mac @4.8.4_6) on OS X 10.7.5 with capybara (2.0.3) and capybara-webkit (0.14.2), w/ ruby-1.9.2-p290 under RVM.

I'd post both images but the converted one is no longer recognized as such; I can email both files directly upon request.

mhoran commented Mar 31, 2013

This should be fixed on master, could you give it a shot? We've not released a new version yet as we're waiting for Capybara 2.1 to be released.

jdbo commented Apr 1, 2013

Thanks for the quick reply!

Unfortunately I'm still encountering some funky behavior around downloading/saving binaries (testing with images); testing as follows:

  • directly visiting the URL for an image and using page#save_page to download the image (downloading a PNG saves the following string: "\u0089PNG\r\n\u001A\n")
  • navigating through a site and using an image downloading link is still converting the binary data (as described above), but now also wrapping that data with <html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;"> and </pre></body></html> (it should be noted that the site I'm working with appears to be returning odd response headers, though I haven't been able to reproduce that outside of capybara)

BTW, I had a few issues installing from master:

  • I had to switch from macports to homebrew to workaround PCH (precompiled header?) "is a directory" errors
  • I had to install via gem install_specific -l as installing locally-built gems failed around missing "lib/capybara_webkit_builder" file during installation (for reasons unknown to me, extconf.rb appeared to want this file present before the file was installed). Please note that I'm pretty new to manually building/installing gems.

Let me know if you have any questions; BTW, I tested under the latest stable rvm with the latest stable ruby 1.9.3 on OS X 10.7.5.

@mhoran mhoran added a commit that closed this issue Apr 2, 2013
@mhoran mhoran Don't cast raw frame content to QString
The conversion is lossy and drops non-ASCII characters.

Fixes #503.
@mhoran mhoran closed this in e72b48f Apr 2, 2013
mhoran commented Apr 2, 2013

I found a bug in the Body command which was converting the raw content into a QString, which is a lossy conversion (non-ASCII characters are dropped.) Regarding the odd response headers, if the JPEG content is being served up as "text/html", unfortunately there's not much we can do. The Body command looks at the content type to determine whether to return the DOM (in the case of "text/html") or the raw content (which was happening, but it was being converted to ASCII characters.)

@youpy youpy added a commit to youpy/capybara-webkit that referenced this issue Jul 22, 2014
@mhoran @youpy mhoran + youpy Don't cast raw frame content to QString
The conversion is lossy and drops non-ASCII characters.

Fixes #503.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment