-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue reading content encoded as Windows-1258 #141
Comments
Hi @Lepelley Could you confirm which version of mail-mime-parser you're using? Some old versions had an issue with base64 decoding that was doing that using php's built-in decoding, so I had switched to my own based on psr7 streams with guzzlehttp... I don't think that issue was specific to 5.4, and unfortunately can't think of anything else that may be causing that. Otherwise -- if it's not a version issue... it would be very helpful if you could narrow it down to an email and see if a test could be written based on it so we can fix it. |
I was using the 1.2.0 version, but i also tried with the 1.2.3. The email i got the error with (anonymised some data) : Deleted |
Can you confirm it's the base64 encoded image part that the issue happens on? |
My problem is that the content of the mail is truncated, not sure if it's cause of the base64 image. |
Hmm, so it could be an issue with quoted-printable... the content as in specifically the text part or the html part or both? |
Both |
Hi @Lepelley Sorry for the delay looking at this. This is actually happening to me on php 7.4.3 as well actually, but what I've noticed is that it specifies a weird charset for the content: "windows-1258", which according to Wikipedia is "a code page used in Microsoft Windows to represent Vietnamese texts.". Using your attached example, if I manually update the charsets to iso-8859-1, I'm able to see the entire content for both the text/plain and text/html parts. I'm not sure if this is an issue on my end (or with zbateson/mb-wrapper), with php, or with the incorrect charset specified... any ideas? |
Hello @zbateson, |
I've narrowed it down to an iconv function, so this could be system-specific, down to the version of iconv being used potentially (or existing in php's implementation of the function calling iconv, lol). In zbateson/mb-wrapper, I end up calling: iconv_substr($decodedText, 0, 2037, 'CP1258'); $decodedText containing the html or text part after being quoted-printable decoded. Unfortunately iconv_substr is only returning 11 characters, and I'm not sure why. It seems to successfully convert from CP1258 to UTF-8, and calling iconv_strlen on $decodedText also returns '2037' in this case. I noticed converting to UTF-8, then calling mb_substr seems to work (mb_substr doesn't support these Windows charsets and some others, hence why it's using iconv). Unfortunately that's additional work getting the correct results, but I've had to do that elsewhere too anyway. |
Oh! I went in to create a test and it seems I was kind of aware of this: I have a comment that reads "// seems to fail only on CP1258, returns incorrect number of characters with iconv_substr". Aah well, I guess time to work that out ;) |
This is fixed in zbateson/mb-wrapper 1.0.1. I released a new mail-mime-parser version 1.3.0 which requires that version, but just updating your dependencies in 1.x will also work. If you get a chance, please have a look and make sure all is well for you now :) |
Works perfectly, thank you ! |
I retrieve mails from Gmail API using your library and for some cases (like less that 3%), it returns some characters, but not all of them, on PHP 5.4.16, but returns everything on 7.4.6.
<?php $decodedMail = "mime string"; $mime = Message::from($decodedMail); echo $mime->getHtmlContent();
Do you have some ideas that can cause that difference ? We are bound to upgrade PHP version, but i'm not sure to force my boss to do that yet.
The text was updated successfully, but these errors were encountered: