Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

->getHeaderValue() is normalizing whitespaces #127

Closed
ThomasLandauer opened this issue Aug 5, 2020 · 10 comments
Closed

->getHeaderValue() is normalizing whitespaces #127

ThomasLandauer opened this issue Aug 5, 2020 · 10 comments
Labels

Comments

@ThomasLandauer
Copy link
Contributor

If I have this in the email (notice the two spaces):

Subject: foo  bar

...then $message->getHeader('subject')->getValue() and $message->getHeaderValue('subject') both give me this (notice the single space):

foo bar

I didn't look at your code yet, I wanted to ask you in advance: Are you doing this on purpose?

RFC 5322 says:

These are referred to as unstructured field bodies. Semantically, unstructured field bodies are simply to be treated as a single line of characters with no further processing (except for "folding" and "unfolding" as described in section 2.2.3).

@zbateson
Copy link
Owner

zbateson commented Aug 5, 2020

Without investigating I couldn't say either, haha :) let me know if you manage to have a look before I do here.

@ThomasLandauer
Copy link
Contributor Author

I didn't read RFC 5322 from start to end. But as it says (see above): "with no further processing", I'm 99% sure that it's perfectly legal to have several whitespaces in the subject. After all, you certainly can have several whitespaces in the message's body ;-)

The problem comes from the fact that you split the subject on whitespaces into MimeLiteralParts. What's the purpose of that? Why don't you just keep it as a single string?

@zbateson
Copy link
Owner

zbateson commented Aug 5, 2020

It's because 'Subject' can actually contain RFC 2047 mime-encoded parts... that may be simply considered an 'extension' to 5322, and you may still be right regardless though (haven't looked for clues why I'm not preserving more than one whitespace, or if it's intentional).

@zbateson
Copy link
Owner

zbateson commented Aug 6, 2020

The reason this happens is this line here:

return $this->partFactory->newToken(' ');

Which as you've observed doesn't preserve multiple whitespaces in subjects (because the separator token is '\s+' as well). This works well for RFC 2047 encoding when they're next to text or next to each other, and how that's supposed to work.

  • gmail always shows without the extra spaces
  • outlook web same
  • Thunderbird shows the extra spaces in the 'list' of messages, but doesn't in the message preview and view windows

I'm feeling torn on this one...

On the one hand, you're right -- the RFC doesn't specifically say they should be replaced by a single space as far as I can tell, but on the other hand I'm not sure most users would specifically write code to handle having the extra spaces either, and it may be more of an expectation that a subject wouldn't.

Open to hearing arguments on this one :).

@zbateson
Copy link
Owner

zbateson commented Aug 6, 2020

Also I'm not sure in this case that my usual Thunderbird test is valid... in this case I'm looking at their "display" which is different from their parsing also. Probably the Thunderbird 'positive' is more just that it wasn't handled specifically in that case, or that everywhere else it's being displayed as html anyway and supporting multiple spaces would be more work.

@ThomasLandauer
Copy link
Contributor Author

  • php-mime-mail-parser's $parser->getHeader('subject') does keep the whitespaces.
  • Thunderbird's Composer removes them upon composing of the message. But it does keep them on received (i.e. written elsewhere) messages in the source code, and in the messages' list in the main window. It removes them in the message's window itself.
  • But my main argument is the RFC :-)

@zbateson
Copy link
Owner

zbateson commented Aug 6, 2020

Yeah, I already agreed with all of those -- my remaining question is about the value in changing what's there and user expectation.

@ThomasLandauer
Copy link
Contributor Author

Well, if you're asking me: You cannot be proud for following every RFC, and in this case ask for "user expectation"...

@zbateson
Copy link
Owner

zbateson commented Aug 6, 2020

Hahaha... well you got me there I guess 👍

@zbateson
Copy link
Owner

zbateson commented Dec 8, 2020

Released in 1.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants