Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in address parser #165

Closed
postme opened this issue Feb 28, 2021 · 6 comments
Closed

Bug in address parser #165

postme opened this issue Feb 28, 2021 · 6 comments
Labels

Comments

@postme
Copy link

postme commented Feb 28, 2021

Hi Zaahid, hope you're doing well! I found a bug in the address parser, I created a little test case:

<?php

declare(strict_types=1);

require __DIR__ . '/vendor/autoload.php';

use ZBateson\MailMimeParser\Message;

$emailString = <<<EOD
From: "Test Sender" <test@sender.org>
To: "Test Recipient" <test@recipient.org>, "Test Recipient" <test@recipient.org:>, "Test Recipient" <,test@recipient.org>
Subject: Test
Date: Tue, 23 Feb 2021 19:01:51 +0000
Message-ID:
 <AM8P189MB142828DEF87769DE5F072D7BFF809@AM8P189MB1428.EURP189.PROD.OUTLOOK.COM>
MIME-Version: 1.0

Test
EOD;

/* Parse the e-mail and create a ZBateson/MailMimeParser object */
$email = Message::from($emailString);

if (null !== $email->getHeader('To')) {
    $toAddresses = $email->getHeader('To')->getAddresses();
}

var_dump($toAddresses);

This yields:

Address 2 and 3 are incorrect, address 2 yields an empty string, address 3 yields a corrected string. Ideally the address parser should throw an error if it can't correctly process the address?

P.S. I have resolved the issue for now by getting the raw values of To, Cc and Bcc and parsing them with egulias/emailvalidator so no rush here.

Kind regards
Meint

@zbateson
Copy link
Owner

zbateson commented Mar 3, 2021

Hi Meint,

Nice to hear from you! Yes, doing well thank you :)

The issue you're experiencing (off the top of my head) is because top level for an address line is the AddressGroupConsumer. This means ':' takes precedence over actual addresses, so something in the form 'To: mylist: addr1@example.com, addr2@example.com; another-recipient@example.com' correctly parses the 'mylist' part as a group. It's definitely a less commonly-used feature, but part of the standards nonetheless. It looks out for ':' characters to start a group, and ';' as a separator.

There may be a way to solve this by being more intuitive about ignoring the ':' part once in an address... I'll investigate this more thoroughly to figure out what can be done about it.

I've generally avoided throwing errors in favour of 'returning something' on a best-effort basis, kind of like an email client would do... I do like the idea proposed in #124 though. The trouble is, for something like this issue, without additional code to validate it probably wouldn't cause an error anyway -- and I'm not sure the additional validation is worth it.

All the best,
Zaahid

@postme
Copy link
Author

postme commented Mar 3, 2021

Hi Zaahid happy to hear you're doing well!
Thanks for explaining the behaviour, I understand the reasoning, ticket #124 would definitely be a nice way to go about it, i.e. returning something sane but still have an opportunity to inspect parser issues and decide on a different interpretation.

What's not entirely clear to me is whether the domain part of the example I gave, i.e. test@recipient.org: (with a colon at the end) would qualify as a valid domain? RFC 5322 states that the domain part is of type dtext and that can mean a whole range of characters (including colon) however it also states that it needs to constitute a real address, ie. resolvable and from that perspective I don't think that recipient.org: qualifies?

Kind regards
Meint

@zbateson
Copy link
Owner

zbateson commented Mar 3, 2021

I think you're right -- it wouldn't be a correct domain. For MMP's purposes though, I think that validation is out of scope -- as in it'll "return what's there" to the best of its abilities, but the validation for what constitutes correctness is up to the user. At least that's been my working policy so far.

@zbateson
Copy link
Owner

zbateson commented Mar 3, 2021

I see your point though too about the spec, the 'colon' should be handled normally. Thanks for pointing that out as well Meint :)

@zbateson zbateson added the bug label Mar 3, 2021
@postme
Copy link
Author

postme commented Mar 4, 2021

Hi Zaahid, you're welcome, I'm closing the ticket.

For anybody running into the same (or similar) issue, the solution is to retrieve the raw values like this:


$addresses = '';
foreach ($email->getAllHeaders() as $header) {
    if (strtolower($header->getName()) === 'to' || strtolower($header->getName()) === 'cc' || strtolower($header->getName()) === 'bcc') {
        $addresses .= $header->getRawValue() . ', ';
    }
}
$addresses = trim($addresses, ', ');

This will yield a string containing all recipients addresses which can be parsed by an email address validator.

@zbateson
Copy link
Owner

zbateson commented Nov 9, 2021

A fix for this has been released in 2.1.0 that addresses this in MMP directly -- a separate consumer for the email address portion of an address solves it, and it should've been done that way all along :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants