Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReceivedHeader: address is null when hostname is unknown #104

Closed
toto4ds opened this issue Jan 18, 2020 · 1 comment
Closed

ReceivedHeader: address is null when hostname is unknown #104

toto4ds opened this issue Jan 18, 2020 · 1 comment

Comments

@toto4ds
Copy link

toto4ds commented Jan 18, 2020

Hello,
This header return address as null
Received: from mx.yandex.ru (unknown [114.234.11.107])
by my.mail.host (Postfix) with ESMTP id A11294BE6E49
for sc2015@my.mail.host; Sat, 18 Jan 2020 14:11:04 +0700 (+07)

ZBateson\MailMimeParser\Header\ReceivedHeader^ {#1216
#comments: array:2 [
0 => "unknown [114.234.11.107]"
1 => "Postfix"
]
#date: DateTime @1579331464 {#1266
date: 2020-01-18 14:11:04.0 +07:00
}
#parameters: array:5 [
"from" => ZBateson\MailMimeParser\Header\Part\ReceivedDomainPart^ {#1249
#ehloName: "mx.yandex.ru"
#hostname: null
#address: null
#name: "from"
#language: null
#canIgnoreSpacesBefore: false
#canIgnoreSpacesAfter: false
#languages: []
#value: "mx.yandex.ru"

thx

@zbateson
Copy link
Owner

Hi @toto4ds --

That's because "unknown" doesn't look like a hostname, the rules for parsing FROM and BY were created as follows (from https://mail-mime-parser.org/api/1.2/classes/ZBateson.MailMimeParser.Header.ReceivedHeader.html):

Anything outside and before a parenthesized expression is considered "the name", for example "FROM AlainDeBotton", "AlainDeBotton" would be the name, but also if the name is an address, but exists outside the parenthesized expression, it's still considered "the name". For example: "From [1.2.3.4]", getFromName would return "[1.2.3.4]".

A parenthesized expression MUST match what looks like either a domain name on its own, or a domain name and an address. Otherwise the parenthesized expression is considered a comment, and not parsed into hostname and address. The rules are defined loosely because many implementations differ in how strictly they follow the standard. For a domain, it's enough that the expression starts with any alphanumeric character and contains at least one '.', followed by any number of '.', '-' and alphanumeric characters. The address portion must be surrounded in square brackets, and contain any sequence of '.', ':', numbers, and characters 'a' through 'f'. In addition the string 'ipv6' may start the expression (for instance, '[ipv6:::1]' would be valid). A port number may also be considered valid as part of the address, for example: [1.2.3.4:3231]. No additional validation on the address is done, and so an invalid address such as '....' could be returned, so users using the 'address' header are encouraged to validate it before using it. The square brackets are parsed out of the returned address, so the value returned by getFromAddress() would be "2.2.2.2", not "[2.2.2.2]".

Because "Received" doesn't seem to be very documented and there is bound to be a large amount of variation in the wild, I'm not sure I want to chase every instance of a difference and make it work for Received.

Having said that, I could consider a change in parsing for this specific issue:

If the pattern of the parenthesized part is "(no-whitespace [ip.address])", ignore the the no-whitespace part if it doesn't look like a hostname (containing at least one '.')? Or what change would be the most beneficial?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants