-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix empty msgid #2063
Fix empty msgid #2063
Conversation
… in NET:IMAP mail parser
Additionally, this also fixes an issue where the IMAP gem and the Mail gem are inconsistent in how they parse fields like the message-id. The IMAP gem sometimes leaves whitespace around the message-id leading to different MD5 results when checking for duplicate imports. Adding Happy to answer questions as always! |
@dsukhin thanks for you PR. Can you explain me, what's happen if there is no message id in the incoming email? By the way, normally every MTA adds one if there is none. Thanks for feedback! |
Hi @martini. Sorry for the slow reply. I do that by taking the rfc 822 string of the message (which has a fixed set of headers in a deterministic order) and taking the md5 of this to generate a message hash. I piece that together with a "gen-" to identify which ids are generated by Zammad and the fqdn of the sender to add a little more uniqueness. The code to generate this id is present both in the imap driver (which checks for message duplicates) and the email parser which creates the id for when a message is being added to the database. This prevents duplicate imports even in the case of messages without a message id. The second fix in this PR is actually for a bug. The imap driver and mail parser strip whitespace differently when parsing headers. Normally this is fine, but in the case of the message id which is run through MD5, an extra space in the front of the id changes the hash completely. Therefore it's important to strip any leading/lagging whitespace from the message id before computing the md5 in both places. Otherwise duplicates may be imported if you run the imap import twice on a message. I can provide an email test case in a bit to show this behavior more clearly. Please let me know of any questions. Happy to answer. |
This is a test message that gets imported multiple times with the whitespace differences of message id parsing in the Mail gem vs the IMAP gem. The line break on the message id line is fully compliant with the IMAP spec for long lines.
|
@dsukhin, thanks for your contribution and sorry to delay on getting it merged. We're currently in the process of incorporating / refactoring your changes, and I wanted to confirm something with you:
In your refactoring, the part that gets MD5-ed is the raw text of the email. In the IMAP driver, that looks like this: 261 msg = @imap.fetch(message_id, 'RFC822')[0].attr['RFC822']
# ...
270 local_message_id = '<gen-' + Digest::MD5.hexdigest(msg) + '@' + fqdn.strip + '>' and in the EmailParser, it looks like this: 478 '<gen-' + Digest::MD5.hexdigest(msg) + '@' + fqdn.strip + '>' In neither of these cases do you strip whitespace from the raw Have I misunderstood you, or is there something wrong with this patch? EDITWhoops, it looks like you were talking about If so, false alarm. Sorry for the confusion! |
Hey @rlue, Yup, that's right - I strip the fqdn (the full domain of the from address) in both places (imap driver and email parser) but the really important place to do this is here in the IMAP driver: This is for the case (for which I provided the test email above) where the message id is present in the message but has leading whitespace. The NET::IMAP module doesn't strip whitespace on its own the way the Mail gem parser does (see the example message above) so I made sure to add a strip in the imap driver. The Mail gem does strip on its own so it's not strictly necessary. This is the original commit where I added the fix: 8604bb2 (it's prior to the refactor tho). You are right that I left the It may be worth it to see for yourself how the imap driver parses the message-id with whitespace (but you will have to load the test message above into a test imap server). Ultimately, I feel this is a bug with NET::IMAP not Zammad but the Let me know if that helps! |
Hey @rlue -- was auditing my server today and found yet another interesting case that the IMAP gem and Mail gem deal with differently...
the message-id part is only the part in the angle brackets... comments like this are technically allowed by the spec and the Mail gem parser properly returns only Thinking about the design a little more - it seems that the only way to do clean message-id checks would be to use the same parser in both cases. In other words, use the Mail gem parser inside the imap module to extract the message-id. It's either this or using the uidvalidity+uid for unique identification of messages which would require a much larger refactor. Wanted to share this interesting new finding as we work to integrate this and think about the design so that its properly robust. Let me know if you have any questions! Happy to help. |
closed by accident - re-open |
Hey @dsukhin - any chance to have some tests for this? Let me know if we can help! |
Hi @thorsteneckel, sorry for the late reply - missed this message. The only fix needed for that is To test messages without a message id, simply remove the message id line and try to import and make sure it generates the "<gen-#hash#@fqdn|zammad-generated>" id in both the email parser and the IMAP driver. |
Hi @dsukhin - thanks for your feedback. I was thinking about tests in terms of code coverage/test files as part of this contribution. Do you mind adding them? Otherwise please let us know. It just might take longer though. |
Hi @dsukhin This part of code has changed quite a bit since the time this PR started. A part of the issue (message-ID on new line) was already fixed. However, parsing emails without message ID at all was still broken. This is now fixed with the commit above. I did use your core idea to use fingerprint of the body, but refactored it a bit . So commit shows up under my name, but I added credit to commit message. Fingers crossed it works for you. |
When messages come in with an empty or missing Message-Id header, its trivial to generate a unique one from the hash of the RFC822 message body. This is important to prevent duplicate imports via the imap driver if a message is ever looked at twice.
Happy to discuss and answer questions.