-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve detection of reply quotations #3
Comments
Why is parsing email bodies hard?
Biased source: SigParser, a paid service for email parsing Existing librariesI have found https://github.com/mailgun/talon (in Python) which is interesting for its quotation detection for Text and HTML, and its basic text signature detection (forget about the signature detection with machine learning). They also have a lot of real-world fixtures, which is invaluable. There is a JS port of it, made by people from Front, which I believe are great engineers. https://github.com/quentez/talonjs/ The repo is not documented, but it is recent and maintained. There is also another port https://github.com/lever/planer which is older and seems less complete. Both planer and talonjs requires a DOM implementation to work (xmldom or jsdom for example). talonjs also uses cheerio to cleanup the input document a bit. |
For information, below is the algorithm used by Talon for HTML messages
|
Things we could take from Mailspring:
Things we could take from TalonJS
|
We should improve the existing logic to detect the replied messages. We can use blockquotes as indicators, or common strings like
"On Friday, 27 November 2015, Your Tempo <contact@yourtempo.co> wrote"
.Here are some useful regexes for such messages in several languages
The text was updated successfully, but these errors were encountered: