PHP library for parsing plain text email content.
PHP
Clone or download
Latest commit 8308e02 Feb 7, 2018

README.md

EmailReplyParser

Build Status Total Downloads Latest Stable Version PHP7 ready

EmailReplyParser is a PHP library for parsing plain text email content, based on GitHub's email_reply_parser library written in Ruby.

Installation

The recommended way to install EmailReplyParser is through Composer:

composer require willdurand/email-reply-parser

Usage

Instantiate an EmailParser object and parse your email:

<?php

use EmailReplyParser\Parser\EmailParser;

$email = (new EmailParser())->parse($emailContent);

You get an Email object that contains a set of Fragment objects. The Email class exposes two methods:

  • getFragments(): returns all fragments;
  • getVisibleText(): returns a string which represents the content considered as "visible".

The Fragment represents a part of the full email content, and has the following API:

<?php

$fragment = current($email->getFragments());

$fragment->getContent();

$fragment->isSignature();

$fragment->isQuoted();

$fragment->isHidden();

$fragment->isEmpty();

Alternatively, you can rely on the EmailReplyParser to either parse an email or get its visible content in a single line of code:

$email = \EmailReplyParser\EmailReplyParser::read($emailContent);

$visibleText = \EmailReplyParser\EmailReplyParser::parseReply($emailContent);

Known Issues

Quoted Headers

Quoted headers aren't picked up if there's an extra line break:

On <date>, <author> wrote:

> blah

Also, they're not picked up if the email client breaks it up into multiple lines. GMail breaks up any lines over 80 characters for you.

On <date>, <author>
wrote:
> blah

The above On ....wrote: can be cleaned up with the following regex:

$fragment_without_date_author = preg_replace(
    '/\nOn(.*?)wrote:(.*?)$/si',
    '',
    $fragment->getContent()
);

Note though that we're search for "on" and "wrote". Therefore, it won't work with other languages.

Possible solution: Remove "reply@reply.github.com" lines...

Weird Signatures

Lines starting with - or _ sometimes mark the beginning of signatures:

Hello

--
Rick

Not everyone follows this convention:

Hello

Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub

**********************DISCLAIMER***********************************
* Note: blah blah blah                                            *
**********************DISCLAIMER***********************************

Strange Quoting

Apparently, prefixing lines with > isn't universal either:

Hello

--
Rick

________________________________________
From: Bob [reply@reply.github.com]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick

Unit Tests

Setup the test suite using Composer:

$ composer install

Run it using PHPUnit:

$ phpunit

Contributing

See CONTRIBUTING file.

Credits

License

EmailReplyParser is released under the MIT License. See the bundled LICENSE file for details.