Skip to content

Commit

Permalink
Added option to preserve comments (#179)
Browse files Browse the repository at this point in the history
* Added option to preserve comments.
* Updated docs on preserve_comments option.
  • Loading branch information
straube authored and colinodell committed Nov 2, 2019
1 parent 63adb92 commit 1faad81
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 6 deletions.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Typically you would convert HTML to Markdown if:

1. You have an existing HTML document that needs to be edited by people with good taste.
2. You want to store new content in HTML format but edit it as Markdown.
3. You want to convert HTML email to plain text email.
3. You want to convert HTML email to plain text email.
4. You know a guy who's been converting HTML to Markdown for years, and now he can speak Elvish. You'd quite like to be able to speak Elvish.
5. You just really like Markdown.

Expand Down Expand Up @@ -95,6 +95,24 @@ $html = '<span>Turnips!</span><div>Monkeys!</div>';
$markdown = $converter->convert($html); // $markdown now contains ""
```

By default, all comments are stripped from the content. To preserve them, use the `preserve_comments` option, like this:

```php
$converter = new HtmlConverter(array('preserve_comments' => true));

$html = '<span>Turnips!</span><!-- Monkeys! -->';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!<!-- Monkeys! -->"
```

To preserve only specific comments, set `preserve_comments` with an array of strings, like this:

```php
$converter = new HtmlConverter(array('preserve_comments' => array('Eggs!')));

$html = '<span>Turnips!</span><!-- Monkeys! --><!-- Eggs! -->';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!<!-- Eggs! -->"
```

### Style options

By default bold tags are converted using the asterisk syntax, and italic tags are converted using the underlined syntax. Change these by using the `bold_style` and `italic_style` options.
Expand Down Expand Up @@ -161,7 +179,7 @@ $markdown = $converter->convert($html); // $markdown now contains "### Header" a

Headers of H3 priority and lower always use atx style.

- Links and images are referenced inline. Footnote references (where image src and anchor href attributes are listed in the footnotes) are not used.
- Links and images are referenced inline. Footnote references (where image src and anchor href attributes are listed in the footnotes) are not used.
- Blockquotes aren't line wrapped – it makes the converted Markdown easier to edit.

### Dependencies
Expand Down Expand Up @@ -193,4 +211,3 @@ Use one of these great libraries:
- [Parsedown](https://github.com/erusev/parsedown)

No guarantees about the Elvish, though.

38 changes: 37 additions & 1 deletion src/Converter/CommentConverter.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,35 @@

namespace League\HTMLToMarkdown\Converter;

use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;

class CommentConverter implements ConverterInterface
class CommentConverter implements ConverterInterface, ConfigurationAwareInterface
{
/**
* @var Configuration
*/
protected $config;

/**
* @param Configuration $config
*/
public function setConfig(Configuration $config)
{
$this->config = $config;
}

/**
* @param ElementInterface $element
*
* @return string
*/
public function convert(ElementInterface $element)
{
if ($this->shouldPreserve($element)) {
return '<!--' . $element->getValue() . '-->';
}
return '';
}

Expand All @@ -23,4 +41,22 @@ public function getSupportedTags()
{
return array('#comment');
}

/**
* @param ElementInterface $element
*
* @return bool
*/
private function shouldPreserve(ElementInterface $element)
{
$preserve = $this->config->getOption('preserve_comments');
if ($preserve === true) {
return true;
}
if (is_array($preserve)) {
$value = trim($element->getValue());
return in_array($value, $preserve);
}
return false;
}
}
5 changes: 3 additions & 2 deletions src/HtmlConverter.php
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ public function __construct($options = array())
'remove_nodes' => '', // space-separated list of dom nodes that should be removed. example: 'meta style script'
'hard_break' => false, // Set to true to turn <br> into `\n` instead of ` \n`
'list_item_style' => '-', // Set the default character for each <li> in a <ul>. Can be '-', '*', or '+'
'preserve_comments' => false, // Set to true to preserve comments, or set to an array of strings to preserve specific comments
);

$this->environment = Environment::createDefaultEnvironment($defaults);
Expand Down Expand Up @@ -229,13 +230,13 @@ protected function sanitize($markdown)

return trim($markdown, "\n\r\0\x0B");
}

/**
* Pass a series of key-value pairs in an array; these will be passed
* through the config and set.
* The advantage of this is that it can allow for static use (IE in Laravel).
* An example being:
*
*
* HtmlConverter::setOptions(['strip_tags' => true])->convert('<h1>test</h1>');
*/
public function setOptions(array $options)
Expand Down
7 changes: 7 additions & 0 deletions tests/HtmlConverterTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,13 @@ public function test_strip_comments()
$this->html_gives_markdown('<p>Test</p><!-- Test comment -->', 'Test', array('strip_tags' => true));
}

public function test_preserve_comments()
{
$this->html_gives_markdown('<p>Test</p><!-- Test comment -->', "Test\n\n<!-- Test comment -->", array('preserve_comments' => true));
$this->html_gives_markdown('<p>Test</p><!-- more -->', "Test\n\n<!-- more -->", array('preserve_comments' => array('more')));
$this->html_gives_markdown('<p>Test</p><!-- Test comment --><!-- more -->', "Test\n\n<!-- more -->", array('preserve_comments' => array('more')));
}

public function test_preserve_whitespace()
{
$this->html_gives_markdown('<a href="google.com">google.com</a> <code>test</code>', '[google.com](google.com) `test`');
Expand Down

0 comments on commit 1faad81

Please sign in to comment.