An HTML parser written in PHP. Based on nikic's PHP Parser.
HTML parser
goal is to simplify the traversal/modification of an HTML tree using the visitor pattern.
First, you'll want to parse your HTML using the Parser
in order to generate a data structure appropriate for the NodeTraverser
.
Once that is done, you specify one or many visitors that implement the operation you want to apply on the HTML elements.
Then, you traverse the HTML tree structure, which will call the visitors on every element entry/exit.
Finally, you may print back the final output as a string.
<?php
$code = file_get_contents('input.html');
$parser = new Parser();
$statements = $parser->parse($code);
$traverser = new NodeTraverser();
$traverser->addVisitor(new ElementStripper(['head', 'a'])); // A visitor which removes any element of a specific type
$statements = $traverser->traverse($statements);
$printer = new Printer();
$printer->output($statements);
The code is licensed under the MIT license. See LICENSE.