Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selector support for DOMText is missing #62

Open
heldchen opened this issue Feb 1, 2021 · 0 comments
Open

Selector support for DOMText is missing #62

heldchen opened this issue Feb 1, 2021 · 0 comments

Comments

@heldchen
Copy link

heldchen commented Feb 1, 2021

What is this feature about (expected vs actual behaviour)?

in the original simple_html_dom it is possible to use text in a css selector to get the DOMText element (i.e. xpath's text() equivalent) back. in voku/simple_html_dom this unfortunately fails.

How can I reproduce it?

$html = '<div> foo <br /> bar </div>';

$dom = (new voku\helper\HtmlDomParser())->loadHtml($html);
var_dump($dom->find('div text', 0)->plaintext);

$dom = (new simple_html_dom())->load($html);
var_dump($dom->find('div text', 0)->plaintext);

output:

string '' (length=0)
string ' foo ' (length=5)

Does it take minutes, hours or days to fix?

hours

Any additional information?

there seems to be already some sort of support for selecting the text node, just not in combination with a css selector:

$html = '<div> foo <br /> bar </div>';

$dom = (new voku\helper\HtmlDomParser())->loadHtml($html);
var_dump($dom->find('div', 0)->find('text', 0)->plaintext);

$dom = (new simple_html_dom())->load($html);
var_dump($dom->find('div', 0)->find('text', 0)->plaintext);

output:

string 'foo' (length=3)
string ' foo ' (length=5)

I think css selector support could be added by checking if the last token in the css selector is text, and if so stripping it from the selector, then applying the existing //text() xpath replacement on the nodes result set. having text appear at any other place of the selector does not make much sense. it does require a bit of logic though, as the css selector after all could be using multiple targets (i.e. ->find('div text, span text', 0))

that said, unfortunately the current ->find('text') implementation behaves a bit weird as it's trimming the white space which more often than not is an important part when explicitly looking for text nodes.

@heldchen heldchen changed the title Selector support for TEXT is missing Selector support for DOMText is missing Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant