-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using xpath or css selectors with htmerl #2
Comments
Hey! Thanks! 🤗 |
This is an idea I have played with for a while, but never really found a great way to do in a simple way. 😄 There is I've built some DSLs for just this (not open source), but eventually just did it all in Rust later anyway. 😉 I imagine there's an elegant way of turning path-like strings into simple functions based on their level and placement in the document, but haven't taken the time to actually build it! Might be a fun project for someone out there with more time on their hands than I have right now. 😺 |
Maybe a minimum example of a simple path extraction could be: -module(htmerl_example).
-export([run/0]).
run() ->
Html =
<<"<html><body><p>Check</p>nothing here<p>this <b>bold garbage</b></p>g"
"arbage<p>out!</p></body></html>">>,
XPath = <<"html/body/p">>,
Path =
lists:reverse(
binary:split(XPath, <<"/">>, [global])),
Opts = [{event_fun, fun xpath/3}, {user_state, {[], Path, []}}],
{ok, TextList, []} = htmerl:sax(Html, Opts),
TextList.
xpath({characters, Text}, _LineNum, {Path, Path, Acc}) ->
{Path, Path, [Text | Acc]};
xpath({endElement, _Ns, Ln, _}, _LineNum, {[Ln | Path], XPath, Acc}) ->
{Path, XPath, Acc};
xpath({startElement, _Ns, Ln, _, _Atts}, _LineNum, {Path, XPath, Acc}) ->
{[Ln | Path], XPath, Acc};
xpath(endDocument, _LineNum, {_Path, _XPath, Acc}) ->
lists:reverse(Acc);
xpath(_Event, _LineNum, State) ->
State.
|
Hey!
Maybe that example could be stuck in an example folder or the README or such for future users? |
I added the example in the README with #3 😸 |
Thank you 💜 |
Hello!
Thanks for another fab parser!
Is there a way to use xpath or css selectors with htmerl with this library? If not, do you have a recommended way to get certain elements from the list of SAX events?
Thanks,
Louis
The text was updated successfully, but these errors were encountered: