GitHub

You just found Nemo!

image source

What?

A simple Python module to query html pages (or xml in general) using (almost) all available CSS selectors and rules, that doesn't bore you with weird objects, just plain old lists and dicts.

Why?

Why if we have BeautifulSoup? Because:

bs4 doesn't support advanced selectors, as a:not(.not-this-a) (not selector).
it gets more into the lxml performance range.
I wanted to make something useful in Nim.

How?

Nim is a very flexible and powerful language that I am delving a bit deeper into.
Nimquery is a great nim module/package/library that gives us the querying capabilities.
Nimpy is an awesome nim module/package/library that builds a python native extension (think numpy or pandas) from a nim module.

Getting it:

Build it on your OS:
- Make sure you have nim and nimble installed and working
- Clone this repo
- Run nimble bld to generate the sharedlib
- Run nimble tst to test it with a bundled python script
- And you are good to go!
Build it on a docker container (for use with alpine or ubuntu containers):
- Be sure to have make and docker insalled and working
- Clone this repo
- Run make build to get the alpine version (for ubuntu, set LINUX = ubuntu)
- Run make test to test it with the bundled python script on the same container used to build
- There you have your nemo.so file to put into your desired container!
Prebuilt binaries (macosx, alpine and ubuntu only, for the lazy ones):
- macosx
- alpine
- ubuntu

Usage:

import nemo # assuming this is in the module's path

queries = [
    'body span a:not(.first-item)',
    # all 'a's inside 'span's in 'body' that are not in '.first-item' class
    '[href$=".pdf"]',
    # all links to pdfs
    'p, span'
    # all of 'p's and 'span's
]

results = dict(nemo.find(some_html, queries)) 
# a dict mapping from the query-string to a list of the findings,
# where each finding is a dict with attributes and content on key 'text', like:
{
    'body span a:not(.first-item)' : [{'tag':'a', 'text':'hi', 'class':'last-item'}],
    '[href$=".pdf"]':[
        {'tag':'a', 'href':'link-to-pdf'},
        {'tag':'a', 'href':'link-to-other-pdf'}
    ],
    'p, span':[
    # loads of elements, or maybe none, who knows
    ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
Makefile		Makefile
README.MD		README.MD
index.html		index.html
nemo.nim		nemo.nim
nemo.nimble		nemo.nimble
nemo.py		nemo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You just found Nemo!

image source

What?

Why?

How?

Getting it:

Usage:

About

Releases 3

Packages

Contributors 2

Languages

License

yvern/nemo

Folders and files

Latest commit

History

Repository files navigation

You just found Nemo!

image source

What?

Why?

How?

Getting it:

Usage:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages