Node.js module/CLI app for scraping links from a webpage
$ npm install pagelinks
You can also install it globally to use the CLI version
$ npm install pagelinks -g
Pass an object with the uri, file or data for the page you want to scrape.
const pagelinks = require('pagelinks')
const options = {
uri: 'http://www.google.com'
}
pagelinks(options, function(error, links) {
if (error) {
throw error
}
console.log(links)
})
or
const options = {
file:'mypath/myfile.html'
}
or
const options = {
data: <pagedata>
}
returns
[
{
text: 'Søk',
href: 'https://www.google.no/webhp?tab=ww',
id: 'gb_1',
class: 'gbzt gbz0l gbp1'
},
{
text: 'Bilder',
href: 'http://www.google.no/imghp?hl=no&tab=wi',
id: 'gb_2',
class: 'gbzt'
},
{
text: 'Maps',
href: 'http://maps.google.no/maps?hl=no&tab=wl',
id: 'gb_8',
class: 'gbzt'
},
{
text: 'Play',
href: 'https://play.google.com/?hl=no&tab=w8',
id: 'gb_78',
class: 'gbzt'
},
{ text: 'Alt om Google', href: '/intl/no/about.html' },
{
text: 'Google.no',
href: 'http://www.google.com/setprefdomain?prefdom=NO&prev=http://www.google.no/&sig=K_Yael_-8yUXfGhE8aXDXMo07ePOo%3D'
}
]
Default the module will return text, 'href', 'id', 'target' and 'class' from the links. For other attributes/properties supply an array of attributes.
const options = {
uri:<uri>,
attrs:['href', 'data-title', 'data-description']
}
To use it as a CLI app install it globally.
To display help
$ pagelinks --help
To display version
$ pagelinks --version
Usage:
$ pagelinks <uri>
or
$ pagelinks --file=<file>
or
$ pagelinks --data=<data>
Default the module will return text, 'href', 'id', 'target' and 'class' from the links. For other attributes/properties supply a comma separated string of attributes.
$ pagelinks <uri> --attrs='href,data-title,data-description'