Skip to content

A command line application designed to crawl a given set of URLs and scrape the JSON Linked Data (JSON-LD) contained within the webpage before writing the data entries out to a CSV file.

License

Notifications You must be signed in to change notification settings

wintermi/get-linked-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Get Linked Data

Workflows Go Report License Release

Description

A command line application designed to crawl a given set of URLs and scrape the JSON Linked Data (JSON-LD) contained within the webpage before writing the data entries out to a CSV file.

USAGE:
    get-linked-data -i URL_CSV -s ELEMENT_SELECTOR -o OUTPUT_CSV -e FAILED_URL_CSV

ARGS:
  -d string
    	Field Delimiter  (Required) (default ",")
  -e string
    	Failed Request URLs Output CSV File  (Required)
  -g	Scrape Google's Cached Version Instead
  -i string
    	CSV File containing URLs to Scrape  (Required)
  -j string
    	jq Selector
  -o string
    	Output Scraped Data CSV File  (Required)
  -p int
    	Parallelism or Maximum allowed Concurrent Requests (default 100)
  -s string
    	Element Selector  (Required)
  -v	Output Verbose Detail
  -w int
    	Random Wait Time in Milliseconds between Requests (default 2000)
  -x	Scrape XML not HTML

Example

get-linked-data -i "urls.csv" -e "script#product-schema" -o "results.csv"

License

get-linked-data is released under the Apache License 2.0 unless explicitly mentioned in the file header.

About

A command line application designed to crawl a given set of URLs and scrape the JSON Linked Data (JSON-LD) contained within the webpage before writing the data entries out to a CSV file.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages