Skip to content

Latest commit

 

History

History
48 lines (37 loc) · 1.74 KB

README.md

File metadata and controls

48 lines (37 loc) · 1.74 KB

Get Linked Data

Workflows Go Report License Release

Description

A command line application designed to crawl a given set of URLs and scrape the JSON Linked Data (JSON-LD) contained within the webpage before writing the data entries out to a CSV file.

USAGE:
    get-linked-data -i URL_CSV -s ELEMENT_SELECTOR -o OUTPUT_CSV -e FAILED_URL_CSV

ARGS:
  -d string
    	Field Delimiter  (Required) (default ",")
  -e string
    	Failed Request URLs Output CSV File  (Required)
  -g	Scrape Google's Cached Version Instead
  -i string
    	CSV File containing URLs to Scrape  (Required)
  -j string
    	jq Selector
  -o string
    	Output Scraped Data CSV File  (Required)
  -p int
    	Parallelism or Maximum allowed Concurrent Requests (default 100)
  -s string
    	Element Selector  (Required)
  -v	Output Verbose Detail
  -w int
    	Random Wait Time in Milliseconds between Requests (default 2000)
  -x	Scrape XML not HTML

Example

get-linked-data -i "urls.csv" -e "script#product-schema" -o "results.csv"

License

get-linked-data is released under the Apache License 2.0 unless explicitly mentioned in the file header.