This app will identify PDFs and other documents scrattered across the web. The motivating use-case is to identify prospective sales leads who will benefit from better document management services.
This app uses one of Zillabyte's pre-crawled copy of the web. The dataset, web_deep
, is a cached copy of the top 500 pages of the top 1 million domains on the web (500,000,000 pages total). The dataset updates with a fresh crawl every week. Although the web_deep
dataset is a good proxy for the "important stuff on the web", it is by no means exhaustive. If you require more coverage, you'll need to use Zillabyte's live-crawing feature. The drawback is that it may be more expensive, and will not be as quick to analyze. Contact us at support@zillabyte.com for more information.
To start running this app, tweak the code and zillabyte push
. For more information, check out docs.zillabyte.com