Skip to content
Timothy Duffy edited this page Jan 19, 2015 · 8 revisions

Welcome to the BarkingOwl wiki!

###About###

BarkingOwl is a set of tools packaged in a library that focus on finding different document types on websites (such as PDFs, DOC, XLS, TXT, HTML, etc). The library is made up of two primary parts: the Scraper and the Dispatcher.

BarkingOwl uses linmagic to type files.

Document Types

###Implementation Details###

The BarkingOwl Scraper is the core of the system, and does most of the hard work. There is an extension to the Scraper called the ScraperWrapper that allows for the Scraper to broadcast messages to a AMQP bus. The Scraper can be used as a stand-alone tool, or can be used via the ScraperWrapper in a message bus topology.

Scraper

The Dispatcher takes in a list of URLs and dispatches them to available Scrapers waiting on the AMQP bus. The dispatcher can run in a number of modes including 'broadcast all once' and 'broadcast each at an interval'.

Dispatcher

The Bus Access portion of BarkingOwl allows for the programmer to interface with any part of the system using the same AMQP bus that the the dispatcher and ScraperWrapper use.

Bus Access

Clone this wiki locally