-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #7 from vladcalin/rework-config
Update docs
- Loading branch information
Showing
13 changed files
with
410 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
from .config import Configuration | ||
from .core import Crawlster, Job | ||
from .config import Configuration, JsonConfiguration | ||
from .core import Crawlster, Job, start | ||
|
||
__all__ = [ | ||
'Crawlster', | ||
'Job', | ||
'Configuration' | ||
'Configuration', | ||
'JsonConfiguration', | ||
'start' | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,83 @@ | ||
Extending the crawler with helpers | ||
================================== | ||
================================== | ||
|
||
The ``crawlster`` library makes very easy to extend the functionality | ||
of the crawler through helpers. A helper is only a utility class that is | ||
attached to the crawler instance. | ||
|
||
Core helpers: | ||
|
||
- :py:class:`crawlster.helpers.RequestsHelper` available as ``http``. | ||
- :py:class:`crawlster.helpers.UrlsHelper` available as ``urls``. | ||
- :py:class:`crawlster.helpers.ExtractHelper` available as ``extract``. | ||
- :py:class:`crawlster.helpers.StatsHelper` available as ``stats``. | ||
- :py:class:`crawlster.helpers.LoggingHelper` available as ``log``. | ||
- :py:class:`crawlster.helpers.QueueHelper` available as ``queue``. | ||
- :py:class:`crawlster.helpers.RegexHelper` available as ``regex``. | ||
|
||
|
||
Create your own helper | ||
---------------------- | ||
|
||
In order to create your own helper to enhance your crawler with super powers | ||
you need to subclass the :py:class:`crawlster.helpers.BaseHelper` base class. | ||
|
||
Then you can start implementing the functionality you need. | ||
|
||
|
||
Methods | ||
------- | ||
|
||
There is no required method that has to be overwritten, but there are some | ||
methods that can be overwritten to act as hooks. So far the only two | ||
available hooks are | ||
|
||
- :py:meth:`crawlster.helpers.BaseHelper.initialize` that performs actions | ||
on crawler start. | ||
- :py:meth:`crawlster.helpers.BaseHelper.finalize` that performs actions | ||
on crawler stop (when there are no more items to process). | ||
|
||
|
||
Configuration | ||
------------- | ||
|
||
Helpers can take advantage of the configuration system the library provides by | ||
providing the ``config_options`` attribute, a mapping of option name and | ||
option value. | ||
|
||
|
||
Attributes | ||
---------- | ||
|
||
The two attributes that are available inside the helper are | ||
``config`` and ``crawler``. | ||
|
||
The ``config`` attribute will hold the ``Configuration`` instance used to | ||
initialize the crawler. You can get values from the configuration using | ||
the ``self.config.get(option_name)`` method. | ||
|
||
The ``crawler`` attribute holds the current crawler instance through which | ||
the helper can access other helpers. Although it is recommended to make | ||
the helper as independent as possible, sometimes you would need to use | ||
the functionality already provided by some already existent helper (stats | ||
aggregation, logging, etc). | ||
|
||
Attaching the helper to the crawler | ||
----------------------------------- | ||
|
||
In the crawler definition, provide the helper instance as a class attribute | ||
|
||
|
||
:: | ||
|
||
class MyCrawler(Crawlster): | ||
|
||
my_helper = MyHelperClass() | ||
|
||
# ... | ||
|
||
def some_step(self, url): | ||
# ... | ||
self.my_helper.do_amazing_things() | ||
# ... | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
How to | ||
====== | ||
|
||
Here you will find some more in-depth guides on various topics. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
Parsing requests and extracting data | ||
==================================== | ||
|
||
We can parse the response data (or basically any string or bytes sequences) using | ||
the core ``.extract`` helper (:py:class:`crawlster.helpers.ExtractHelper`) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
Making HTTP requests | ||
==================== | ||
|
||
Http requests are made through the ``.http`` helper which is a : | ||
:py:class:`crawlster.helpers.RequestsHelper` instance. | ||
Http requests are made through the ``.http`` helper which is | ||
a :py:class:`crawlster.helpers.RequestsHelper` instance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,13 @@ | ||
Submitting results | ||
================== | ||
|
||
Submitting results is done via the :py:meth:`crawlster.Crawlster.submit_item` | ||
method. The single argument must be a :py:class:`dict` that represents the item. | ||
|
||
After being submitted, the item will be passed through all the defined item | ||
handlers. | ||
|
||
.. seealso:: | ||
|
||
The module reference for :py:mod:`crawlster.handlers` for more details and | ||
all the available item handler classes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,3 +7,4 @@ The crawlster module | |
.. autoclass:: crawlster.Crawlster | ||
:members: | ||
|
||
.. autofunction:: crawlster.start |
Oops, something went wrong.