Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spidermon x Cerberus Documentation #6

Merged
merged 16 commits into from Aug 20, 2019
4 changes: 2 additions & 2 deletions docs/source/actions.rst
Expand Up @@ -475,7 +475,7 @@ SPIDERMON_REPORT_S3_REGION_ENDPOINT
.. _actions-sentry-action:

Sentry action
============
=============

This action allows you to send custom messages to `Sentry`_ when your
monitor suites finish their execution. To use this action
Expand Down Expand Up @@ -533,7 +533,7 @@ It could be set to any level provided by `Sentry Log Level`_
.. _SPIDERMON_SENTRY_FAKE:

SPIDERMON_SENTRY_FAKE
--------------------
---------------------

Default: ``False``

Expand Down
5 changes: 3 additions & 2 deletions docs/source/getting-started.rst
Expand Up @@ -229,8 +229,8 @@ Item validation

Item validators allows you to match your returned items with predetermined structure
ensuring that all fields contains data in the expected format. Spidermon allows
you to choose between schematics_ or `JSON Schema`_ to define the structure
of your item.
you to choose from schematics_,`JSON Schema`_ or `cerberus`_ to define structure and
validation tool needed for your item.

In this tutorial, we will use a schematics_ model to make sure that all required
fields are populated and they are all of the correct format.
Expand Down Expand Up @@ -385,6 +385,7 @@ The resulted item will look like this:

.. _`JSON Schema`: https://json-schema.org/
.. _`schematics`: https://schematics.readthedocs.io/en/latest/
.. _`cerberus`: https://docs.python-cerberus.org/en/latest/index.html
.. _`Scrapy`: https://scrapy.org/
.. _`Scrapy items`: https://docs.scrapy.org/en/latest/topics/items.html
.. _`Scrapy Tutorial`: https://doc.scrapy.org/en/latest/intro/tutorial.html
Expand Down
63 changes: 62 additions & 1 deletion docs/source/item-validation.rst
Expand Up @@ -18,7 +18,7 @@ the first step is to enable the built-in item pipeline in your project settings:
}

After that, you need to choose which validation library will be used. Spidermon
accepts schemas defined using schematics_ or `JSON Schema`_.
accepts schemas defined using schematics_, `JSON Schema`_ or cerberus_.

With schematics
---------------
Expand Down Expand Up @@ -87,6 +87,34 @@ an example of a schema for the quotes item from the :doc:`tutorial </getting-sta
]
}

With Cerberus
-------------

`Cerberus`_ is a powerful yet simple and lightweight data validation
tool, designed to be ​extensible​, allowing for custom validation​ and has ​no
dependencies. You can define what the field contains, what is required, the type of
each field, as well as dependencies and regex.

.. warning::

You need to install `cerberus`_ to use this feature.

This `usage`_ and `validation-rules`_ guide explain the main keywords and how to make a
schema. Here we have an example of a schema for the quotes item from the
:doc:`tutorial </getting-started>`.

.. code-block:: json

{
"quote": {"type": "string", "required": true},
"author": {"type": "string", "required": true},
"author_url": {"type": "string"},
"tags": {"type": "list"}
}

To use Cerberus validation, you would need to add
:ref:`SPIDERMON_VALIDATION_CERBERUS` setting to your `settings.py`

Settings
--------

Expand Down Expand Up @@ -193,6 +221,36 @@ as a `dict`:
OtherItem: '/path/to/otheritem_schema.json',
}

.. _SPIDERMON_VALIDATION_CERBERUS:

SPIDERMON_VALIDATION_CERBERUS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Default: ``None``

A `list` containing the local path of the item schema.

.. code-block:: python

# settings.py

SPIDERMON_VALIDATION_CERBERUS = [
'/path/to/schema.json',
'http://example.com/mycerberusschema',
{"Field": {"type": "number", "required":True}}
vipulgupta2048 marked this conversation as resolved.
Show resolved Hide resolved
]

If you are working on a spider that produces multiple items types, you can define paths to schema for each item as `dict` as shown below:

# settings.py

from quotes.items import DummyItem, OtherItem

SPIDERMON_VALIDATION_CERBERUS = {
DummyItem: '/path/to/dummyitem_schema.json',
OtherItem: '/path/to/otheritem_schema.json',
}

Validation in Monitors
----------------------

Expand Down Expand Up @@ -238,3 +296,6 @@ Some examples:
.. _`guide`: http://json-schema.org/learn/getting-started-step-by-step.html
.. _`schematics models`: https://schematics.readthedocs.io/en/latest/usage/models.html
.. _`jsonschema`: https://pypi.org/project/jsonschema/
.. _`cerberus`: https://pypi.org/project/Cerberus/
.. _`usage`: http://docs.python-cerberus.org/en/latest/usage.html
.. _`validation-rules`: http://docs.python-cerberus.org/en/latest/validation-rules.html