Skip to content

Latest commit

 

History

History
148 lines (119 loc) · 4.42 KB

items.rst

File metadata and controls

148 lines (119 loc) · 4.42 KB

Items

The :ref:`provided item classes <item-api>` can be used to map data extracted from web pages, e.g. using :ref:`page objects <page-objects>`.

Creating items from dictionaries

You can create an :ref:`item <items>` from any :class:`dict`-like object via the :meth:`~zyte_common_items.Item.from_dict` method.

For example, to create a :class:`~zyte_common_items.Product`:

>>> from zyte_common_items import Product
>>> data = {
...     'url': 'https://example.com/',
...     'mainImage': {
...         'url': 'https://example.com/image.png',
...     },
...     'gtin': [
...         {'type': 'gtin13', 'value': '9504000059446'},
...     ],
... }
>>> product = Product.from_dict(data)

:meth:`~zyte_common_items.Item.from_dict` applies the right classes to nested data, such as :class:`~zyte_common_items.components.media.Image` and :class:`~zyte_common_items.components.gtin.Gtin` for the input above.

>>> product.url
'https://example.com/'
>>> product.mainImage
Image(url='https://example.com/image.png')
>>> product.canonicalUrl
>>> product.gtin
[Gtin(type='gtin13', value='9504000059446')]

Creating items from lists

You can create items in bulk using the :meth:`~zyte_common_items.Item.from_list` method:

>>> from zyte_common_items import Product
>>> data_list = [
...     {'url': 'https://example.com/1', 'name': 'Product 1'},
...     {'url': 'https://example.com/2', 'name': 'Product 2'},
...     {'url': 'https://example.com/3', 'name': 'Product 3'},
...     {'url': 'https://example.com/4', 'name': 'Product 4'}
... ]
>>> products = Product.from_list(data_list)
>>> len(products)
4
>>> products[0].url
'https://example.com/1'
>>> products[3].name
'Product 4'

This can be especially useful if you're processing lots of items from an API, file, database, etc.

Handling unknown fields

:ref:`Items <items>` and :ref:`components <components>` do not allow attributes beyond those they define:

>>> from zyte_common_items import Product
>>> product = Product(url="https://example.com", foo="bar")
Traceback (most recent call last):
...
TypeError: ... got an unexpected keyword argument 'foo'
>>> product = Product(url="https://example.com")
>>> product.foo = "bar"
Traceback (most recent call last):
...
AttributeError: 'Product' object has no attribute 'foo'

However, when using :meth:`~zyte_common_items.Item.from_dict` and :meth:`~zyte_common_items.Item.from_list`, unknown fields assigned to items and components won't cause an error. Instead, they are placed inside the :attr:`~zyte_common_items.Item._unknown_fields_dict` attribute, and can be accessed the same way as known fields using :class:`~zyte_common_items.ZyteItemAdapter`:

>>> from zyte_common_items import Product, ZyteItemAdapter
>>> data = {
...     'url': 'https://example.com/',
...     'unknown_field': True,
... }
>>> product = Product.from_dict(data)
>>> product._unknown_fields_dict
{'unknown_field': True}
>>> adapter = ZyteItemAdapter(product)
>>> adapter['unknown_field']
True

This allows compatibility with future field changes in the input data, which could cause backwards incompatibility issues.

Note, however, that unknown fields are only supported within items and components. Input processing can still fail for other types of unexpected input:

>>> from zyte_common_items import Product
>>> data = {
...     'url': 'https://example.com/',
...     'mainImage': 'not a dictionary',
... }
>>> product = Product.from_dict(data)
Traceback (most recent call last):
...
ValueError: Expected mainImage to be a dict with fields from zyte_common_items.components.media.Image, got 'not a dictionary'.
>>> data = {
...     'url': 'https://example.com/',
...     'breadcrumbs': 3,
... }
>>> product = Product.from_dict(data)
Traceback (most recent call last):
...
ValueError: Expected breadcrumbs to be a list, got 3.

Defining custom items

You can subclass :class:`~zyte_common_items.Item` or any :ref:`item subclass <items>` to define your own item.

:class:`~zyte_common_items.Item` is a slotted attrs class and, to enjoy the benefits of that, subclasses should also be slotted attrs classes. For example:

>>> import attrs
>>> from zyte_common_items import Item
>>> @attrs.define
... class CustomItem(Item):
...     foo: str

Mind that slotted attrs classes do not support multiple inheritance.