Skip to content

Package for fast integration between SQLAlchemy models and Scrapy spiders.

License

Notifications You must be signed in to change notification settings

xlurio/herodotus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Herodotus

Test Coverage Package version

Overview

Herodotus is a simple package for fastening the integration between Scrapy spiders and SQLAlchemy. With a few lines, it easily allows the data to be stored as soon as it gets scraped, providing a faster development for scraping tools.

Requirements

Installation

$ pip install herodotus

---> 100%

Usage

For using the herodotus, you need to wrap the parse method of your spider with the herodotus.parse_for_db decorator and pass a sqlalchemy.orm.Session object to it:

import herodotus

class MySpider(Spider):

    @herodotus.parse_for_db(session=session)
    def parse(response, **kwargs):
        // code

You will also need to yield a object mapped with a database table by using SQLAlchemy ORM:

import herodotus

Base = declarative_base()


class MappedModel(Base):
    
    __tablename__ == mappedmodel
    object_id = Column(Integer, primary_key=True)
    some_field = Column(String(50))


class MySpider(Spider):

    @herodotus.parse_for_db(session=session)
    def parse(response, **kwargs):
        // Scrape data
        yield MappedModel(object_id=object_id, some_field=some_field)

License

This project is licensed under the terms of the MIT license.