Skip to content
This repository has been archived by the owner on Dec 9, 2019. It is now read-only.

Reduce time costs with Requestium Session Pooling

License

Notifications You must be signed in to change notification settings

whatamithinking/requestiumpool

 
 

Repository files navigation

Requestium Pool

Pooling + Requestium ( requests + selenium + parsel )

RequestiumPooling

Very simple pooling scheme for working with multiple requestium Sessions. Reduce your time costs when running multiple sessions, while sticking to a number of instances you know your machine can handle.

Methods

  • acquire() : get a requestium Session. Sessions are built on demand, meaning that none will be created until they are requested. If pool maxed out and all Sessions are being used, acquire_wait_timeout sets how long you will wait for a free Session before giving up.
  • release : release the Session and return it to the pool, so that other processes can use it.
  • destroy : destroy the Session you pass to it.
  • stop : kill all Sessions. Multithreaded.

Installation

pip install git+git://github.com/ConnorSMaynes/requestiumpool

Usage

from threading import Thread
from requestiumpool import RequestiumPool

requestium_args = {
    'webdriver_path' : DRIVER_PATH
    ,'browser':BROWSER_NAME
    ,'default_timeout':15
    }
# for headless -> 'webdriver_options':{'arguments':['headless'] }

RPool = RequestiumPool( requestium_args, pool_size=2 )

def acquireAndFollow( url ):
    R = RPool.acquire(60)
    if R != None:
        print(url)
        R.driver.get( url )
        RPool.release( R )
    else:
        print( R )           # print None if no browser acquired within timeout

URLs = [ 'https://www.google.com/', 
            'https://www.stackoverflow.com/', 
            'https://www.github.com/' ]
threads = []
for i in range( 5 ):        
    for url in URLs:
        t = Thread( target=acquireAndFollow, args=(url,) )
        threads.append(t)
        t.start()

for t in threads:           # wait for all urls to be visited
    t.join()

RPool.stop()                # kill all requestium instances

NOTES

stop() is multithreaded, but it can take a while with a lot of instances.
By reusing browsers that are already open, you can significantly reduce time costs.

Similar Projects

This project was inspired by another project, Webdriver Pool, but for the requestium wrapping around Requests, Selenium, and Parsel.

License

Copyright © 2018, ConnorSMaynes. Released under the MIT.

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%