Very simple pooling scheme for working with multiple requestium Sessions. Reduce your time costs when running multiple sessions, while sticking to a number of instances you know your machine can handle.
acquire()
: get a requestium Session. Sessions are built on demand, meaning that none will be created until they are requested. If pool maxed out and all Sessions are being used,acquire_wait_timeout
sets how long you will wait for a free Session before giving up.release
: release the Session and return it to the pool, so that other processes can use it.destroy
: destroy the Session you pass to it.stop
: kill all Sessions. Multithreaded.
pip install git+git://github.com/ConnorSMaynes/requestiumpool
from threading import Thread
from requestiumpool import RequestiumPool
requestium_args = {
'webdriver_path' : DRIVER_PATH
,'browser':BROWSER_NAME
,'default_timeout':15
}
# for headless -> 'webdriver_options':{'arguments':['headless'] }
RPool = RequestiumPool( requestium_args, pool_size=2 )
def acquireAndFollow( url ):
R = RPool.acquire(60)
if R != None:
print(url)
R.driver.get( url )
RPool.release( R )
else:
print( R ) # print None if no browser acquired within timeout
URLs = [ 'https://www.google.com/',
'https://www.stackoverflow.com/',
'https://www.github.com/' ]
threads = []
for i in range( 5 ):
for url in URLs:
t = Thread( target=acquireAndFollow, args=(url,) )
threads.append(t)
t.start()
for t in threads: # wait for all urls to be visited
t.join()
RPool.stop() # kill all requestium instances
stop() is multithreaded, but it can take a while with a lot of instances.
By reusing browsers that are already open, you can significantly reduce time costs.
This project was inspired by another project, Webdriver Pool, but for the requestium wrapping around Requests, Selenium, and Parsel.
Copyright © 2018, ConnorSMaynes. Released under the MIT.