# Concurrency, parallelism, and asynchronous I/O

## Concurrency

> the appearance of doing more than one thing at a time (can be time-sliced)

- Easy to do with the `threading` or `multiprocessing` libraries in Python
- `libevent`, `gevent`, etc. provide concurrency as well

## Parallelism

> *actually* doing more than one thing at a time (multi-core/hyperthreading/distributed)

- The GIL prevents this in many *threaded* environments (**including** `libevent`, `gevent`, etc.)

## Asynchronous programming

> programming style where rather than blocking on I/O, we find something useful to do, and "come back" to the I/O later

- `twisted` did this with reactors and callbacks
- `libevent`, `gevent`, et. al. are *implicitly asynchronous* (things that would block in a thread instead yield to
an **event loop**, which finds something useful to do
- In Py3, (particularly 3.6+), we have an *explicitly asynchronous* style we can use (and 3.7 added `async`/`await` syntax to support it)

# Raise StopIteration(value)

In [1]:
def mygen():
    if False:
        yield
    return 'Something'

In [2]:
gen = mygen()

In [3]:
next(gen)

StopIteration: Something

In [8]:
def myprint():
    value = yield from mygen()
    print(value)

In [9]:
next(myprint())

Something


StopIteration: 

Asyncio in Py3.4-3.5

In [10]:
%%file data/asyncio-examples/asyncio-old.py
import asyncio
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s:%(name)s:%(message)s')

log = logging.getLogger()

def main():
    loop = asyncio.get_event_loop()    
    loop.run_until_complete(asyncio.gather(coroutine_1(), coroutine_2()))
    
    
@asyncio.coroutine
def coroutine_1():
    log.info('coroutine_1 is active on the event loop')

    log.info('coroutine_1 yielding control. Going to be blocked for 4 seconds')
    yield from asyncio.sleep(4)

    log.info('coroutine_1 resumed. coroutine_1 exiting')
    

@asyncio.coroutine
def coroutine_2():
    log.info('coroutine_2 is active on the event loop')

    log.info('coroutine_2 yielding control. Going to be blocked for 5 seconds')
    yield from asyncio.sleep(5)

    log.info('coroutine_2 resumed. coroutine_2 exiting')
    

if __name__ == '__main__':
    main()

Overwriting data/asyncio-examples/asyncio-old.py


In [11]:
!python data/asyncio-examples/asyncio-old.py

2019-03-20 14:11:52,920 INFO:root:coroutine_1 is active on the event loop
2019-03-20 14:11:52,920 INFO:root:coroutine_1 yielding control. Going to be blocked for 4 seconds
2019-03-20 14:11:52,920 INFO:root:coroutine_2 is active on the event loop
2019-03-20 14:11:52,920 INFO:root:coroutine_2 yielding control. Going to be blocked for 5 seconds
2019-03-20 14:11:56,925 INFO:root:coroutine_1 resumed. coroutine_1 exiting
2019-03-20 14:11:57,925 INFO:root:coroutine_2 resumed. coroutine_2 exiting


Asyncio with `async/await`

In [12]:
%%file data/asyncio-examples/asyncio-new.py
import asyncio
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s:%(name)s:%(message)s')

log = logging.getLogger()

def main():
    loop = asyncio.get_event_loop()    
    loop.run_until_complete(asyncio.gather(coroutine_1(), coroutine_2()))
    
    
async def coroutine_1():
    log.info('coroutine_1 is active on the event loop')

    log.info('coroutine_1 yielding control. Going to be blocked for 4 seconds')
    await asyncio.sleep(4)

    log.info('coroutine_1 resumed. coroutine_1 exiting')
    

async def coroutine_2():
    log.info('coroutine_2 is active on the event loop')

    log.info('coroutine_2 yielding control. Going to be blocked for 5 seconds')
    await asyncio.sleep(5)

    log.info('coroutine_2 resumed. coroutine_2 exiting')
    

if __name__ == '__main__':
    main()

Overwriting data/asyncio-examples/asyncio-new.py


In [13]:
!python data/asyncio-examples/asyncio-new.py

2019-03-20 14:13:14,842 INFO:root:coroutine_1 is active on the event loop
2019-03-20 14:13:14,842 INFO:root:coroutine_1 yielding control. Going to be blocked for 4 seconds
2019-03-20 14:13:14,842 INFO:root:coroutine_2 is active on the event loop
2019-03-20 14:13:14,842 INFO:root:coroutine_2 yielding control. Going to be blocked for 5 seconds
2019-03-20 14:13:18,846 INFO:root:coroutine_1 resumed. coroutine_1 exiting
2019-03-20 14:13:19,847 INFO:root:coroutine_2 resumed. coroutine_2 exiting


Slightly more complex: simple web crawler

In [14]:
!python3 -m pip install aiohttp-requests beautifulsoup4

[33mYou are using pip version 18.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [31]:
from urllib.parse import urljoin, splittype

import bs4
from aiohttp_requests import requests

async def get_links(url):
    response = await requests.get(url, ssl=False)
    text = await response.text()
    soup = bs4.BeautifulSoup(text, 'html.parser')
    hrefs = (a.attrs.get('href') for a in soup.find_all('a'))
    hrefs = (href for href in hrefs if href)
    hrefs = (urljoin(url, href) for href in hrefs)
    hrefs = (href for href in hrefs if splittype(href)[0] in ('http', 'https'))
    hrefs = (href.split('#')[0] for href in hrefs)
    return list(hrefs)


In [32]:
import re
import asyncio
from urllib.parse import urlparse

hrefs_seen = set()
is_python = re.compile(r'www\.python\.org')
queue = asyncio.Queue()

def valid_host(href):
    pr = urlparse(href)
    return is_python.search(pr.netloc)

async def queue_url(url):
    if url not in hrefs_seen and valid_host(url):
        hrefs_seen.add(url)
        await queue.put(url)
        
async def handle_page(url):
    print(f'Handling {url}')
    for link in await get_links(url):
        await queue_url(link)    
        
async def crawl():
    try:
        while len(hrefs_seen) < 200:
            url = queue.get_nowait()
            await handle_page(url)
    except asyncio.QueueEmpty:
        print('No more urls!')

In [33]:
ROOT = 'https://www.python.org'

In [34]:
await handle_page(ROOT)

Handling https://www.python.org


In [35]:
await asyncio.gather(*[crawl() for i in range(8)])

Handling https://www.python.org
Handling https://www.python.org/
Handling https://www.python.org/psf-landing/
Handling https://www.python.org/jobs/
Handling https://www.python.org/community/
Handling https://www.python.org/psf/donations/
Handling https://www.python.org/community/irc/
Handling https://www.python.org/about/
Handling https://www.python.org/about/apps/
Handling https://www.python.org/about/quotes/
Handling https://www.python.org/about/gettingstarted/
Handling https://www.python.org/about/help/
Handling https://www.python.org/downloads/
Handling https://www.python.org/downloads/source/


[None, None, None, None, None, None, None, None]

In [36]:
while queue.qsize():
    print(await queue.get())

https://www.python.org/downloads/windows/
https://www.python.org/downloads/mac-osx/
https://www.python.org/download/other/
https://www.python.org/download/alternatives
https://www.python.org/doc/
https://www.python.org/doc/av
https://www.python.org/doc/essays/
https://www.python.org/community/survey
https://www.python.org/community/diversity/
https://www.python.org/community/lists/
https://www.python.org/community/forums/
https://www.python.org/community/workshops/
https://www.python.org/community/sigs/
https://www.python.org/community/logos/
https://www.python.org/community/merchandise/
https://www.python.org/community/awards
https://www.python.org/psf/codeofconduct/
https://www.python.org/success-stories/
https://www.python.org/success-stories/category/arts/
https://www.python.org/success-stories/category/business/
https://www.python.org/success-stories/category/education/
https://www.python.org/success-stories/category/engineering/
https://www.python.org/success-stories/category/gov

In [38]:
import re
import asyncio
from urllib.parse import urlparse

hrefs_seen = set()
host_re = re.compile(r'www\.python\.org')
queue = asyncio.Queue()

def valid_host(href):
    pr = urlparse(href)
    return host_re.search(pr.netloc)

async def queue_url(url):
    if url not in hrefs_seen and valid_host(url):
        hrefs_seen.add(url)
        await queue.put(url)
        
async def handle_page(url):
    print(f'Handling {url}')
    for link in await get_links(url):
        await queue_url(link)    
        
async def crawl():
    try:
        while len(hrefs_seen) < 200:
            url = queue.get_nowait()
            await handle_page(url)
    except asyncio.QueueEmpty:
        print('No more urls!')
        
        
await handle_page('https://www.python.com/')

Handling https://www.python.com/


In [39]:
await asyncio.gather(*[crawl() for i in range(8)])

No more urls!
No more urls!
No more urls!
No more urls!
No more urls!
No more urls!
No more urls!
No more urls!


[None, None, None, None, None, None, None, None]

In [40]:
while queue.qsize():
    print(await queue.get())

# Lab

Open [AsyncIO Lab](asyncio-lab.ipynb)