# extracting accessibility related patterns from ravelry

In [1]:
%%
## bootstrap our search for accessible ravelry patterns

there are several accessibility categories that need to be searched.
they are captured in the `searches` mapping where the key is the name/index
for the search and the value defines the query parameters.

    __import__("dotenv").load_dotenv()
    auth = (os.environ["RAVELRY_USERNAME"], os.environ["RAVELRY_PASSWORD"])
    import httpx    
    searches: dict =\
```toml
adaptive.pa = "adaptive"
"medical device access".pa = "medical-device-access"
"medical device support".pa = "medical-device-accessory"
"mobility aid support".pa = "mobility-aid-accesory"
other.pa = "other-add-accessibility"
"therapy aid/toy".pa = "therapy-aid"
medical.pc = "medical"
```

the `seed_urls` provide the `first_pages` returned from  our `searches`

    seed_urls = ("https://api.ravelry.com/patterns/search.json?" + Series(searches).apply(urllib.parse.urlencode))
    first_pages = seed_urls.apply(httpx.get, auth=auth)

{{first_pages.to_frame("responses").T._repr_html_()}}

Unnamed: 0,adaptive,medical device access,medical device support,mobility aid support,other,therapy aid/toy,medical
responses,<Response [200 OK]>,<Response [200 OK]>,<Response [200 OK]>,<Response [500 Internal Server Error]>,<Response [500 Internal Server Error]>,<Response [200 OK]>,<Response [200 OK]>


In [2]:
%%
### handling bad `searches`

not all of the queries are dialed in. the `searches` that don't return a 200 status code are a failure.

    status_codes = first_pages.attrgetter("status_code")
{{status_codes.rename_axis("search", axis=0).to_frame("status code").T._repr_html_()}}

{{first_pages.index[status_codes.ne(200)].map("<samp>{}</samp>".format) | join(", ")}} are not working properly yet.

search,adaptive,medical device access,medical device support,mobility aid support,other,therapy aid/toy,medical
status code,200,200,200,500,500,200,200


In [3]:
%%
## finding the missing paginated search information

from the `first_pages` were learn about possible results we missed in the `first_pattern.paginator` attribute in the payload.

    first_patterns = first_pages[status_codes.eq(200)].methodcaller("json").series()
    paginated = first_patterns.paginator.series()
    paginated = paginated[paginated.page_count.gt(1)].drop(columns="page")
    paginated = paginated.join(paginated.page_count.add(1).apply(compose(list, partial(range, 2))).explode().rename("page"))
from `paginated`, we can determine the other `searches` we need to make to complete our list of patterns;
the responses don't have much pattern information and we dig deeper in a few more calls.

    other_urls = seed_urls[paginated.index] + "&page=" + paginated.page.astype(str)
we leave with the `other_pages` that can be combined with the `first_pages` to provide
all of the `responses`.

    other_pages = other_urls.apply(httpx.get, auth=auth)
    responses = pandas.concat([first_pages, other_pages]).to_frame("response")

In [4]:
%%
### extract all of the `pattern_ids` we have information about

extract the good `responses` and parse their results into a table of `pattern_ids`

    responses = responses.assign(
        url=responses.response.attrgetter("url"),
        status_code=responses.response.attrgetter("status_code"),
    )
    responses = responses[responses.status_code.eq(200)]
    responses = responses.assign(**responses.response.methodcaller("json").series())

    pattern_ids = responses.patterns.explode().series().reset_index().drop_duplicates("id").set_index("id")

In [5]:
%%
## extract information about each actual pattern

we can use the `patterns.json` endpoint to each information about each pattern.
there doesn't seem to be an upper limit. we are bound by a time out so the number 
we partition on can take too long and fail to respond.

    pattern_urls = pipe(
        len(pattern_ids),
        range,
        partition(50),
        map(compose(
            "https://api.ravelry.com/patterns.json?ids={}".format, "+".join, map(compose(str, pattern_ids.index.__getitem__))
        )),list, Series)

    limit = 3 # limit the queries when reporting
    pattern_pages = pattern_urls.iloc[:limit].apply(httpx.get, auth=auth)

In [6]:
%%
## all of the pattern information

    patterns = pattern_pages.methodcaller("json").series().patterns.apply(compose(list, dict.values)).explode().series()
    

{{patterns.head().T.fillna("")._repr_html_()}}

Unnamed: 0,0,0.1,0.2,0.3,0.4
comments_count,6,23,23,13,32
created_at,2008/03/28 14:40:29 -0400,2008/05/17 10:10:10 -0400,2009/01/14 12:30:32 -0500,2009/04/04 14:42:17 -0400,2011/03/20 18:29:09 -0400
currency,,,USD,USD,
difficulty_average,2.5,2.788991,1.801282,1.811881,2.666667
difficulty_count,14.0,109.0,156.0,101.0,72.0
downloadable,True,True,True,False,True
favorites_count,1055,1978,2767,1937,2640
free,True,True,True,False,True
gauge,8.0,24.0,14.0,8.0,4.0
gauge_divisor,1.0,4.0,4.0,4.0,1.0
