# tidying up the wcag dataset

Recently there was a new data set, and I got excited. It is the web content accessibility guidelines as a structured JSON document this notebook tidied up that data and explores how it can be used with a testing system like playwright.

In [1]:
import requests, requests_cache
requests_cache.install_cache("xxx")

shell.display_formatter.formatters["text/html"].for_type(
    __import__("bs4").Tag, str
);

def tag(name="section", *children, **attrs):
    import bs4
    tag = bs4.Tag(name=name, attrs=attrs)
    tag.extend(children)
    return tag

def parse(object):
    return tag(
        "section", *__import__("bs4").BeautifulSoup(object, features="lxml").body.children
    )

In [2]:
%%
load in the new dataset referenced in [this readme](https://github.com/w3c/wcag/tree/main/11ty/json#readme())

    wcag = Series(requests.get("https://www.w3.org/WAI/WCAG22/wcag.json").json())

In [3]:
%%
## structuring the terms for ourselves as a definition list

{{dl}}

    terms = wcag.loc[["terms"]].explode().apply(Series)
    dl = tag("dl", *terms.apply(
        lambda x: [
            tag("dt", x["name"], id=x.id),
            tag("dd", *parse(x["definition"]).children),    
        ],
        axis=1
    ).explode())

In [11]:
%%
## unraveling the wcag `principles`

    principles = wcag.loc[["principles"]].explode().apply(Series).rename(
        columns=dict(num="principle")
    ).set_index("principle")
    guidelines = principles.pop("guidelines").explode().apply(Series).rename(
        columns=dict(num="guideline")
    ).set_index("guideline", append=True)
    success = guidelines.pop("successcriteria")    
    criteria = success.explode().apply(Series).rename(
        columns=dict(num="criteria")
    ).set_index("criteria", append=True)
    
    techniques = criteria.pop("techniques").apply(Series)
    sufficientNote = techniques.pop("sufficientNote")

### thoughts about the schema after unpacking it

* maybe its worth posting an issue or looking for a schema about this dataset.

the dataset should also carry reference to the schema. 

    draft: "schema"=\
```yaml
"@content": https://www.w3.org/TR/WCAG22/
properties:
    version: 
        description: the wcag version 
        enum:
        - "2.0"
        - "2.1"
        - "3.0"
    namespace:
        format: uri
        example: https://www.w3.org/TR/WCAG22/
    principles:
        $ref: "$/$defs/principles"
    terms:
        $ref: "$/$defs/terms"
$defs:
    # there probably be a version
    principles:
        type: array
    terms:
        type: array

```

## connecting our new dataset with axe.

Running axes on a website returns some information about which web content, accessibility guideline it applies to. If we can extract that we can have supplementary information to support our development work locally. Data sounds like the one we're exploring are nice to have because they can enrich developers local productivity without having to go to the web. That's why it's nice to experiment with making different notes and representations of the data because at different parts of the testing process, different scales of concern and required.

example: this runs a test on my homepage which i dont test for accessibility.

In [5]:
    AXE = requests.get("https://cdnjs.cloudflare.com/ajax/libs/axe-core/4.10.3/axe.min.js").text

In [6]:
import playwright.async_api
async with playwright.async_api.async_playwright() as play:
    browser  = await play.firefox.launch()
    page = await browser.new_page()
    await page.goto("https://tonyfast.github.io/tonyfast/")        
    await page.evaluate(AXE)
    results = await page.evaluate("axe.run()")

Structure the ax test results as the data frame and start extracting which with web content access accessibility guidelines, we need to consider in the scope of our problem.

In [7]:
    results = Series(results).loc[
        "passes violations incomplete".split()
    ].explode().apply(Series)
    tags = results.tags.explode().drop_duplicates()

In [8]:
    revelent = tags[tags.str.contains("^wcag.*[0-9]$")].str.removeprefix("wcag").apply(
        lambda x: ".".join([x[0], x[1], x[2:]])
    )
    slice = criteria.loc[pandas.MultiIndex.from_arrays([
        revelent.str.rpartition(".")[0].str.rpartition(".")[0], revelent.str.rpartition(".")[0], revelent
    ])]

### we show the relevent success criteria to our test site

In [9]:
heaven = tag()
for c, row in slice.iterrows():
    heaven.append(
        tag(
            "section",
            tag("hgroup", 
                tag("h4", c[2], " ", row.handle),
                tag("a", "external link", href="https://www.w3.org/TR/WCAG22/#"+row.id)
            ),
            *parse(row.content).children
        )
    )

heaven

## custom html representation

In [10]:
principles = principles.assign(html=None)
for p, principled in criteria.groupby("principle"):
    p = principles.loc[p]
    principles.update(Series([tag(
        "section",
        tag("hgroup", 
            tag("h2", p.name, " ",  p.handle),
            tag("a", "external link", href="https://www.w3.org/TR/WCAG22/#"+p.id)
        ),
        *parse(p.content).children
    )], [p.name], None, "html"))
    for g, row in guidelines.loc[p.name].iterrows(): 
        principles.loc[p.name, "html"].append(
            tag(
                "section",
                tag("hgroup", 
                    tag("h3", g, " ",  row.handle),
                    tag("a", "external link", href="https://www.w3.org/TR/WCAG22/#"+row.id)
                ),
                *parse(row.content).children
            )
        )
        for c, row in criteria.loc[p.name, g].iterrows():
            principles.loc[p.name, "html"].append(
                tag(
                    "section",
                    tag("hgroup", 
                        tag("h4", c, " ", row.handle),
                        tag("a", "external link", href="https://www.w3.org/TR/WCAG22/#"+row.id)
                    ),
                    *parse(row.content).children
                )
            )
            

display(*principles.html.astype(str).apply(HTML))