# rendering dataframes for screen readers with `pandas`

in the document we'll explore what it takes to make `pandas.DataFrame`s accessible.
we'll follow [Paul J Adam's](https://pauljadam.com) instructions for making [Simple Data Tables](https://pauljadam.com/demos/data-tables.html#heading1).

In [1]:
    
    %reload_ext pidgy
    import pandas.io.formats.style, bs4, pytest
    soup = lambda x: bs4.BeautifulSoup(x, features="html.parser")

## scoping this document

in this document we only work with `df`. 

In [2]:
    (df := pandas.DataFrame(
        columns=pandas.Index(list("ABC")), 
        index=pandas.Index(range(2), name="index"))).style

Unnamed: 0_level_0,A,B,C
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,,,
1,,,


    (df := pandas.DataFrame(
        columns=pandas.Index(list("ABC")), 
        index=pandas.Index(range(2), name="index"))).style

In [3]:
it is a basic `pandas.DataFrame` because:    
    
* it has ONE row level 
* it has ONE column level 

> we'll deal with more complex shapes in other documents.

it is a basic `pandas.DataFrame` because:    
    
* it has ONE row level 
* it has ONE column level 

> we'll deal with more complex shapes in other documents.

In [4]:
## 1. Title of data table is inside the `<caption>` element.

this rule requires knowledge of the data and is context dependent

## 1. Title of data table is inside the `<caption>` element.

this rule requires knowledge of the data and is context dependent

In [5]:
    def assert_has_caption(object):
        caption = soup(object).select_one("table caption")
        assert caption and caption.string.strip(), "table is missing a <caption>"

    def assert_has_caption(object):
        caption = soup(object).select_one("table caption")
        assert caption and caption.string.strip(), "table is missing a <caption>"

In [6]:
    def set_caption(df, caption) -> pandas.io.formats.style.Styler:
        return df.style.set_caption(caption)
    
after we have a styler we are working with a subset of pandas operations.
the styler should be the last stop for the data.

    def set_caption(df, caption) -> pandas.io.formats.style.Styler:
        return df.style.set_caption(caption)
    
after we have a styler we are working with a subset of pandas operations.
the styler should be the last stop for the data.

In [7]:
    with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())
    assert_has_caption(
        captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
    )

    with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())
    assert_has_caption(
        captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
    )

In [8]:
## 2. Column headers are inside `<th scope="col">` elements.

## 2. Column headers are inside `<th scope="col">` elements.

In [9]:
    def assert_has_col_scope(object, selector="thead tr th"):
        assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"

    def assert_has_col_scope(object, selector="thead tr th"):
        assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"

In [10]:
    def set_col_scope(object, selector="thead tr th"):
        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "col")
        return str(object)

    def set_col_scope(object, selector="thead tr th"):
        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "col")
        return str(object)

In [11]:
    with pytest.raises(AssertionError): assert_has_col_scope(captioned)
    assert_has_col_scope(col_scoped := set_col_scope(captioned))
    HTML(col_scoped)

Unnamed: 0_level_0,A,B,C
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,,,
1,,,


    with pytest.raises(AssertionError): assert_has_col_scope(captioned)
    assert_has_col_scope(col_scoped := set_col_scope(captioned))
    HTML(col_scoped)

In [12]:
## 3. Row headers are inside `<th scope="row">` elements.

## 3. Row headers are inside `<th scope="row">` elements.

In [13]:
    def assert_has_row_scope(object, selector="tbody tr th"):
        assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"

    def assert_has_row_scope(object, selector="tbody tr th"):
        assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"

In [14]:
    def set_row_scope(object, selector="tbody tr th"):
        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "row")
        return str(object)

    def set_row_scope(object, selector="tbody tr th"):
        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "row")
        return str(object)

In [15]:
    with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)
    assert_has_row_scope(row_scoped := set_row_scope(col_scoped))

    with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)
    assert_has_row_scope(row_scoped := set_row_scope(col_scoped))

In [16]:
## 4. Avoid using blank header cells.

name your indexes

## 4. Avoid using blank header cells.

name your indexes

In [17]:
    def assert_no_blank_header(body):
        assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"

    def assert_no_blank_header(body):
        assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"

In [18]:
    with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)

    with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)

In [19]:
    def set_squashed_th(body):
        table = soup(body)
        tr = bs4.Tag(name="tr")
        col, row = table.select("thead tr")
        for top, bottom in zip(col.select("th"), row.select("th")):
            tr.append(top if top.string.strip() else bottom)
        thead = bs4.Tag(name="thead"); thead.append(tr)
        table.select_one("thead").replace_with(thead)
        return str(table)

    def set_squashed_th(body):
        table = soup(body)
        tr = bs4.Tag(name="tr")
        col, row = table.select("thead tr")
        for top, bottom in zip(col.select("th"), row.select("th")):
            tr.append(top if top.string.strip() else bottom)
        thead = bs4.Tag(name="thead"); thead.append(tr)
        table.select_one("thead").replace_with(thead)
        return str(table)

In [20]:
    assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))

    assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))

In [21]:
## 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

## 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

In [22]:
    def set_header_titles(body, titles):
        for th in (body := soup(body)).select("thead tr th"):
            name = th.string.strip()
            if name in titles:
                th.attrs["title"] = titles[name]
        return str(body)

    def set_header_titles(body, titles):
        for th in (body := soup(body)).select("thead tr th"):
            name = th.string.strip()
            if name in titles:
                th.attrs["title"] = titles[name]
        return str(body)

In [23]:
    titled = set_header_titles(squashed_th, dict(A="apple", B="banana", C="carrot"))

    titled = set_header_titles(squashed_th, dict(A="apple", B="banana", C="carrot"))

In [24]:
    def strip_class_ids(body):
        for e in (body := soup(body)).select("td, th"): 
            e.attrs.pop("id", None)
            e.attrs.pop("class", None)

        return str(body)
    final = strip_class_ids(titled)

    def strip_class_ids(body):
        for e in (body := soup(body)).select("td, th"): 
            e.attrs.pop("id", None)
            e.attrs.pop("class", None)

        return str(body)
    final = strip_class_ids(titled)

## conclusions

for basic dataframes:

* 2 3 4 are context free and can be fixed. there are upstream fixes that could be made too.
* 1 and 5 and context dependent and can't be autofixed.

## the final html

our final dataframe has:

- [x] `<caption>`
- [x] `<th scope="col">`
- [x] `<th scope="row">`
- [x] no empty `<th>`
- [x] `<th title>` for abbreviations

In [25]:
```html
{{final}}
```

```html
<style type="text/css">
</style>
<table id="T_1111b">
<caption>a value-less dataframe with columns and row indexes</caption>
<thead><tr><th scope="col">index</th><th scope="col" title="apple">A</th><th scope="col" title="banana">B</th><th scope="col" title="carrot">C</th></tr></thead>
<tbody>
<tr>
<th scope="row">0</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
<tr>
<th scope="row">1</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
</tbody>
</table>

```

https://pandas.pydata.org/docs/user_guide/style.html
https://pauljadam.com/demos/data-tables.html
https://www.w3.org/WAI/tutorials/tables/
https://developer.mozilla.org/en-US/docs/Learn/HTML/Tables/Advanced
