# rendering dataframes for screen readers with `pandas`

in the document we'll explore what it takes to make `pandas.DataFrame`s accessible.
we'll follow [Paul J Adam's](https://pauljadam.com) instructions for making [Simple Data Tables](https://pauljadam.com/demos/data-tables.html#heading1).

In [1]:
    
    %pip install pandas beautifulsoup4 pidgy pytest jinja2
    %reload_ext pidgy
    import pandas.io.formats.style, bs4, pytest
    shell.weave.reactive = False
    shell.weave.use_async = False
    soup = lambda x: bs4.BeautifulSoup(x, features="html.parser")

<div hidden>

    
    %reload_ext pidgy
    import pandas.io.formats.style, bs4, pytest
    shell.weave.reactive = False
    shell.weave.use_async = False
    soup = lambda x: bs4.BeautifulSoup(x, features="html.parser")

</div>

In [2]:
    (df := pandas.DataFrame(
        columns=pandas.Index(list("ABC")), 
        index=pandas.Index(range(2), name="index"))
    ).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")

Unnamed: 0_level_0,A,B,C
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,,,
1,,,


    (df := pandas.DataFrame(
        columns=pandas.Index(list("ABC")), 
        index=pandas.Index(range(2), name="index"))
    ).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")

In [3]:
`df` is a basic `pandas.DataFrame` because:    
    
* it has ONE row level 
* it has ONE column level 

we do not need to consider the contents of the cells for the recommendations we are implementing.

`df` is a basic `pandas.DataFrame` because:    
    
* it has ONE row level 
* it has ONE column level 

we do not need to consider the contents of the cells for the recommendations we are implementing.

In [4]:
## applying best practices  to pandas

## applying best practices  to pandas

In [5]:
### 1. Title of data table is inside the `<caption>` element.

the caption is dependent on the data and effects the visual appearance of the table. 
it is context dependent up to the author to supply.

### 1. Title of data table is inside the `<caption>` element.

the caption is dependent on the data and effects the visual appearance of the table. 
it is context dependent up to the author to supply.

In [6]:
    def assert_has_caption(object):
        caption = soup(object).select_one("table caption")
        assert caption and caption.string.strip(), "table is missing a <caption>"
    # with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())

    def assert_has_caption(object):
        caption = soup(object).select_one("table caption")
        assert caption and caption.string.strip(), "table is missing a <caption>"
    # with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())

the caption can be set using the `pandas.DataFrame.style.set_caption` method.
the result is a `pandas.io.formats.style.Styler` object that let's us modify how the
instance is displayed.

In [7]:
`set_caption` user the `pandas.DataFrame.style` attribute to set a caption

`set_caption` user the `pandas.DataFrame.style` attribute to set a caption

In [8]:
    def set_caption(df, caption) -> pandas.io.formats.style.Styler:
        return df.style.set_caption(caption)
    
after we have a styler we are working with a subset of pandas operations.
the styler should be the last stop for the data.

    def set_caption(df, caption) -> pandas.io.formats.style.Styler:
        return df.style.set_caption(caption)
    
after we have a styler we are working with a subset of pandas operations.
the styler should be the last stop for the data.

In [9]:
the `<table>` below demonstrates how the `<caption>` appears on a `captioned` `pandas.DataFrame`.

{{captioned}}
    
    assert_has_caption(
        captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
    );

the `<table>` below demonstrates how the `<caption>` appears on a `captioned` `pandas.DataFrame`.

<style type="text/css">
</style>
<table id="T_595b7">
  <caption>a value-less dataframe with columns and row indexes</caption>
  <thead>
    <tr>
      <th class="blank level0" >&nbsp;</th>
      <th id="T_595b7_level0_col0" class="col_heading level0 col0" >A</th>
      <th id="T_595b7_level0_col1" class="col_heading level0 col1" >B</th>
      <th id="T_595b7_level0_col2" class="col_heading level0 col2" >C</th>
    </tr>
    <tr>
      <th class="index_name level0" >index</th>
      <th class="blank col0" >&nbsp;</th>
      <th class="blank col1" >&nbsp;</th>
      <th class="blank col2" >&nbsp;</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th id="T_595b7_level0_row0" class="row_heading level0 row0" >0</th>
      <td id="T_595b7_row0_col0" class="data row0 col0" >nan</td>
      <td id="T_595b7_row0_col1" class="data row0 col1" >nan</td>
      <td id="T_595b7_row0_col2" class="data row0 col2" >nan</td>
    </tr>
    <tr>
      <th id="T_595b7_level0_row1" class="row_heading level0 row1" >1</th>
      <td id="T_595b7_row1_col0" class="data row1 col0" >nan</td>
      <td id="T_595b7_row1_col1" class="data row1 col1" >nan</td>
      <td id="T_595b7_row1_col2" class="data row1 col2" >nan</td>
    </tr>
  </tbody>
</table>

    
    assert_has_caption(
        captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
    );

In [10]:
#### `pandas.io.formats.style.Styler`

`pandas.io.formats.style.Styler` gives a different html representation that `_repr_html_`

    assert df.style.to_html() != df.to_html()


#### `pandas.io.formats.style.Styler`

`pandas.io.formats.style.Styler` gives a different html representation that `_repr_html_`

    assert df.style.to_html() != df.to_html()

In [11]:
### 2. Column headers are inside `<th scope="col">` elements.

the `scope` property improves navigation for screen readers when used properly.

### 2. Column headers are inside `<th scope="col">` elements.

the `scope` property improves navigation for screen readers when used properly.

In [12]:
    def assert_has_col_scope(object, selector="thead tr th"):
for basic frames, all the `<th>` tags in `<thead>` should have `scope="col"`
        
        assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_col_scope(captioned)

    def assert_has_col_scope(object, selector="thead tr th"):
for basic frames, all the `<th>` tags in `<thead>` should have `scope="col"`
        
        assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_col_scope(captioned)

In [13]:
    def set_col_scope(object, selector="thead tr th"):
`set_col_scope` automatically remediates missing columns `scope`s
        
        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "col")
        return str(object)

    def set_col_scope(object, selector="thead tr th"):
`set_col_scope` automatically remediates missing columns `scope`s
        
        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "col")
        return str(object)

In [14]:
these `scope` has no visual effect, however we can use `assert_has_col_scope` to verify the scope is correct.

    assert_has_col_scope(col_scoped := set_col_scope(captioned));

these `scope` has no visual effect, however we can use `assert_has_col_scope` to verify the scope is correct.

    assert_has_col_scope(col_scoped := set_col_scope(captioned));

In [15]:
### 3. Row headers are inside `<th scope="row">` elements.

similar to `scope="col"`, the `<th>` elements in the body require `scope="row"`;
basically every `<th>` needs the scope property.

### 3. Row headers are inside `<th scope="row">` elements.

similar to `scope="col"`, the `<th>` elements in the body require `scope="row"`;
basically every `<th>` needs the scope property.

In [16]:
    def assert_has_row_scope(object, selector="tbody tr th"):
        assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
        "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)

    def assert_has_row_scope(object, selector="tbody tr th"):
        assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
        "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)

In [17]:
    def set_row_scope(object, selector="tbody tr th, tfoot tr th"):
like the columns, we can deterministically add `scope="row"`

        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "row")
        return str(object)

    def set_row_scope(object, selector="tbody tr th, tfoot tr th"):
like the columns, we can deterministically add `scope="row"`

        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "row")
        return str(object)

In [18]:
we use `set_row_scope` to verify the `scope` because, again, there aren't any visual effects to these changes.

    assert_has_row_scope(row_scoped := set_row_scope(col_scoped))

we use `set_row_scope` to verify the `scope` because, again, there aren't any visual effects to these changes.

    assert_has_row_scope(row_scoped := set_row_scope(col_scoped))

In [19]:
### 4. Avoid using blank header cells.

this instruction to yield a best practice to __name the dataframe index__.
without a name, the index `<th>` will always be empty.

### 4. Avoid using blank header cells.

this instruction to yield a best practice to __name the dataframe index__.
without a name, the index `<th>` will always be empty.

In [20]:
    def assert_no_blank_header(body):
        assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"

    def assert_no_blank_header(body):
        assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"

In [21]:
    # with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)

    # with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)

In [22]:
`pandas` requires some upstream work to satisfy this instruction.

`pandas` requires some upstream work to satisfy this instruction.

In [23]:
    def set_squashed_th(body):
`set_squashed_th` squashes the table column names. this method on works for the most basic dataframes.
        
        table = soup(body)
        tr = bs4.Tag(name="tr")
        col, row = table.select("thead tr")
        for top, bottom in zip(col.select("th"), row.select("th")):
            tr.append(top if top.string.strip() else bottom)
        thead = bs4.Tag(name="thead"); thead.append(tr)
        table.select_one("thead").replace_with(thead)
        return str(table)

    def set_squashed_th(body):
`set_squashed_th` squashes the table column names. this method on works for the most basic dataframes.
        
        table = soup(body)
        tr = bs4.Tag(name="tr")
        col, row = table.select("thead tr")
        for top, bottom in zip(col.select("th"), row.select("th")):
            tr.append(top if top.string.strip() else bottom)
        thead = bs4.Tag(name="thead"); thead.append(tr)
        table.select_one("thead").replace_with(thead)
        return str(table)

In [24]:
the frame below doesn't have empty `<th>` elements and is denser.

{{squashed_th}}
    
    assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))

the frame below doesn't have empty `<th>` elements and is denser.

<style type="text/css">
</style>
<table id="T_595b7">
<caption>a value-less dataframe with columns and row indexes</caption>
<thead><tr><th class="index_name level0" scope="col">index</th><th class="col_heading level0 col0" id="T_595b7_level0_col0" scope="col">A</th><th class="col_heading level0 col1" id="T_595b7_level0_col1" scope="col">B</th><th class="col_heading level0 col2" id="T_595b7_level0_col2" scope="col">C</th></tr></thead>
<tbody>
<tr>
<th class="row_heading level0 row0" id="T_595b7_level0_row0" scope="row">0</th>
<td class="data row0 col0" id="T_595b7_row0_col0">nan</td>
<td class="data row0 col1" id="T_595b7_row0_col1">nan</td>
<td class="data row0 col2" id="T_595b7_row0_col2">nan</td>
</tr>
<tr>
<th class="row_heading level0 row1" id="T_595b7_level0_row1" scope="row">1</th>
<td class="data row1 col0" id="T_595b7_row1_col0">nan</td>
<td class="data row1 col1" id="T_595b7_row1_col1">nan</td>
<td class="data row1 col2" id="T_595b7_row1_col2">nan</td>
</tr>
</tbody>
</table>

    
    assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))

In [25]:
### 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

the naming of columns is a context specific screen reader feature.
authors would have to add this information themselves.

### 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

the naming of columns is a context specific screen reader feature.
authors would have to add this information themselves.

In [26]:
    titles = dict(A="apple", B="banana", C="carrot")
    def set_header_titles(body, titles=titles):
`set_header_titles` is a method that includes the name of the abbreviation in the `title`.
        
        for th in (body := soup(body)).select("thead tr th"):
            name = th.string.strip()
            if name in titles:
                th.attrs["title"] = titles[name]
        return str(body)
    
> a real dataset would make more sense in this example.

    titles = dict(A="apple", B="banana", C="carrot")
    def set_header_titles(body, titles=titles):
`set_header_titles` is a method that includes the name of the abbreviation in the `title`.
        
        for th in (body := soup(body)).select("thead tr th"):
            name = th.string.strip()
            if name in titles:
                th.attrs["title"] = titles[name]
        return str(body)
    
> a real dataset would make more sense in this example.

In [27]:
    titled = set_header_titles(squashed_th, titles)

    titled = set_header_titles(squashed_th, titles)

In [28]:
    def strip_class_ids(body):
`pandas` adds classes and ids to elements that for the sake of this discussion are superfluous.

        for e in (body := soup(body)).select("td, th"): 
            e.attrs.pop("id", None)
            e.attrs.pop("class", None)

        return str(body)
    final = strip_class_ids(titled)

    def strip_class_ids(body):
`pandas` adds classes and ids to elements that for the sake of this discussion are superfluous.

        for e in (body := soup(body)).select("td, th"): 
            e.attrs.pop("id", None)
            e.attrs.pop("class", None)

        return str(body)
    final = strip_class_ids(titled)

In [29]:
## inconsistencies in labelled indexes

everything goes to hell with `named_column` which as a column name.

    named_column = df.copy()
    named_column.columns.name = "letters"

## inconsistencies in labelled indexes

everything goes to hell with `named_column` which as a column name.

    named_column = df.copy()
    named_column.columns.name = "letters"

In [30]:
### column and index names

consider the case of `df2` where the `df2.columns` is named and `df2.index` is not. 
    
{% set df = named_column.style.set_caption(pidgy.filters.md("the `named_column` dataframe with a column name")) %}
{{df}}

screenreader visitors may struggle to interpret the meaning of "letters" relative to the index.
ensurely a proper experience for screen readers will require extra markup to group the `<th> with the columns. 

### column and index names

consider the case of `df2` where the `df2.columns` is named and `df2.index` is not. 
    

<style type="text/css">
</style>
<table id="T_8b93e">
  <caption><p>the <code>named_column</code> dataframe with a column name</p>
</caption>
  <thead>
    <tr>
      <th class="index_name level0" >letters</th>
      <th id="T_8b93e_level0_col0" class="col_heading level0 col0" >A</th>
      <th id="T_8b93e_level0_col1" class="col_heading level0 col1" >B</th>
      <th id="T_8b93e_level0_col2" class="col_heading level0 col2" >C</th>
    </tr>
    <tr>
      <th class="index_name level0" >index</th>
      <th class="blank col0" >&nbsp;</th>
      <th class="blank col1" >&nbsp;</th>
      <th class="blank col2" >&nbsp;</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th id="T_8b93e_level0_row0" class="row_heading level0 row0" >0</th>
      <td id="T_8b93e_row0_col0" class="data row0 col0" >nan</td>
      <td id="T_8b93e_row0_col1" class="data row0 col1" >nan</td>
      <td id="T_8b93e_row0_col2" class="data row0 col2" >nan</td>
    </tr>
    <tr>
      <th id="T_8b93e_level0_row1" class="row_heading level0 row1" >1</th>
      <td id="T_8b93e_row1_col0" class="data row1 col0" >nan</td>
      <td id="T_8b93e_row1_col1" class="data row1 col1" >nan</td>
      <td id="T_8b93e_row1_col2" class="data row1 col2" >nan</td>
    </tr>
  </tbody>
</table>


screenreader visitors may struggle to interpret the meaning of "letters" relative to the index.
ensurely a proper experience for screen readers will require extra markup to group the `<th> with the columns. 

In [31]:
### column name and no index name

    named_column_no_index = named_column.copy()
    named_column_no_index.index.name = None
    
{% set df = named_column_no_index.style.set_caption(pidgy.filters.md("the `named_column_no_index` dataframe with a column name and without an index name")) %}
{{df}}

in this conformation, it is possible for a screen reader to misinterpet `letters` as the name of the index column.
when the column index is named, like `letters`, the entry should be `<th scope="row">`. 
it this example we can see how instructions in 2 and 3 differ.

### column name and no index name

    named_column_no_index = named_column.copy()
    named_column_no_index.index.name = None
    

<style type="text/css">
</style>
<table id="T_3d1a0">
  <caption><p>the <code>named_column_no_index</code> dataframe with a column name and without an index name</p>
</caption>
  <thead>
    <tr>
      <th class="index_name level0" >letters</th>
      <th id="T_3d1a0_level0_col0" class="col_heading level0 col0" >A</th>
      <th id="T_3d1a0_level0_col1" class="col_heading level0 col1" >B</th>
      <th id="T_3d1a0_level0_col2" class="col_heading level0 col2" >C</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th id="T_3d1a0_level0_row0" class="row_heading level0 row0" >0</th>
      <td id="T_3d1a0_row0_col0" class="data row0 col0" >nan</td>
      <td id="T_3d1a0_row0_col1" class="data row0 col1" >nan</td>
      <td id="T_3d1a0_row0_col2" class="data row0 col2" >nan</td>
    </tr>
    <tr>
      <th id="T_3d1a0_level0_row1" class="row_heading level0 row1" >1</th>
      <td id="T_3d1a0_row1_col0" class="data row1 col0" >nan</td>
      <td id="T_3d1a0_row1_col1" class="data row1 col1" >nan</td>
      <td id="T_3d1a0_row1_col2" class="data row1 col2" >nan</td>
    </tr>
  </tbody>
</table>


in this conformation, it is possible for a screen reader to misinterpet `letters` as the name of the index column.
when the column index is named, like `letters`, the entry should be `<th scope="row">`. 
it this example we can see how instructions in 2 and 3 differ.

## conclusions

In [32]:
for basic dataframes, two practices can be enforced without knowledge of the data:

- [x] Column headers are inside <th scope="col"> elements.
- [x] Row headers are inside <th scope="row"> elements.
    
    
the abbreviations and caption are context specific and require knowledge of the data:

- [ ] Title of data table is inside the <caption> element.
- [ ] Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.
    
with `pandas<={{pandas.__version__}}`, it is hard to avoid black header cells without some significant effort.

    
- [ ] Avoid using blank header cells.

    
some conventions we can extract from this study is:
    
* treat the `df.index` as a column that needs to be named
* if an index is superfluous then remove it. this can be down with the styler `df.style.hide(axis=0)`

for basic dataframes, two practices can be enforced without knowledge of the data:

- [x] Column headers are inside <th scope="col"> elements.
- [x] Row headers are inside <th scope="row"> elements.
    
    
the abbreviations and caption are context specific and require knowledge of the data:

- [ ] Title of data table is inside the <caption> element.
- [ ] Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.
    
with `pandas<=1.4.2`, it is hard to avoid black header cells without some significant effort.

    
- [ ] Avoid using blank header cells.

    
some conventions we can extract from this study is:
    
* treat the `df.index` as a column that needs to be named
* if an index is superfluous then remove it. this can be down with the styler `df.style.hide(axis=0)`

In [33]:
### final frame

our final dataframe has:

- [x] `<caption>`
- [x] `<th scope="col">`
- [x] `<th scope="row">`
- [x] no empty `<th>`
- [x] `<th title>` for abbreviations

{{final}}


### final html source

```html
{{final}}
```

### final frame

our final dataframe has:

- [x] `<caption>`
- [x] `<th scope="col">`
- [x] `<th scope="row">`
- [x] no empty `<th>`
- [x] `<th title>` for abbreviations

<style type="text/css">
</style>
<table id="T_595b7">
<caption>a value-less dataframe with columns and row indexes</caption>
<thead><tr><th scope="col">index</th><th scope="col" title="apple">A</th><th scope="col" title="banana">B</th><th scope="col" title="carrot">C</th></tr></thead>
<tbody>
<tr>
<th scope="row">0</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
<tr>
<th scope="row">1</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
</tbody>
</table>



### final html source

```html
<style type="text/css">
</style>
<table id="T_595b7">
<caption>a value-less dataframe with columns and row indexes</caption>
<thead><tr><th scope="col">index</th><th scope="col" title="apple">A</th><th scope="col" title="banana">B</th><th scope="col" title="carrot">C</th></tr></thead>
<tbody>
<tr>
<th scope="row">0</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
<tr>
<th scope="row">1</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
</tbody>
</table>

```

In [34]:
### about the `scope` attribute

<q cite="https://www.w3schools.com/tags/att_scope.asp">The scope attribute specifies whether a header cell is a header for a column, row, or group of columns or rows.</q>

<q cite="https://dequeuniversity.com/rules/axe/4.0/scope-attr-valid">The scope attribute makes table navigation much easier for screen reader users, provided that it is used correctly. Incorrectly used, scope can make table navigation much harder and less efficient. <q>

### about the `scope` attribute

<q cite="https://www.w3schools.com/tags/att_scope.asp">The scope attribute specifies whether a header cell is a header for a column, row, or group of columns or rows.</q>

<q cite="https://dequeuniversity.com/rules/axe/4.0/scope-attr-valid">The scope attribute makes table navigation much easier for screen reader users, provided that it is used correctly. Incorrectly used, scope can make table navigation much harder and less efficient. <q>

### links

* https://pandas.pydata.org/docs/user_guide/style.html
* https://pauljadam.com/demos/data-tables.html
* https://www.w3.org/WAI/tutorials/tables/
* https://developer.mozilla.org/en-US/docs/Learn/HTML/Tables/Advanced
