# Style

This section demonstrates visualization of tabular data using the pandas `style` methods.


## Imports

In [None]:
import pandas as pd
import numpy as np

import ipypandas # enables ipypandas output 


## Styler

### Formatting Values

The [Styler][styler] distinguishes the *display* value from the *actual* value, in both data values and index or columns headers. To control the display value, the text is printed in each cell as a string, and we can use the [.format()][formatfunc] and [.format_index()][formatfuncindex] methods to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. We can also overwrite index names.

Additionally, the format function has a **precision** argument to specifically help format floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** and **hyperlinks** arguments to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' global options such as `styler.format.precision` option, controllable using `with pd.option_context('format.precision', 2):`

[styler]: ../reference/api/pandas.io.formats.style.Styler.rst
[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language
[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst
[formatfuncindex]: ../reference/api/pandas.io.formats.style.Styler.format_index.rst
[relabelfunc]: ../reference/api/pandas.io.formats.style.Styler.relabel_index.rst

In [None]:
df = pd.DataFrame({
    "strings": ["Adam", "Mike"],
    "ints": [1, 3],
    "floats": [1.123, 1000.23]
})

df.style \
  .format(precision=3, thousands=".", decimal=",") \
  .format_index(str.upper, axis=1) \
  .relabel_index(["row 1", "row 2"], axis=0)


Using Styler to manipulate the display is a useful feature because maintaining the indexing and data values for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is a more comprehensive example of using the formatting functions whilst still relying on the underlying data for indexing and calculations.

<font color="red">
    <ul>
        <li>Crashes on .pipe()</li>
    </ul>
</font>

In [None]:
weather_df = pd.DataFrame(np.random.rand(10,2)*5, 
                          index=pd.date_range(start="2021-01-01", periods=10),
                          columns=["Tokyo", "Beijing"])

def rain_condition(v): 
    if v < 1.75:
        return "Dry"
    elif v < 2.75:
        return "Rain"
    return "Heavy Rain"

def make_pretty(styler):
    styler.set_caption("Weather Conditions")
    styler.format(rain_condition)
    styler.format_index(lambda v: v.strftime("%A"))
    styler.background_gradient(axis=None, vmin=1, vmax=5, cmap="YlGnBu")
    return styler

weather_df.style\
    .pipe(make_pretty)

weather_df

### Hiding Data

The index and column headers can be completely hidden, as well subselecting rows or columns that one wishes to exclude. Both these options are performed using the same methods.

The index can be hidden from rendering by calling [.hide()][hideidx] without any arguments, which might be useful if your index is integer based. Similarly column headers can be hidden by calling [.hide(axis="columns")][hideidx] without any further arguments.

Specific rows or columns can be hidden from rendering by calling the same [.hide()][hideidx] method and passing in a row/column label, a list-like or a slice of row/column labels to for the ``subset`` argument.

Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will still start at `col2`, since `col0` and `col1` are simply ignored.

[hideidx]: ../reference/api/pandas.io.formats.style.Styler.hide.rst

In [None]:
df = pd.DataFrame(np.random.randn(5, 5))

df.style \
  .hide(subset=[0, 2, 4], axis=0) \
  .hide(subset=[0, 2, 4], axis=1)


### Styling CSS

Below we demonstrate the default output, which looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven't yet created any styles. We can view these by calling the  [.to_html()][tohtml] method, which returns the raw HTML as string, which is useful for further processing or adding to a file - read on in [More about CSS and HTML](#More-About-CSS-and-HTML). This section will also provide a walkthrough for how to convert this default output to represent a DataFrame output that is more communicative. For example how we can build `s`:

[tohtml]: ../reference/api/pandas.io.formats.style.Styler.to_html.rst

[styler]: ../reference/api/pandas.io.formats.style.Styler.rst
[w3schools]: https://www.w3schools.com/html/html_tables.asp

In [None]:
df = pd.DataFrame([[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]], 
                  index=pd.Index(['Tumour (Positive)', 'Non-Tumour (Negative)'], name='Actual Label:'), 
                  columns=pd.MultiIndex.from_product([['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']))

df.style


In [None]:
# Hidden cell to just create the below example: code is covered throughout the guide.
s = df.style\
      .hide([('Random', 'Tumour'), ('Random', 'Non-Tumour')], axis='columns')\
      .format('{:.0f}')\
      .set_table_styles([{
        'selector': '',
        'props':  'border-collapse: separate;'
      },{
        'selector': 'caption',
        'props': 'caption-side: bottom; font-size:1.3em;'
      },{
        'selector': '.index_name',
        'props': 'font-style: italic; color: darkgrey; font-weight:normal;'
      },{
        'selector': 'th:not(.index_name)',
        'props': 'background-color: #000066; color: white;'
      },{
        'selector': 'th.col_heading',
        'props': 'text-align: center;'
      },{
        'selector': 'th.col_heading.level0',
        'props': 'font-size: 1.5em;'
      },{
        'selector': 'th.col2',
        'props': 'border-left: 1px solid white;'
      },{
        'selector': '.col2',
        'props': 'border-left: 1px solid #000066;'
      },{
        'selector': 'td',
        'props': 'text-align: center; font-weight:bold;'
      },{
        'selector': '.true',
        'props': 'background-color: #e6ffe6;'
      },{
        'selector': '.false',
        'props': 'background-color: #ffe6e6;'
      },{
        'selector': '.border-red',
        'props': 'border: 2px dashed red;'
      },{
        'selector': '.border-green',
        'props': 'border: 2px dashed green;'
      },{
        'selector': 'td:hover',
        'props': 'background-color: #ffffb3;'
      }])\
      .set_td_classes(pd.DataFrame([['true border-green', 'false', 'true', 'false border-red', '', ''],
                                    ['false', 'true', 'false', 'true', '', '']], 
                                    index=df.index, columns=df.columns))\
      .set_caption("Confusion matrix for multiple cancer prediction models.")\
      .set_tooltips(pd.DataFrame([['This model has a very strong true positive rate', '', '', "This model's total number of false negatives is too high", '', ''],
                                    ['', '', '', '', '', '']], 
                                    index=df.index, columns=df.columns),
                   css_class='pd-tt', props=
    'visibility: hidden; position: absolute; z-index: 1; border: 1px solid #000066;'
    'background-color: white; color: #000066; font-size: 0.8em;' 
    'transform: translate(0px, -24px); padding: 0.6em; border-radius: 0.5em;')

s


## Functions

### Content

We use the following methods to pass your style functions. Both of those methods take a function (and some other keyword arguments) and apply it to the DataFrame in a certain way, rendering CSS styles.

- [.map()][map] (elementwise): accepts a function that takes a single value and returns a string with the CSS attribute-value pair.
- [.apply()][apply] (column-/row-/table-wise): accepts a function that takes a Series or DataFrame and returns a Series, DataFrame, or numpy array with an identical shape where each element is a string with a CSS attribute-value pair. This method passes each column or row of your DataFrame one-at-a-time or the entire table at once, depending on the `axis` keyword argument. For columnwise use `axis=0`, rowwise use `axis=1`, and for the entire table at once use `axis=None`.

This method is powerful for applying multiple, complex logic to data cells. We create a new DataFrame to demonstrate this.

[apply]: ../reference/api/pandas.io.formats.style.Styler.apply.rst
[map]: ../reference/api/pandas.io.formats.style.Styler.map.rst

In [None]:
np.random.seed(0)

df2 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])
df2.style


For example we can build a function that colors text if it is negative, and chain this with a function that partially fades cells of negligible value. Since this looks at each element in turn we use ``map``.

In [None]:
def style_negative(v, props=''):
    return props if v < 0 else None
    
s2 = df2.style\
    .map(style_negative, props='color:red;')\
    .map(lambda v: 'opacity: 20%;' if (v < 0.3) and (v > -0.3) else None)
s2


In [None]:
# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting 
s2.set_uuid('after_applymap')


We can also build a function that highlights the maximum value across rows, cols, and the DataFrame all at once. In this case we use ``apply``. Below we highlight the maximum in a column.

In [None]:
def highlight_max(s, props=''):
    return np.where(s == np.nanmax(s.values), props, '')
    
s2.apply(highlight_max, props='color:white;background-color:darkblue', axis=0)


In [None]:
# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting 
s2.set_uuid('after_apply')


We can use the same function across the different axes, highlighting here the DataFrame maximum in purple, and row maximums in pink.

In [None]:
s2.apply(highlight_max, props='color:white;background-color:pink;', axis=1)\
  .apply(highlight_max, props='color:white;background-color:purple', axis=None)


In [None]:
# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting 
s2.set_uuid('after_apply_again')


This last example shows how some styles have been overwritten by others. In general the most recent style applied is active but you can read more in the [section on CSS hierarchies](#CSS-Hierarchies). You can also apply these styles to more granular parts of the DataFrame - read more in section on [subset slicing](#Finer-Control-with-Slicing).

It is possible to replicate some of this functionality using just classes but it can be more cumbersome. See [item 3)  of Optimization](#Optimization)

<div class="alert alert-info">

*Debugging Tip*: If you're having trouble writing your style function, try just passing it into ``DataFrame.apply``. Internally, ``Styler.apply`` uses ``DataFrame.apply`` so the result should be the same, and with ``DataFrame.apply`` you will be able to inspect the CSS string output of your intended function in each cell.

</div>

### Headers

Similar application is achieved for headers by using:
    
- [.map_index()][mapindex] (elementwise): accepts a function that takes a single value and returns a string with the CSS attribute-value pair.
- [.apply_index()][applyindex] (level-wise): accepts a function that takes a Series and returns a Series, or numpy array with an identical shape where each element is a string with a CSS attribute-value pair. This method passes each level of your Index one-at-a-time. To style the index use `axis=0` and to style the column headers use `axis=1`.

You can select a `level` of a `MultiIndex` but currently no similar `subset` application is available for these methods.

[applyindex]: ../reference/api/pandas.io.formats.style.Styler.apply_index.rst
[mapindex]: ../reference/api/pandas.io.formats.style.Styler.map_index.rst

<font color="red">
    <ul>
        <li>Column changes color after sorting</li>
    </ul>
</font>

In [None]:
s2.map_index(lambda v: "color:pink;" if v>4 else "color:darkblue;", axis=0)
s2.apply_index(lambda s: np.where(s.isin(["A", "B"]), "color:pink;", "color:darkblue;"), axis=1)

### Slicing


The examples we have shown so far for the `Styler.apply` and `Styler.map` functions have not demonstrated the use of the ``subset`` argument. This is a useful argument which permits a lot of flexibility: it allows you to apply styles to specific rows or columns, without having to code that logic into your `style` function.

The value passed to `subset` behaves similar to slicing a DataFrame;

- A scalar is treated as a column label
- A list (or Series or NumPy array) is treated as multiple column labels
- A tuple is treated as `(row_indexer, column_indexer)`

Consider using `pd.IndexSlice` to construct the tuple for the last one. We will create a MultiIndexed DataFrame to demonstrate the functionality.

In [None]:
df3 = pd.DataFrame(np.random.randn(4,4), 
                   pd.MultiIndex.from_product([['A', 'B'], ['r1', 'r2']]),
                   columns=['c1','c2','c3','c4'])
df3

We will use subset to highlight the maximum in the third and fourth columns with red text. We will highlight the subset sliced region in yellow.

In [None]:
slice_ = ['c3', 'c4']
df3.style.apply(highlight_max, props='color:red;', axis=0, subset=slice_)\
         .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)

If combined with the ``IndexSlice`` as suggested then it can index across both dimensions with greater flexibility.

In [None]:
idx = pd.IndexSlice
slice_ = idx[idx[:,'r1'], idx['c2':'c4']]
df3.style.apply(highlight_max, props='color:red;', axis=0, subset=slice_)\
         .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)

This also provides the flexibility to sub select rows when used with the `axis=1`.

In [None]:
slice_ = idx[idx[:,'r2'], :]
df3.style.apply(highlight_max, props='color:red;', axis=1, subset=slice_)\
         .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)

There is also scope to provide **conditional filtering**. 

Suppose we want to highlight the maximum across columns 2 and 4 only in the case that the sum of columns 1 and 3 is less than -2.0 *(essentially excluding rows* `(:,'r2')`*)*.

In [None]:
slice_ = idx[idx[(df3['c1'] + df3['c3']) < -2.0], ['c2', 'c4']]
df3.style.apply(highlight_max, props='color:red;', axis=1, subset=slice_)\
         .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)

Only label-based slicing is supported right now, not positional, and not callables.

If your style function uses a `subset` or `axis` keyword argument, consider wrapping your function in a `functools.partial`, partialing out that keyword.

```python
my_func2 = functools.partial(my_func, subset=42)
```

## Builtin Styles

Some styling functions are common enough that we've "built them in" to the `Styler`, so you don't have to write them and apply them yourself. The current list of such functions is:

 - [.highlight_null][nullfunc]: for use with identifying missing data. 
 - [.highlight_min][minfunc] and [.highlight_max][maxfunc]: for use with identifying extremeties in data.
 - [.highlight_between][betweenfunc] and [.highlight_quantile][quantilefunc]: for use with identifying classes within data.
 - [.background_gradient][bgfunc]: a flexible method for highlighting cells based on their, or other, values on a numeric scale.
 - [.text_gradient][textfunc]: similar method for highlighting text based on their, or other, values on a numeric scale.
 - [.bar][barfunc]: to display mini-charts within cell backgrounds.
 
The individual documentation on each function often gives more examples of their arguments.

[nullfunc]: ../reference/api/pandas.io.formats.style.Styler.highlight_null.rst
[minfunc]: ../reference/api/pandas.io.formats.style.Styler.highlight_min.rst
[maxfunc]: ../reference/api/pandas.io.formats.style.Styler.highlight_max.rst
[betweenfunc]: ../reference/api/pandas.io.formats.style.Styler.highlight_between.rst
[quantilefunc]: ../reference/api/pandas.io.formats.style.Styler.highlight_quantile.rst
[bgfunc]: ../reference/api/pandas.io.formats.style.Styler.background_gradient.rst
[textfunc]: ../reference/api/pandas.io.formats.style.Styler.text_gradient.rst
[barfunc]: ../reference/api/pandas.io.formats.style.Styler.bar.rst

### Highlight Null

In [None]:
df2.iloc[0,2] = np.nan
df2.iloc[4,3] = np.nan
df2.loc[:4].style.highlight_null(color='yellow')

### Highlight Min or Max

In [None]:
df2.loc[:4].style.highlight_max(axis=1, props='color:white; font-weight:bold; background-color:darkblue;')

### Highlight Between

This method accepts ranges as float, or NumPy arrays or Series provided the indexes match.

In [None]:
left = pd.Series([1.0, 0.0, 1.0], index=["A", "B", "D"])
df2.loc[:4].style.highlight_between(left=left, right=1.5, axis=1, props='color:white; background-color:purple;')

### Highlight Quantile

Useful for detecting the highest or lowest percentile values

In [None]:
df2.loc[:4].style.highlight_quantile(q_left=0.85, axis=None, color='yellow')

### Set properties

Use `Styler.set_properties` when the style doesn't actually depend on the values. This is just a simple wrapper for `.map` where the function returns the same properties for all cells.

In [None]:
df2.loc[:4].style.set_properties(**{'background-color': 'black',
                           'color': 'lawngreen',
                           'border-color': 'white'})

### Bar charts

You can include "bar charts" in your DataFrame.

In [None]:
df2.style.bar(subset=['A', 'B'], color='#d65f5f')

## Limitations

- DataFrame only (use `Series.to_frame().style`)
- The index and columns do not need to be unique, but certain styling functions can only work with unique indexes.
- You can only apply styles, you can't insert new HTML entities, except via subclassing.