# Some Plotting Tricks

## Paths

When you perform data analysis in the notebook, at first, it is important to define all necessary paths. I use two dictionaries to define them:
- **IN_PATHS** - all the paths where the input notebook data are stored
- **OUT_PATHS** - the paths where the results are stored

A key in the dictionary may have the `_dir` suffix. If a key in the dictionary has the `_dir` suffix, I know that the corresponding path should be treated as the path to a directory 

In [None]:
IN_PATHS = {
    'timeseries_file' : './input_data/timeseries.csv',
    'input_dir' : './input_data/',
}

OUT_PATHS = {
    'results_dir' : './results/csvs/',
    'figs_dir' : '../paper/figures/',
}

I have an auxiliary function that checks all the paths. For each `in_paths` path, it checks if the corresponding path exists and warns a user if it does not. For each path `out_paths` path, the function creates recursively all intermediate directories. Thus, when I save data, the corresponding functions do not complain that the path is not found. 

In [None]:
def check_paths(in_paths, out_paths):
    import os, shutil, itertools

    for pth_key in in_paths:
        pth = in_paths[pth_key]
        if not os.path.exists(pth):
            print(f'Path [{pth}] does not exist')
        if pth_key.endswith('_dir') and (not os.path.isdir(pth)):
            print(f'Path [{pth}] does not correspond to a directory!')

    for pth_key in out_paths:
        pth = out_paths[pth_key]
        if pth_key.endswith('_dir'):
            abs_path = os.path.abspath(pth)
        else:
            abs_path = os.path.abspath(os.path.dirname(pth))
        if not os.path.exists(abs_path):
            print(f'Creating path: [{abs_path}]')
            os.makedirs(abs_path)

check_paths(IN_PATHS, OUT_PATHS)

### Preprocessing and Loading Data

In [None]:
# imports
import os
import pandas as pd

In [None]:
# loading data
# got the data file from https://github.com/plotly/datasets/blob/master/timeseries.csv
ts_df = pd.read_csv(IN_PATHS['timeseries_file'])
ts_df['Date'] = pd.to_datetime(ts_df.Date)
#ts_df.info()

## Illustrative Example

In [None]:
# importing matplotlib functions
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.pyplot import subplots

In [None]:
# I use 'Roboto' font family
# matplotlib.rcParams['font.family'] = ['Roboto', 'sans-serif']

We use the following code to plot our data loaded into a pandas dataframe. On the `x` axis, I plot the time, while on the `y` axis I plot the values of the time series.

In [None]:
fig, ax = plt.subplots()
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.show()

In [None]:
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '1.pdf'),
)

## Improvements

### Tighttening Bounding Boxes

**bbox_inches**: str or Bbox, default: rcParams["savefig.bbox"] (default: None)

Bounding box in inches: only the given portion of the figure is saved. If 'tight', try to figure out the tight bbox of the figure.

In [None]:
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '2.pdf'),
    bbox_inches='tight',
)

### Figure Style

Matplotlib has a number of embedded styles. They are listed in the `plt.style.available` property. 

In [None]:
import math

available_styles = plt.style.available
n_styles = len(available_styles)

fig = plt.figure(dpi=100, figsize=(12.8, 4*n_styles/2), tight_layout=True)
for i, style in enumerate(available_styles):
    with plt.style.context(style):
        ax = fig.add_subplot(math.ceil(n_styles/2.0), 2, i+1)
        for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
            ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
        ax.tick_params(axis='x', labelrotation = 90)
        ax.set(xlabel='Date', ylabel='Value')
        ax.legend(loc='center right', ncol=4)
        ax.set_title(style)
fig.show()

From this list of styles, I think the following are good to be used in scientific papers:

In [None]:
for i, style in enumerate([
    'seaborn-paper',
    'seaborn-talk',
    'seaborn-notebook',
    'seaborn-colorblind',
    'tableau-colorblind10',
]):
    with plt.style.context(style):
        fig, ax = plt.subplots()
        for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
            ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
        ax.set_title(style)
        ax.tick_params(axis='x', labelrotation = 90)
        ax.set(xlabel='Date', ylabel='Value')
        ax.legend(loc='center right', ncol=4)

        fig.savefig(
            os.path.join(OUT_PATHS['figs_dir'], '{}.pdf'.format(i+3)),
            bbox_inches='tight',
        )

In [None]:
with plt.style.context(['seaborn-colorblind', 'seaborn-talk']):
    fig, ax = plt.subplots()
    for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
        ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
    ax.set_title('Colors Combined')
    ax.tick_params(axis='x', labelrotation = 90)
    ax.set(xlabel='Date', ylabel='Value')
    ax.legend(loc='center right', ncol=4)

    fig.savefig(
        os.path.join(OUT_PATHS['figs_dir'], '8.pdf'),
        bbox_inches='tight',
    )

Additionally, matplotlib provides an xkcd sketch mode context. It could be interesting to use this one in informal talks.

In [None]:
with plt.xkcd():
    fig, ax = plt.subplots()
    for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
        ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
    ax.set_title('xkcd')
    ax.tick_params(axis='x', labelrotation = 90)
    ax.set(xlabel='Date', ylabel='Value')
    ax.legend(loc='center right', ncol=4)
    fig.savefig(
        os.path.join(OUT_PATHS['figs_dir'], '9.pdf'),
        bbox_inches='tight',
    )

In order to set style for the whole notebook, use one of the following functions in the beginning of the notebook:

```python
# for papers
plt.style.use('seaborn-paper')
# for presentations
plt.style.use('seaborn-talk')
# for papers with colors distinguishable by colorblind people
plt.style.use('seaborn-colorblind') 
# HACK: for presentations with colors distinguishable by colorblind people
plt.style.use(['seaborn-colorblind', 'seaborn-talk'])
# to apply the xkcd style to all the graphs 
plt.xkcd()
```

### Patterns

In [None]:
from cycler import cycler

In [None]:
custom_marker_cycler = (cycler(marker=['o', 'x', 's', 'P']))


fig, ax = plt.subplots()

ax.set_prop_cycle(custom_marker_cycler) # setting the cycler for the current figure 

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '10.pdf'),
    bbox_inches='tight',
)

In [None]:
custom_ls_cycler = (cycler(ls=['-', '--', ':', '-.']))


fig, ax = plt.subplots()

ax.set_prop_cycle(custom_ls_cycler) # setting the cycler for the current figure 

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '11.pdf'),
    bbox_inches='tight',
)

In [None]:
# check the styles of a custom cycler
for c in custom_marker_cycler:
    print(c)

In [None]:
# adding cyclers
# lenghts of the cyclers should be equal, here equal to 4
print('Adding cyclers')
custom_cycler = (
    cycler(marker=['o', 'x', 's', 'P']) +
    cycler(ls=['-', '--', ':', '-.'])
)

for c in custom_cycler:
    print(c)

In [None]:
fig, ax = plt.subplots()

ax.set_prop_cycle(custom_cycler) # setting the cycler for the current figure 

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '12.pdf'),
    bbox_inches='tight',
)

In [None]:
# multiplying cyclers
# lenghts of the cyclers can be different
print('Multiplying cyclers')
custom_cycler = (
    cycler(marker=['o', 'x', 's', 'P']) *
    cycler(ls=['-', '--', ':'])
)

for c in custom_cycler:
    print(c)

In [None]:
fig, ax = plt.subplots()

ax.set_prop_cycle(custom_cycler) # setting the cycler for the current figure 

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '13.pdf'),
    bbox_inches='tight',
)

In [None]:
custom_color_cycler = (cycler(color='rgbk'))

for c in custom_color_cycler:
    print(c)

In [None]:
fig, ax = plt.subplots()

ax.set_prop_cycle(custom_color_cycler) # setting the cycler for the current figure 

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '14.pdf'),
    bbox_inches='tight',
)

In [None]:
# all colors can be extracted from the current style
custom_cycler = (cycler(ls=['-', '--', ':', '-.']) * cycler(color=plt.rcParams['axes.prop_cycle'].by_key()['color']))

fig, ax = plt.subplots()

ax.set_prop_cycle(custom_cycler)

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '15.pdf'),
    bbox_inches='tight',
)

In [None]:
# black and white cycler
black_n_white_cycler = (
    cycler(color=['black']) *
    cycler(ls=['-', '--', ':', '-.']) *
    cycler(marker=['o', 'x', 's', 'P'])
)

fig, ax = plt.subplots()

ax.set_prop_cycle(black_n_white_cycler)

for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '16.pdf'),
    bbox_inches='tight',
)

### Figure Size

#### Width

**figsize(float, float)**, default: rcParams["figure.figsize"] (default: [6.4, 4.8])

Width, height in inches.

Let's play with different figure sizes still preserving the ratio between width and height (4:3)

In [None]:
# 4 inch by 3 inch
fig, ax = plt.subplots(figsize=(4, 3))
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.show()
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '17.pdf'),
    bbox_inches='tight',
)

In [None]:
# default
fig, ax = plt.subplots()
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.show()
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '18.pdf'),
    bbox_inches='tight',
)

In [None]:
fig, ax = plt.subplots(figsize=(16, 12))
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.show()
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '19.pdf'),
    bbox_inches='tight',
)

The standard size of an A4 page is 8.25x11.75 inches. Usually, scientific papers have a two column format, therefore, the width of 1 column is roughly 4 inch. However, as you can see in the paper, the 4-inch width does not look good. The reason behind this is that matplotlib use bigger font sizes.

Personally, I often use either 6 or 6.4 inches.  

In [None]:
fig, ax = plt.subplots(figsize=(6, 4.5))
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.show()
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '5.pdf'),
    bbox_inches='tight',
)

#### Height

The default width-height ratio is 4:3. This is also the default ratio for old monitors, and thus, for presentations' page setups. However, newer monitors usually have 16:9 or 16:10 ratio, and the presentations' page setup as well. Therefore, I usually adjust the height for default figures to this ratio:
- 16:10 - (6, 3.75)
- 16:9  - (6, 3.375)
- 16:10 - (6.4, 4)
- 16:9  - (6.4, 3.6)

However, if you have long figures you may adjust the height according to your needs.

In [None]:
fig, ax = plt.subplots(figsize=(6.4, 4))
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax.plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax.tick_params(axis='x', labelrotation = 90)
ax.set(xlabel='Date', ylabel='Value')
ax.legend(loc='center right', ncol=4)

fig.show()
fig.savefig(
    os.path.join(OUT_PATHS['figs_dir'], '20.pdf'),
    bbox_inches='tight',
)

### Saving Figures

In [None]:
def save_fig(
        fig: matplotlib.figure.Figure, 
        fig_name: str, 
        fig_dir: str, 
        fig_fmt: str,
        fig_size: Tuple[float, float] = [6.4, 4], 
        save: bool = True, 
        dpi: int = 300,
        transparent_png = True,
    ):
    """This procedure stores the generated matplotlib figure to the specified 
    directory with the specified name and format.

    Parameters
    ----------
    fig : [type]
        Matplotlib figure instance
    fig_name : str
        File name where the figure is saved
    fig_dir : str
        Path to the directory where the figure is saved
    fig_fmt : str
        Format of the figure, the format should be supported by matplotlib 
        (additional logic only for pdf and png formats)
    fig_size : Tuple[float, float]
        Size of the figure in inches, by default [6.4, 4] 
    save : bool, optional
        If the figure should be saved, by default True. Set it to False if you 
        do not want to override already produced figures.
    dpi : int, optional
        Dots per inch - the density for rasterized format (png), by default 300
    transparent_png : bool, optional
        If the background should be transparent for png, by default True
    """
    if not save:
        return
    
    fig.set_size_inches(fig_size, forward=False)
    fig_fmt = fig_fmt.lower()
    fig_dir = os.path.join(fig_dir, fig_fmt)
    if not os.path.exists(fig_dir):
        os.makedirs(fig_dir)
    pth = os.path.join(
        fig_dir,
        '{}.{}'.format(fig_name, fig_fmt.lower())
    )
    if fig_fmt == 'pdf':
        metadata={
            'Creator' : '',
            'Producer': '',
            'CreationDate': None
        }
        fig.savefig(pth, bbox_inches='tight', metadata=metadata)
    elif fig_fmt == 'png':
        alpha = 0 if transparent_png else 1
        axes = fig.get_axes()
        fig.patch.set_alpha(alpha)
        for ax in axes:
            ax.patch.set_alpha(alpha)
        fig.savefig(
            pth, 
            bbox_inches='tight',
            dpi=dpi,
        )
    else:
        try:
            fig.savefig(pth, bbox_inches='tight')
        except Exception as e:
            print("Cannot save figure: {}".format(e))

In [None]:
fig, ax = plt.subplots(nrows = 2, ncols=1, figsize=[16, 12], sharex=True)
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax[0].plot(ts_df['Date'], ts_df[column_name], label=column_name)
for indx, column_name in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G']):
    ax[1].plot(ts_df['Date'], ts_df[column_name], label=column_name)
ax[0].tick_params(axis='x', labelrotation = 90)
ax[1].tick_params(axis='x', labelrotation = 90)
ax[0].set(xlabel='Date', ylabel='Value')
ax[1].set(xlabel='Date', ylabel='Value')
ax[0].legend(loc='center right', ncol=4)
ax[1].legend(loc='center right', ncol=4)

fig.show()

In [None]:
# saving non-transparent
save_fig(
    fig,
    'non_transparent',
    OUT_PATHS['figs_dir'],
    'png',
    fig_size=(6.4, 8),
    transparent_png=False,
)

# saving transparent
save_fig(
    fig,
    'transparent',
    OUT_PATHS['figs_dir'],
    'png',
    fig_size=(6.4, 8),
    transparent_png=True,
)

But writing every time all these things is tough, therefore, I use the following trick. I define several constants:

In [None]:
FIG_SIZE = (6.4, 4)
SAVE_FIG = True
FIG_FMT = 'pdf'
TRANSPARENT_PNG=True


Then the `save_fig` function can be used to store figures in the following way:

In [None]:

save_fig(
    fig,
    'transparent',
    fig_dir=OUT_PATHS['figs_dir'],
    fig_fmt=FIG_FMT,
    save=SAVE_FIG, 
    fig_size=FIG_SIZE,
    transparent_png=TRANSPARENT_PNG,
)

In [None]:

from functools import partial
savefig = partial(save_fig, fig_dir=OUT_PATHS['figs_dir'], fig_fmt=FIG_FMT, fig_size=FIG_SIZE, save=SAVE_FIG, transparent_png=TRANSPARENT_PNG)


In [None]:
savefig(fig, fig_name='cool_figure')