(auto-reports)=
# Automating Reports

In this chapter, you'll learn how to set up *automated reporting* with code, by which we mean creating outputs such as PDF files, Microsoft Word (or open office) text documents, and even slides where the content is partly text and partly generated by code. For example, a technical report exploring recent trade statistics in which the code part pulls down the latest data and plots it. Best of all, you can decide whether to hide or show the code parts in the final outputs (or, with html outputs, allow users to choose whether to see the code or not).

This chapter has some similarities with the previous chapter, on {ref}`auto-research-outputs` (and uses some of the same underlying technology). But whereas that chapter was about using the LaTeX typesetting language to create research documents and slides that include automatically generated outputs, this chapter skips out any typesetting with LaTeX shows how to create outputs that go straight from the code to the documents and slides. Although LaTeX gives you full control over how outputs look, it comes with a fairly big overhead and the approach in this chapter trades off control for less complexity.

*The key use cases for the automated reports in this chapter are sharing documents or slides with colleagues or co-authors within an organisation or collaboration*. In more detail, this includes:

- reports that use data and/or charts and that are similar each time they are run (eg only the data are updated)
- technical reports that show or use the functionality of an existing code base
- slide decks that summarise the most recent data and that are produced at a regular frequency
- sending exploratory or prototype analysis to co-authors or collaborators
- writing blogs for blogging services that accept `.md` files (make sure to export to markdown)
- amazingly, it can also be used to create automatically updated websites relatively easily——we won't cover that in this chapter, but you can find some [information on this here](https://quarto.org/docs/websites/).

For this chapter, you will need an installation of **Jupyter Lab** (which can be installed via `pip install jupyterlab`), an installation of a programme called [**Quarto**](https://quarto.org/), and the other Python packages as used below. For some types of output, you'll also need to have an installation of the typesetting language **LaTeX**, for which this book recommends the [MikTeX](https://miktex.org/download) distribution.

This chapter has benefitted from work on automated reporting templates by the ever-talented [Grant McDermott](https://grantmcdermott.com/).

## Automated Reports with **Quarto**

The tool that we'll be using to do this is [**Quarto**](https://quarto.org/). You'll need to go to the website and follow the [install instructions](https://quarto.org/docs/getting-started/installation.html) first. You can check you've installed it properly using `quarto check install` on the command line.

**Quarto** is a really convenient wrapper for a bunch of other tools that makes it convenient to produce automated reports. You should check the latest [documentation](https://quarto.org/docs/getting-started/quarto-basics.html) for an up to date guide on use——here, we're going to see the basics and introduce a couple of templates that will serve you well.

Quarto can be used to create *output* documents and slides in a wide variety of formats including HTML, PDF, Microsoft Office (docx and pptx), OpenOffice, and many more.

You can write the *input* documents (and code) in two possible ways:

1. A special kind of markdown file, with extension `.qmd`. For more on markdown, see {ref}`wrkflow-markdown`.
2. Jupyter Notebooks, with extension `.ipynb` (which most of this book is written in)

You optionally add code (e.g. Python, R, JavaScript, etc.) to the documents to dynamically create figures, tables, etc. and then render the documents to their final format using Quarto.

If you only want to work one way, this book recommends the latter, using Jupyter Notebooks. But some of the key principles for both will be covered in the next sub-section.

### A minimal example of a report written with markdown content

We're now going to try the most minimal example of the first approach, a `.qmd` file, that also includes code and outputs.

There are advantages and disadvantages to writing your report in the `.qmd` format. The advantage is that it's just a plain text file and therefore anyone can open it, look at it, and change it with a text editor (and it's more convenient for version control in this way too). The big, big disadvantage is that you cannot see how the code is coming on as you write it (you have to render it to see the code outputs, as we'll see in a moment). In the next sub-section, we'll see a way of achieving a better workflow.

Let's get our minimal example setup. The below code and markdown form the content of a file called `report.qmd`:

````markdown
---
title: "Example Report"
author: "Joan Robinson"
format: pdf
toc: true
number-sections: true
jupyter: python3
---

## Polar Axis

For a demonstration of a line plot on a polar axis, see @fig-polar.

```{python}
#| label: fig-polar
#| fig-cap: "A line plot on a polar axis"

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```

For an example of a code output where the input is not shown, the code below will *only* show the output table by using the `echo: false` option.

```{python}
#| echo: false
import pandas as pd
import seaborn as sns

df = sns.load_dataset("penguins")
pd.crosstab(df["species"], [df["island"], df["sex"]], margins=True)
```

````

To turn this report into a PDF, save it as `report.qmd` and then, on the command line and in the same directory as the file, run

```bash
quarto render report.qmd
```

```{admonition} Exercise

Successfuly create a PDF by saving the markdown above into a file called `report.qmd` and then running the quarto render command.

If you get an error about not being able to find the Jupyter kernel, first check you have Jupyter Lab installed and then check what your Jupyter kernel is called using `jupyter kernelspec list` on the command line. You need to specify the name of your Jupyter kernel correct in the header document (in the example above, it's called 'python3', which is the default).

```

Now, because we specified `pdf` in the 'header' of our file we automatically got a pdf. But a wide range of output formats are available. For example, HTML

```bash
quarto render report.qmd --to html
```

and Microsoft Word

```bash
quarto render report.qmd --to docx
```

One slight frustration with the conversion to Word Documents is that tables from code (dataframes) are not rendered as tables in the Word doc.

The basic syntax is to write `--to outputformat` at the end of the render command.

```{admonition} Exercise

Successfuly create a HTML report by saving the markdown above into a file called `report.qmd` and then running the quarto render command with the to html option.

What happens to the menu on the right-hand side as you add extra headings using the `##` markdown syntax?

```

Although it's a bit clunky, it's also possible to insert code results in-line with text. Here's a minimal example of that.

````markdown
---
title: "Example Report with Inline Numbers from Code"
author: "Joan Robinson"
format: pdf
toc: true
number-sections: true
jupyter: python3
---

## Report

For a demonstration of a line plot on a polar axis, see @fig-polar.

For an example of a code output where the input is not shown, the code below will *only* show the output table by using the `echo: false` option.

```{python}
#| echo: false
from IPython.display import display, Markdown
import pandas as pd
import seaborn as sns

df = sns.load_dataset("penguins")
big_pen = df["body_mass_g"].max()
number = len(df)
display(
    Markdown(
    """
### The Heaviest Penguin

We find that the heaviest penguin, out of a total of {number} penguins, has a mass of {big_pen:.2f} kilograms.
""".format(big_pen = big_pen, number=number)
)
)
```
````

Note that, in this example, the `:2f` part of `{big_pen:.2f}` is an instruction to report the given number to 2 decimal places.

```{admonition} Exercise

Create a HTML report with an in-line number using the above example but change the formatting of the heaviest penguin to not show any decimal places.

```

There are, of course, loads of extra features that go beyond this example.

### A minimal example of a report written with Jupyter Notebooks

You can also write your reports with Jupyter Notebooks. As a reminder, these have cells that can be either text (in the form of markdown) or code (and they support a lot of languages), and have the file format `.ipynb`. This book recommends working with them in Visual Studio Code. Google Colab notebooks are a type of Jupyter notebook (and can be downloaded as `.ipynb` files).

Writing your automated reports in Jupyter Notebooks has one major advantage over using `.qmd` markdown files: you can run the code as you go, so you know what you're getting and it's easier to weed out bugs. This book recommends this approach to writing automated reports.

So what is the difference from what we've seen above? Very little, actually. Your content will still start with exactly the same header but this time it's going to be in a *markdown cell* at the top of your notebook. To be explicit, the first cell in your notebook will have:

```markdown
---
title: "Example Report"
author: "Joan Robinson"
format: pdf
toc: true
number-sections: true
jupyter: python3
---
```

Subsequent cells will be code or markdown depending on whether you need rich outputs (figures and tables) or text. So, instead of a code chunk that begins with ```` ```{python} ```` like in the `.qmd` approach, you will just create a new code cell.

Remember that putting `format: pdf` into this header will mean that the `render` command automatically produces a pdf. You can change it to `format: html` to default to a html file instead—and both options can be over-written by passing `--to format`.

As before, you can create text that is dynamically updated with code outputs too; just choose a code cell rather than a markdown cell.

The main difference when it comes to Jupyter Notebooks is that you must decide whether you want to *execute* the notebook before you render it with **Quarto**. Executing a notebook just means run the code before exporting it to a new format. (As an aside, the best practice way to use Jupyter Notebooks is to save them without any code outputs, so execute and render would be the standard way of doing it.) The terminal command to run a notebook and render it is:

```bash
quarto render jupyter-report.ipynb --execute
```

```{admonition} Exercise

Create a new Jupyter Notebook called `jupyter-report.ipynb` with the header above. Re-use the blocks of code and text from the `qmd` example in the previous section. Render it with the command above.

```

To change the output type, add another instruction to the command using `--to`:

```bash
quarto render jupyter-report.ipynb --execute --to html
```

### Further Customisation

The minimal examples we saw above may not cover everything you want to do. You can look at the Quarto [documentation](https://quarto.org/docs/output-formats/html-basics.html) for more features and customisation options.

## The Optimal Workflow for Writing Automated Reports

We now turn to a big tip on the optimal workflow for making automated reports and slides. Often you are interested in seeing how the final report will look as you change the code in *real time*. This is possible with Jupyter Notebooks and **Quarto**. Run the following in the terminal

```bash
quarto preview jupyter-report.ipynb
```

A browser window will open with a live preview of your pdf (if you set pdf as the default output option in the header). To create a live preview of a HTML document, it's

```bash
quarto preview jupyter-report.ipynb --to html
```

The image below shows an example of using preview side-by-side with a Jupyter Notebook in Visual Studio Code.

![Jupyter notebook and live HTML preview of generated report](https://quarto.org/docs/tools/images/vscode-preview.png)

````{admonition} Exercise
Take the Jupyter example from the previous exercise and change the format to HTML. Then add both a new text cell and a new figure (from a code cell) while in preview mode, making use of the `quarto preview` command above. If you need inspiration for the new figure, here's a simple scatter plot:

```python
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(
    [1, 2, 3, 4, 5, 6],
    [1, 4, 2, 3, 1, 7],
    s=np.linspace(300, 2000, 6),
    c=["b", "r", "g", "k", "cyan", "yellow"],
    edgecolors="k",
    alpha=0.5,
)
plt.show()
```

````

## Automated Slides with **Quarto**

It isn't just reports that you can create; you can make slide decks too. You have three main output formats to choose from for slides:

- html, via something called 'revealjs'; use `format: revealjs`
- pdf, via the LaTeX beamer package; use `format: beamer`
- Powerpoint, using the pptx format; use `format: pptx`

Everything else is the same as we have seen before. Here's a minimal example showing both code and text. It creates a HTML slide deck.

````markdown
---
title: "My Talk"
author: "Joan Robinson"
format: revealjs
---

## Introduction

- This is some text
- As is this

## Here Are Some Code Outputs

```{python}
#| echo: false
import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```
````

Note that this will not show the code, only the figure, as we have set `#| echo: false` for the code chunk. You could also set `echo: false` for the whole deck in the header.

```{admonition} Exercise
Render this slide example in all three of the main formats
```

```{admonition} Exercise
Add outputs from in-line code into your rendered deck, adapting the heaviest penguin example from earlier.
```

## Review

Having pored over this chapter, you should now:

- ✅ know that you can create automatic reports incorporating code and code outputs;
- ✅ know how to write automatic reports using **Quarto**, Jupyter Notebooks, and `.qmd` files;
- ✅ know how to produce reports in formats such as PDF, Word doc, and HTML;
- ✅ know how to produce automatic slide decks in formats such as PDF, Microsoft Powerpoint, and HTML; and
- ✅ know how to develop automatic reports using Jupyter Notebooks in preview mode.