## 9.1 A Brief matplotlib API Primer

matplotlib can be a fairly low-level tool. You assemble a plot from its base components: the data display (i.e., the type of plot: line, bar, box, scatter, contour, etc.), legend, title, tick labels, and other annotations.


### Top level methods using `import matplotlib.pyplot as plt`
`plt.plot(data)`  to plot a line

### Axis method: Figures and Subplots (preferred)

1. Create a `Figure` object with `plt.figure(figsize=(10,6))`. `figsize` can be omitted. 
2. add plot axes using `add_subplot`
3. add a plot using the axis method: `ax.plot()`, etc. Possible plot types are: `.plot`, `.hist`, `.scatter`, etc

```
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(np.random.standard_normal(50).cumsum(), color="black",
         linestyle="dashed")
ax1.hist(np.random.standard_normal(100), bins=20, color="black", alpha=0.3);
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.standard_normal(30));
```

`plt.close("all")`# close all open figures. 

 `plt.subplots` = `plt.figure()` + `plt.add_subplot()`. 

The axes array can then be indexed like a two-dimensional array; for example, `axes[0, 1]` refers to the subplot in the top row at the center. You can also indicate that subplots should have the same `x`- or `y`-axis using `sharex` and `sharey`, respectively. 

`fig, axes = plt.subplots(2, 3)`

Table 9.1: matplotlib.pyplot.subplots options
Argument	|Description
|:----------|:------------------------------------------------------|
nrows|	Number of rows of subplots
ncols|	Number of columns of subplots
sharex	|All subplots should use the same x-axis ticks (adjusting the xlim will affect all subplots)
sharey	|All subplots should use the same y-axis ticks (adjusting the ylim will affect all subplots)
subplot_kw|	Dictionary of keywords passed to add_subplot call used to create each subplot
**fig_kw|	Additional keywords to subplots are used when creating the figure, such as plt.subplots(2, 2, figsize=(8, 6))

### Adjusting the spacing around subplots

You can change the spacing using the subplots_adjust method on Figure objects:
```
subplots_adjust(left=None, bottom=None, right=None, top=None,
                wspace=None, hspace=None)
```
`left, right, bottom, top`: These parameters control the spacing of the subplots from the left, right, bottom, and top edges of the figure, respectively. They can be specified as a fraction of the figure width or height.

`wspace` and `hspace` control the percent of the figure width and figure height, respectively, to use as spacing between subplots.

### Colors, Markers, and Line Styles

```
ax.plot(x, y, linestyle="--", color="green")
ax.plot(np.random.standard_normal(30).cumsum(), color="black",
        linestyle="dashed", marker="o");
```

A number of color names are provided for commonly used colors, but you can use any color on the spectrum by specifying its hex code (e.g., `"#CECECE"`). 

For line plots, you will notice that subsequent points are linearly interpolated by default. This can be altered with the `drawstyle` option.

```
ax.plot(data, color="black", linestyle="dashed",
        drawstyle="steps-post", label="steps-post");
        # steps-post: jumps after the point
        # steps: (steps-pre); steps-mid;
ax.legend() # to create 
```

The `loc` legend option tells matplotlib where to place the plot. The default is `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`.

### Ticks, Labels, and Legends
Most kinds of plot decorations can be accessed through methods on matplotlib axes objects. This includes methods like `xlim`, `xticks`, and `xticklabels`. These control the plot range, tick locations, and tick labels, respectively. They can be used in two ways:

Called with no arguments returns the current parameter value (e.g., `ax.xlim()` returns the current x-axis plotting range)

Called with parameters sets the parameter value (e.g., `ax.xlim([0, 10])` sets the x-axis range to 0 to 10)

All such methods act on the active or most recently created AxesSubplot. Each corresponds to two methods on the subplot object itself; in the case of `xlim`, these are `ax.get_xlim` and `ax.set_xlim`.

```
ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels(["one", "two", "three", "four", "five"],
                            rotation=30, fontsize=8)
ax.set_xlabel("Stages")
ax.set_title("My first matplotlib plot")
```

Modifying the y-axis consists of the same process, substituting y for x in this example. The axes class has a `set` method that allows batch setting of plot properties. From the prior example, we could also have written:
```
ax.set(title="My first matplotlib plot", xlabel="Stages")
```

### Annotations and Drawing on a Subplot
In addition to the standard plot types, you may wish to draw your own plot annotations, which could consist of text, arrows, or other shapes. You can add annotations and text using the `text`, `arrow`, and `annotate` functions. text draws text at given coordinates (x, y) on the plot with optional custom styling:
```
ax.text(x, y, "Hello world!",
        family="monospace", fontsize=10)
`````


`asof` is a method that allows you to retrieve the value of the spx data at the specified date or the most recent date before the given date if the exact date is not available.

```
ax.annotate(label, xy=(date, spx.asof(date) + 75), #to coordinates
                xytext=(date, spx.asof(date) + 225),#from coordinates
                arrowprops=dict(facecolor="black", headwidth=4, width=2,
                                headlength=4),
                horizontalalignment="left", verticalalignment="top")

# Zoom in on 2007-2010
ax.set_xlim(["1/1/2007", "1/1/2011"])
ax.set_ylim([600, 1800])
```

Drawing shapes requires some more care. matplotlib has objects that represent many common shapes, referred to as `patches`. Some of these, like `Rectangle` and `Circle`, are found in `matplotlib.pyplot`, but the full set is located in `matplotlib.patches`.

To add a shape to a plot, you create the patch object and add it to a subplot ax by passing the patch to `ax.add_patch` (see Data visualization composed from three different patches):

```
fig, ax = plt.subplots(figsize=(6, 6))
rect = plt.Rectangle((0.2, 0.75), 0.4, 0.15, color="black", alpha=0.3)
circ = plt.Circle((0.7, 0.3), 0.15, color="blue", alpha=0.3)
pgon = plt.Polygon([[0.15, 0.15], [0.35, 0.4], [0.2, 0.6]],
                   color="green", alpha=0.5)
ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(pgon)
```

### Saving Plots to File
You can save the active figure to file using the figure object’s `savefig` instance method. For example, to save an SVG version of a figure, you need only type:
```
fig.savefig("figpath.svg") #SVG: scalable vector Graphics, a XML-based format used on a webpage. 
```
The file type is inferred from the file extension. So if you used `.pdf` instead, you would get a PDF. One important option that I use frequently for publishing graphics is `dpi`, which controls the dots-per-inch resolution. To get the same plot as a PNG at 400 DPI, you would do:
```
fig.savefig("figpath.png", dpi=400)
```


Table 9.2: Some fig.savefig options
Argument|	Description
|:-----------|:-----------------------------------------------------|
fname|	String containing a filepath or a Python file-like object. The figure format is inferred from the file extension (e.g., .pdf for PDF or .png for PNG).
dpi|	The figure resolution in dots per inch; defaults to 100 in IPython or 72 in Jupyter out of the box but can be configured.
facecolor, edgecolor|	The color of the figure background outside of the subplots; "w" (white), by default.
format	|The explicit file format to use ("png", "pdf", "svg", "ps", "eps", ...).


### matplotlib Configuration
to set the global default figure size to be 10 × 10, you could enter:
```
plt.rc("figure", figsize=(10, 10))
```
All of the current configuration settings are found in the `plt.rcParams` dictionary, and they can be restored to their default values by calling the `plt.rcdefaults()` function.

The first argument to `rc` is the component you wish to customize, such as "figure", "axes", "xtick", "ytick", "grid", "legend", or many others. After that can follow a sequence of keyword arguments indicating the new parameters. A convenient way to write down the options in your program is as a dictionary:
```
plt.rc("font", family="monospace", weight="bold", size=8)
```
For more extensive customization and to see a list of all the options, matplotlib comes with a configuration file `matplotlibrc` in the `matplotlib/mpl-data` directory. If you customize this file and place it in your home directory titled `.matplotlibrc`, it will be loaded each time you use matplotlib.

## 9.2 Plotting with pandas and seaborn

### Line Plots
Series and DataFrame have a plot attribute for making some basic plot types. By default, `plot()` makes line plot. The plot attribute contains a "family" of methods for different plot types. For example, `df.plot()` is equivalent to `df.plot.line()`. 

The Series object’s `index` is passed to matplotlib for plotting on the x-axis, though you can disable this by passing `use_index=False`. The x-axis ticks and limits can be adjusted with the `xticks` and `xlim` options, and the y-axis respectively with `yticks` and `ylim`. 

Table 9.3: Series.plot method arguments

Argument|	Description
|:----------|:---------------------------------------------------|
label|	Label for plot legend
ax|	matplotlib subplot object to plot on; if nothing passed, uses active matplotlib subplot
style|	Style string, like "ko--", to be passed to matplotlib
alpha|	The plot fill opacity (from 0 to 1)
kind|	Can be "area", "bar", "barh", "density", "hist", "kde", "line", or "pie"; defaults to "line"
figsize|	Size of the figure object to create
logx|	Pass True for logarithmic scaling on the x axis; pass "sym" for symmetric logarithm that permits negative values
logy|	Pass True for logarithmic scaling on the y axis; pass "sym" for symmetric logarithm that permits negative values
title|	Title to use for the plot
use_index|	Use the object index for tick labels
rot|	Rotation of tick labels (0 through 360)
xticks|	Values to use for x-axis ticks
yticks|	Values to use for y-axis ticks
xlim|	x-axis limits (e.g., [0, 10])
ylim|	y-axis limits
grid|	Display axis grid (off by default)

Most of pandas’s plotting methods accept an optional `ax` parameter, which can be a matplotlib subplot object. This gives you more flexible placement of subplots in a grid layout.

DataFrame’s plot method ***plots each of its columns*** as a different line on the same subplot, creating a legend automatically using the column names (see Simple DataFrame plot):

```
df = pd.DataFrame(np.random.standard_normal((10, 4)).cumsum(0),
                  columns=["A", "B", "C", "D"],
                  index=np.arange(0, 100, 10))
plt.style.use('grayscale') # use grayscale for all plot elements: line, marker, background
df.plot()
```

:::{.callout-note}
`plt.style.use('grayscale')` to switch to a color scheme more suitable for black and white publication, since some readers will not be able to see the full color plots.

Table 9.4: DataFrame-specific plot arguments
Argument|	Description
|:-------------|:---------------------------------------------|
subplots|	Plot each DataFrame column in a separate subplot
layouts|	2-tuple (rows, columns) providing layout of subplots
sharex|	If subplots=True, share the same x-axis, linking ticks and limits
sharey|	If subplots=True, share the same y-axis
legend|	Add a subplot legend (True by default)
sort_columns|	Plot columns in alphabetical order; by default uses existing column order



### Bar Plots
The `plot.bar()` and `plot.barh()` make vertical and horizontal bar plots, respectively. In this case, the Series or DataFrame index will be used as the x (bar) or y (barh) ticks 

```
fig, axes = plt.subplots(2, 1)
data = pd.Series(np.random.uniform(size=16), index=list("abcdefghijklmnop"))
data.plot.bar(ax=axes[0], color="black", alpha=0.7)
data.plot.barh(ax=axes[1], color="black", alpha=0.7)# note the index starting from bottom
``````

Note that the name “Genus” on the DataFrame’s columns is used to title the legend.

We create stacked bar plots from a DataFrame by passing `stacked=True`, resulting in the value in each row being stacked together horizontally

```
df.plot.barh(stacked=True, alpha=0.5)
```

:::{.callout-note}
A useful recipe for bar plots is to visualize a Series’s value frequency using `value_counts`: `s.value_counts().plot.bar()`.

The `pandas.crosstab` function is a convenient way to compute a simple frequency table from two DataFrame columns:

```
party_counts = pd.crosstab(tips["day"], tips["size"]) # row: "day"; col: "size"
```


`party_pcts.plot.bar(stacked=True)`

### barplot by Seaborn

`sns.barplot(x="tip_pct", y="day", hue="time", data=tips, orient="h")`

Notice that seaborn has automatically changed the aesthetics of plots: the default color palette, plot background, and grid line colors. You can switch between different plot appearances using `seaborn.set_style`. 

When producing plots for black-and-white print medium, you may find it useful to set a greyscale color palette, like so:
```
sns.set_palette("Greys_r")
```

`sns.set_style("whitegrid")`

### Histograms and Density Plots
using the `plot.hist` method on the Series :

`tips["tip_pct"].plot.hist(bins=50)`

Thus, density plots are also known as kernel density estimate (KDE) plots. Using `plot.density` makes a density plot using the conventional mixture-of-normals estimate 

`tips["tip_pct"].plot.density()` # need scipy package

Seaborn makes histograms and density plots even easier through its histplot method, which can plot both a histogram and a continuous density estimate simultaneously. 

`sns.histplot(values, bins=100, kde=False, color="black")`

### Scatter or Point Plots


```
ax = sns.regplot(x="m1", y="unemp", data=trans_data)
```

In exploratory data analysis, it’s helpful to be able to look at all the scatter plots among a group of variables; this is known as a pairs plot or `scatter plot matrix`. Making such a plot from scratch is a bit of work, so seaborn has a convenient `pairplot` function that supports placing histograms or density estimates of each variable along the diagonal:

```
sns.pairplot(trans_data, diag_kind="kde", plot_kws={"alpha": 0.2}) 
#data points transparency is 0.2. 0(transparent)<=alpha<=1(opaque)
```

### Facet Grids and Categorical Data
 seaborn has a useful built-in function `catplot` that simplifies making many kinds of faceted plots split by categorical variables 

```
sns.catplot(x="day", y="tip_pct", hue="time", col="smoker",  # facet by "smoker"
            kind="bar", data=tips[tips.tip_pct < 1])
```

```
sns.catplot(x="day", y="tip_pct", row="time",
            col="smoker",
            kind="bar", data=tips[tips.tip_pct < 1])
```

`catplot` supports other plot types that may be useful depending on what you are trying to display. For example, box plots (which show the median, quartiles, and outliers) can be an effective visualization type

```
sns.catplot(x="tip_pct", y="day", kind="box",
            data=tips[tips.tip_pct < 0.5])
```

You can create your own facet grid plots using the more general `seaborn.FacetGrid` class.