# Examples

Several of these plots have default interactions that you can try out directly on this page.

In [None]:
!pip freeze

In [None]:
import altair_ally as aly
from vega_datasets import data


aly.alt.data_transformers.disable_max_rows()

movies = (
    data
    .movies()[data.movies()['MPAA Rating'].isin(["G", "PG", "PG-13", "R"])]
    .sample(400, random_state=234890)
#     .query('`MPAA Rating` in ["G", "PG", "PG-13", "R"]')
    [['IMDB Votes', 'IMDB Rating', 'Rotten Tomatoes Rating',
      'Running Time min', 'MPAA Rating', 'Creative Type']])
movies.info()

## Missing values

A missing value plot can reveal patterns that would influence downstream analysis
and upstream wrangling issues.
It is also useful to indicate which variables are codependent in the data collection process,
such as the IMDB Votes and Ratings in the plot below.

Selecting an interval in the heatmap of individual NaNs 
will automatically update the bar plot with the NaN counts.

In [None]:
aly.nan(movies)

## Univariate distributions

Distributions are shown as densities by default,
and the subplots are laid out in square grids.
Densities can be made as areas or lines,
and include a rug plot included to indicate the number of observations,
since they can be misleadingly smooth even for small datasets.

In [None]:
aly.dist(movies)

In [None]:
aly.dist(movies, 'MPAA Rating')

Histograms can be made with the `'bar'` mark.

In [None]:
aly.dist(movies, mark='bar')

In [None]:
aly.dist(movies, dtype='object')

## Pairwise variable relationships

Pairplots (also called scatter plot matrices) gives an overview of the pairwise reationships 
of all quantitative columns in the data.
Selecting in one plot highlights the same points across all subplots.

In [None]:
aly.pair(movies)

In [None]:
aly.pair(movies, 'MPAA Rating')

## Pairwise variable correlation

A pairwise correlation plot can complement a pairplot
and provide a quantitative measurement of correlation between column pairs.
By default the Pearson and Spearman correlations are shown
to reveal both linear and monotonic non-linear (exponential, logarithmic, etc) relationships.
Note that non of these correlations would pick up more complex 
column relationships (e.g. quadratic),
so it is a good idea to use these in tandem with the pairplot.

Hovering over a point shows the exact coefficient
and highlights the point across all subplots.

In [None]:
aly.corr(movies)

## Parallel coordinates

Parallel coordinate plots gives an overview
of how individual observations are distributed
across all quantitative columns in the data.
Coloring by a categorical variable can help reveal groupings in the data
and is also effective to qualitatively assess clustering results
from using unsupervised learning algorithms.

Click the legend to hide and show groups.

In [None]:
aly.parcoord(movies, 'MPAA Rating', rescale='min-max')