# Matplotlib

The following imports numpy.  Do similar imports for matplotlib.pyplot and pandas:

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

We're going to generate some plots of fictitious data that follow this equation:

$$ y(x) = 4 + 2x - x^2 + 0.075x^3 $$

The following cell generates our `x` and `y` arrays.

In [None]:
x = np.linspace(0, 10, 50)
y = 4 + 2*x - x**2 + 0.075*x**3

Use the matplotlib.pyplot module to make a plot of `y` against `x` (y vertically, x horizontally):

In [9]:
plt.plot(x,y)
plt.ylabel('y')
plt.xlabel('x')
plt.show()

NameError: name 'x' is not defined

(Check your ranges in the plot to make sure that `x` values are plotted horizontally and `y` values are plotted vertically.)

Real data is usually noisy.  It may be noise from measurement, or "noise" in the sense that there are aspects of the data that aren't captured by the features that we measure.

The following cell introduces some random noise into a new variable `y_with_noise` that equals `y + noise`

In [None]:
# generate 50 points from a normal 
# distribution that has mean = 0 and std dev = 1.5
noise = np.random.normal(0,1.5,50)

# this y is now the theoretical value + noise
y_with_noise = 4 + 2*x - x**2 + 0.075*x**3 + noise

Make a plot of y_with_noise against x:

In [13]:

plt.plot(noise,y_with_noise)
plt.show()

NameError: name 'noise' is not defined

Make a figure that includes
* scatter plot of y_with_noise against x
* line plot of y against x

In [14]:

plt.scatter(noise,y_with_noise)
plt.show()

plt.plot(x,y)
plt.show()

NameError: name 'noise' is not defined

Make the same plot again with:
* labels for the x-axis and y-axis
* a title
* 16-pt font size for the labels and title
* make the line blue
* make the scatter points black

In [None]:
plt.plot(x,y)
plt.ylabel('y', fontsize=16)
plt.xlabel('x', fontsize=16)

plt.text('title',fontsize=16,color='blue')
plt.show()

plt.scatter(noise,y_with_noise)
plt.ylabel('y_with_noise', fontsize=16)
plt.xlabel('noise', fontsize=16)

plt.text('title',fontsize=16,color='black')
plt.show()

Save the figure into a file named 'matplotlib-plot.png':

In [None]:
fig.savefig('matplotlib-plot.png')

Generate the same plot using the figure and axes objects (you can use the matplotlib.pyplot module to generate the figure, but otherwise don't use matplotlib.pyplot.)

* The documentation for matplotlib.axes -> https://matplotlib.org/stable/api/axes_api.html
* The documentation for matplotlib.figure -> https://matplotlib.org/stable/api/figure_api.html

Save the figure into the file 'figure-plot.png'

In [None]:
plt.savefig('figure-plot.png')

plt.close()

# Pandas

Use pandas to import the data from 'anscombe.csv' into a dataframe variable

In [None]:
dinodata = pd.read_csv('anscombe.csv')

Print the first 2 rows of the dataframe:

In [None]:
dinodata.loc[[0,1]]

Print the last 2 rows of the dataframe:

In [None]:
dinodata.tail[[-2,-1]]

Print the number of rows and columns of the dataframe:

In [None]:
print(dinodata.shape())

Print the column names of the dataframe:

In [None]:
print(dinodata.columns())

Print the datatypes of each column:

In [None]:
print(dinodata.dtypes())


Print summary statistics about the dataframe:

In [None]:
print(dinodata.describe())

Print the values that are in the 'dataset' column:

In [None]:
print(df['dataset'])

Use "loc" to print the first 10 rows:

In [None]:
dinodata.loc[[0,10]]

Use "loc" to print the values from the first 10 rows of the 'x' column:

In [None]:
dinodata.iloc[0:10, 1]

Use "loc" to print the rows for which `dataset` is equal to `III`:

In [None]:
dinodata['dataset'] == 'III'

Make a line plot of y against x for dataset III:

In [None]:
a = dinodata[dinodata['dataset'] == 'III'

a.plot(x='x', y='y')

Make a scatter plot of y against x for dataset III:

In [None]:
a.plot(x='x', y='y',kind='scatter')

Make scatter plots for the other 3 datasets:

In [None]:
a = dinodata[dinodata['dataset'] == 'I'

a.plot(x='x', y='y',kind='scatter')
             
a = dinodata[dinodata['dataset'] == 'II'

a.plot(x='x', y='y',kind='scatter')

a = dinodata[dinodata['dataset'] == 'IIII'

a.plot(x='x', y='y',kind='scatter')

a = dinodata[dinodata['dataset'] == 'IV'

a.plot(x='x', y='y',kind='scatter')

We're working with the datasets from Anscombe's quartet.  You can read about it [here](https://en.wikipedia.org/wiki/Anscombe%27s_quartet).

Use that page to find the equation for the linear regression line of these datasets, and make a figure that includes:
* a scatter plot of dataset I 
* a line plot of the linear regression line

In [None]:
y = 3.00 + 0.500*x

a = dinodata[dinodata['dataset'] == 'I'

a.plot(x='x', y='y',kind='scatter',color='black')


Make the same plot again with:
* a title
* 16-pt font size for the labels and title
* make the line blue
* make the scatter points black
* save the figure into a file called 'anscombe-I.png'

In [None]:
plt.title('title', fontsize=16)
plt.xlabel('x', fontsize=16)
plt.ylabel('y', fontsize=16)

fig.savefig('anscombe-I.png')

# End