# Data Visualization with Python
Part of the SWEET Workshop series presented by the [IDEA Student Center at UC San Diego](http://www.jacobsschool.ucsd.edu/student/).

### Goals
Learn to create professionally formatted visuals.

### Requirements
- numpy
- matplotlib
- seaborn

In [None]:
# load required packages

# vectorized functions
import numpy as np

# plotting
import matplotlib.pyplot as plt
%matplotlib inline

# make the code compatible with python 2.x and 3.x
from __future__ import print_function, division

## 1) Loading data
Let's start by loading an example data file: data on the number of shark attacks per month vs. the number of ice cream sales.

**Discussion**: Based on the data description:
- How many columns should we get from loading the file?
- Which variables have numeric values (if any)?

In [None]:
# we'll load the data using numpy's genfromtxt() function
#
# NOTE: the data columns are separated by commas
#

# load the data
data = np.genfromtxt("ice_cream_vs_shark_attacks.csv", delimiter=",")

# check the data dimensions
#print( data.shape )

# check the data type
#print( type(data) )

# check the data type of one of the individual elements
#print( type(data[0, 0]) )

**Discussion**:
- How many rows and columns are there?
- What data type of data was loaded? Numbers? Text?


And which column is which variable?
- column 0: ice cream sales
- column 1: ???

## 2) Visualizing data
Now that we've loaded some data, it makes sense to try to visualize it.

In [None]:
# select the two variables from the data set
ice_cream = data[:, 0]
shark_attacks = data[:, 1]

# create a scatter plot
plt.scatter(ice_cream, shark_attacks)

plt.show()

Let's try to improve the plot formatting. After all, almost every data analysis project will involve create a visual that can then be presented to someone (coworkers, project supervisors, clients, etc.).

Ideas for formatting revisions:
- colors
- figure size
- text labels
- font sizes

In [None]:
# select two of the variables from the data set
ice_cream = data[:, 0]
shark_attacks = data[:, ???]

# set the figure size
plt.figure(figsize=(???, ???))

# create a scatter plot
plt.scatter(ice_cream, shark_attacks, color='???')

# add labels
plt.xlabel('???')
plt.ylabel('???')

# add a grid
plt.grid()

plt.show()

# 3) Formatting with seaborn
Let's "cheat" by using the seaborn package to get better plot defaults.
**NOTE**: everything seaborn does, you can accomplish with just matplotlib. But seaborn makes it a lot easier.

In [None]:
# better plotting settings with seaborn
#
# NOTE: once loaded, seaborn starts affecting all plots
#
import seaborn as sns

In [None]:
# select two of the variables from the data set
ice_cream = data[:, 0]
shark_attacks = data[:, 1]

# set the seaborn settings
#
# context:
# - talk = for a presentation
# - paper = for a report, journal article, etc.
#
# style:
# - white
# - dark
# - whitegrid
# - darkgrid
# - ticks
#
sns.set(context='talk', style='white')

# set the figure size
#plt.figure(figsize=(???, ???))

# create a scatter plot
plt.scatter(ice_cream, shark_attacks)

# add labels
plt.xlabel('???')
plt.ylabel('???')

plt.show()


# try customizing this plot further
# - add labels
# - change the colors

Now let's try some other plot types:
- line plot
- histogram
- bar plot