Survey Link:
 - https://docs.google.com/forms/d/e/1FAIpQLScpN3Leq765IihESOpakWJgj4MqdtR_jf9GXtlyqrMGA7nYqg/viewform
 
Download this lesson plan:
 - https://github.com/GTLibraryDataVisualization/Plotly-with-Python-Introduction-Workshop
 - OR Google: Gt github data visualization
 
Open this lesson plan:
 - open command prompt -> "jupyter notebook"



*Lesson Plan for Plotly Lab*

What is Python?

    Python is a programming language that is easy to learn and reads closer to English than other languages. Identation is VERY important. It is necessary for Python to know "what comes after what."
    
Basic Types

    During this lessson plan we will be using a few different types--
        String: a sequence of letters and symbols surrounded by ""
        Integer: whole number (ex: 5)
        float: number with decimal places (ex: 5.3)
        Object: a special type that has its own methods we can use.
        A method is a series of lines of code that accomplishes some task.
        Dataframe: special object imported by the pandas library, allows us
        to create a relational way of viewing and manipulating data.
        Series/List: simply a list of some form of data.
        Dictionary: a series of mappings from one type to another.
        
What does "Import" do?

    The import statement gives us functions, objects or a set of objects and functions (known as a
    library) that other people created that we would like to use. It saves us from having to write a
    bunch of code ourselves. The 'as' keyword lets us address this library as any nickname we give it.
    
    numpy
            Gives us the ability to create a series object, series objects are 
        what plotly uses to graph.
        
    plotly
            Gives us access to all of the methods we need to create a plotly graph.
            
    chart_studio.plotly
            This gives us easy access to plotly's plotting methods.
            
    plotly.graph_objs
            This is a plotly graph.  we are renaming it for ease later.
            
    pandas
            Gives us the ability to create a dataframe from our CSV files.

Project Set Up For Personal Device:

1.) Install Python3 (python.org)

2.) Open command prompt:
    - windows: [Shift key]+[Right click] on desktop and click "Open PowerShell window here."
    - linux / mac: on desktop, press [CTRL]+[ALT]+[T].

3.) Verify installation:
    - type "python --version" and press enter.
    - Installation was a success if it returns "Python 3.x.x."
    
4.) Install all of the needed libraries with the commands:
    - pip install plotly
    - pip install numpy
    - pip install pandas
    - pip install chart_studio
    - pip install jupyter
    OR
    - python -m pip install plotly numpy pandas chart_studio jupyter
    
5.) Initialize Jupyter Notebook with:
    - jupyter notebook
    
6.) Download project files
    - github.com/GTLibraryDataVisualization/Plotly-with-Python-Introduction-Workshop
    - Press the green "Clone or Download" button and download the project zip file.
    - This holds the example data we will be graphing.

Time to actually get started on the plotting:

In [1]:
#setting up the project

import numpy as np
import chart_studio.plotly as py
import plotly.graph_objs as go
import chart_studio
import pandas as pd

Plotly tools set credentials

    In order to save our graphs to our specific account, Plotly needs a username and API key.
    1) Go to https://plot.ly/
    2) Create an account or login.
    3) Hover over account name in upper-right and click Settings.
    4) Click API keys on the left hand side.
    5) Click (re)generate key, and copy that number.
    6) Paste that key as well as your username below.

In [2]:
#replace FAKE_NAME with your Plotly username and FAKE_KEY with the API key you generated
user = "FAKE_NAME"
key = "FAKE_KEY"
chart_studio.tools.set_credentials_file(username=user, api_key=key)

In [3]:
#reading the data

csv_file = "univ_reduced.csv"
df = pd.read_csv(csv_file)
#The variable csv_file is going to be a string object that points to the location of our data
#the next line asks our pandas library to try to read our CSV file, and turn it into a dataframe.
#Once executed, it will be stored in the variable df.

In [4]:
print(df)
#The print statement is used to see our data.
#Notice there are over 7500 rows.

                                                 INSTNM  SAT_AVG_ALL     UGDS  \
0                              Alabama A & M University        850.0   4505.0   
1                   University of Alabama at Birmingham       1147.0  11269.0   
2                                    Amridge University          NaN    308.0   
3                   University of Alabama in Huntsville       1221.0   5829.0   
4                              Alabama State University        844.0   4740.0   
...                                                 ...          ...      ...   
7588  National Personal Training Institute of Cleveland          NaN      NaN   
7589  Bay Area Medical Academy - San Jose Satellite ...          NaN      NaN   
7590                        High Desert Medical College          NaN      NaN   
7591                        Vantage College-San Antonio          NaN      NaN   
7592  American Institute of Pharmaceutical Technolog...          NaN      NaN   

      UGDS_RICH MD_EARN_WNE

In [5]:
###cleaning up the data###

#Notice we have a lot of Nan values. Plotly cannot graph those, so we need to get rid of them.
#We do this by indexing into our dataframe using [], select the column we want to filter, 
#and then only keeping every row that has a non-null value for that column entry.

df = df[df.SAT_AVG_ALL.notnull()]
print(df)
df = df[df.INSTNM.notnull()]
df = df[df.UGDS_RICH.notnull()]
df = df[df.MD_EARN_WNE_P10.notnull()]

#There are also some "PrivacySupressed" values in the salary column. Since this is also not a number,
#we want to remove all rows that contain that value as well.

df = df[~df.MD_EARN_WNE_P10.str.contains('PrivacySuppressed')]

#Print the updated and cleaned dataframe.
print(df)
#Notice the drop from ~7500 rows to now ~1300 rows.
#The rest contained data we could not graph.

                                                INSTNM  SAT_AVG_ALL     UGDS  \
0                             Alabama A & M University        850.0   4505.0   
1                  University of Alabama at Birmingham       1147.0  11269.0   
3                  University of Alabama in Huntsville       1221.0   5829.0   
4                             Alabama State University        844.0   4740.0   
5                            The University of Alabama       1181.0  31005.0   
...                                                ...          ...      ...   
7424   Purdue University - Purdue Polytechnic Columbus       1231.0      NaN   
7425     Purdue University - Purdue Polytechnic Kokomo       1231.0      NaN   
7426   Purdue University - Purdue Polytechnic Richmond       1231.0      NaN   
7427  Purdue University - Purdue Polytechnic Lafayette       1231.0      NaN   
7428  Purdue University - Purdue Polytechnic Vincennes       1231.0      NaN   

      UGDS_RICH MD_EARN_WNE_P10  
0    

The following code extracts the columns from our dataframe and give us arrays that we can use to graph. We need to do this because plotly works with *series* types (such as arrays) and not panda data structures. To get the names out of our list, we look into our dataframe for the column we want, and call .values. This simply gives us back the values that were in that column as an array. We want to know our university names, their average SAT, their starting salaries, and the percentage of rich students.

In [6]:
#extracting data values

name = df['INSTNM'].values
sat_average = df['SAT_AVG_ALL'].values
salary = df['MD_EARN_WNE_P10'].values
percentage_rich = df['UGDS_RICH'].values

Calling go.scatter3d is going to help us create a 3D scatter plot object containing all our points.

***IN ORDER TO GRAPH OUR DATA WE NEED TWO THINGS: 1) A DATA SET AND 2) A LAYOUT.***
- The Data set is "what" we are going to show.
- The Layout is "how" we are going to show it.

*Let's start with the data set:*

We want:
    - x axis to equal SAT averages (x = sat_average).
    - y axis to equal starting salary (y = salary).
    - z axis to equal percentage rich (z = percentage_rich).
    - when hovering over a point, to display corresponding school name (text = name).
    - our data points to be displayed as markers / dots (mode = "markers").

Now we want to control what our dots look like. Do this by passing in a tuple (static array) of information
    (marker = dict(.......)).

    The size of each dot is 4 (size = 4).
    The color will correspond to the percentage of rich students (color = percentage_rich).
    Plotly has premade colorscales so for this example we will use "Viridis" (colorscale = 'Viridis').
    Opacity is how transparent our dots we will be (0 is invisible, 1 is solid). We will set this to .8 (opacity = 0.8).
    
Think of a "trace" as a big grouping of all of the data needed to draw a graph. Plotly can support many of these "traces" simultaneously, so it likes to receive a list of traces. For this demonstration, we will use a list of size one to hold our single trace, trace1.

In [7]:
#preparing data for plotly

trace1 = go.Scatter3d(
    x=sat_average,
    y=salary,
    z=percentage_rich,
    text=name,
    mode='markers',
    marker=dict(
        size=4,
        color=percentage_rich, # set color to an array/list of desired values
        colorscale='Viridis',   # choose a colorscale
        opacity=0.8
    )
)
my_data = [trace1]

Next we want to define the layout of our graph

    We want the title of our graph to simply be "univeristy data" (title = 'University Data').
    To label our axis, we must first create a "scene." Inside this scene, we pass in a tuple of info, including
    xaxis, yaxis, and zaxis. Then each of those get their own tuple of info to describe themselves in detail
    scene=dict(xaxis=dict(......), yaxis=dict(.....), zaxis=dict(.....)).
    
    Let us set our x axis label to SAT average, y to average salary, and z to percentage rich.

In [8]:
my_layout = go.Layout(
    title='University Data',
    scene=dict(
        xaxis=dict(
            title='SAT Average'
        ),
        yaxis=dict(
            title='Average Salary'
        ),
        zaxis=dict(
            title='Percentage Rich'
        )
    )
)

Once we have all of that, we can actually plot the graph

        First we have to create a completed figure to graph. Plotly will do this automatically for us. All it needs to know is the data we want to plot and the layout. So we specify that the data it should use is my_data, and the layout it should use is my_layout. Finally, in order to display the graph, we call py.plot or py.iplot, specify the figure we want to plot, and what we want to save our graph as.
        
        IMPORTANT if using a text editor use py.plot, iplot is only for jupyter notebook.

In [None]:
fig = go.Figure(data=my_data, layout=my_layout)
py.iplot(fig, filename='univ_vis') #if using a text editor call py.plot(fig, filename='univ_vis')