In [None]:
import plotly.graph_objects as go
import pandas as pd
import plotly.express as px
from datetime import datetime

#**Building a scatterplot with specific colors**
In your work as a data analyst, you have been engaged by a group of Antarctic research scientists to help them explore and report on their work.

They have spent a lot of time collating data on penguin species, but are having difficulty visualizing it to understand what is happening. Specifically, they have asked if you can help them plot their data in relation to statistics on the penguins' body attributes. They also suspect there is some pattern related to species, but are unsure how to plot this extra element.

In this exercise, you will help the scientific team by creating a scatterplot of the 'Culmen' (upper beak) attributes of the scientists' penguin data, ensuring that the species are included as specific colors.

In [None]:
penguins = pd.read_csv('penguins.csv')
penguins.head()

Unnamed: 0.1,Unnamed: 0,studyName,Sample Number,Species,Region,Island,Stage,Individual ID,Clutch Completion,Date Egg,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex,Delta 15 N (o/oo),Delta 13 C (o/oo),Comments
0,1,PAL0708,1,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A1,Yes,2007-11-11,39.1,18.7,181.0,3750.0,MALE,,,Not enough blood for isotopes.
1,2,PAL0708,2,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A2,Yes,2007-11-11,39.5,17.4,186.0,3800.0,FEMALE,8.94956,-24.69454,
2,3,PAL0708,3,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A1,Yes,2007-11-16,40.3,18.0,195.0,3250.0,FEMALE,8.36821,-25.33302,
3,4,PAL0708,4,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A2,Yes,2007-11-16,,,,,,,,Adult not sampled.
4,5,PAL0708,5,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N3A1,Yes,2007-11-16,36.7,19.3,193.0,3450.0,FEMALE,8.76651,-25.32426,


In [None]:
color_map = {'Adelie': 'rgb(235,52,52)' , 'Gentoo': 'rgb(235,149,52)', 'Chinstrap': 'rgb(67,52,235)'}

fig = px.scatter(data_frame=penguins, title="Penguin Culmen Statistics",
    x='Culmen Length (mm)',
    y='Culmen Depth (mm)',
    color="Species",
    color_discrete_map=color_map
)
fig.show()

#**Bird feature correlations**
Continuing your work with the Antarctic Research Scientists, they loved the scatterplot you created for them.

Now they are sure there is a relationship between the attributes of the penguins. But how strong is that relationship and in what direction?

They have reached out again for help. Luckily you know just the plot: a correlation plot!

In this exercise, you will help the scientific team by creating a correlation plot between various penguin attributes in the provided penguins DataFrame.

In [None]:
penguin_corr = penguins.corr(method='pearson')

fig = go.Figure(go.Heatmap(
        z=penguin_corr.values.tolist(),
        x=penguin_corr.columns,
        y=penguin_corr.columns,
        colorscale='rdylgn', 
        zmin=-1, zmax=1))
fig.show()

#**GDP vs. life expectancy legend**
You have been contacted by the United Nations to help them understand and visualize their data. They have been playing with various visualization tools, but just can't seem to find the design that they want.

They want to understand the relationship (if it exists) between GDP and life expectancy and have gathered data on over 200 countries to analyze. However, their initial efforts have been confusing to stakeholders and they need a clear legend positioned below the plot to help viewers understand it.

In [None]:
life_gdp = pd.read_csv('life_gdp.csv')
life_gdp.head()

Unnamed: 0,Country,Code,Year,Population,Continent,Life expectancy,GDP Per Capita
0,Afghanistan,AFG,2015,34414000,Asia,63.377,1928
1,Albania,ALB,2015,2891000,Europe,78.025,10947
2,Algeria,DZA,2015,39728000,Africa,76.09,13024
3,Angola,AGO,2015,27884000,Africa,59.398,8631
4,Argentina,ARG,2015,43075000,South America,76.068,19316


In [None]:
fig = px.scatter(
        data_frame=life_gdp, 
        x="Life expectancy", 
        y="GDP Per Capita", color='Continent')

my_legend = {'x': 0.2, 'y': 0.95, 
            'bgcolor': 'rgb(60,240,201)', 'borderwidth': 5}

fig.update_layout({'showlegend': True, 'legend': my_legend})

fig.show()

#**Enhancing our GDP plot**
The United Nations loved your previous plot - the legend really stands out and makes it easier to view the plot.

However, there are some interesting data points that are not easy to further analyze due to the limited information in the plot.

Your task is to enhance the plot of life_gdp created in the last exercise to include more information in the hover and style it as requested.

In [None]:
fig = px.scatter(
  data_frame=life_gdp, 
  x="Life expectancy", 
  y="GDP Per Capita",
  color="Continent",
  hover_data=["Continent", "Life expectancy", "GDP Per Capita"],
  hover_name='Country'
)
fig.show()

#**Annotating your savings**
You have been working hard over the last 30 weeks to build your savings balance for your first car. However, there is some extra context that needs to be added to explain a drop in savings and, later, a big increase in savings accumulated each fortnight.

Your task is to annotate the bar chart of your savings balance over the weeks and add two key annotations to the plot to explain what happened.

For both annotations:

Ensure the arrow is showing and the head of the arrow is size 4
Make the font black color using the string ('black'), not RGB method.

In [None]:
savings = pd.read_csv('savings.csv')
savings.head()

Unnamed: 0,Week,Savings
0,2,250
1,4,500
2,6,800
3,8,1200
4,10,400


In [None]:
fig = go.Figure({
    'data': [{'alignmentgroup': 'True',
              'hovertemplate': 'Week=%{x}<br>Savings=%{y}<extra></extra>',
              'legendgroup': '',
              'marker': {'color': '#636efa', 'pattern': {'shape': ''}},
              'name': '',
              'offsetgroup': '',
              'orientation': 'v',
              'showlegend': False,
              'textposition': 'auto',
              'type': 'bar',
              'x': savings["Week"],
              'xaxis': 'x',
              'y': savings["Savings"],
              'yaxis': 'y'}],
    'layout': {'barmode': 'relative',
               'legend': {'tracegroupgap': 0},
               'margin': {'t': 60},
               'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Week'}},
               'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'Savings'}}}
})
fig.show()

#**A happier histogram plot**
The stock exchange firm you created the histogram for thinks that all the data and plots being created are too impersonal.

They have requested that a positive message be added to the histogram plot of company revenues you recently created.

You have just the right idea - you can wish the viewer a happy day and use the current day of the week for this!

In [None]:
loss_annotation = {'x': 10, 'y': 600, 'showarrow': True, 'arrowhead': 4,
                    'font': {'color': 'black'}, 'text': 'Urgent House Repairs'}

gain_annotation = {'x': 18, 'y':2500, 'showarrow': True, 'arrowhead': 4,
                    'font': {'color': 'black'}, 'text': 'New Job!'}

fig.update_layout({'annotations': [loss_annotation, gain_annotation]})

fig.show()

In [None]:
revenues = pd.read_csv('revenue_data.csv')
revenues.head()

fig = px.histogram(data_frame = revenues,
                   y="Revenue",
                   nbins=5)
fig.show()

In [None]:
today = datetime.today().strftime('%A')

message_annotation = {
  'x': 0.5, 'y': 0.95, 'xref': 'paper', 'yref': 'paper',
  'text': f'Have a Happy {today} :)',
  'font': {'size': 20, 'color': 'white'},
  'bgcolor': 'rgb(237, 64, 200)', 'showarrow': False}

fig.update_layout({'annotations': [message_annotation]})
fig.show()

#**Analyzing basketball stats**
You have been contracted by a national basketball team to help them visualize and understand key player stats for their top 50 players.

They have requested you to create a plot comparing players' 'Field Goal Percentage' (FGP) vs. their 'Points Per Game' (PPG). This sounds like a great opportunity to utilize your scatterplot skills!

It is important that this graph is comparable to their other graphs. Therefore, all axes need to start at 0 and the y-axis (FGP) needs to have a range of 0-100, since it is a percentage.

You have available a bball_data DataFrame with columns FGP and PPG.

In [None]:
bball_data = pd.read_csv('bball_data.csv')
bball_data.head()

Unnamed: 0,Player,PPG,FGP
0,1,36.44,51.14
1,2,31.01,50.01
2,3,30.9,44.19
3,4,30.99,52.5
4,5,27.75,46.08


In [None]:
fig = px.scatter(
  data_frame=bball_data,
  x="PPG", 
  y="FGP",
  title='Field Goal Percentage vs. Points Per Game')
fig.show()

fig.update_layout({'xaxis': {'range': [0, bball_data['PPG'].max() + 5]}})
fig.show()

fig.update_layout({'yaxis': {'range' : [0, 100]}})
fig.show()

#**Styling scientific research**
Now you have mastered customizing your plots it is time to let your creative energy flow!

In this exercise, you are continuing your work with the Antarctic research team to assist them to explore and understand the penguin data they have collected.

They have asked you to help them understand how the flipper length differs between species. Time is short, so you think a quick plotly.express visualization would do. However, they also want some specific customizations for the axes' titles and a timestamp annotation when the plot is generated.

Your task is to build a quick bar chart using plotly.express, including the specified customizations.

In [None]:
penguins = pd.read_csv('penguins.csv')
penguins.head()

Unnamed: 0.1,Unnamed: 0,studyName,Sample Number,Species,Region,Island,Stage,Individual ID,Clutch Completion,Date Egg,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex,Delta 15 N (o/oo),Delta 13 C (o/oo),Comments
0,1,PAL0708,1,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A1,Yes,2007-11-11,39.1,18.7,181.0,3750.0,MALE,,,Not enough blood for isotopes.
1,2,PAL0708,2,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A2,Yes,2007-11-11,39.5,17.4,186.0,3800.0,FEMALE,8.94956,-24.69454,
2,3,PAL0708,3,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A1,Yes,2007-11-16,40.3,18.0,195.0,3250.0,FEMALE,8.36821,-25.33302,
3,4,PAL0708,4,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A2,Yes,2007-11-16,,,,,,,,Adult not sampled.
4,5,PAL0708,5,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N3A1,Yes,2007-11-16,36.7,19.3,193.0,3450.0,FEMALE,8.76651,-25.32426,


In [None]:
penguins_agg = penguins.groupby(['Species']).mean()
penguins_agg["Species"] = penguins_agg.index
penguins_agg.head()

Unnamed: 0_level_0,Unnamed: 0,Sample Number,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Delta 15 N (o/oo),Delta 13 C (o/oo),Species
Species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Adelie Penguin (Pygoscelis adeliae),76.5,76.5,38.791391,18.346358,189.953642,3700.662252,8.859733,-25.804194,Adelie Penguin (Pygoscelis adeliae)
Chinstrap penguin (Pygoscelis antarctica),310.5,34.5,48.833824,18.420588,195.823529,3733.088235,9.356155,-24.546542,Chinstrap penguin (Pygoscelis antarctica)
Gentoo penguin (Pygoscelis papua),214.5,62.5,47.504878,14.982114,217.186992,5076.01626,8.245338,-26.185298,Gentoo penguin (Pygoscelis papua)


In [None]:
timestamp = datetime.now()

# Create plot
fig = px.bar(penguins_agg, x="Species", y="Flipper Length (mm)", color="Species", title='Flipper Length (mm) by Species')

# Change the axis titles
fig.update_layout({'xaxis': {'title': {'text': 'Species'}},
                  'yaxis': {'title': {'text': 'Average Flipper Length (mm)'}}})

# Add an annotation and show
fig.update_layout({'annotations': [{
  "text": f"This graph was generated at {timestamp}", 
  "showarrow": False, "x": 0.5, "y": 1.1, "xref": "paper", "yref": "paper"}]})
fig.show()