In [26]:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import plotly.express as px

In [7]:
monthly_sales = {'data': [{'type': '', 'x': ['Jan', 'Feb', 'March'], 'y': [450, 475, 400]}], 'layout': {'title': {'text': ''}}}

#**Fixing a Plotly figure**
Your colleague had started a project to create a visualization of sales for the first three months of 2020. However, she then left for some annual leave - but the boss wants the visualization now!

You can see she left behind a dictionary object that has started to define the visualization. It is your task to finish this dictionary with the important key arguments so it can be turned into a Plotly visualization.

In the exercises where it is needed throughout the course, plotly.graph_objects has already been loaded as go.

There is a monthly_sales dictionary that has been partially complete also available.

In [8]:
monthly_sales['data'][0]['type'] = 'bar'
monthly_sales['layout']['title']['text'] = 'Sales for Jan-Mar 2020'

In [9]:
fig = go.Figure(monthly_sales)

In [12]:
fig.show()

#**Student scores bar graph**
The school board has asked you to come and look at some test scores. They want an easy way to visualize the score of different students within a small class. This seems like a simple use case to practice your bar chart skills!

In this exercise, you will help the school board team by creating a bar chart of school test score values.

A DataFrame student_scores has been provided. 

In [63]:
student_scores = pd.read_csv('student_scores.csv')

In [64]:
student_scores

Unnamed: 0,student_name,score
0,John,80
1,Julia,85
2,Xuan,90
3,Harry,97


In [65]:
fig = px.bar(data_frame=student_scores, 
             title='Student Scores by Student', 
             y=student_scores.score, 
             x=student_scores.student_name)
fig.show(renderer="colab")

#**Box plot of company revenues**
You have been contracted by a New York Stock exchange firm who are interested in upping their data visualization capabilities.

They are cautious about this new technology so have tasked you with something simple first. To display the distribution of the revenues of top companies in the USA. They are particularly interested in what kind of revenue puts you in the 'top bracket' of companies.

They also want to know if there are any outliers and how they can explore this in the plot. This sounds like a perfect opportunity for a box plot.

In this exercise, you will help the investment team by creating a box plot of the revenue of top US companies.

In [66]:
revenues = pd.read_csv('revenue_data.csv')
revenues.head()

Unnamed: 0,Rank,Company,Revenue
0,1,Walmart,523964.0
1,2,Sinopec Group,407009.0
2,3,State Grid,383906.0
3,4,China National Petroleum,379130.0
4,5,Royal Dutch Shell,352106.0


In [67]:
revenues.tail()

Unnamed: 0,Rank,Company,Revenue
195,196,Auchan Holding,54672.0
196,197,Tencent Holdings,54613.0
197,198,Nippon Steel Corporation,54465.0
198,199,CNP Assurances,54365.0
199,200,Energy Transfer,


In [68]:
fig = px.box(data_frame=revenues, 
            y="Revenue", 
            hover_data=["Company"])
fig.show()

#**Histogram of company revenues**
The New York Stock exchange firm loved your previous box plot and want you to do more work for them.

The box plot was a perfect visualization to help them understand the outliers and quartile-related attributes of their company revenue dataset.

However, they want to understand a bit more about the distribution of the data. Are there many companies with smaller revenue, or larger revenue? Is it somewhat bell-shaped or skewed towards higher or lower revenues?

In this exercise, you will help the investment team by creating a histogram of the revenue of top US companies.

In [69]:
fig = px.histogram(data_frame = revenues,
                   y="Revenue",
                   nbins=5)
fig.show()

#**Coloring student scores bar graph**
The previous plot that you created was well received by the school board, but they are wondering if there is a way for you to visually identify good and bad performers.

This would be a great opportunity to utilize color. Specifically, a color scale. You think a scale from red (worst marks) to green (good marks) would be great.

Part of your previous code to create the student scores bar chart has been provided.

In [70]:
my_scale = ['rgb(255,0,0)', 'rgb(3,252,40)']

fig = px.bar(data_frame=student_scores, 
             x='student_name', y='score', title='Student Scores by Student',
             color='score',
             color_continuous_scale=my_scale
             )
fig.show(renderer="colab")

#**Side-by-side revenue box plots with color**
The New York Stock Exchange firm you did work for previously has contracted you to extend on your work building the box plot of company revenues.

They want to understand how different industries compare using this same visualization technique from before. They are also particular about what colors are used for what industries. They have prepared a list of industries and the colors as below.

Your task is to create a box plot of company revenues, as before, but include the specified colors based on the list of industries given below.

There is a revenues DataFrame already loaded for your use.

*Industry-color RGB definitions:*

Tech = 124, 250, 120

Oil = 112,128,144

Pharmaceuticals = 137, 109, 247

Professional Services = 255, 0, 0

In [72]:
revenue2 = pd.read_csv('revenue_data2.csv')
revenue2.head()

Unnamed: 0,Rank,Company,Revenue,employees,Industry,age
0,1,Walmart,523964,2300000,Tech,44
1,2,Sinopec Group,407009,71200,Tech,56
2,3,State Grid,383906,377000,Oil,21
3,4,China National Petroleum,379130,123000,Tech,33
4,5,Royal Dutch Shell,352106,260000,Tech,70


In [73]:
ind_color_map = {'Tech': 'rgb(124,250,120)', 'Oil': 'rgb(112,128,144)', 
                 'Pharmaceuticals': 'rgb(137,109,247)', 'Professional Services': 'rgb(255,0,0)'}
fig = px.box(data_frame=revenue2, y="Revenue",
			color_discrete_map=ind_color_map,
			color='Industry')

fig.show()

#**Revenue histogram with stacked bars**
The New York Stock exchange firm thought your previous histogram provided great insight into how the revenue of the firms they are looking at is distributed.

However, like before, they are interested in learning a bit more about how the industry of the firms could shed more light on what is happening.

Your task is to re-create the histogram of company revenues, as before, but include the specified colors based on the list of industries given below.

In [87]:
revenue2.tail()

Unnamed: 0,Rank,Company,Revenue,employees,Industry,age
195,196,Auchan Holding,54672,49000,Unknown,74
196,197,Tencent Holdings,54613,14715,Unknown,78
197,198,Nippon Steel Corporation,54465,57750,Unknown,9
198,199,CNP Assurances,54365,21900,Unknown,65
199,200,Energy Transfer,Unknown,215000,Unknown,79


In [92]:
revenue2 = revenue2.drop(revenue2[revenue2["Revenue"] == 'Unknown'].index)
revenue2.tail()

Unnamed: 0,Rank,Company,Revenue,employees,Industry,age
194,195,Vinci,54788,260000,Unknown,71
195,196,Auchan Holding,54672,49000,Unknown,74
196,197,Tencent Holdings,54613,14715,Unknown,78
197,198,Nippon Steel Corporation,54465,57750,Unknown,9
198,199,CNP Assurances,54365,21900,Unknown,65


In [96]:
revenue2["Revenue"] = pd.to_numeric(revenue2["Revenue"])

In [97]:
ind_color_map = {'Tech': 'rgb(124,250,120)', 'Oil': 'rgb(112,128,144)', 
                 'Pharmaceuticals': 'rgb(137,109,247)', 
                 'Professional Services': 'rgb(255,0,0)'}
fig = px.histogram(data_frame=revenue2, x="Revenue", nbins=5,
			color_discrete_map=ind_color_map,
			color="Industry")
fig.show()