![tstaunton.jpg](attachment:tstaunton.jpg)

***
# Matplotlib
***

Data science is about telling a story with whatever data it is that you are dealing with. It might be website visitor behavior or customer purchasing patterns. Not matter what it is, your job as a data scientist is to use the data you have to tell a story, to understand and gain insights. Using visualizations is one of the most powerful ways that you can communicate your story and insights.

**Matplotlip** is a Python plotting library used to create visualizations. In this lesson we're going to start by importing matplotlib, create some simple line, scatter and histogram plots and finish up with creating this chart:

![world_development_2007.png](attachment:world_development_2007.png)

Together we are going to walk step by step to create this beautiful chart. If you would like more information on matplotlib go to https://matplotlib.org. If you would like some background information on this chart and its creator go to [here](https://en.wikipedia.org/wiki/Hans_Rosling).

Let's get started.

In [None]:
# Import matplotlib with subpackage pyplot as plt
import matplotlib.pyplot as plt

# List 1 year01
year01 = [1950, 1970, 1990, 2010]

# List 2 pop02 (world population)
pop01 = [2.519, 3.692, 5.263, 6.972]

So far we have simply imported matplotlib and created two new lists. Nothing new from what we have learned in previous lessons. To plot this data as a line chart we call **plt.plot** and use the two lists as the axis.

In [None]:
# Create a line plot using year01 and pop01 as inputs
plt.plot(year01, pop01);

There are two very important things to point out at this stage. 

1. The examples in this lesson are created using Jupyter Notebook. When creating plots there is usually a **plt.show()** statement which tells Python to render or show the actual plot. This statement is placed at the end of the program. Jupyter Notebook does not require this statement but it will be required in other text editors or IDEs. The plot function tells what to plot and how to plot it. Show actually displays the plot. 

2. Take another look at the line _plt.plot(year01, pop01);_. Notice anything out of place? If you picked up on the semi-colon, well done! Adding the semi-colon here is a choice I make to hide additional output that is produced when the plot is created. Go ahead and remove the semi-colon to see the difference. For the rest of this lesson you are going to see the semi-colon at the end of several lines of code. Don't worry about it too much.

Well done, you have just created your first plot. As you can clearly see Python has drawn a line between  the data points of your lists. The list **year01** was used to create the x-axis and was the first argument in plt.plot(). Our second list, **pop01**, was used to create the y-axis and was the second argument in plt.plot().
## Scatter Plots
We can easily modify what we have done so far to create a scatter plot. Simply replace the word "plot" with "scatter" in the plot function. As you can see in the plot below a scatter plot shows all data points without connecting them by a line which makes it easier to read. For many applications the scatter plot is sometimes better than a line plot.

In [None]:
plt.scatter(year01, pop01);

The two plot examples above have have been simple and created using two small lists. Let's bite off something more challenging.  Below are two lists, [gross domestic product](https://www.investopedia.com/ask/answers/what-is-gdp-why-its-important-to-economists-investors/) per capita and life expectancy. I got these figures from the [World Bank](https://data.worldbank.org/indicator) datasets which are freely available online. 

In a later lesson we will learn how to import data files directly in to our programs but for now lists will suffice. Make sure to go to my [Github](https://github.com/tstaunton/Learn-Data-Science-with-Python) page and copy the two lists. Don't try and type them in by hand.

In [None]:
gdp_cap = [974.5803384, 5937.029525999998, 6223.367465, 4797.231267, 12779.37964, 34435.367439999995, 36126.4927, 29796.04834,
 1391.253792, 33692.60508, 1441.284873, 3822.137084, 7446.298803, 12569.85177, 9065.800825, 10680.79282, 1217.032994,
 430.0706916, 1713.778686, 2042.09524, 36319.23501, 706.016537, 1704.063724, 13171.63885, 4959.114854, 7006.580419,
 986.1478792, 277.5518587, 3632.557798, 9645.06142, 1544.750112, 14619.222719999998, 8948.102923, 22833.30851,
 35278.41874, 2082.4815670000007, 6025.3747520000015, 6873.262326000001, 5581.180998, 5728.353514, 12154.08975,
 641.3695236000002, 690.8055759, 33207.0844, 30470.0167, 13206.48452, 752.7497265, 32170.37442, 1327.60891, 27538.41188,
 5186.050003, 942.6542111, 579.2317429999998, 1201.637154, 3548.3308460000007, 39724.97867, 18008.94444, 36180.78919,
 2452.210407, 3540.651564, 11605.71449, 4471.061906, 40675.99635, 25523.2771, 28569.7197, 7320.8802620000015, 31656.06806,
 4519.461171, 1463.249282, 1593.06548, 23348.139730000006, 47306.98978, 10461.05868, 1569.331442, 414.5073415, 12057.49928,
 1044.770126, 759.3499101, 12451.6558, 1042.581557, 1803.151496, 10956.99112, 11977.57496, 3095.7722710000007, 9253.896111,
 3820.17523, 823.6856205, 944.0, 4811.060429, 1091.359778, 36797.93332, 25185.00911, 2749.320965, 619.6768923999998,
 2013.977305, 49357.19017, 22316.19287, 2605.94758, 9809.185636, 4172.838464, 7408.905561, 3190.481016, 15389.924680000002,
 20509.64777, 19328.70901, 7670.122558, 10808.47561, 863.0884639000002, 1598.435089, 21654.83194, 1712.472136,
 9786.534714, 862.5407561000002, 47143.17964, 18678.31435, 25768.25759, 926.1410683, 9269.657808, 28821.0637, 3970.095407,
 2602.394995, 4513.480643, 33859.74835, 37506.41907, 4184.548089, 28718.27684, 1107.482182, 7458.396326999998, 882.9699437999999,
 18008.50924, 7092.923025, 8458.276384, 1056.380121, 33203.26128, 42951.65309, 10611.46299, 11415.80569, 2441.576404,
 3025.349798, 2280.769906, 1271.211593, 469.70929810000007]

In [None]:
life_exp = [43.828, 76.423, 72.301, 42.731, 75.32, 81.235, 79.829, 75.635, 64.062, 79.441, 56.728, 65.554, 74.852, 50.728,
 72.39, 73.005, 52.295, 49.58, 59.723, 50.43, 80.653, 44.7410001, 50.651, 78.553, 72.961, 72.889, 65.152, 46.462,
 55.322, 78.782, 48.328, 75.748, 78.273, 76.486, 78.332, 54.791, 72.235, 74.994, 71.33800000000002, 71.878,51.578999,
 58.04, 52.947, 79.313, 80.657, 56.735, 59.448, 79.406, 60.022, 79.483, 70.259, 56.007, 46.38800000000001, 60.916,
 70.19800000000001, 82.208, 73.33800000000002, 81.757, 64.69800000000001, 70.65, 70.964, 59.545, 78.885, 80.745, 80.546,
 72.567, 82.603, 72.535, 54.11, 67.297, 78.623, 77.58800000000002, 71.993, 42.592, 45.678, 73.952, 59.44300000000001,
 48.303, 74.241, 54.467, 64.164, 72.801, 76.195, 66.803, 74.543, 71.164, 42.082, 62.069, 52.90600000000001, 63.785,
 79.762, 80.204, 72.899, 56.867, 46.859, 80.196, 75.64, 65.483, 75.53699999999998, 71.752, 71.421, 71.688, 75.563,
 78.098, 78.74600000000002, 76.442, 72.476, 46.242, 65.528, 72.777, 63.062, 74.002, 42.56800000000001, 79.972,
 74.663, 77.926, 48.159, 49.339, 80.941, 72.396, 58.556, 39.613, 80.884, 81.70100000000002, 74.143, 78.4, 52.517,
 70.616, 58.42, 69.819, 73.923, 71.777, 51.542, 79.425, 78.242, 76.384, 73.747, 74.249, 73.422, 62.698, 42.38399999999999,
 43.487]

Using what we learned so far lets plot these two lists. I would like to put gdp_cap on the x-axis and life_exp on the y-axis. This means that they must take positions 1 and 2 respectively in the plt.plot() function.

In [None]:
# Create a line plot using gdp_cap and life_exp
plt.plot(gdp_cap, life_exp);

Ouch! That's a lot of data and not very readable. You would need to be a special kind of genius to pull any insights from that chart. Lets keep going. Here it is as a scatter plot.

In [None]:
# # Create a scatter plot using gdp_cap and life_exp
plt.scatter(gdp_cap, life_exp)

# Put the x-axis on a logarithmic scale
plt.xscale('log')

You'll notice a new line of code used to create the above plot, plt.xscale('log'). Plotting the x-axis on a logarithmic scale helps us to view and understand large amounts of data more clearly. A logarithmic scale is a nonlinear scale used for a large range of positive multiples of some quantity. It is based on orders of magnitude, rather than a standard linear scale, so the value represented by each equidistant mark on the scale is the value at the previous mark multiplied by a constant. You can find more information [at this link](https://en.wikipedia.org/wiki/Logarithmic_scale).

Now we're starting to look in better shape.

From this early chart what insights can you glean about our data? Higher GDP seems to correspond to a higher life expectancy. In other words, there is a positive correlation.

Below is a new list, pop03 which shows population data to go with our existing GDP and life expectancy data.

In [None]:
pop03 = [31.889923, 3.600523, 33.333216, 12.420476, 40.301927, 20.434176, 8.199783, 0.708573, 150.448339, 10.392226,
 8.078314, 9.119152, 4.552198, 1.639131, 190.010647, 7.322858, 14.326203, 8.390505, 14.131858, 17.696293, 33.390141, 4.369038,
 10.238807, 16.284741, 1318.683096, 44.22755, 0.71096, 64.606759, 3.80061, 4.133884, 18.013409, 4.493312, 11.416987,
 10.228744, 5.46812, 0.496374, 9.319622, 13.75568, 80.264543, 6.939688, 0.551201, 4.906585, 76.511887, 5.23846,
 61.083916, 1.454867, 1.688359, 82.400996, 22.873338, 10.70629, 12.572928, 9.947814, 1.472041, 8.502814, 7.483763,
 6.980412, 9.956108, 0.301931, 1110.396331, 223.547, 69.45357, 27.499638, 4.109086, 6.426679, 58.147733, 2.780132,
 127.467972, 6.053193, 35.610177, 23.301725, 49.04479, 2.505559, 3.921278, 2.012649, 3.193942, 6.036914, 19.167654,
 13.327079, 24.821286, 12.031795, 3.270065, 1.250882, 108.700891, 2.874127, 0.684736, 33.757175, 19.951656,
 47.76198, 2.05508, 28.90179, 16.570613, 4.115771, 5.675356, 12.894865, 135.031164, 4.627926, 3.204897, 169.270617,
 3.242173, 6.667147, 28.674757, 91.077287, 38.518241, 10.642836, 3.942491, 0.798094, 22.276056, 8.860588, 0.199579,
 27.601038, 12.267493, 10.150265, 6.144562, 4.553009, 5.447502, 2.009245, 9.118773, 43.997828, 40.448191, 20.378239,
 42.292929, 1.133066, 9.031088, 7.554661, 19.314747, 23.174294, 38.13964, 65.068149, 5.701579, 1.056608, 10.276158,
 71.158647, 29.170398, 60.776238, 301.139947, 3.447496, 26.084662, 85.262356, 4.018332, 22.211743, 11.746035, 12.311143]

In [None]:
# Plot pop03 on the x-axis and life_exp on the y-axis
plt.scatter(pop03, life_exp);

## Histograms
Histograms are used to explore the distribution of data. Imagine 12 values between 0 and 6. In the image below I have placed them on a number line.

![number_line.png](attachment:number_line.png)

As you can see along the line it's divided into equal chunks known as bins. Now let's reorganize our number line into three bins with a width of two.

![number_line_2.png](attachment:number_line_2.png)

How many data points sit in each bin? Four, six and two. If we were to draw a bar in each bin the height of the bar would represent the number of data points in each bin. 

![histogram.png](attachment:histogram.png)

With our histogram in place we can now see how the values are distributed. Most values are in the middle and there are more values below two than there are above four.

Now that we understand the basics of histograms it's time to use matplotlib to start creating them. As always if he need some help we can call help(plt.hist).

In [None]:
help(plt.hist)

Look at the first two values, x is the list of values you want to build the histogram for. The second argument tells Python how many bins the data should be divided up into. Based on this number hist will automatically find appropriate boundaries for all bins and calculate how many values are in each one. If you don't specify the bins argument it will be ten by default.

Let's get working on a new example.

In [None]:
# Import package
import matplotlib.pyplot as plt

# Create a list of values
values = [0, 0.6, 1.4, 1.6, 2.2, 2.5, 2.6, 3.2, 3.5, 3.9, 4.2, 6]

# Call hist, and pass our new list as an argument x, bins are 3 so that values are divided into 3 bins
plt.hist(values, bins = 3);

In [None]:
# See how life expectancy in different countries is distributed create a histogram
plt.hist(life_exp);

In [None]:
# Plot life_exp with 5 bins
plt.hist(life_exp, bins=5);

In [None]:
# Plot life_exp with 20 bins
plt.hist(life_exp, bins = 20);

Too few bins will over simplify reality and won't show you the details. Too many bins will over complicate reality and won't show you the bigger picture. As you can see we're getting more insights with 20 bins.

Histograms make it very easy to compare data. Let's do a comparison now, life_exp contains life expectancy data for different countries in 2007. Below is a second list, life_exp1950 which contains similar data for 1950. Let's make a histogram containing both lists.

In [None]:
life_exp1950 = [28.8, 55.23, 43.08, 30.02, 62.48, 69.12, 66.8, 50.94, 37.48, 68.0, 38.22, 40.41,
 53.82, 47.62, 50.92, 59.6, 31.98, 39.03, 39.42, 38.52, 68.75, 35.46, 38.09, 54.74, 44.0,
 50.64, 40.72, 39.14, 42.11, 57.21, 40.48, 61.21, 59.42, 66.87, 70.78, 34.81, 45.93, 48.36,
 41.89, 45.26, 34.48, 35.93, 34.08, 66.55, 67.41, 37.0, 30.0, 67.5, 43.15, 65.86, 42.02, 33.61, 32.5, 37.58, 41.91,
 60.96, 64.03, 72.49, 37.37, 37.47, 44.87, 45.32, 66.91, 65.39, 65.94, 58.53, 63.03, 43.16, 42.27, 50.06, 47.45,
 55.56, 55.93, 42.14, 38.48, 42.72, 36.68, 36.26, 48.46, 33.68, 40.54, 50.99, 50.79, 42.24, 59.16, 42.87, 31.29,
 36.32, 41.72, 36.16, 72.13, 69.39, 42.31, 37.44, 36.32, 72.67, 37.58, 43.44, 55.19, 62.65, 43.9, 47.75, 61.31, 59.82,
 64.28, 52.72, 61.05, 40.0, 46.47, 39.88, 37.28, 58.0, 30.33, 60.4, 64.36, 65.57, 32.98, 45.01, 64.94, 57.59, 38.64,
 41.41, 71.86, 69.62, 45.88, 58.5, 41.22, 50.85, 38.6, 59.1, 44.6, 43.58, 39.98, 69.18, 68.44, 66.07, 55.09, 40.41,
 43.16, 32.55, 42.04, 48.45]

In [None]:
plt.hist(life_exp, bins = 15)
plt.hist(life_exp1950, bins = 15);

# Customizing plots

One of the biggest challenges when dealing with large datasets is communicating your findings and allowing others to gain insights from you visualization work. Again what you are trying to accomplish is to tell a story with data. 

To help with this challenge matplotlib allows us to add customizations to our plots so that they are easier to understand. Let's look at an example to help demonstrate.

In [None]:
# Import correct package as plt 
import matplotlib.pyplot as plt

# Create list of years
year04 = [1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015]

# Create list of corresponding populations
pop04 = [2.536, 2.583, 2.630, 2.677, 2.724, 2.772, 2.821, 2.871, 2.924, 2.977, 3.033, 3.090, 3.149, 3.210, 3.273, 3.339, 3.408, 3.479, 3.551, 3.625, 3.700, 3.775, 3.851, 3.927, 4.003, 4.079, 4.154, 4.229, 4.304, 4.380, 4.458, 4.537, 4.618, 4.701, 4.786, 4.873, 4.963, 5.055, 5.148, 5.240, 5.330, 5.418, 5.504, 5.588, 5.670, 5.751, 5.831, 5.910, 5.988, 6.066, 6.145, 6.223, 6.302, 6.381, 6.461, 6.542, 6.623, 6.706, 6.789, 6.873, 6.958, 7.043, 7.128, 7.213, 7.298, 7.383]

# Plot year04 on x-axis and pop04 on y-axis
plt.plot(year04, pop04);

At first glance this is actually not a bad looking plot. But if you were to study it a bit longer, would you know what data it is trying to communicate, what story it is trying to tell? Let's help this plot communicate its story by adding labels to the x and y axis. To do this we use the **xlabel** and **ylabel** function.

In [None]:
# Import correct package as plt 
import matplotlib.pyplot as plt

# Create list of years
year05 = [1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015]

# Create list of corresponding populations
pop05 = [2.536, 2.583, 2.630, 2.677, 2.724, 2.772, 2.821, 2.871, 2.924, 2.977, 3.033, 3.090, 3.149, 3.210, 3.273, 3.339, 3.408, 3.479, 3.551, 3.625, 3.700, 3.775, 3.851, 3.927, 4.003, 4.079, 4.154, 4.229, 4.304, 4.380, 4.458, 4.537, 4.618, 4.701, 4.786, 4.873, 4.963, 5.055, 5.148, 5.240, 5.330, 5.418, 5.504, 5.588, 5.670, 5.751, 5.831, 5.910, 5.988, 6.066, 6.145, 6.223, 6.302, 6.381, 6.461, 6.542, 6.623, 6.706, 6.789, 6.873, 6.958, 7.043, 7.128, 7.213, 7.298, 7.383]

# Plot year05 on x-axis and pop05 on y-axis
plt.plot(year05, pop05)

# Label x-axis
plt.xlabel('Year')

# Label y-axis
plt.ylabel('Population');

In [None]:
# Import correct package as plt 
import matplotlib.pyplot as plt

# Create list of years
year05 = [1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015]

# Create list of corresponding populations
pop05 = [2.536, 2.583, 2.630, 2.677, 2.724, 2.772, 2.821, 2.871, 2.924, 2.977, 3.033, 3.090, 3.149, 3.210, 3.273, 3.339, 3.408, 3.479, 3.551, 3.625, 3.700, 3.775, 3.851, 3.927, 4.003, 4.079, 4.154, 4.229, 4.304, 4.380, 4.458, 4.537, 4.618, 4.701, 4.786, 4.873, 4.963, 5.055, 5.148, 5.240, 5.330, 5.418, 5.504, 5.588, 5.670, 5.751, 5.831, 5.910, 5.988, 6.066, 6.145, 6.223, 6.302, 6.381, 6.461, 6.542, 6.623, 6.706, 6.789, 6.873, 6.958, 7.043, 7.128, 7.213, 7.298, 7.383]

# Plot year05 on x-axis and pop05 on y-axis
plt.plot(year05, pop05)

# Label x-axis
plt.xlabel('Year')

# Label y-axis
plt.ylabel('Population');

# Add a title
plt.title('World Population Estimates');

At least now a reader knows what this plot is about. Next lets make the y-axis start from 0 for more context. We do this with the **yticks** function.

In [None]:
# Import correct package as plt 
import matplotlib.pyplot as plt

# Create list of years
year05 = [1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015]

# Create list of corresponding populations
pop05 = [2.536, 2.583, 2.630, 2.677, 2.724, 2.772, 2.821, 2.871, 2.924, 2.977, 3.033, 3.090, 3.149, 3.210, 3.273, 3.339, 3.408, 3.479, 3.551, 3.625, 3.700, 3.775, 3.851, 3.927, 4.003, 4.079, 4.154, 4.229, 4.304, 4.380, 4.458, 4.537, 4.618, 4.701, 4.786, 4.873, 4.963, 5.055, 5.148, 5.240, 5.330, 5.418, 5.504, 5.588, 5.670, 5.751, 5.831, 5.910, 5.988, 6.066, 6.145, 6.223, 6.302, 6.381, 6.461, 6.542, 6.623, 6.706, 6.789, 6.873, 6.958, 7.043, 7.128, 7.213, 7.298, 7.383]

# Plot year05 on x-axis and pop05 on y-axis
plt.plot(year05, pop05)

# Label x-axis
plt.xlabel('Year')

# Label y-axis
plt.ylabel('Population');

# Add a title
plt.title('World Population Estimates');

# Start y-axis at 0
plt.yticks([0,1,2,3,4,5,6,7,8]);

We can add a second argument to the yticks function which will further annotate our yaxis.

In [None]:
# Import correct package as plt 
import matplotlib.pyplot as plt

# Create list of years
year05 = [1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015]

# Create list of corresponding populations
pop05 = [2.536, 2.583, 2.630, 2.677, 2.724, 2.772, 2.821, 2.871, 2.924, 2.977, 3.033, 3.090, 3.149, 3.210, 3.273, 3.339, 3.408, 3.479, 3.551, 3.625, 3.700, 3.775, 3.851, 3.927, 4.003, 4.079, 4.154, 4.229, 4.304, 4.380, 4.458, 4.537, 4.618, 4.701, 4.786, 4.873, 4.963, 5.055, 5.148, 5.240, 5.330, 5.418, 5.504, 5.588, 5.670, 5.751, 5.831, 5.910, 5.988, 6.066, 6.145, 6.223, 6.302, 6.381, 6.461, 6.542, 6.623, 6.706, 6.789, 6.873, 6.958, 7.043, 7.128, 7.213, 7.298, 7.383]

# Plot year05 on x-axis and pop05 on y-axis
plt.plot(year05, pop05)

# Label x-axis
plt.xlabel('Year')

# Label y-axis
plt.ylabel('Population');

# Add a title
plt.title('World Population Estimates');

# Start y-axis at 0 and add second argument
plt.yticks([0,1,2,3,4,5,6,7,8], ['0','1B','2B','3B','4B','5B','6B','7B','8B']);

From Wikipedia I found some additional population data. Lets add it to your current data set. If you are unfamiliar with the code below take a look at [Part 1 of this series which covers Python Lists](https://skl.sh/2XCsI0M) and take a look.

In [None]:
year06 = [1800, 1850, 1900] + year05
pop06 = [1, 1.262, 1.659] + pop05

In [None]:
# Import correct package as plt 
import matplotlib.pyplot as plt

# Plot year06 on x-axis and pop06 on y-axis
plt.plot(year06, pop06)

# Label x-axis
plt.xlabel('Year')

# Label y-axis
plt.ylabel('Population');

# Add a title
plt.title('World Population Estimates');

# Start y-axis at 0 and add second argument
plt.yticks([0,1,2,3,4,5,6,7,8], ['0','1B','2B','3B','4B','5B','6B','7B','8B']);

Back to some data we where using at the start of the lesson.

In [None]:
gdp_cap = [974.5803384, 5937.029525999998, 6223.367465, 4797.231267, 12779.37964, 34435.367439999995, 36126.4927, 29796.04834,
 1391.253792, 33692.60508, 1441.284873, 3822.137084, 7446.298803, 12569.85177, 9065.800825, 10680.79282, 1217.032994,
 430.0706916, 1713.778686, 2042.09524, 36319.23501, 706.016537, 1704.063724, 13171.63885, 4959.114854, 7006.580419,
 986.1478792, 277.5518587, 3632.557798, 9645.06142, 1544.750112, 14619.222719999998, 8948.102923, 22833.30851,
 35278.41874, 2082.4815670000007, 6025.3747520000015, 6873.262326000001, 5581.180998, 5728.353514, 12154.08975,
 641.3695236000002, 690.8055759, 33207.0844, 30470.0167, 13206.48452, 752.7497265, 32170.37442, 1327.60891, 27538.41188,
 5186.050003, 942.6542111, 579.2317429999998, 1201.637154, 3548.3308460000007, 39724.97867, 18008.94444, 36180.78919,
 2452.210407, 3540.651564, 11605.71449, 4471.061906, 40675.99635, 25523.2771, 28569.7197, 7320.8802620000015, 31656.06806,
 4519.461171, 1463.249282, 1593.06548, 23348.139730000006, 47306.98978, 10461.05868, 1569.331442, 414.5073415, 12057.49928,
 1044.770126, 759.3499101, 12451.6558, 1042.581557, 1803.151496, 10956.99112, 11977.57496, 3095.7722710000007, 9253.896111,
 3820.17523, 823.6856205, 944.0, 4811.060429, 1091.359778, 36797.93332, 25185.00911, 2749.320965, 619.6768923999998,
 2013.977305, 49357.19017, 22316.19287, 2605.94758, 9809.185636, 4172.838464, 7408.905561, 3190.481016, 15389.924680000002,
 20509.64777, 19328.70901, 7670.122558, 10808.47561, 863.0884639000002, 1598.435089, 21654.83194, 1712.472136,
 9786.534714, 862.5407561000002, 47143.17964, 18678.31435, 25768.25759, 926.1410683, 9269.657808, 28821.0637, 3970.095407,
 2602.394995, 4513.480643, 33859.74835, 37506.41907, 4184.548089, 28718.27684, 1107.482182, 7458.396326999998, 882.9699437999999,
 18008.50924, 7092.923025, 8458.276384, 1056.380121, 33203.26128, 42951.65309, 10611.46299, 11415.80569, 2441.576404,
 3025.349798, 2280.769906, 1271.211593, 469.70929810000007]

life_exp = [43.828, 76.423, 72.301, 42.731, 75.32, 81.235, 79.829, 75.635, 64.062, 79.441, 56.728, 65.554, 74.852, 50.728,
 72.39, 73.005, 52.295, 49.58, 59.723, 50.43, 80.653, 44.7410001, 50.651, 78.553, 72.961, 72.889, 65.152, 46.462,
 55.322, 78.782, 48.328, 75.748, 78.273, 76.486, 78.332, 54.791, 72.235, 74.994, 71.33800000000002, 71.878,51.578999,
 58.04, 52.947, 79.313, 80.657, 56.735, 59.448, 79.406, 60.022, 79.483, 70.259, 56.007, 46.38800000000001, 60.916,
 70.19800000000001, 82.208, 73.33800000000002, 81.757, 64.69800000000001, 70.65, 70.964, 59.545, 78.885, 80.745, 80.546,
 72.567, 82.603, 72.535, 54.11, 67.297, 78.623, 77.58800000000002, 71.993, 42.592, 45.678, 73.952, 59.44300000000001,
 48.303, 74.241, 54.467, 64.164, 72.801, 76.195, 66.803, 74.543, 71.164, 42.082, 62.069, 52.90600000000001, 63.785,
 79.762, 80.204, 72.899, 56.867, 46.859, 80.196, 75.64, 65.483, 75.53699999999998, 71.752, 71.421, 71.688, 75.563,
 78.098, 78.74600000000002, 76.442, 72.476, 46.242, 65.528, 72.777, 63.062, 74.002, 42.56800000000001, 79.972,
 74.663, 77.926, 48.159, 49.339, 80.941, 72.396, 58.556, 39.613, 80.884, 81.70100000000002, 74.143, 78.4, 52.517,
 70.616, 58.42, 69.819, 73.923, 71.777, 51.542, 79.425, 78.242, 76.384, 73.747, 74.249, 73.422, 62.698, 42.38399999999999,
 43.487]

In [None]:
# Basic scatter plot, log scale
plt.scatter(gdp_cap, life_exp)
plt.xscale('log') 

# Create string variables to repesent x and y axis and title labels
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'

# Add axis labels
plt.xlabel(xlab)
plt.ylabel(ylab)

# Add title
plt.title(title);

In [None]:
# Scatter plot
plt.scatter(gdp_cap, life_exp)

# Previous customizations
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')

# Definition of tick_val and tick_lab
tick_val = [1000, 10000, 100000]
tick_lab = ['1k', '10k', '100k']

# Adapt the ticks on the x-axis
plt.xticks(tick_val, tick_lab)

# After customizing, display the plot
plt.show()

In [None]:
pop07 = [31.889923, 3.600523, 33.333216, 12.420476, 40.301927, 20.434176, 8.199783, 0.708573, 150.448339, 10.392226,
 8.078314, 9.119152, 4.552198, 1.639131, 190.010647, 7.322858, 14.326203, 8.390505, 14.131858, 17.696293, 33.390141,
 4.369038, 10.238807, 16.284741, 1318.683096, 44.22755, 0.71096, 64.606759, 3.80061, 4.133884, 18.013409, 4.493312,
 11.416987, 10.228744, 5.46812, 0.496374, 9.319622, 13.75568, 80.264543, 6.939688, 0.551201, 4.906585, 76.511887,
 5.23846, 61.083916, 1.454867, 1.688359, 82.400996, 22.873338, 10.70629, 12.572928, 9.947814, 1.472041, 8.502814,
 7.483763, 6.980412, 9.956108, 0.301931, 1110.396331, 223.547, 69.45357, 27.499638, 4.109086, 6.426679, 58.147733,
 2.780132, 127.467972, 6.053193, 35.610177, 23.301725, 49.04479, 2.505559, 3.921278, 2.012649, 3.193942, 6.036914,
 19.167654, 13.327079, 24.821286, 12.031795, 3.270065, 1.250882, 108.700891, 2.874127, 0.684736, 33.757175, 19.951656,
 47.76198, 2.05508, 28.90179, 16.570613, 4.115771, 5.675356, 12.894865, 135.031164, 4.627926, 3.204897, 169.270617,
 3.242173, 6.667147, 28.674757, 91.077287, 38.518241, 10.642836, 3.942491, 0.798094, 22.276056, 8.860588, 0.199579,
 27.601038, 12.267493, 10.150265, 6.144562, 4.553009, 5.447502, 2.009245, 9.118773, 43.997828, 40.448191, 20.378239,
 42.292929, 1.133066, 9.031088, 7.554661, 19.314747, 23.174294, 38.13964, 65.068149, 5.701579, 1.056608, 10.276158,
 71.158647, 29.170398, 60.776238, 301.139947, 3.447496, 26.084662, 85.262356, 4.018332, 22.211743, 11.746035,
 12.311143]

In [None]:
# Import numpy as np
import numpy as np

# Store pop07 as a numpy array: np_pop
np_pop = np.array(pop07)

# Double np_pop
np_pop = np_pop * 2

# Update: set s argument to np_pop
plt.scatter(gdp_cap, life_exp, s = np_pop)

# Previous customizations
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k']);

In [None]:
col = ['red', 'green', 'blue', 'blue', 'yellow', 'black', 'green', 'red', 'red', 'green', 'blue', 'yellow', 'green',
 'blue', 'yellow', 'green', 'blue', 'blue', 'red', 'blue', 'yellow', 'blue', 'blue', 'yellow', 'red', 'yellow', 'blue',
 'blue', 'blue', 'yellow', 'blue', 'green', 'yellow', 'green', 'green', 'blue', 'yellow', 'yellow', 'blue', 'yellow',
 'blue', 'blue', 'blue', 'green', 'green', 'blue', 'blue', 'green', 'blue', 'green', 'yellow', 'blue', 'blue', 'yellow',
 'yellow', 'red', 'green', 'green', 'red', 'red', 'red', 'red', 'green', 'red', 'green', 'yellow', 'red', 'red', 'blue',
 'red', 'red', 'red', 'red', 'blue', 'blue', 'blue', 'blue', 'blue', 'red', 'blue', 'blue', 'blue', 'yellow', 'red',
 'green', 'blue', 'blue', 'red', 'blue', 'red', 'green', 'black', 'yellow', 'blue', 'blue', 'green', 'red', 'red',
 'yellow', 'yellow', 'yellow', 'red', 'green', 'green', 'yellow', 'blue', 'green', 'blue', 'blue', 'red', 'blue', 'green',
 'blue', 'red', 'green', 'green', 'blue', 'blue', 'green', 'red', 'blue', 'blue', 'green', 'green', 'red', 'red', 'blue',
 'red', 'blue', 'yellow', 'blue', 'green', 'blue', 'green', 'yellow', 'yellow', 'yellow', 'red', 'red', 'red', 'blue',
 'blue']

In [None]:
# Specify c and alpha inside plt.scatter()
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop07) * 2, c = col, alpha = 0.8)

# Previous customizations
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Show the plot
plt.show()

In [None]:
# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp, 
            s = np.array(pop07) * 2, c = col, alpha = 0.8)

# Previous customizations
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')

# Add grid() call
plt.grid(True)

# Show the plot
plt.show()

For more information on this and other charts created by Prof. Hans Rolsing you can Google him or check out his Wikipedia page [here](https://en.wikipedia.org/wiki/Hans_Rosling).