## Data Analysis with Pandas

- Pandas is a widely-used, open-source Python library for data manipulations and analysis. 
- Pandas is built on top of NumPy, and provides an efficient implementation of a `DataFrame`.
- The `DataFrame` is multidimensional array with attached row and column labels. 
- The `DataFrame` is a heterogeneous data types (floats, integers, strings, dates, etc) that may be structured a hierarchy and indexed.
- It provides a large number (and powerful) functions for cleaning, transforming and aggregating data efficiently.
- The documenation of pandas can be found here: https://pandas.pydata.org/
- Pandas overcome the downsides of NumPy like attaching labels to data, working with missing data, elementwise broadcasting.
- The key data structures in Pandas are `Series` and `DataFrame`, representing a one-dimensional sequence of values and a data table, respectively.
- In this tutorial, we discuss the following concepts:
-  Pandas Objects: `Series`, `DataFrame`, and `Index`
-  Reading and Writing Dataframes
-  Advanced Indexing
-  Data Cleaning - Missing data points, Binding, Outliers
-  Data Grouping and Aggregation
-  Combining Datasets
-  Introduction to Time Series Data

### Series
- A `Series` creats a one-dimensional array similar to NumPy array.
- Each element in a pandas `Series` is associated with an index.
- Explicit indexing of the entries can be achieved by passing a sequence as the index argument or by creating the Series from a dictionary.
- Series can be sorted, either by their index or their values, using Series.sort_index and Series.sort_values

In [1]:
import pandas as pd

In [2]:
#create a pandas series
river_length = pd.Series([6300,6650,6275,6400])
river_length

0    6300
1    6650
2    6275
3    6400
dtype: int64

In [3]:
#column labeling
river_length = pd.Series([6300,6650,6275,6400],name='River Length /km', dtype=float)
river_length

0    6300.0
1    6650.0
2    6275.0
3    6400.0
Name: River Length /km, dtype: float64

In [4]:
#accesing data from the series
river_length[0]

6300.0

In [5]:
#index range
river_length.index

RangeIndex(start=0, stop=4, step=1)

In [6]:
#customize the indexing for series
river_length = pd.Series(data =[6300, 6650, 6275, 6400], 
                         index = ['Yangtze','Nile','Mississippi','Amazon'], name = 'River Length /km')
river_length

Yangtze        6300
Nile           6650
Mississippi    6275
Amazon         6400
Name: River Length /km, dtype: int64

In [7]:
# accesing the data with customized index
river_length['Nile']

6650

In [8]:
#creating a series using dictionary
river_length = pd.Series(data = {'Yangtze':6300,'Nile':6650, 'Mississippi':6275, 'Amazon': 6400}, name = 'Length /km')
river_length

Yangtze        6300
Nile           6650
Mississippi    6275
Amazon         6400
Name: Length /km, dtype: int64

In [9]:
river_length['Nile']

6650

In [10]:
#still default index work
river_length[1]

6650

In [44]:
river_length[['Nile','Amazon']]

Nile      4132.11715
Amazon    3976.77440
Name: Length /miles, dtype: float64

In [47]:
river_length[1::-1]

Amazon    3976.77440
Nile      4132.11715
Name: Length /miles, dtype: float64

In [50]:
river_length['Nile':'Amazon']
             

Nile      4132.11715
Amazon    3976.77440
Name: Length /miles, dtype: float64

In [14]:
# Index work as an attribute
river_length.Mississippi

6275

In [15]:
# numerical operations on Series data
KM_to_Miles = 0.621371
river_length *= KM_to_Miles
river_length.name = 'Length /miles'
river_length

Yangtze        3914.637300
Nile           4132.117150
Mississippi    3899.103025
Amazon         3976.774400
Name: Length /miles, dtype: float64

In [55]:
# you can also change the data type 
river_length > 3900 # logical operator

Nile            True
Amazon          True
Yangtze         True
Mississippi    False
Name: Length /miles, dtype: bool

In [17]:
river_length[river_length<= 4000]

Yangtze        3914.637300
Mississippi    3899.103025
Amazon         3976.774400
Name: Length /miles, dtype: float64

In [56]:
# Tests for membership for a series examine the index, not the values
'Nile' in river_length

True

In [19]:
# Series can be sorted, either bu their index or their values, using Series.sort_index and Series.sort_values
river_length.sort_index()

Amazon         3976.774400
Mississippi    3899.103025
Nile           4132.117150
Yangtze        3914.637300
Name: Length /miles, dtype: float64

In [58]:
river_length.sort_values(ascending=False)#, inplace=True
river_length

Nile           4132.117150
Amazon         3976.774400
Yangtze        3914.637300
Mississippi    3899.103025
Name: Length /miles, dtype: float64

In [59]:
# Two pd.Series can be combined
masses = pd.Series({'Ganymede':1.482e23,
                    'Callisto': 1.076e23,
                    'Io': 8.932e22,
                    'Europa':4.800e22,
                    'Moon':7.342e22,
                    'Earth':5.972e24}, name = 'mass /kg')


radii = pd.Series({'Ganymede':2.634e6,
                   'Io':1.822e6, 
                   'Earth': 6.371e6}, name = 'radius /m')



In [60]:
from scipy.constants import G#gravity constant
surface_g = G* masses / radii**2
surface_g.name = 'Surface gravity /m.s-2'
surface_g.index.name = 'Body'
surface_g

Body
Callisto         NaN
Earth       9.819973
Europa           NaN
Ganymede    1.425681
Io          1.795799
Moon             NaN
Name: Surface gravity /m.s-2, dtype: float64

In [61]:
surface_g.isnull()

Body
Callisto     True
Earth       False
Europa       True
Ganymede    False
Io          False
Moon         True
Name: Surface gravity /m.s-2, dtype: bool

In [24]:
# return without NaN
surface_g.dropna()

Body
Earth       9.819973
Ganymede    1.425681
Io          1.795799
Name: Surface gravity /m.s-2, dtype: float64

In [25]:
surface_g.values

array([       nan, 9.81997343,        nan, 1.42568108, 1.79579887,
              nan])

In [26]:
# NaN entries can be replaces in a pandas Series with a specified value using the fillna method. 
ser1 = pd.Series({'b':2,'c':-5,'d':6.5}, index = list('abcd'))
ser1

a    NaN
b    2.0
c   -5.0
d    6.5
dtype: float64

In [27]:
ser1.fillna(0,inplace=True)
ser1

a    0.0
b    2.0
c   -5.0
d    6.5
dtype: float64

In [28]:
#renaming index
ser2 = pd.Series([-3.4,0,0,1], index=ser1.index)
ser2

a   -3.4
b    0.0
c    0.0
d    1.0
dtype: float64

In [29]:
ser3 = ser1/ser2
ser3

a   -0.0
b    inf
c   -inf
d    6.5
dtype: float64

In [30]:
import numpy as np


In [31]:
ser3.replace([np.inf,-np.inf],0, inplace=True)
ser3

a   -0.0
b    0.0
c    0.0
d    6.5
dtype: float64

## DataFrame

### Creating a DataFrame 
- A `DataFrame` is a two-dimensional table of data that can be thought of as an ordered set of Series columns, which all have the same index. 
- To create a simple `DataFrame` from a dictionary, assign value sequences to column name keys. 
- The `DataFrame` can be seen as generalization of NumPy array or as specilization of dictionary.

In [64]:
#dictionary to DataFrame
data = {'mass':[1.482e23, 1.076e23, 8.932e22, 4.800e22, 7.342e22],
        'radius': [2.634e6 , None , 1.822e6 , None , 1.737e6],
        'parent': ['Jupiter', 'Jupiter', 'Jupiter', 'Jupiter', 'Earth']
        }

index = ['Ganymede','Callisto','Io','Europa','Moon']
df =pd.DataFrame(data, index=index)
df

Unnamed: 0,mass,radius,parent
Ganymede,1.482e+23,2634000.0,Jupiter
Callisto,1.076e+23,,Jupiter
Io,8.932e+22,1822000.0,Jupiter
Europa,4.8e+22,,Jupiter
Moon,7.342e+22,1737000.0,Earth


In [66]:
#df.rename({'parent':'planet'})
df.rename({'parent':'parent_planet'}, axis='columns', inplace=True) 
#df.rename({'Moon':'The Moon'}) # change a row index label
df

Unnamed: 0,mass,radius,parent_planet
Ganymede,1.482e+23,2634000.0,Jupiter
Callisto,1.076e+23,,Jupiter
Io,8.932e+22,1822000.0,Jupiter
Europa,4.8e+22,,Jupiter
Moon,7.342e+22,1737000.0,Earth


In [68]:
#rename the index
df.rename({'Moon':'The Moon'},axis='rows',inplace=True)
df.rename({'mass':'MASS'},axis='columns',inplace=True)
df

Unnamed: 0,MASS,radius,parent_planet
Ganymede,1.482e+23,2634000.0,Jupiter
Callisto,1.076e+23,,Jupiter
Io,8.932e+22,1822000.0,Jupiter
Europa,4.8e+22,,Jupiter
The Moon,7.342e+22,1737000.0,Earth


This last statement has returned a new DataFrame but not altered the original one, df.

In [35]:
# Accessing rows, columns and cells
df['MASS']  # df.mass
#df[:2:]

Ganymede    1.482000e+23
Callisto    1.076000e+23
Io          8.932000e+22
Europa      4.800000e+22
Moon        7.342000e+22
Name: MASS, dtype: float64

In [36]:
df.MASS[2]

8.932e+22

In [37]:
df.MASS['Moon']

7.342e+22

In [38]:
#create DataFrame from a list of dicts
data = [{'a':i,'b':2*i} for i in range(3)]
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


In [71]:
#from a 2D NumPy array
import numpy as np 

data = np.random.rand(3,2) #generate random 3x2 array
pd.DataFrame(data,columns=["rand1","rand2"],index=["a","b","c"])

Unnamed: 0,rand1,rand2
a,0.059909,0.55095
b,0.337497,0.870966
c,0.996363,0.77823


# More about Index Object 

- Note that there is an explicit `index` for both pandas objects: `Series` and `DataFrame` 
- This `Index` object is an interesting structure itself, and it can be thought of either as an *immutable array* or as *ordered set*. 
- - The methods `loc` and `iloc`, can be used to access and assign to columns, rows and cells;
- `loc` selects by row and column by lables only.
- `iloc` selects by row and column by numbers only.

In [80]:
#df.loc['Europa']
df.iloc[2]

MASS             89320000000000006291456.0
radius                           1822000.0
parent_planet                      Jupiter
Name: Io, dtype: object

In [81]:
df

Unnamed: 0,MASS,radius,parent_planet
Ganymede,1.482e+23,2634000.0,Jupiter
Callisto,1.076e+23,,Jupiter
Io,8.932e+22,1822000.0,Jupiter
Europa,4.8e+22,,Jupiter
The Moon,7.342e+22,1737000.0,Earth


In [72]:
df.loc['Europa',['MASS','parent_planet']]

MASS             48000000000000000000000.0
parent_planet                      Jupiter
Name: Europa, dtype: object

In [73]:
# slicing
df.loc[:,'MASS'] # the same as df [ ' mass '] - returns a Series

Ganymede    1.482000e+23
Callisto    1.076000e+23
Io          8.932000e+22
Europa      4.800000e+22
The Moon    7.342000e+22
Name: MASS, dtype: float64

In [74]:
df.loc['Ganymede':'Io',['MASS','radius']]

Unnamed: 0,MASS,radius
Ganymede,1.482e+23,2634000.0
Callisto,1.076e+23,
Io,8.932e+22,1822000.0


In [83]:
df.loc[['The Moon','Europa'],'parent_planet']

The Moon      Earth
Europa      Jupiter
Name: parent_planet, dtype: object

In [76]:
df.loc[df.parent_planet=='Jupiter','radius']

Ganymede    2634000.0
Callisto          NaN
Io          1822000.0
Europa            NaN
Name: radius, dtype: float64

In [None]:
# modify the data 
df.loc['Europa','radius'] = 1.561e6
df.loc['Europa']

In [None]:
df.loc[df.parent_planet=='Jupiter','MASS']

In [None]:
# The rows corresponding to moons with radii less than 2000 km
df.loc[df.radius < 2e6,'parent_planet']

In [None]:
df.iloc[2]

The second method, `iloc`, retrieves data by numerical index position:

In [None]:
df.iloc[1] # the second row

In [None]:
df.iloc[1:,[1,2]] # all row, second and third columns

In [None]:
df.iloc[-1,1] # last row, columns 1

In [None]:
# For single scalar values
df.at['Moon','radius'] # same as df.loc['Moon','mass']


In [None]:
df.iat[-1,0] # same as df.iloc[-1,0]

- `loc` always refers to the index labels, whereas `iloc` takes a integer location index.

In [None]:
df = pd.DataFrame(np.arange(12).reshape(4,3)+10,index=[1,2,3,4], columns = list('abc'))
df

In [None]:
df.loc[1] # the row with index "label" 1 (the first row)

In [None]:
df.iloc[0]

In [None]:
df.iloc[1]

In [None]:
df.index = ['row1',2,2,3] # change the index labels
df

In [None]:
df.loc[2] # a DataFrame: all rows labeled 2

In [None]:
df.iloc[2]

## Combining Series and DataFrame

- To create a `DataFrame` is from a nested dictionary or from a dictionary of `Series`.

In [None]:
boeing_wingspan = pd.Series({'B747-8': 68.4, 'B777-9': 64.8, 'B787-10': 60.12},
                            name='wingspan')
boeing_length = pd.Series({'B747-8': 76.3, 'B777-9': 76.7, 'B787-10': 68.28},
                          name='length')
boeing_range = pd.Series({'B777-9': 13940, 'B787-10': 11910},
                         name='range', dtype=float)

In [None]:
# Create a DataFrame from a dictionary of Series.
df_boeing = pd.DataFrame({'wingspan':boeing_wingspan,'length':boeing_length, 'range': boeing_range})

In [None]:
# Create a DataFrame from a dictionary of dictionaries
df_airbus = pd.DataFrame({'range': {'A350 -1000': 16100, 'A380 -800': 14800},
                          'wingspan': {'A350 -1000': 64.75, 'A380 -800': 79.75},
                          'length': {'A350 -1000': 73.8, 'A380 -800': 72.72} })

In [None]:
df_airbus

In [None]:
df_boeing

In [None]:
df_aircraft = pd.concat((df_airbus,df_boeing))
df_aircraft

In [None]:
df_airbus.append(df_boeing)

In [None]:
df_airbus['speed'] = [950,903]
df_airbus

In [None]:
df_aircraft = pd.concat((df_airbus,df_boeing))
df_aircraft

In [None]:
speeds = df_aircraft['speed']
speeds['B747-8','B787-10'] = 903, 956
jumbo = df_aircraft.loc['B747-8']
jumbo.range = 15000
df_aircraft

In [None]:
del df_aircraft['speed']

In [None]:
df_aircraft

In [None]:
# The drop function can be used to selectively remove rows and columns
df_aircraft.drop(['A350 -1000','A380 -800'])

In [None]:
#drop a column/multiple
df_aircraft.drop(['length','wingspan'], axis='columns',inplace=True)
df_aircraft

## Reading and Writing Series and DataFrames
### Reading Text Files

- The core method for reading text files of data into a `DataFrame` is `pd.read_csv`.
- It takes around 49 possible arguments
- Some important arguments discussed below:
    - `filepath` or `buffer`: The path to the file to read 
    -  `sep`: The column delimiter; by default ',', but use '\s+' for whitespace delimeter columns, '\t+' for tab-delimiters, or `None`
    -  `delimiter`: An alias for `sep`
    -  `header`: The row numbers to use for the column names. Default `header=0`: use the first row for the column names. If the file doesn't have column names, specify `header=None` and set the column names with the `names` argument.
    - `names`: A sequence of unique column names to use. If the file contains no header, set `header=None` in addition to setting names.
    - `index_col`: The column(s) to use as the row labels in the `DataFrame`.
    - `usecols`: A sequence of column indices or column names identifying the columns to be read into the `DataFrame`.
    - `squeeze`:  If the data required consist of a single column, then `squeeze=True` will return a `Series` instead of the default, a `DataFrame`
    - `skiprows`: An integer giving the number of lines at the start of the file to skip over before reading the data or a sequence giving the indices of rows to skip.
    -  `skipfooter`: The number of rows at the bottom of the file to skip (by default, 0).
    -  `nrows`: The number of rows of the file to read: this is useful for reading a subset of lines from a very large file for testing or exploring its data.
    -  `na_values`: A string or sequence of strings to treat as `NaN` values, in addition to the default values which include 'NaN', 'NA', 'NULL' and '#N/A' (see the documentation for a full list).



In [None]:
import pandas as pd 
import numpy as np

In [None]:
df_airlines = pd.read_csv('E:/MDSC-106/Datasets/airlines_usa.csv')#skiprows=1,index_col=6,usecols=range(2),nrows=3
df_airlines.head()

In [None]:
#df_airlines.columns = df_airlines.columns.str.strip()
#df_airlines.columns  

In [None]:
#rename columns using lambda functions
df_airlines.rename(columns = lambda x: x.strip())
df_airlines.columns

In [None]:
df_airlines.columns = df_airlines.columns.str.strip()
df_airlines.columns

## Writing Text Files
- The `DataFrame` method `to_csv` outputs its data to a text file, formatted as arguments summarized below:
    - `path`_or_`buf`: A file path or file object to output to; if None, the DataFrame is returned as string.
    - `sep`: The single-character field-delimiter (defaults to ',').
    - `na_rep`: The string to use to represent missing data (defaults to the empty string,'').
    - `float_format`: The C-style format specifier for floating-point numbers.
    - `columns`: A sequence identifying the columns to output.
    - `header`: By default, `True`, indicating that column names should be output; can be set to `False` or a list of column names.
    - `index`: By default, `True`, indicating that row names should be output.
    - `compression`: One of 'infer', 'gzip', 'bz2', 'zip', 'xz', `None` to specify whether and how to compress the output file. The default is 'infer': pandas determines the intended compression method from the filename extension.

In [None]:
import pandas as pd  
import numpy as np 

In [None]:
# Reading in a text table of vitamin data

df = pd.read_csv('vitamin.txt', delim_whitespace =True, skiprows=4, 
                 skipfooter=1, header=None, usecols=(1,2,3), names = ['Vitamin','Solubility','RDA'], index_col=0)
print(df)

In [None]:
def average_rda_in_micrograms(col):
    def ensure_microgram(s):
        if s.endswith('ug'):
            return float(s[:-2]) 
        elif s.endswith('mg'):
            return float(s[:-2]) * 1000  
        raise ValueError(f'Unrecognised Units in {s}')
    fields = col.split('/')
    return sum([ensure_microgram(s) for s in fields]) / len(fields) 


In [None]:

df = pd.read_csv('vitamin.txt', delim_whitespace =True, skiprows=4, 
                skipfooter=1, header=None, usecols=(1,2,3),
                converters ={'RDA': average_rda_in_micrograms},
                names = ['Vitamin','Solubility','RDA'], index_col=0)
print(df)

In [None]:
df.to_csv('vitamins.csv',float_format='%.1f', columns=['Solubility','RDA'])
df

In [None]:
df

## Microsoft Excel Files

- Pandas is able to read `DataFrame` from Excel files with both `.xls` and `.xlsx` extensions with the function `pd.read_excel`.

In [None]:
#pip install xlrd 
#pip install openpyxl
df = pd.read_excel('E:/Datasets/bond-lengths.xlsx',
                   index_col =0, # the first column contains the index labels
                   skipfooter =2, # ignore the last two lines of the sheet
                   header =1, # take the column names from the second row
                   usecols='A:E', # use Excel columns labeled A-E
                   sheet_name ='Diatomics' # take data from this sheet
)
print(df)


## Advanced Indexing
### Multiindex 


In [None]:
cities = ('Paris', 'Berlin', 'Vienna', 'London', 'Madrid')
months = ('Jan', 'Apr', 'Jul', 'Oct')
index = pd.MultiIndex.from_tuples((city, month) for city in cities for month in months)
index


In [None]:
index =pd.MultiIndex.from_product((cities,months))
index

In [None]:
import numpy as np

In [None]:
index.names =['City','Month']
temps = [[4.9 , 11.5 , 20.5 , 13.0] , [0.1 , 9.0, 19.1 , 9.4] ,
         [0.3 , 10.7 , 20.8 , 10.2] , [5.2 , 9.9, 18.7 , 12.0] ,
         [6.3 , 12.9 , 25.6 , 15.1]
]
rainfall = [[51.0 , 51.8 , 62.3 , 61.5] , [37.2 , 33.7 , 52.5 , 32.2] ,
            [38. , 45., 70., 38.] , [55.2 , 43.7 , 44.5 , 68.5] ,
            [33. , 45., 12., 60.]]
arr = np.array((temps , rainfall )).reshape(2,20).T

df = pd. DataFrame (arr , index=index , columns =['Mean temperature /degC','Mean rainfall /mm'])
print(df)


In [None]:
df

- Data Cleaning Need to added
- Missing Values
- Outliers
- Binding

## Data Grouping and Aggregation
#### DataFrame Grouping with groupby  
- The method `groupby` to analyze data in a `Series` or `DataFrame` based on their categorization according to some key row/ column values.
- 


In [None]:
import pandas as pd
import numpy as np 

In [None]:
# Consider the following data
data = [['Anu', 'A', 5.4] , ['Anu', 'B', 6.7] , ['Anu', 'C', 10.1] ,
        ['Jenny', 'A', 6.5] , ['Jenny', 'B', 5.9] , ['Jenny', 'C', 12.2] ,
        ['Tom', 'A', 4.0] , ['Tom', 'B', None], ['Tom', 'C', 9.5]
        ]

In [None]:
df = pd.DataFrame(data,columns=['Student','Compound','Yield/g'])
print(df)

In [None]:
grouped = df.groupby('Compound')
grouped

In [None]:
grouped.mean()

In [None]:
grouped.max()

In [None]:
grouped['Yield/g'].mean()

In [None]:
# groupby() can be iterated
for compund, group in grouped:
    print('Compound:', compund)
    print(group)

In [None]:
grouped = df.groupby('Student')

In [None]:
grouped.mean()

In [None]:
degree_programmes = {'Anu':'Chemistry',
                     'Jenny':'Chemistry',
                     'Tom': 'Pharmacology'}

In [None]:
#df.set_index('Student',inplace=True) 
df.groupby(degree_programmes).mean()


## Exercise:
- P9.5.1 Pg # 482
- P9.5.2 Pg # 482

## Data Visualization

The file `nuclear-explosion-data.csv`, available to download
[here](https://scipython.com/eg/ban), and it contains data on all nuclear explosions between 1945 and 1998. We will use pandas to analyze it in various ways.

In [None]:
df = pd.read_csv('E:/Datasets/nuclear-explosion-data.csv')
df.head(10)

In [None]:
df.index

In [None]:
df.columns

In [None]:
df.describe()

In [None]:
# datetime parsing
#from datetime import datetime
def parse_time(t):
    hr, t = divmod(t,10000)
    minute, t = divmod(t,100) 
    return int(hr), int(minute), int(t)  

def parse_datetime(date,time):
    date_and_time = datetime.strptime(str(date), '%Y%m%d')  
    hr, minute, second = parse_time(time)
    return date_and_time.replace(hour=hr, minute=minute, second=second)

In [None]:
from datetime import datetime
dt = datetime.strptime(str(20211025), '%Y%m%d')
dt.replace(hour=14,minute=52,second=2)

In [None]:
divmod(5,3)

In [None]:
df.index = pd.DatetimeIndex([parse_datetime(date,time) for date,time in zip(df['date'],df['time'])])
print(df)

In [None]:
df.index.year

In [None]:
explosion_number = df.groupby(df.index.year).size()
print(explosion_number)

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(explosion_number.index,explosion_number.values)
ax.set_xlabel('Year')
ax.set_ylabel('# of nuclear explosions') 
ax.set_title('Nuclear Explosions between 1945-1998')
plt.show()

In [None]:
df2 = df.groupby([df.index.year, df.country])
explosions_by_country = df2.size()
print(explosions_by_country.head(10))

The `unstack()` method has the added benefit of allowing us to specify how to fill in 
missing values that come into existence upon reshaping the data. To do so, we can use the 
`fill_value` parameter.

In [85]:
explosions_by_country = explosions_by_country.unstack().fillna(0)
print(explosions_by_country.head(10))

NameError: name 'explosions_by_country' is not defined

In [84]:
countries =['USA', 'USSR','UK','FRANCE','CHINA','INDIA','PAKISTAN']
bottom = np.zeros(len(explosions_by_country))
fig, ax = plt.subplots()
for country in countries:
    ax.bar(explosions_by_country.index, explosions_by_country[country],
           bottom=bottom , label=country)
    bottom += explosions_by_country[country].values  
    
ax.set_xlabel('Year')
ax.set_ylabel('Number of nuclear explosions')  
ax.legend()
plt.show()

NameError: name 'explosions_by_country' is not defined

## Exercise:
- Explore Example E9.14 Page # 488