## What if you want to find:
- the average number in your data.  
- or the maximum?
- Or you want to calculate some statistics.  Here are some examples to do that.

## Min, Max and Average
- Using the same country data set from Part 3A, you can get the larges country by population, using the "max()" method and getting the data at that location.

In [None]:
import pandas as pd
myDataFrame = pd.read_csv("https://psme.foothill.edu/~20033409@ad.fhda.edu/cs3a/nations.csv")

In [None]:
#  returns the max of all the population
myDataFrame["population"].max()

## What country is this? what year?, etc.  
 - The second method "idxmax()" returns the index (row number) of the value    
 - which then you can feed into the "loc" method to find out all the details.

In [None]:
myDataFrame.loc[myDataFrame['population'].idxmax()]

Let's find the average (arithmetic mean) life expectancy in the East Asia & Pacific region.   
- To do this 
 - we need to combine multiple queries, including only looking at life expectancy from a certain region,   
 - then calculate the average for the latest year (2014) that data is available.    
- In this example we use the "query()" method since it will make the query look much cleaner.

In [None]:
myDataFrame.query("(region == 'East Asia & Pacific') and (year == 2014)")["life_expect"].mean()

If you look into the data, some of the life_expect numbers are missing.    
- Pandas handles that fine and just gives the mean (average) where it's available.   
    
Let's assume that we posit that high income countries will have a higher life expectancy than lower income countries.  We can test this easily, using the following one-liner:

In [None]:
myDataFrame.query("(region == 'East Asia & Pacific') and (income == 'High income')")["life_expect"].mean() > \
    myDataFrame.query("(region == 'East Asia & Pacific') and (income == 'Low income')")["life_expect"].mean()

Note: this doesn't tell us anything about statistical significance, just that life expectancy is higher in high income countries compared to low income countries (in East Asia).   
 - Pandas and Python has a suite of tools to perform statistical calculations.  
 - Here's a useful video on YouTube that covers statistics https://www.youtube.com/watch?v=eMOA1pPVUc4

## Custom Functions

Notice in our data, there is a line for GDP per Capita but what if you want the raw GDP number?     
- You can calculate that using a lambda function.  
- We want the number in Billions of dollars (10^9 dollars) so it's easier to understand.    
- Check to make sure everything looks ok too.

In [None]:
myDataFrame["GDP (Billions)"] = myDataFrame.apply(lambda x: x['gdp_percap'] * x['population'] * 10**-9,axis=1)

In [None]:
myDataFrame.query("year == 2014")[["country","income","population","gdp_percap", "GDP (Billions)"]]

For more on lambda fuction https://www.youtube.com/watch?v=Ob9rY6PQMfI