## 03_functions notebook

##### This notebook will cover the basics in creating simple and more complex functions. Conditional logic is intorduced in function creation and the importance of ensuring the code "indention" is correct. Not correctly indenting my code when writing functions and caused me alot of headaches. 

Topics covered:

**1. Creating Functions**
* defining a basic function
* Multiple function arguments


**2. Conditional logic** 
* combining conditional logic in functions
* code indent issue examples


In [1]:
# Set up

# pandas and numpy are universally used in python, like tidyverse is in R. 
import pandas as pd
import numpy as np

!pip install openpyxl

# chnage from scientific notation 
pd.set_option('display.float_format', lambda x: '%.5f' % x)

trade = pd.read_excel("data/trade_data.xlsx") # upload xlsxl
tariff = pd.read_excel("data/tariff_data.xlsx")
uk_trqs = pd.read_csv("data/uk_trqs.csv",dtype={'quota__order_number': str})

trade.columns = trade.columns.str.lower().str.replace(" ","_")

Looking in indexes: https://s3-eu-west-2.amazonaws.com/mirrors.notebook.uktrade.io/pypi/
Collecting openpyxl
  Downloading https://s3-eu-west-2.amazonaws.com/mirrors.notebook.uktrade.io/pypi/openpyxl/openpyxl-3.0.10-py2.py3-none-any.whl (242 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.1/242.1 kB[0m [31m112.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting et-xmlfile
  Downloading https://s3-eu-west-2.amazonaws.com/mirrors.notebook.uktrade.io/pypi/et-xmlfile/et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.10
[0m

### 1. Creating functions

In [6]:
# defining a simple function

def simple_function():
    print("Simple none argument function")

In [7]:
simple_function()

Simple none argument function


In [8]:
# simple function with one argument
def simple_function(arg):
    print("Simple function with one argument: "+arg)

In [9]:
simple_function("My first argument")

Simple function with one argument: My first argument


In [10]:
# multiple arguments:
def multi_arg_function(arg1, arg2, arg3):
    print(arg1)
    print(arg2)
    print(arg3)

In [11]:
multi_arg_function(100,"Test argument", "Argument 3")

100
Test argument
Argument 3


In [12]:
multi_arg_function(100,"Test argument")

TypeError: multi_arg_function() missing 1 required positional argument: 'arg3'

Notice the error as the third argument hasn't been populated in the function. We can set arguments to null values so this doens't happen and own't break the code.

In [13]:
def multi_arg_function(arg1, arg2, arg3 = None):
    print(arg1)
    print(arg2)
    print(arg3)

In [14]:
multi_arg_function(100,"Test argument")

100
Test argument
None


#### Function for data manipulation

It is nice and easy to define a simple function to print values. In reality we need to write functions to apply to real-world data. Using the trade data set I will dmeonstrate creating a function to manipulate this data set based on the user defined arguments and returning multiple outputs. 

In [16]:
trade.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41142 entries, 0 to 41141
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   year               41142 non-null  int64  
 1   flow               41142 non-null  object 
 2   commodity_code     41142 non-null  object 
 3   country_code       41142 non-null  object 
 4   country_name       41142 non-null  object 
 5   value_gbp          41142 non-null  int64  
 6   suppression_notes  0 non-null      float64
dtypes: float64(1), int64(2), object(4)
memory usage: 2.2+ MB


In [17]:
trade.head()

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
0,2020,Exports,1012100,TW,Taiwan,892,
1,2020,Exports,1062000,TW,Taiwan,14101,
2,2020,Exports,1063100,TW,Taiwan,1750,
3,2020,Exports,2031913,TW,Taiwan,290818,
4,2020,Exports,2031990,TW,Taiwan,1140,


In [18]:
print(pd.unique(trade["country_name"]))

['Taiwan' 'Tajikistan' 'Tanzania (United Republic of)' 'Thailand'
 'Timor-Leste' 'Tonga' 'Trinidad and Tobago' 'Tunisia' 'Turkmenistan'
 'Tuvalu' 'Uganda' 'Ukraine' 'United States'
 'United States Minor outlying islands' 'United States Virgin Islands'
 'Uruguay' 'Uzbekistan' 'Vanuatu' 'Vatican City State'
 'Venezuela, Bolivarian Republic of' 'Vietnam' 'Wallis and Futuna' 'Yemen'
 'Zambia']


#### Create a function that filters and aggregates this data based on user-selected country.

In [25]:
# simple filter data filter:
def trade_data_function(country):
    df = trade.copy()
    df_filt = df.loc[df["country_name"] == country]
    

In [30]:
trade_data_function("Thailand")

##### Nothing is returned. We haven't told the function to return the data object. 

In [31]:
def trade_data_function(country):
    df = trade.copy()
    df_filt = df.loc[df["country_name"] == country]
    return(df_filt)

In [34]:
df = trade_data_function("Thailand") # you can create an object from the function
trade_data_function("Thailand") # or simply return the output

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
7614,2020,Exports,01051119,TH,Thailand,3187660,
7615,2020,Exports,01051199,TH,Thailand,54196,
7616,2020,Exports,01051300,TH,Thailand,161070,
7617,2020,Exports,01062000,TH,Thailand,18750,
7618,2020,Exports,02031990,TH,Thailand,7929,
...,...,...,...,...,...,...,...
13198,2020,Imports,97030000,TH,Thailand,254489,
13199,2020,Imports,97040000,TH,Thailand,10338,
13200,2020,Imports,97050000,TH,Thailand,269000,
13201,2020,Imports,97060000,TH,Thailand,48600,


In [35]:
df

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
7614,2020,Exports,01051119,TH,Thailand,3187660,
7615,2020,Exports,01051199,TH,Thailand,54196,
7616,2020,Exports,01051300,TH,Thailand,161070,
7617,2020,Exports,01062000,TH,Thailand,18750,
7618,2020,Exports,02031990,TH,Thailand,7929,
...,...,...,...,...,...,...,...
13198,2020,Imports,97030000,TH,Thailand,254489,
13199,2020,Imports,97040000,TH,Thailand,10338,
13200,2020,Imports,97050000,TH,Thailand,269000,
13201,2020,Imports,97060000,TH,Thailand,48600,


In [38]:
# extending the function:

def trade_data_function(country):
    df = trade.copy()
    df_filt = df.loc[df["country_name"] == country]
    df_agg = df_filt.groupby("country_name").agg({"value_gbp":sum})
    return(df_agg)

In [39]:
trade_data_function("Thailand")

Unnamed: 0_level_0,value_gbp
country_name,Unnamed: 1_level_1
Thailand,3725874154


### Passing an array into a function

In [61]:
# we want to extend the filter option so the trade data can filter for multiple countries. 
# the code need to slightly be re-worked.
# the argument needs chnaging so the fucntion can read the array

def trade_data_function(country):
    df = trade.copy()
    filt = country
    df_filt = df.loc[df["country_name"].isin(filt)]
    df_agg = df_filt.groupby("country_name").agg({"value_gbp":sum})
    return df_agg

In [62]:
trade_data_function(["Thailand","Taiwan","United States"])

Unnamed: 0_level_0,value_gbp
country_name,Unnamed: 1_level_1
Taiwan,4470180597
Thailand,3725874154
United States,89903558073


In [69]:
## pre-define function arguments. 

def pre_set_arg_function(country = [] , converter = 1.2, trade_flow = None):
    df = trade.copy()
    filt = country
    df_filt = df.loc[df["country_name"].isin(filt)]
    
    # space to include for trade_flow argument input #
    
    df_agg = df_filt.groupby("country_name").agg({"value_gbp":sum})
    df_agg["value_gbp2"] = df_agg["value_gbp"]/converter
    return(df_agg)


In [70]:
pre_set_arg_function(["United States","Vietnam"])

Unnamed: 0_level_0,value_gbp,value_gbp2
country_name,Unnamed: 1_level_1,Unnamed: 2_level_1
United States,89903558073,74919631727.5
Vietnam,4391174212,3659311843.33333


The trade_flow argument is set to None and therefore doens't interact with the underlying code and won't break it. None is a helpful technique when writing mroe complex functions which requires additional logic. This will be demonstrated further in the next seciton. 

#### Return multiple outputs from function

In [77]:
# utilising return you cna output multiple values:

def multi_output():
    print("Test function")
    return trade.dtypes, trade.info(), trade.shape # seperate out your outputs with a , 

In [78]:
multi_output()

Test function
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41142 entries, 0 to 41141
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   year               41142 non-null  int64  
 1   flow               41142 non-null  object 
 2   commodity_code     41142 non-null  object 
 3   country_code       41142 non-null  object 
 4   country_name       41142 non-null  object 
 5   value_gbp          41142 non-null  int64  
 6   suppression_notes  0 non-null      float64
dtypes: float64(1), int64(2), object(4)
memory usage: 2.2+ MB


(year                   int64
 flow                  object
 commodity_code        object
 country_code          object
 country_name          object
 value_gbp              int64
 suppression_notes    float64
 dtype: object,
 None,
 (41142, 7))

In [79]:
def trade_data_function(country):
    df = trade.copy()
    filt = country
    df_filt = df.loc[df["country_name"].isin(filt)]
    df_agg = df_filt.groupby("country_name").agg({"value_gbp":sum})
    return df_agg, df_filt.head()

In [80]:
trade_data_function(["Thailand","United States"])

(                 value_gbp
 country_name              
 Thailand        3725874154
 United States  89903558073,
       year     flow commodity_code country_code country_name  value_gbp  \
 7614  2020  Exports       01051119           TH     Thailand    3187660   
 7615  2020  Exports       01051199           TH     Thailand      54196   
 7616  2020  Exports       01051300           TH     Thailand     161070   
 7617  2020  Exports       01062000           TH     Thailand      18750   
 7618  2020  Exports       02031990           TH     Thailand       7929   
 
       suppression_notes  
 7614                NaN  
 7615                NaN  
 7616                NaN  
 7617                NaN  
 7618                NaN  )

This method usiing return isn't the most helpful if from your function you want ot return multiple dataframes. You can input the data into a list and return the list from the function to call multiple objects.

In [85]:
def trade_data_function(country):
    df = trade.copy()
    filt = country
    list = []
    df_filt = df.loc[df["country_name"].isin(filt)]
    df_agg = df_filt.groupby("country_name").agg({"value_gbp":sum})
    # insert dfs into a list
    list.append(df_filt)
    list.append(df_agg)
    return list # return list for function output

In [90]:
list_output = trade_data_function(["Thailand","Vietnam"]) 

In [91]:
list_output[0]

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
7614,2020,Exports,01051119,TH,Thailand,3187660,
7615,2020,Exports,01051199,TH,Thailand,54196,
7616,2020,Exports,01051300,TH,Thailand,161070,
7617,2020,Exports,01062000,TH,Thailand,18750,
7618,2020,Exports,02031990,TH,Thailand,7929,
...,...,...,...,...,...,...,...
39932,2020,Imports,96200010,VN,Vietnam,2046,
39933,2020,Imports,97011000,VN,Vietnam,3931,
39934,2020,Imports,97019000,VN,Vietnam,9644,
39935,2020,Imports,97050000,VN,Vietnam,3100,


In [100]:
list_output[1]

Unnamed: 0_level_0,value_gbp
country_name,Unnamed: 1_level_1
Thailand,3725874154
Vietnam,4391174212


#### Text arguments

You can easily in python insert a text argument and have this interact and transform data through functions. 

In [60]:
# Example: function to filter dataframe based on column input

def filter_col_function(col, arg1):
    df = trade.copy()
    df_filt = df.loc[df[col] == arg1]
    return(df_filt.head(3))

In [57]:
trade.dtypes

year                   int64
flow                  object
commodity_code        object
country_code          object
country_name          object
value_gbp              int64
suppression_notes    float64
dtype: object

In [58]:
trade.head(3)

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
0,2020,Exports,1012100,TW,Taiwan,892,
1,2020,Exports,1062000,TW,Taiwan,14101,
2,2020,Exports,1063100,TW,Taiwan,1750,


In [61]:
filter_col_function("year",2019)

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
11,2019,Exports,2064900,TW,Taiwan,966185,
12,2019,Exports,3021110,TW,Taiwan,270452,
13,2019,Exports,3021400,TW,Taiwan,15220129,


In [62]:
filter_col_function("country_code","US")

Unnamed: 0,year,flow,commodity_code,country_code,country_name,value_gbp,suppression_notes
20956,2020,Exports,1012100,US,United States,16637203,
20957,2020,Exports,1012990,US,United States,1752234,
20958,2020,Exports,1019000,US,United States,54000,


****

### 2. Conditional Logic

Utilising conditonal logic within your functions is a powerful technique to create more complicated and adaptable chunks of code. In this section I will demonstrate how to create basic logic, insert into a function and executing functions using conditional logic. 

In [18]:
# simple conditional logic:

x = 10

if(x > 10):
    print("greater than 10")
    
elif(x < 10):
        print("less than 10")
        
elif(x == 10):
        print("equals 10")
        
else:
    print("other condition")

        

equals 10


In [22]:
# combine with function

def logic_func(x):
    if(x > 10):
        print("greater than 10")
    
    elif(x < 10):
        print("less than 10")
        
    elif(x == 10):
        print("equals 10")
        
    else:
        print("other condition")

In [23]:
logic_func(30)

greater than 10


In [25]:
# code "indention" where each condition is placed within each code block can cause issues if nto set up correctly:
if(x > 10):
        print("greater than 10")
    
    elif(x < 10):
        print("less than 10")
        
    elif(x == 10):
        print("equals 10")


IndentationError: unindent does not match any outer indentation level (<tokenize>, line 5)

The if / else statements must be inline with each other. Within each code block you can place additional logic:

In [55]:

if(x > 10):
         
    if(x > 10*10):
            print("Greater than 100")
    elif(x < 10/10):
            print("Less than 1")
    elif(x >1000):
            print("Greater than 1000")

else:
    print(x)

10


In [None]:
# Function with conditional logic example:
# Filter data and apply condition of US only. 
# If flow flag is selected aggregate data by trade flow

In [81]:
def example_function(country_filt, flow_flag = None):
    df = trade.copy()
    if(country_filt == "United States"):
        df["value_usd"] = df["value_gbp"] / 0.8
    else:
        df["value2"] = df["value_gbp"] / 1000
       
    # end indent block
    df_filt = df.loc[df["country_name"]==country_filt]
    
    
    if(flow_flag == None):
        df_return = df_filt.groupby("country_name").sum()
    else:
        df_return = df_filt.groupby(["country_name","flow"]).sum()
    
    return df_return

In [75]:
example_function("United States")

Unnamed: 0_level_0,year,value_gbp,suppression_notes,value_usd
country_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
United States,26039140,89903558073,0.0,112379447591.25


The output shows the function logic worked. The united states has had the GBP converted to GBP. The flow is set ot its defauly and thus not included in the aggregation

In [79]:
example_function("Thailand",flow_flag = "Y")

Unnamed: 0_level_0,Unnamed: 1_level_0,year,value_gbp,suppression_notes,value2
country_name,flow,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Thailand,Exports,6569040,1161053338,0.0,1161053.338
Thailand,Imports,4720740,2564820816,0.0,2564820.816


 ```python
# Example logical function steps:

def function(x,y):
    print("step 1")
    print("step 2")
    if(x == "something"):
        print("logic_1 step 1")
    elif(x == "something_else"):
        print("logic_2 step 1")
        print("logic_2 step 2")
    else:
        if(y == "something"):
            print(y)
        else:
            print(y*10)
        #end of indent block
        # next steps are inline with previous. 
    # All logic statements are in-line for the same indent block
    print("step 3 after logic")
    print("step 4 after logic")
    return(end_result)
       
```

End. 