# Defining Custom Functions and Automating Tasks with Python 
*Functions, Loops, and Conditional Statements*
\
March 9 at 3:00 PM \
Vincent Scalfani and Lance Simpson \
*The University of Alabama Libraries* \
[Contact Information on UA Libraries Directory](https://www.lib.ua.edu/#/staffdir?liaison=1&department=Rodgers%20Library%20for%20Science%20and%20Engineering)


**Today, attendees will learn how to:**

* Use Python functions 
* Define custom functions
* Use conditional statements to make choices 
* Create loops to automate tasks

# Annoucements

*Future Workshops:*

March 30 at 3:00 PM (in-person) \
**Plotting and Working with Data in Python**

March 31 at 3:00 PM (zoom) \
**Plotting and Working with Data in Python**


More information and registration here:
https://calendar.ua.edu/department/university_libraries/calendar

*Past Workshops:*

**Computational Notebooks and Beginner Syntax with Python**

Archived copy here: https://github.com/ualibweb/UALIB_Workshops



# Setup For Today

If you would like to follow along intractively

1. Go to the link provided for this Colab notebook. 

2. Save a copy to your Google Drive. You should then be able to run and edit the code interactively.


# Using Python Functions [1,2]

A brief review of using Python functions from our [previous workshop](https://github.com/ualibweb/UALIB_Workshops/blob/master/04_Python_spring_2022/01_Python_computational_notebooks_and_syntax.ipynb).

In [None]:
# functions are called with parentheses
print("Hello World!")

In [None]:
# empty parentheses are used when a function should be evaluated
# even with no argument input
myList = [23, 1, 45, 9]
myList.reverse() # () evaluates reverse function method with no arguments
print(myList)

Python [Built-in functions](https://docs.python.org/3/library/functions.html) are always available and generally do not need to be explicity imported. However, there are modules within the [Python Standard Library](https://docs.python.org/3/library/index.html) (e.g., math functions, file format handling, etc.) and external libraries, where we need to import the library module before we can use the functions.

We can use `import` to load a library module:

In [None]:
import math
# check the help documentation
help(math)

In [None]:
# return square root
math.sqrt(9)

In [None]:
# import specific parts of a module only
from math import sqrt
sqrt(9)

Adapted from 

[1] https://nbviewer.jupyter.org/github/jakevdp/WhirlwindTourOfPython/blob/master/02-Basic-Python-Syntax.ipynb

[2] http://swcarpentry.github.io/python-novice-gapminder/06-libraries/index.html

# Defining Custom Functions [3]

Python functions are defined using the `def` statement. A general Python syntax format for a function looks like this:

```python
def function_name():
    do something
```

or 

```python
def function_name(param1, parmam2, ...):
    do something
```

## Functions without input parameters

In [None]:
def UALIB_address():
    print("The University of Alabama")
    print("University Libraries")
    print("Box 870266")
    print("Tuscaloosa, AL 35487")

In the above function, `UALIB_address` is the function name and `()` contains the function parameters, in this case, there are none. The line containing `def` is completed with a `:`, which is then followed by indented code.

To call our new function, type the name of the function with empty parentheses (since no inputs are required):

In [None]:
UALIB_address()

It is a good idea to add comments to your custom function as a docstring. Enclose comments in triple quotes `"""` to describe how to use it. There are some guidelines available on how to format docstrings: https://www.python.org/dev/peps/pep-0257/

In [None]:
def UALIB_address():
    """ Print The University of Alabama Libraries mailing
    address. There are no inputs.
    """
    print("The University of Alabama")
    print("University Libraries")
    print("Box 870266")
    print("Tuscaloosa, AL 35487")

In [None]:
# now we can access the docstring with help()
help(UALIB_address)

In [None]:
# You can view the original Python source code with two question marks `??`
# see: https://jakevdp.github.io/PythonDataScienceHandbook/01.01-help-and-documentation.html
UALIB_address??

## Functions with input parameters

We can also write functions with input parameters. For example, a function that prepends a DOI with `https://doi.org/`:

In [None]:
def DOI_link(doi):
    """ returns a doi with url prefix. Input a DOI string"""
    print("https://doi.org/" + doi)

In [None]:
# try DOI_link function
m = "10.1021/acs.jmedchem.0c01516"
DOI_link(m)

Multiple parameters can be incorporated into functions:

In [None]:
def id_code(number, location, year):
    formatted_id = str(number) + '-' + str(location) + '-' + str(year)
    print(formatted_id)

In [None]:
id_code(34643,"rodgers",1981)

Note that the output will be formatted with the argument order specificed in the function. If we want to input arguments in a different order, we will need to name the input arguments:

In [None]:
id_code(location="rodgers", year=1981, number=34643)

We can use `return` in functions to output variables/values.

In [None]:
def pint_to_ounce(v):
    """ converts liquid pints to ounces """
    vo = v * 16
    return vo

In [None]:
pint_to_ounce(4)

In [None]:
# check the variable returned
type(pint_to_ounce(4))

When writing functions, it is helpful to figure out the workflow first without a function. For example, let's say we have an [InChI chemical identifier](https://en.wikipedia.org/wiki/International_Chemical_Identifier) and need to write a function to extract out the molecular formula. Here is an example InChI:

\

**InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)**

\

The molecular formula occurs right after the first `/`, so we can use a string split method. 

\

*Note: we are using a string approach to extract out the molecular formula from an InChI. InChI is not designed for this and there are certainly more robust methods for getting the molecular formula for chemical substances, so be cautious if you use a method like this...*





In [None]:
# first create a string with the InChI
myInChI = "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

# Next split the string at each `/` and put into a list
L = myInChI.split('/')
L

In [None]:
# Now all we need to do is index out the molecular formula at index position 1
L[1]

In [None]:
# Next, write a function:
def molecular_formula(InChI):
    """ returns molecular formula from a standard InChI input"""
    L = InChI.split('/')
    return L[1]

In [None]:
# try our new function
molecular_formula("InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)")

In [None]:
molecular_formula("InChI=1S/C7H6O/c8-6-7-4-2-1-3-5-7/h1-6H")

[3] Parts of this section adapted from http://swcarpentry.github.io/python-novice-gapminder/16-writing-functions/index.html

# Conditional Statements [4,5]

A simplified general Python syntax for conditional statements is as follows:

```python
if expression1:
  do something1
elif expression2:
  do something2
else:
  do something3    
```


## if

Use an `if` statement to make a choice and determine the direction of code execution. Start the line of code with `if` followed by the condition, then end with a colon, `:`. Conditional statements are often tested with [Comparison Operators](https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not) (e.g., `>`) or [sequence operations](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) (e.g., `x in s`)

In [None]:
# if statement with condition met
state = 'Alabama'

if len(state) > 5:
  print(state, 'has more than 5 characters')

In [None]:
# if statement with condition not met
state = 'Iowa'

if len(state) > 5:
  print(state, 'has more than 5 characters')

## else

In the above example, the condition is not met, so nothing happens, we can add an `else` condition to create an alternative code execution. 

In [None]:
# add an else
state = 'Iowa'

if len(state) > 5:
  print(state, 'has more than 5 characters')
else:
  print(state, 'has less than 5 characters')

## elif 

Additional conditional tests can be added before `else` with the `elif` statement (else if). 

In [None]:
# for example, what if we want to test len(state) == 5
state = 'Texas'

if len(state) > 5:
  print(state, 'has more than 5 characters')
elif len(state) == 5:
   print(state, 'has 5 characters') 
else:
  print(state, 'has less than 5 characters')

## Testing Multiple True conditions

Next, what if we wanted to add another conditional such as checking if the state variable has 5 characters and contains the letter `a`. You might first try something like this with another `elif` statement. 

In [None]:
state = 'Texas'

if len(state) > 5:
  print(state, 'has more than 5 characters')
elif len(state) == 5:
   print(state, 'has 5 characters')
elif 'a' in state:
   print(state, 'contains the character a') 
else:
  print(state, 'has less than 5 characters')

In the above case, we may have expected to see the following printed:

```
Texas has 5 characters
Texas contains the character a
```

However, the `if-elif-else` sequence stops once the first condition is True. An alternative would be to rewrite the code with a [boolean operator](https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not) or use only `if` statements to test all conditions. 

In [None]:
# with boolean operator
state = 'Texas'

if len(state) > 5:
  print(state, 'has more than 5 characters')
elif len(state) == 5 and 'a' in state:
   print(state, 'has 5 characters and contains the character a')
else:
  print(state, 'has less than 5 characters')

In [None]:
# with all if statements
state = 'Texas'

if len(state) > 5:
  print(state, 'has more than 5 characters')
if 'a' in state:
   print(state, 'contains the character a')
if len(state) == 5:
   print(state, 'has 5 characters')  
if len(state) < 5:
  print(state, 'has less than 5 characters')

## Incorporating Within Functions

Conditional statements can be incorporated into custom functions. Consider a use case where we need to create a function to test if a string contains a `+` or `-` character, for example, within a [SMILES](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) chemical representation string.

\

Example SMILES for imidazolium: **C1=C[NH+]=CN1**

*Note: we are using a string approach to test if a SMILES string contains a `+` or `-` character, which would suggest an atom within the molecule contains a charge. There are more robust ways of doing this with cheminformatics software, so be cautious as there are most certainly edge-cases where string approaches fail.*



In [None]:
def has_Atomcharge(smi):
    """ checks if a molecular SMILES string contains a charged atom using
    character matching.
    Input a SMILES string, returns True or False.
    """
    if "+" in smi:
        return True
    elif "-" in smi:
        return True
    else:
        return False

In [None]:
smi = "C1=C[NH+]=CN1" # imidazolium molecule SMILES
has_Atomcharge(smi)

In [None]:
smi = "CCN1C=C[N+](=C1)C.[Cl-]" # imidazolium chloride molecule SMILES
has_Atomcharge(smi)

In [None]:
smi = "C=CC1=CC=CC=C1" # styrene molecule SMILES
has_Atomcharge(smi)

Adapted from 

[4] http://swcarpentry.github.io/python-novice-gapminder/13-conditionals/index.html

[5] https://github.com/vfscalfani/UALIB_Workshops/blob/master/01_MATLAB/06_MATLAB_Conditional_Statements.md

# Loop to Repeat Tasks [6,7,8]

## `for` loops

If we wanted to print a series of statements, we could do this one at a time, but it is very slow and inefficient:

In [None]:
print("Rodgers Library has Nursing collections.")
print("Rodgers Library has Engineering collections.")
print("Rodgers Library has Math collections.")
print("Rodgers Library has Science collections.")

`for` loops allow repeated execution of code on a known collection of values such as a range of numbers or a list. A general syntax example is as follows:

```python

for item in items:
  do something
```


`while` loops are another type of loop and are useful when you need to iterate for a specific condition and/or don't know the specific number of iterations. For an introduction, see ref [7]. However, today we will cover `for` loops, here is an example with our previous use-case:

In [None]:
# put or variables in a list
subjects = ["Nursing", "Engineering", "Math", "Science"]

for subject in subjects:
  print('Rodgers Library has', subject, 'collections.')

Here is another example with a range of values:

In [None]:
for n in range(1,21):
  x = (n * 100) / 2
  print(x)

In loops, it is often useful to store the output in a list variable, rather than printing. here is one way to do that:

In [None]:
L = [] # create an empty list
for n in range(1,21):
  x = (n * 100) / 2
  L.append(x) # append each x value to List, L.

In [None]:
L

In [None]:
type(L)

In [None]:
# We can take the same approach with the string example:

# put or variables in a list
subjects = ["Nursing", "Engineering", "Math", "Science"]
sentences = []

for subject in subjects:
  s = str('Rodgers Library has ' + subject + ' collections.')
  sentences.append(s)

In [None]:
sentences

In [None]:
type(sentences)

Let's revisit our custom function:

In [None]:
def has_Atomcharge(smi):
    """ checks if a molecular SMILES string contains a charged atom using
    character matching.
    Input a SMILES string, returns True or False.
    """
    if "+" in smi:
        return True
    elif "-" in smi:
        return True
    else:
        return False

and incorporate it into a for loop:

In [None]:
# create a list of SMILES to check
mysmiles = ["C1=C[NH+]=CN1", "CCCCOC", "CCN1C=C[N+](=C1)C.[Cl-]", 
            "C1C(O1)(C2=CC=C(C=C2)Cl)C3=CC=C(C=C3)Cl", "C=CC1=CC=CC=C1"]

charge_check = []
for smi in mysmiles:
    x = has_Atomcharge(smi)
    charge_check.append(x)

In [None]:
charge_check

**Working with data from a file**

We will review how to work with data files in our next workshop using the [pandas library](https://pandas.pydata.org/), however, here is a basic approach for loading a text file into a Python list, as this can be very convenient if you have hundreds or more items to put into a list and then process with a custom function, for example.

1. Create a simple text file for testing, we can use the SMILES strings we have from above:

```python
C1=C[NH+]=CN1
CCCCOC
CCN1C=C[N+](=C1)C.[Cl-]
C1C(O1)(C2=CC=C(C=C2)Cl)C3=CC=C(C=C3)Cl
C=CC1=CC=CC=C1
```

2. copy/paste these into any text editor and save the file as mysmiFile.txt

3. upload the mysmiFile.txt to Colab as follows:

In [None]:
pwd

In [None]:
ls

In [None]:
mkdir workshop2_test

In [None]:
ls

Next, we can use the file navigation window to upload data directly to our new folder. Click on the workshop2_test folder `three dots > upload`.

In [None]:
cd workshop2_test

In [None]:
ls

We can use the Python csv module to load the data into a list [9,10]:

In [None]:
import csv
with open('mysmiFile.txt', newline='') as inp:
  reader = csv.reader(inp)
  mysmilesF = list(reader)

In [None]:
print(mysmilesF)

In [None]:
type(mysmilesF)

Note that our `mysmilesF` list is actually a list of lists:

In [None]:
mysmilesF[0]

In [None]:
type(mysmilesF[0])

This may be preferred for certain applications, but in our case, it is probably easier to "flatten" the list.  Here is a convenient way to "flatten a list" from Stackoverflow [11]

In [None]:
flat_mysmilesF = []
for sublist in mysmilesF:
    for smi in sublist:
        flat_mysmilesF.append(smi)
flat_mysmilesF     

In [None]:
# now we can run our check again on the imported data
charge_check2 = []
for smi in flat_mysmilesF:
  charge_check2.append(has_Atomcharge(smi))

In [None]:
charge_check2

It might also be useful to keep track of our lines, using the Python `enumerate` iterator [12]:

In [None]:
list(enumerate(flat_mysmilesF))

In [None]:
for i,smi in enumerate(flat_mysmilesF):
    AtomCharge_log = has_Atomcharge(smi)
    print(i, smi, AtomCharge_log)

In [None]:
# we can also use a string concatenation approach with append to
# save our output into a list
formatted_check = []
for i,smi in enumerate(flat_mysmilesF):
    AtomCharge_log = has_Atomcharge(smi)
    s = str(i) + " " + str(smi) + " " + str(AtomCharge_log)
    formatted_check.append(s)

In [None]:
formatted_check

Finally, we may want to write our results to a file [13,14]:

In [None]:
# write output to file
with open('mysmiFile_checked_2.txt', 'w') as out:
  for check in formatted_check:
      out.write(str(check) + "\n")

# will be saved in current Colab directory, then you can download file and save.      

In [None]:
cat mysmiFile_checked_2.txt

References

[6] http://swcarpentry.github.io/python-novice-gapminder/12-for-loops/index.html

[7] https://nbviewer.jupyter.org/github/jakevdp/WhirlwindTourOfPython/blob/master/07-Control-Flow-Statements.ipynb

[8] https://github.com/vfscalfani/UALIB_Workshops/blob/master/01_MATLAB/05_MATLAB_Loops.md

[9] https://stackoverflow.com/questions/24662571/python-import-csv-to-list

[10] https://docs.python.org/3/library/csv.html

[11] https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-a-list-of-lists

[12] https://nbviewer.jupyter.org/github/jakevdp/WhirlwindTourOfPython/blob/master/10-Iterators.ipynb

[13] https://stackoverflow.com/questions/899103/writing-a-list-to-a-file-with-python

[14] https://docs.python.org/3/tutorial/inputoutput.html?highlight=write

# Python Learning Resources

We recommend the following resources as a start for further reading. Some content (as referenced and attributed to above) in this workshop have been adapted and derive from them:


[1] https://github.com/jakevdp/WhirlwindTourOfPython

[CC0-1.0 License](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE)

\

[2] http://swcarpentry.github.io/python-novice-gapminder/

[CC-BY-4.0 License](http://swcarpentry.github.io/python-novice-gapminder/LICENSE.html)

\

[3] For searching specific use-cases: https://stackoverflow.com/questions/tagged/python

\


In addition, UA Libraries provides access to many Python eBooks. Use [Scout](https://www.lib.ua.edu/scout/) to discover Python eBooks. Start with a search for `python` and limit to ebooks within computer science discipline.

# Notebook Copy

An archived version of this notebook is available on our UALIB_Workshops GitHub repository: https://github.com/ualibweb/UALIB_Workshops