# 11. Modules
*using existing code*

## 11.1 Introduction

So now that we know how to make functions, how can you use them in another program if they're not inside the same file? In Python you can import a function into a file containing your code, even if the code for the function is in another file. The file with a Python function in it, is called a module. A module can also have multiple functions in it. 

Using code in another file is possible by using **import**. In this way you can import your own functions, but also draw on a very extensive library of functions provided by Python (built-in modules). We will first look at the syntax for imports, then explore the most commonly used Python libraries.

## 11.2 How imports work
But how exactly do imports work? Let’s say you import a module abc like so:
```python
import abc
```
The first thing Python will do is look up the name `abc` in [`sys.modules`](https://docs.python.org/3/library/sys.html#sys.modules). This is a cache of all modules that have been previously imported.

If the name isn’t found in the module cache, Python will proceed to search through a list of built-in modules. These are modules that come pre-installed with Python and can be found in the [Python Standard Library](https://docs.python.org/3/library/). If the name still isn’t found in the built-in modules, Python then searches for it in a list of directories defined by [`sys.path`](https://docs.python.org/3/library/sys.html#sys.path). This list usually includes the current directory, which is searched first.

When Python finds the module, it binds it to a name in the local scope. This means that `abc` is now defined and can be used in the current file without throwing a `NameError`.

If the name is never found, you’ll get a `ModuleNotFoundError`. 


## 11.3 Import syntax 
The easiest example is importing a function from within the same directory. Let's create a Python module called 'functions1.py' with the code of the function 'getMeanValue()' that we have written earlier (and you can find here below). 

**Create a module in Jupyter Lab/Notebook**
- In order to create a module in Jupyter Lab, first click on the + symbol on the top left and create a new notebook 
- Rename the notebook (e.g. 'functions1.ipynb') and copy paste the code in the notebook 
- Click 'File', 'Export Notebook as' and 'Export Notebook to Executable Script' 
- Jupyter will not download it in some local folder, make it available in your working directory (in our case in the same directory as we're in right now). 

Unfortunately, Jupyter Lab/Notebook doesn't have a streamlined & straightforward way of creating Python modules and Python scripts. When you export the notebook, it will always export the whole Notebook and not just a part of it, which makes it very messy if you have a very large notebook. 


In [None]:
def getMeanValue(valueList):
    """
    Calculate the mean (average) value from a list of values.
    Input: list of integers/floats
    Output: mean value
    """
    valueTotal = 0.0
 
    for value in valueList:
        valueTotal += value
    numberValues = len(valueList)
    
    return (valueTotal/numberValues)

We can now use the module we just created by importing it. In this case where we import the whole 'functions1' file, we can call the function as a method, similar to the methods for lists and strings that we saw earlier:

In [None]:
import functions1

print(functions1.getMeanValue([4,6,77,3,67,54,6,5]))

If we were to write code for a huge project, long names can get exhaustive. Programmers will intrinsically make shortcut names for functions they use a lot. Renaming a module is therefore a common thing to do:

In [None]:
import functions1 as f1

print(f1.getMeanValue([4,6,77,3,67,54,6,5]))

When importing a file, Python only searches the current directory, the directory that the entry-point script is running from, and sys.path which includes locations such as the package installation directory (it's actually a little more complex than this, but this covers most cases).

However, you can specify the Python path yourself as well. Note that a folder 'modules' is already created with a module in it called 'functions2.py'. In that module there are two functions: 'getMeanValue' and 'compareMeanValueOfLists'. 

In [None]:
from modules import functions2

print(functions2.getMeanValue([4,6,77,3,67,54,6,5]))

In [None]:
from modules import functions2 as f2

print(f2.getMeanValue([4,6,77,3,67,54,6,5]))

Another way of writing this is with an absolute path to the module. You can explicitly import a specific function from a module.

In [None]:
from modules.functions2 import compareMeanValueOfLists

print(compareMeanValueOfLists([1,2,3,4,5,6,7], [4,6,77,3,67,54,6,5]))

So here we *import* the function compareMeanValueOfLists (without brackets!) from the file *functions2* (without .py extension!).



## 11.3 Built-in Modules

There are several built-in modules in Python, which you can import whenever you like.

Python has many ready-to-use functions that can save you a lot of time when writing code. The most common ones are **time**, **sys**, **os/os.path** and **re**.

### 11.3.1 `time`
With **time** you can get information on the current time and date, ...:

In [None]:
import time

print(time.ctime())  # Print current day and time
print(time.time())   # Print system clock time
time.sleep(10)       # Sleep for 5 seconds - the program will wait here

See the [Python documentation](https://docs.python.org/3/library/time.html) for a full description of time. Also see [datetime](https://docs.python.org/3/library/datetime.html), which is a module to deal with date/time manipulations.


### 11.3.2 `sys`
gives you system-specific parameters and functions:

In [None]:
import sys
 
print(sys.argv)     # A list of parameters that are given when calling this script 
                    # from the command line (e.g. ''python myScript a b c'')
print(sys.platform) # The platform the code is currently running on
print(sys.path)     # The directories where Python will look for things to import

help(sys.exit)          # Exit the code immediately

See the [Python documentation](https://docs.python.org/3/library/sys.html) for a full description.

### 11.3.3 `os` and `os.path` 
are very useful when dealing with files and directories:


In [None]:
import os
 
# Get the current working directory (cwd)
currentDir = os.getcwd()
print(currentDir)

# Get a list of the files in the current working directory    
myFiles = os.listdir(currentDir)
print(myFiles)

# Create a directory, rename it, and remove it
os.mkdir("myTempDir")
os.rename("myTempDir","myNewTempDir")
os.removedirs("myNewTempDir")

# Create a full path name to the functions2 module in the modules folder
myFileFullPath = os.path.join(currentDir,'modules','functions2.py')
print(myFileFullPath)

# Does this file exist?
print(os.path.exists(myFileFullPath))

# How big is the file?
print(os.path.getsize(myFileFullPath))

# Split the directory path from the file name
(myDir,myFileName) = os.path.split(myFileFullPath)
print(myDir, myFileName)

See the Python documentation for [**os**](https://docs.python.org/3/library/os.html) and [**os.path**](https://docs.python.org/3/library/os.path.html) for a full description.

### 11.3.4 `re`

A library that is very powerful for dealing with strings is **re**. It allows you to use regular expressions to examine text - using these is a course in itself, so just consider this simple example:

In [None]:
import re

myText = """Call me Ishmael. Some years ago - never mind how long precisely -
having little or no money in my purse, and nothing particular to interest me on 
shore, I thought I would sail about a little and see the watery part of the 
world."""

# Compile a regular expression, 
myPattern = re.compile("(w\w+d)")    # Look for the first word that starts with a w,
                                     # is followed by 1 or more characters (\w+)
                                     # and ends in a d

mySearch = myPattern.search(myText)

# mySearch will be None if nothing was found
if mySearch:
    print(mySearch.groups())

See the full [Python documentation](https://docs.python.org/3/library/re.html) on regular expressions for more information.

## 11.4 Putting everything together


---
### 11.4.1 Exercises

Make a new directory in which you write out 5 files with a 2 second delay. Each file should contain the date and time when it was originally written out.

---


In [None]:
# 1
import time, os
 

# Create a variable for the directory name
myDir = "timeTest"

# Check whether the directory exists, if not create it
if not os.path.exists(myDir):
    os.mkdir(myDir)


# Loop from 1 to 5
for i in range(1,6):

    # Get the current time
    currentTime = time.ctime()

    # Write out the file - use i to give a different name to each
    filePath = os.path.join(myDir,"myFile{}.txt".format(i))

    outFileHandle = open(filePath,'w')    
    outFileHandle.write("{}\n".format(currentTime))
    outFileHandle.close()

    print("Written file {}...".format(filePath))

    # Sleep for 2 seconds
    time.sleep(2)

---
### 11.4.2 Exercises

Write a function to read in a FASTA file with an RNA sequence and return the RNA sequence (in 3 base unit chunks).

---

In [None]:
# 2 
import os
 
def readRnaFastaFile(fileName):
 
    if not os.path.exists(fileName):
        print("Error: File {} not available!".format(fileName))
        return (None,None,None)

    fconnect = open(fileName)
    lines = fconnect.readlines()
    fconnect.close()

    sequenceInfo = []
    moleculeName = None
    description = None

    # Get information from the first line - ignore the >
    firstLine = lines[0]
    firstLineCols = firstLine[1:].split()
    moleculeName = firstLineCols[0]
    description = firstLine[1:].replace(moleculeName,'').strip()

    # Now get the full sequence out
    fullSequence = ""
    for line in lines[1:]:

        line = line.strip()
        fullSequence += line

    # Divide up the sequence depending on type (amino acid or nucleic acid)
    for seqIndex in range(0,len(fullSequence),3):
        sequenceInfo.append(fullSequence[seqIndex:seqIndex+3])

    return (moleculeName,description,sequenceInfo)


print(readRnaFastaFile("data/rnaSeq.txt"))

---
### 11.4.3 Exercises

Write a program where you ask the user for a one-letter amino acid sequence, and print out the three-letter amino acid codes. Download the dictionary from section 8.2 and save it as a module named SequenceDicts.py first.

---

In [None]:
# 3
# Note how you can import a function (or variable) with a different name for your program!

from modules.SequenceDicts import proteinOneToThree as oneToThreeLetterCodes

oneLetterSeq = input('Give one letter sequence:')
 
if oneLetterSeq:
    for oneLetterCode in oneLetterSeq:
        if oneLetterCode in oneToThreeLetterCodes.keys():
            print(oneToThreeLetterCodes[oneLetterCode])
        else:
            print("One letter code '{}' is not a valid amino acid code!".format(oneLetterCode))
else:
    print("You didn't give me any information!")

---
### 11.4.4 Exercises

Write a program where you translate the RNA sequence `data/rnaSeq.txt` into 3 letter amino acid codes. Use the dictionary from section 8.2 (called myDictionary) and save it as a module named SequenceDicts.py first. You can use the `readFasta.py` module from the modules folder. 

---

In [None]:
from modules.SequenceDicts import standardRnaToProtein, proteinOneToThree

from modules.readFasta import readRnaFastaFile

(molName,description,sequenceInfo) = readRnaFastaFile("data/rnaSeq.txt")
proteinThreeLetterSeq = []

for rnaCodon in sequenceInfo:

    aaOneLetterCode = standardRnaToProtein[rnaCodon]
    aaThreeLetterCode = proteinOneToThree[aaOneLetterCode]
    proteinThreeLetterSeq.append(aaThreeLetterCode)

print(proteinThreeLetterSeq)

---
### 11.4.5 Exercises

Write a program that:
- Has a function `readSampleInformationFile()` to read the information from this sample data file into a dictionary. Also check whether the file exists.
- Has a function `getSampleIdsForValueRange()` that can extract sample IDs from this dictionary. Print the sample IDs for pH 6.0-7.0, temperature 280-290 and volume 200-220 using this function.

---


In [None]:
import os
 
def readSampleInformationFile(fileName):
 
    # Read in the sample information file in .csv (comma-delimited) format

    # Doublecheck if file exists
    if not os.path.exists(fileName):
        print("File {} does not exist!".format(fileName))
        return None
 
    # Open the file and read the information
    fileHandle = open(fileName)
    lines = fileHandle.readlines()
    fileHandle.close()

    # Now read the information. The first line has the header information which
    # we are going to use to create the dictionary!

    fileInfoDict = {}

    headerCols = lines[0].strip().split(',')

    # Now read in the information, use the first column as the key for the dictionary
    # Note that you could organise this differently by creating a dictionary with
    # the header names as keys, then a list of the values for each of the columns.

    for line in lines[1:]:
 
        line = line.strip()  # Remove newline characters
        cols = line.split(',')

        sampleId = int(cols[0])

        fileInfoDict[sampleId] = {}

        # Don't use the first column, is already the key!
        for i in range(1,len(headerCols)):
            valueName = headerCols[i]
 
            value = cols[i]
            if valueName in ('pH','temperature','volume'):
                value = float(value)

            fileInfoDict[sampleId][valueName] = value

    # Return the dictionary with the file information
    return fileInfoDict

def getSampleIdsForValueRange(fileInfoDict,valueName,lowValue,highValue):
 
    # Return the sample IDs that fit within the given value range for a kind of value
 
    #sampleIdList = fileInfoDict.keys()
    #sampleIdList.sort()
    sampleIdList = sorted(fileInfoDict.keys())
    sampleIdsFound = []

    for sampleId in sampleIdList:

        currentValue = fileInfoDict[sampleId][valueName]
 
        if lowValue <= currentValue <= highValue:
            sampleIdsFound.append(sampleId)
 
    return sampleIdsFound
 
if __name__ == '__main__':
 
    fileInfoDict = readSampleInformationFile("data/SampleInfo.txt")

    print(getSampleIdsForValueRange(fileInfoDict,'pH',6.0,7.0))
    print(getSampleIdsForValueRange(fileInfoDict,'temperature',280,290))
    print(getSampleIdsForValueRange(fileInfoDict,'volume',200,220))

Go to the next part of the tutorial

## 11.5 The end

Or not? Go to our [next chapter](extra-course-parts/12_Asserts_Try_Except.ipynb). 