# Clean Coding with PEP8
© Explore Data Science Academy

## Learning Objectives
In this train you will learn how to:

- Write clean code following PEP-8 guidelines
- Correctly implement indentation and follow the recommended format used for comments
- Write docstrings in an appropriate format and utilise PEP-8 naming conventions for variable names


## Outline
This train is structured as follows:
- An introduction into PEP-8 guidelines for writing clean code
- Guidelines used for imports, indentation and code layout
- Format used for inserting comments and docstrings in code
- Recommendation for naming conventions and the use of whitespace in expressions and statements 

## Introduction

Now that you are writing code, ideas and guidelines about how to style your code are worthwhile to know. Writing code is not just about writing random words and hoping they will compile, the format and style of those words matter. The same way you would have a hard time reading an essay that is badly punctuated is the same as attempting to read badly formatted code. To achieve high readability in our code we will look at Python Enhancement Proposal (PEP) for assistance, specifically PEP-8.

When someone wants to implement a change to the Python language, they would have to write a *Python Enhancement Proposal(PEP)*. One of the oldest PEP's is PEP-8, which is just simply a set of rules that are used to determine how we should format our code for better readability and neatness. These factors are important in projects where there are people in different departments working on the same piece of code; clean code allows better communication and efficiency in a team. 

Below is an image which can be referenced as a *PEP-8 Cheat Sheet*, in this train we will further expand on each of the items. 

In [4]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://github.com/Explore-AI/Pictures/blob/master/PEP_8_Guide.jpg?raw=true")

## Imports

Importing libraries and/or modules is a common practice when working with Python for data science. The first order of business when writing your script is to ensure that you always import libraries at the start of your script. 

Imports must be placed at the top of the document just after any initial comments, docstring and global modules or constants. [1]. When importing more than one library, each library must be imported on a single line. 

* **Good Imports**

The following are examples of some good practices to follow when importing libraries.

In [3]:
# An example of a good import (1)
from sklearn.linear_model import LinearRegression

# Another example of good imports (2)
import numpy as np
import pandas as pd

 * **Bad Imports**

The following are bad import practices that should be avoided when compiling your code. Wildcard imports (2) are not suitable because there can be two modules within a library that contain the same name but have different functions; one may be built into Python3 and the other may be a third party extension. [1] In addition, it makes it difficult to identify which module has lead to a bug in your code. 

In [3]:
# An example of bad imports (1)
import numpy as np, pandas as pd, os, sys

# Another example of bad imports (2)
from sklearn.linear_model import *

## Indentation

Python, unlike Java and C++, does not use braces in its syntax and instead uses indents. It is noteworthy that indents are better than using spaces, this is due to the consistency of tabs which are less error-prone than spaces when writing blocks of code. In Python a single indent is equivalent to 4 white spaces and sometimes one can use 3 or even 5 spaces by mistake hence indents using the Tab button (windows) is important. Ideally, a line of code in programming should not be longer than 80 characters if it is then you need to drop it to the next line.

All of this will become clear in the next few examples.

* **Good Indents**

The following example uses tabs to make the spaces between the code blocks. 

In [4]:
# An example of indentation that is easily distinguishable
def explore(student):
    if student != 'Wisani': # 4 white spaces used
        return student      # 8 white spaces used
    else:
        return '{} was at EDSA in 2019'.format(student)

* **Bad Indents**

The following example is where tabs were used together with white spaces. Python 3 does not allow this and it will result in an error, it is better to remain consistent and maintain the use tabs throughout.

In [6]:
# indents that distinguish lines of code from each other
def explore(student):
   if student != 'Wisani': # 3 white spaces used
      return student       # 6 white spaces used
    else:                  # 4 white spaces used
        return '{} was at EDSA in 2019'.format(student)

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 5)

## Code layout 

Your code layout plays a major role in determining how readable your code is, below are key guidelines that will help you write code that is readable, neat and follows PEP-8. These guidelines include the use of blank lines and maintaining a maximum line length. 

### Blank lines 

The use of blank lines can greatly improve the readability of your code. Code that is bundled up together can be difficult and tiresome to read. Similarly, the inclusion of too many blank lines in your code makes it look very scattered, causing unnecessary scrolling. Below are key guidelines on how to effectively use blank lines in your code.

Top-level functions and classes should be surrounded by two blank lines. Top-level functions and classes should be fairly self-contained and functions should be handled separately. It, therefore, makes sense to add additional blank lines to separate each function and class making each distinguishable when looking at your code: 

In [7]:
class ClassOne:
    pass


class ClassTwo:
    pass


def top_function():
    return None

On the other hand method definitions inside classes should be surrounded with a single blank line.Inside a class, functions are all related to one another. A good practice in this case is to leave a single line between them: 

In [8]:
class ClassOne:
    def method_one(self):
        return None

    def method_two(self):
        return None

### Maximum line length 


PEP 8 suggests lines should be limited to 79 characters while docstrings/comments characters are limited to 72. Historically, this guideline stems from the limited capacity of older computers as they were only able to fit 79 characters on a single line in a window terminal. Maintaining a line length of 79 characters allows you to have multiple files open next to one another while avoiding line wrapping.The PEP-8 guidelines are not fixed, some teams prefer a 99-character limit.

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation. An example is provided below:

## Comments 

Comments allow the reader of the code to follow along and see what the author was intending to convey. This can be helpful when people change projects and new people have to work on projects they know nothing about.

Generally there are two types of comments which are inline and block comments. Unlike languages such as C++ and Java, Python does not have well defined block and inline comments. Both variants in python use the hash , noted as "#", for comments. The intended use of comments is to explain: Assumptions made in that line of code, include important details in a line of code that you're trying to solve with your code.  

A rule of thumb to keep in mind is: *Code tells you how, comments should tell you why*

In [1]:
# This is an example of a block comment
values = ['walk barefoot', 'Be ridiculously optimistic', 'EXPLORE with gumption', 'Choose growth'] # This is an inline comment
sprint = ['build an awesome Predict', 'accountable to the team']  # This is an inline comment

## Docstrings

Good code requires Object Oriented Programming (OOP) which requires methods, classes and functions to have docstrings.  Documentation strings, or docstrings as they are commonly referred to are strings enclosed in double (""") or single (''') quotation marks that appear on the first line of any function, class, method, or module. Docstrings are used to explain and document a specific block of code. They fall part of the PEP-257 convention but are important for writing clean code.

For more detailed examples of docstrings you can visit the PEP-257 website. [2]

In [10]:
# one line docstring
"""This is a one line docstring"""
# example
def spray():
    """Prints 'Hello World' three times."""
    print('Hello World\n' * 3)

# multiple line docstring
"""Return a string

Some variables - some descriptions
"""
# example
def sum(a, b):
    """Calculates the sum of the two given integers.
    
    Parameters
    ----------
    a: integer
        The first value
    b: integer
        The second value
    
    Returns
    -------
    c: float
        The result of the mathematical expression of a + b
    """
    c = a + b
    return c

## Naming conventions

Consistent naming styles are used to keep the code clean and readable. There are few styles that are available, but in this course, we will only look at the following style. A more detailed guide on this exists in the PEP-8 documentation (link provided in appendix).

lowercase : variables

UPPERCASE : Constants

camelCase : rarely used in python

CapitalisedWords : Classes

underscore (_) for spaces

In [3]:
# lowercase (variable names)
value = 2
name = 'ridha'

# lowercase_underscores (good for function names)
def gross_wages():
    wages = 17000

# UPPERCASE (good for constants)
PI = 3.14
GRAVITY = 9.8

# UPPERCASE_UNDERSCORES (also good for constants)
SPEED_OF_LIGHT = 300000
COUNTRY_CODE = 'ZAR'

# camelCase (predominantly used in Java and C++)
intSpeed = 9
strName = 'Siyanda'

# CapitalisedWords (good for naming classes)
class ClassName:
    MyDog = 'Rufus!'

# Capitalised_Words (Please Try To Avoid This, It's Ugly)
Identity_Document = 'YES'
First_Name = 'Wesley'

## Whitespace 

As much as whitespace in code aides in readability, misuse of white space can make code look untidy and difficult to follow. We will look at basics to help you avoid ruining a good piece of code by giving it too much space

```python
# An example of the use good white space
email(df_train[4:25], {student: 64})
jam = (6,)
butter(2)

# An example of the use bad white space
email( df_train [ 4 : 25 ] , { student : 64 } )
jam = (6, )
butter (2)
```

## Exercise

Let us now try to play with a few lines of code and test what we have learned so far. It is advised that you go through each section of the trains and complete it as PEP8 will play an important role in your task as a Data Scientist at EDSA and/or elsewhere. If you struggle to do the exercise, you can seek the assistance of a supervisor or ask a friend to help _unblock_ you.

In [None]:
# Exercise 1
def FingTHEminus_OfThe_variables( arg1,b) :
    """The docstring for this function is too long and does not follow PEP8 conventions. Long docstrings should be over multiple lines and not exceed 72 characters.
    """
    variable_Name= arg1 #THE arg1 value is assigned to...
                         #...the value variable_Name   
    
    ''' Return answer'''
    return variable_Name+b

In [None]:
# fixed version (1)
def FindtheminusOfthevariables( firstarg,secondarg) :
    """The docstring for this function is too long 
    
    and does not follow PEP8 conventions. 
    
    Long docstrings should be over multiple lines
    
    and not exceed 72 characters.
    """
    total= firstarg + secondarg #THE arg1 value is assigned to...
                         #...the value variable_Name   
    
    ''' Return answer'''
    return total


You can also directly access the docstring of PEP8 (or any python package) directly in jupyter notebook by using help(package_name). You can also use `package_name?` in jupyter for any python package or function in order to access a pop-out window containing its docstring. Uncomment the lines below to see this in action.

In [5]:
#help(pep8)

NameError: name 'pep8' is not defined

In [6]:
# help(unittest)

NameError: name 'unittest' is not defined

## Conclusion
By now we should have a good idea of the basics of writing clean code. We have looked at how to indent, write comments, docstrings, variable names and white space usage. All of this should serve as a starting point for you to write the best code possible with high standards. At EXPLORE Data Science Academy you should aim to go and explore more and learn even more on your own.

PEP contains documentation on more interesting and insightful rules you should have a look at their website in the link(s) provided in the Appendix. 

## Appendix 

- [PEP8](https://pep8.org/) 

- [PEP8 in Python](https://www.python.org/dev/peps/pep-0008/)

- [ Python code layout](https://realpython.com/python-pep8/#code-layout)

- [Unittest](https://docs.python.org/3/library/unittest.html)