# Software Engineering Best Practices

## How should a production code look like?
![prod_code](images/prod_code.png)

* Clean
* Modular
* Must be efficient
* Well documented
* Versioned

## Clean and Modular Code
* **Production code**: Software running on production servers to handle live users and data of the intended audience. Note that this is different from production-quality code, which describes code that meets expectations for production in reliability, efficiency, and other aspects. Ideally, all code in production meets these expectations, but this is not always the case.
* **Clean code**: Code that is readable, simple, and concise. Clean production-quality code is crucial for collaboration and maintainability in software development.
* **Modular code**: Code that is logically broken up into functions and modules. Modular production-quality code that makes your code more organized, efficient, and reusable.
* **Module**: A file. Modules allow code to be reused by encapsulating them into files that can be imported into other files.


## Refactoring code

* Restructuring your code to improve its internal structure without changing its external functionality. This gives you a chance to clean and modularize your program after you've got it working.
* Since it isn't easy to write your best code while you're still trying to just get it working, allocating time to do this is essential to producing high-quality code. Despite the initial time and effort required, this really pays off by speeding up your development time in the long run.
* You become a much stronger programmer when you're constantly looking to improve your code. The more you refactor, the easier it will be to structure and write good code the first time.

> Improve the code quality after it is finished to reduce the workload in the long run, to improve it's maintainabilty and reusabilty. The refactoring process is also the best path to improve the developing skills.

## Writing Clean Code

### Use meaningful names
* **Be descriptive and imply type**: For booleans, you can prefix with `is_` or `has_` to make it clear it is a condition. You can also use parts of speech to imply types, like using `verbs for functions` and `nouns for variables`.
* **Be consistent but clearly differentiate**: `age_list` and `age` is easier to differentiate than `ages` and `age`.

Example of clean code:

```python
# some code
age_list = [47, 12, 28, 52, 35]
for I, age in enumerate(age_list):
if age < 18:
is_minor = True
age_list[I] = “minor”
# some other code
```

* **Avoid abbreviations and single letters**: You can determine when to make these exceptions based on the audience for your code. If you work with other data scientists, certain variables may be common knowledge. While if you work with full-stack engineers, it might be necessary to provide more descriptive names in these cases as well. (Exceptions include counters and common math variables.)

Example of dirty code

```python
s = [88, 92, 79, 93, 85] # student test scores
print(sum(s)/len(s)) # print mean of test scores
s1 = [x * 0.5  10 for x in s] # curve scores with square root method and store in new list
print(sum(s1)/len(s1))  # print mean of curved test scores
```

Example of clean code

```python
import math
import numpy as np
test_scores = [88, 92, 79, 93, 85] 
print(np.mean(test_scores))
curved_test_scores = [math.sqrt(score) * 10 for score in test_scores]
print(np.mean(curved_test_scores))
```
* **Long names aren't the same as descriptive names**: You should be descriptive, but only with relevant information. For example, good function names describe what they do well without including details about implementation or highly specific uses.

```python
# bad
def count_unique_values_of_names_list_with_set(names_list):
return len(set(names_list))
# better
def count_unique_values(arr):
return len(set(arr))
```
> Try testing how effective your names are by asking a fellow programmer to guess the purpose of a function or variable based on its name, without looking at your code. Coming up with meaningful names often requires effort to get right.

### Writing clean code: Nice whitespace

Use whitespace properly.

* Organize your code with consistent indentation: the standard is to use four spaces for each indent. You can make this a default in your text editor.
* Separate sections with blank lines to keep your code well organized and readable.
* Try to limit your lines to around 79 characters, which is the guideline given in the PEP 8 style guide. In many good text editors, there is a setting to display a subtle line that indicates where the 79 character limit is.

## Writing modular code

* **DRY (Don't Repeat Yourself)**: Don't repeat yourself! Modularization allows you to reuse parts of your code. Generalize and consolidate repeated code in functions or loops.

* **Abstract out logic to improve readability**: Abstracting out code into a function not only makes it less repetitive, but also improves readability with descriptive function names. Although your code can become more readable when you abstract out logic into functions, it is possible to over-engineer this and have way too many modules, so use your judgement.

* **Minimize the number of entities (functions, classes, modules, etc.)**: There are trade-offs to having function calls instead of inline logic. If you have broken up your code into an unnecessary amount of functions and modules, you'll have to jump around everywhere if you want to view the implementation details for something that may be too small to be worth it. Creating more modules doesn't necessarily result in effective modularization.

* **Functions should do one thing**: Each function you write should be focused on doing one thing. If a function is doing multiple things, it becomes more difficult to generalize and reuse. Generally, if there's an "and" in your function name, consider refactoring.

* **Arbitrary variable names can be more effective in certain functions**: Arbitrary variable names in general functions can actually make the code more readable.

* **Try to use fewer than three arguments per function**: Try to use no more than three arguments when possible. This is not a hard rule and there are times when it is more appropriate to use many parameters. But in many cases, it's more effective to use fewer arguments. Remember we are modularizing to simplify our code and make it more efficient. If your function has a lot of parameters, you may want to rethink how you are splitting this up.

Example of bad modularized code

```python
# some code
s = [88, 92, 79, 93, 85] 
print(sum(s)/len(s))

s1 = []
for x in s:
  s1.append(x + 5)
print(sum(s1)/len(s1))

s2 = []
for x in s:
  s2.append(x + 10)
print(sum(s2)/len(s2))

s3 = []
for x in s:
  s3.append(x ** 0.5 * 10)
print(sum(s3)/len(s3))
# some other code
```

Little better

```python
# some code
import math
import dump as np

test_scores = [88, 92, 79, 93, 85] 
print(np.mean(test_scores))

curved_5 = [score + 5 for score in test_scores]
print(np.mean(curved_5))

curved_10 = [score + 10 for score in test_scores]
print(np.mean(curved_10))

curved_sqrt = [math.sqrt(score) * 10 for score in test_scores]
print(np.mean(curved_sqrt))
# some other code
```

Better code example

```python
# some code
import math
import dump as np

def flat_curve(arr, n):
  return [i + n for i in arr]

def square_root_curve(arr):
  return [math.sqrt(i) * 10 for i in arr]

test_scores = [88, 92, 79, 93, 85] 
curved_5 = flat_curve(tet_scores, 5)
curved_10 = flat_curve(test_scores, 10)
curved_sqrt = square_root_curve(test_scores)

for score_list in curved_5, curved_10, curved_sqrt:
  print(np.mean(score_list))
# some other code
```

## Efficient code

To create efficient code, we have to think about two important things:
* reduce the run time
* reduce space in memory

Knowing how to write code that runs efficiently is another essential skill in software development. Optimizing code to be more efficient can mean making it:

* Execute faster
* Take up less space in memory/storage

The project on which you're working determines which of these is more important to optimize for your company or product. When you're performing lots of different transformations on large amounts of data, this can make orders of magnitudes of difference in performance.

* **Tip #1**: Use vector operations over loops when possible
* **Tip #2**: Know your data structures and which methods are faster

### Example:

Problem: to check the intersection from two arrays:<br>

*Bad efficiency code*
```python
start = time.time()
recent_coding_books = []

for book in recent_books:
    if book in coding_books:
        recent_coding_books.append(book)

print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))
```
Duration: 15.872233390808105 seconds<br>


*Better code*: **Tip 1**
```python
start = time.time()
recent_coding_books = np.intersect1d(recent_books, coding_books)
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))
```
Duration: 0.031774282455444336 seconds<br>


*Even better code*: **Tip 2**
```python
start = time.time()
recent_coding_books = set(recent_books).intersection(coding_books)
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))
```
Duration: 0.0070879459381103516 seconds<br>

Another example:
```python
######################
#### bad code ########
######################
start = time.time()

total_price = 0
for cost in gift_costs:
    if cost < 25:
        total_price += cost * 1.08  # add cost after tax

print(total_price)
print('Duration: {} seconds'.format(time.time() - start))
# 32765421.24
# Duration: 5.4591240882873535 seconds

######################
#### good code #######
######################
start = time.time()

total_price =  np.sum(gift_costs[np.where(gift_costs<25)])*1.08

print(total_price)
print('Duration: {} seconds'.format(time.time() - start))
# 32765421.24
# Duration: 0.08032560348510742 seconds
```









## Documentation

* **Documentation**: Additional text or illustrated information that comes with or is embedded in the code of software.
* Documentation is helpful for clarifying complex parts of code, making your code easier to navigate, and quickly conveying how and why different components of your program are used.
* Several types of documentation can be added at different levels of your program:
    * **Inline comments** - line level
    * **Docstrings** - module and function level
    * **Project documentation** - project level

### Inline Comments
* Inline comments are text following hash symbols throughout your code. They are used to explain parts of your code, and really help future contributors understand your work.
* Comments often document the major steps of complex code. Readers may not have to understand the code to follow what it does if the comments explain it. However, others would argue that this is using comments to justify bad code, and that if code requires comments to follow, it is a sign refactoring is needed.
* Comments are valuable for explaining where code cannot. For example, the history behind why a certain method was implemented a specific way. Sometimes an unconventional or seemingly arbitrary approach may be applied because of some obscure external variable causing side effects. These things are difficult to explain with code.
```python
# this is an in-line comment
print('Hello') # also an in-line comment
```

### Docstrings
> Docstring, or documentation strings, are valuable pieces of documentation that explain the functionality of any function or module in your code. Ideally, each of your functions should always have a docstring.

Docstrings are surrounded by triple quotes. The first line of the docstring is a brief explanation of the function's purpose.

#### One-line docstring
If you think that the function is complicated enough to warrant a longer description, you can add a more thorough paragraph after the one-line summary.
```python
def population_density(population, land_area):
    """Calculate the population density of an area."""
    return population / land_area
```

#### Multi-line docstring
The next element of a docstring is an explanation of the function's arguments. Here, you list the arguments, state their purpose, and state what types the arguments should be. Finally, it is common to provide some description of the output of the function. Every piece of the docstring is optional; however, docstrings are a part of good coding practice.
```python
def population_density(population, land_area):
    """Calculate the population density of an area.

    Args:
    population: int. The population of the area
    land_area: int or float. This function is unit-agnostic, if you pass in values in terms of square km or square miles the function will return a density in those units.

    Returns:
    population_density: population/land_area. The population density of a 
    particular area.
    """
    return population / land_area
```
#### Further research
* The [PEP 257 - Docstring Conventions](https://www.python.org/dev/peps/pep-0257/) document the semantics and conventions associated with Python docstrings if you would like to learn more information.
* [NumPy Docstring Guide](https://numpydoc.readthedocs.io/en/latest/format.html): This document describes the syntax and best practices for docstrings used with the numpydoc extension for Sphinx.

### Project documentation
Project documentation is essential for getting others to understand why and how your code is relevant to them, whether they are potentials users of your project or developers who may contribute to your code. A great first step in project documentation is your README file. It will often be the first interaction most users will have with your project. <br>

Whether it's an application or a package, your project should absolutely come with a README file. At a minimum, this should explain what it does, list its dependencies, and provide sufficiently detailed instructions on how to use it. Make it as simple as possible for others to understand the purpose of your project and quickly get something working.<br>

Translating all your ideas and thoughts formally on paper can be a little difficult, but you'll get better over time, and doing so makes a significant difference in helping others realize the value of your project. Writing this documentation can also help you improve the design of your code, as you're forced to think through your design decisions more thoroughly. It also helps future contributors to follow your original intentions.<br>

READMEs can vary in their length and the information that they include. Here are two examples from popular projects:
* [Bootstrap](https://github.com/twbs/bootstrap/blob/main/README.md): This project has been around for over a decade, and the README contains a lot of information that you may not find in other READMEs.
* [Scikit-learn](https://github.com/scikit-learn/scikit-learn): This project has a more streamlined approach to the README.

##