# Software Engineering Practices Part II

Adding robustness with 

- Testing 
- Logging
- Code reviews

## Testing 

- Lack of testing is a common problem, in particular in the work of data scientists
- Skills to properly prepare code for an industry setting includes testing code
- **Test driven development:** A development process where you write tests for tasks before writing the code to implement those tasks. 
- **Unit Test:** A type of test that covers a "unit" of code, usually a single function, independently from the rest of the program.

Resources: 

- [Blog Post](https://www.predictiveanalyticsworld.com/patimes/four-ways-data-science-goes-wrong-and-how-test-driven-data-analysis-can-help/6947/)
- Getting started Testing: [Slide Deck](https://speakerdeck.com/pycon2014/getting-started-testing-by-ned-batchelder) and [Presentation Vide](https://www.youtube.com/watch?v=FxSsnHeWQBY)

#### Unit Tests

- Test functions in a way that is repeatable and automated. 
- Ideally, we'd run a test program that runs all our unit tests and cleanly lets us know which ones failed and which ones succeeded. 
- Advantage: Unit tests are isolated from the rest of the program
- Note that passing unit tests isn't always enough to prove that our program is working successfully. 
- [Integration Testing](https://www.fullstackpython.com/integration-testing.html)

#### Unit Testing Tools 

- [pytest: Installation and getting started](https://docs.pytest.org/en/latest/getting-started.html)

> **Task:** run `pytest` in command line to evaluate all files with `test_` 

Structure of the tests: 

- Write a function that has a specific task in one file
```python
def days_until_launch(current_day, launch_day):
    return launch_day - current_day
```
- add another file which defines a new function in which there is an `assert` statement to test the function 

```python
# import function to be tested
from compute_launch import days_until_launch

def test_days_until_launch_4():
    assert(days_until_launch(22,26) == 4)
```

- run `pytest`in console.


### Test Driven development 

In [1]:
### Workflow example ###

# valid email
def email_validator(email):
    if email.count("@") != 0 and email.count(".") != 0:
        return True
    else:
        return False
    
# start testing
print(email_validator("tmeiendresch@gmx.de"))
print(email_validator("tmeiendresch@gmx"))
print(email_validator("tmei@endresch@gmx..de")) # <- wait!

# ...Correct function to account for this error...

# valid email v2
def email_validator(email):
    if email.count("@") == 1 and email.count(".") == 1:
        return True
    else:
        return False
    
# start testing v2
print(email_validator("tmeiendresch@gmx.de"))
print(email_validator("tmeiendresch@gmx"))
print(email_validator("tmei@endresch@gmx..de")) # <- wait!

# ... and so forth.

True
False
True
True
False
False


**This can be automated ...**

In [2]:
def test_email_validator():
    assert email_validator("tmeiendresch@gmx.de") == True
    assert email_validator("tmeiendresch@gmx") == False
    assert email_validator("tmei@@gmx.de") == False
    print("Done!")

# after changes in the base function we can test it using
test_email_validator()

Done!


- **Test Driven Development:** Writing tests before you write the code that's being tested. Your test would fail at first, and you'll know you've finished implementing a task when this test passes. 
- Tests ought to check for all the differnt scenarios and edge cases you can think of, before even starting to write your function. 
- Run this test to get immediate feedback
- When refactoring or adding to your code, tests help you rest assured that the rest of your code didn't break while you were making those changes. Test also helps ensure that your function behavior is repeatable, regardless of external parameters, such as hardware and time. 

Resources: 

- [Data Science TDD](https://www.linkedin.com/pulse/data-science-test-driven-development-sam-savage/)
- [TDD for Data Science](http://engineering.pivotal.io/post/test-driven-development-for-data-science/)
- [TDD is Essential for Good Data Science Here's Why](https://medium.com/@karijdempsey/test-driven-development-is-essential-for-good-data-science-heres-why-db7975a03a44)
- [Testing your code](http://docs.python-guide.org/en/latest/writing/tests/)

## Logging 

- Messages that help understand what the model is doing, especially when run time is long and various tasks are performed sequentially. 

> Logging is the process of recording messages to describe events that have occurred while running your software. 

- Use levels like ERROR, DEBUG, etc. for clarification

## Code Reviews

Benefits: 

- Catch errors
- Ensure readability
- Check standards are met
- Share knowledge among teams 

Resources: 

- [Code Review](https://github.com/lyst/MakingLyst/tree/master/code-reviews)
- [Code Review Best Practices](https://www.kevinlondon.com/2015/05/05/code-review-best-practices.html>)

#### Questions to Ask Yourself When Conducting a Code Review

Is the code clean and modular?

- Can I understand the code easily?
- Does it use meaningful names and whitespace?
- Is there duplicated code?
- Can you provide another layer of abstraction?
- Is each function and module necessary?
- Is each function or module too long?

Is the code efficient?

- Are there loops or other steps we can vectorize?
- Can we use better data structures to optimize any steps?
- Can we shorten the number of calculations needed for any steps?
- Can we use generators or multiprocessing to optimize any steps?

Is documentation effective?

- Are in-line comments concise and meaningful?
- Is there complex code that's missing documentation?
- Do function use effective docstrings?
- Is the necessary project documentation provided?

Is the code well tested?

- Does the code high test coverage?
- Do tests check for interesting cases?
- Are the tests readable?
- Can the tests be made more efficient?

Is the logging effective?

- Are log messages clear, concise, and professional?
- Do they include all relevant and useful information?
- Do they use the appropriate logging level?

#### Tips for conducting a code review

- Use a code linter for checking coding standards and PEP 8
- Explain issues and make suggestions
- Keep comments objective 
- Provide code examples

In [3]:
print("Done")

Done
