# Principles of OOP Design

## Introduction
>Effective software design is based on the concept of __Intentional Design__. Code creation often involves the repeated use of the same algorithms. Thus, it is important to organise your code for the following reasons:

1. To save time when you or your team members have to revisit the code.
2. To reuse the same algorithm, but with different parameters.
3. To avoid the common pitfalls caused by extensive lines of code.
4. To increase code flexibility by incorporating placeholders in functions or methods.

In this notebook, you will learn the hierarchical structure of a project, which is the foundation of clean code. We will learn how to use Python's features to organise code and what level of granularity to consider for separation.

Before proceeding, run the following code to download a few necessary scripts for the lesson.

In [None]:
!wget "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/bar.py" "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/foo.py" 

## Concerns, Scope and Namespaces

Before we explore classes, let us define the three concepts: __concerns__, __scope__ and __Namespace__

### Concerns

In programming, a __concern__ is a unique behaviour presented by your code. 

For example, if you are extracting cat images from a website, the concerns can be
- connecting to the webpage.
- downloading the image.
- confirming the legitimacy of the used URL.

### Scope
The scope of an object defines the area of a program in which one can unambiguously access a name.

You may already be familiar with this concept, as well as the two general scopes: global and local.

1. __Global scope__: names are available throughout the __code__, even within functions.
2. __Local scope__: names are only available within this scope. For example, variables within a function are not accessible outside the function.

In [None]:
outside_variable = 'I am global!'

def awesome_function():
    print('The outside variable says: ' + outside_variable)
    awesomeness = 9001

# When running the function, it will run everything inside
# Notice that awesome_function does not return anything (Void function).
# Thus, it will only print out anything if there is a print statement INSIDE the function.
awesome_function()

print(awesomeness)

#### Local, enclosing, global and built-in scopes (LEGB)
An important concept associated with scopes in software engineering is LEGB:

- Local scope: contains the names that are defined within a function.
- Enclosing scope: only exists for nested functions. If the local scope is an inner or nested function, then the enclosing scope is the scope of the outer or enclosing function.
- Global scope: contains all the names that are defined at the top level of a program or a module.
- Built-in Scope: created whenever you run a script or open an interactive session. 

The LEGB rule determines the order in which Python searches for variables.


In [None]:
outside_variable = 'I am global!'

def awesome_function():
    print('I am in awesome function, the global variable says: ' + outside_variable)
    enclosing_variable = 'I am an enclosed variable!'
    def incredible_function():
        print('I am in incredible function, the global variable says: ' + outside_variable)
        print('I am in incredible function, the enclosed variable says: ' + enclosing_variable)
        local_variable = 'I am incredible, but since I am local I can\'t be used outside here :('
        return local_variable
    incredible_function()
    print('I am in awesome function, the global variable says: ' + local_variable)

awesome_function()

You may have observed that we employed `print` in all scopes for the above examples. This is because `print` is in the Built-in scope; therefore, it can be accessed anywhere. 

### Namespace

A namespace is a collection of currently defined symbolic names, along with information about the object that each name references.

In other words, namespaces are sets of names contained in a scope. To improve your understanding, check out [The Zen of Python](https://www.python.org/dev/peps/pep-0020/#id2).

The concepts of _Namespaces_ and _Scopes_ are similar; however, they are not the same. Python scopes are implemented as dictionaries that map names to objects, and the dictionaries are the namespaces.

Namespaces are useful for the following:

1. Minimising collision between identical names in different scripts.
2. Making educated guesses regarding the location of code.
3. Making educated guesses regarding the location for introducing new code.

When you open a Python interpreter, the `built-in` scope is populated with the objects built in Python, e.g. `print()` or `__name__`. The `__name__` attribute indicates the name of the file we are running; thus, when importing a module, the value `__name__` of that module will be its name. Let us import `foo.py` and view its `__name__`.

In [None]:
import foo
import bar
foo.__dict__

In [None]:
import foo
foo.__name__

Although it appears obvious, what do you think the value of `__name__` will be in this notebook (`Principles of OOP`)?

In [None]:
print(__name__)

The name of the file that opened the interpreter will be `__main__` (Now, the `if __name__ == '__main__'` statement appears practical).


## Namespaces and Imports


When a module is imported, Python creates an additional namespace for the module by creating a new dictionary. In this directory, there is another module named `foo.py`. Thus, when we import the module into our main script, the variables in `foo.py` are present in the `__main__` script; however, it will have a 'first name' corresponding to the name of the module.

In [None]:
import foo
x = 'I am "x" in this notebook'
print('Printing x: ' + x)
print('Printing foo.x: ' + foo.x)

Observe that `foo` is in the global scope; thus, we can call for it within a function:

In [None]:
def print_foo():
    print('I am in the outer function, and foo.x says: ' + foo.x)
    def nested_foo():
        print('I am in the inner function, and foo.x says: ' + foo.x)
    nested_foo()

print_foo()


The following image shows the levels of scope and how the namespace can be accessed from each level:
<p align=center><img src=https://github.com/AI-Core/Content-Public/blob/main/Content/units/Software-Engineering/2.%20Software%20Design/2.%20Principles%20of%20OOP%20Design/images/namespaces.png?raw=1 width=600></p>


>__Notably__, in Python, if the namespace already contains a module, the import statement will not work again. For example, if we import `foo` and subsequently make changes to `foo`, the changes will not be reflected if we import `foo` again. This is because Python will already have a `foo` module in its namespace.

In [None]:
import foo
print(foo.x)

We can see that `foo.x` is the same as `foo.py`. Now, we change the value in the main namespace.

In [None]:
foo.x = 'I changed...'
print(foo.x)

If we attempt to re-import `foo`, Python will search its namespace and confirm that a module named `foo` had already been imported. Hence, it will do anything.

In [None]:
import foo
print(foo.x)

### Handling methods with the same name
__One final note about namespaces and scopes:__ Python has many libraries, and some methods will unavoidably have the same name. For example, the `time` method appears in both the `time` and `datetime` modules.

In [None]:
from time import time
from datetime import time

print(time())

To decide which to use, consider that we are not importing the module, but only the methods. Python will overwrite previous names in the namespace; thus, it only takes the last import statement. To store both methods, you would have to apply the alternative approaches:

1. Import the module, and add the name of the module to the namespace.
2. Give an alias to the methods.

In [None]:
import datetime
import time

print(datetime.time())
print(time.time())

In [None]:
from datetime import time as dttime
from time import time as ttime

print(dttime())
print(ttime())

## Separation Rules in Python

> Do one thing and do it well

This is the Unix philosophy for separating concerns. Each part of your code should be __concerned__ with one behaviour, and each __CONCERN__ should be covered by only one piece of code.


### Functions for separating concerns


1. __Do not create two pieces of code that perform similar tasks__. For example, the concern of one part is used for extracting images of cats, while the concern of the other piece is used for extracting images of dogs. Instead, create a function that accepts an argument. 

Consider the following example of what you should not do (This code should work unless the XPath has been changed; nevertheless, the task is to download dog images):


In [None]:
from selenium import webdriver
import urllib.request
import time

driver = webdriver.Chrome()
# Get links for dogs
URL = 'https://unsplash.com/s/photos/dog'
driver.get(URL)
dog_list = driver.find_elements_by_xpath('//figure[@itemprop="image"]')
links = []
for dog in dog_list:
    links.append(dog.find_element_by_xpath('.//a').get_attribute('href'))
# Go to the link containing the image
for i, link in enumerate(links):
    driver.get(link)
    time.sleep(0.5)
    src = driver.find_element_by_xpath('//img[@class="oCCRx"]').get_attribute('src')
    urllib.request.urlretrieve(src, f"dog_{i}.jpg")
    
# Get links for cats
URL = 'https://unsplash.com/s/photos/cat'
driver.get(URL)
cat_list = driver.find_elements_by_xpath('//figure[@itemprop="image"]')
links = []
for cat in cat_list:
    links.append(cat.find_element_by_xpath('.//a').get_attribute('href'))
# Go to the link containing the image
for i, link in enumerate(links):
    driver.get(link)
    time.sleep(0.5)
    src = driver.find_element_by_xpath('//img[@class="oCCRx"]').get_attribute('src')
    urllib.request.urlretrieve(src, f"cat_{i}.jpg")

The preferred solution:

In [None]:
from selenium import webdriver

def get_animal_pictures(driver: webdriver, animal: str, root: str) -> None:
    URL = root + animal
    driver.get(URL)
    animal_list = driver.find_elements_by_xpath('//figure[@itemprop="image"]')
    links = []
    for item in animal_list:
        links.append(item.find_element_by_xpath('.//a').get_attribute('href'))
    # go to the link containing the image
    for i, link in enumerate(links):
        driver.get(link)
        time.sleep(0.5)
        src = driver.find_element_by_xpath('//img[@class="oCCRx"]').get_attribute('src')
        urllib.request.urlretrieve(src, f"{animal}_{i}.jpg") # <- We are also using the variable animal here.

driver = webdriver.Chrome()
root = 'https://unsplash.com/s/photos/'
animal = 'cat'
get_animal_pictures(driver, animal, root)


2. __Do not have the same piece of code with two concerns__. For example, a piece of code goes to the webpage to extract the links __and__ iterates through the links __and__ download the images. Instead, create a function for each concern.

Thus, instead of the code above, do the following:

In [None]:
from selenium import webdriver

def extract_links(driver: webdriver, animal: str, root: str) -> list:
    URL = root + animal
    driver.get(URL)
    animal_list = driver.find_elements_by_xpath('//figure[@itemprop="image"]')
    links = []
    for item in animal_list:
        links.append(item.find_element_by_xpath('.//a').get_attribute('href'))
    return links

def get_image_source(driver: webdriver, link: str) -> str:
    driver.get(link)
    time.sleep(0.5)
    src = driver.find_element_by_xpath('//img[@class="oCCRx"]').get_attribute('src')
    return src

def download_images(src: str, animal: str, i: int) -> None:
    urllib.request.urlretrieve(src, f"{animal}_{i}.jpg")


animal = 'cat'
root = 'https://unsplash.com/s/photos/'
driver = webdriver.Chrome()
links = extract_links(driver, animal, root)
for i, link in enumerate(links):
    src = get_image_source(driver, link)
    download_images(src, animal, i)

Although this approach of separating individual concerns into functions appears to involve more work, including writing more code, it will eventually pay off. When adding features, debugging, or testing code, it would be easy to identify the root cause because the events are __detached__.

> The higher the granularity, the more convenient the debugging process.

For example, using this approach, we can easily change the animal (flexibility). You will be able to separate scopes and namespaces (robustness), as well as increase readability, which is extremely important.

Although the code appears to be cramped into a single cell, the following is usually incorporated in the main code:

In [None]:
animal = 'cat'
root = 'https://unsplash.com/s/photos/'
driver = webdriver.Chrome()
links = extract_links(driver, animal, root)
for i, link in enumerate(links):
    src = get_image_source(driver, link)
    download_images(src, animal, i)

3. __Things to note on function names__

Before we proceed, note the following:

- __Be concise__: Name your function with a descriptive name; get_info() and do_this() are not very informative. However, do not overembellish, e.g. get_information_about_the_weather_by_scraping_multiple_pages().
- __Functions are actions__: Do not name your function with a name or subject. Functions are actions, and as such, they should contain a verb. For example, consider the function names: image_scraper(), rock_paper_scissor() and music_player(). These functions provide information, but they are not specific. Is the image_scraper retrieving something?
- __Use the name convention__: You can use any form of writing; however, you are encouraged to adhere to the convention. For example, GetImage() can be confused with a class. Functions should employ the snake_case style.

### Classes for connecting concerns

As you keep adding code to your project, an increasing number of concerns will be added. Over time, you will observe that functions frequently work in tandem. If you frequently pass the result of one function to another or if several functions require the same input, you should define a class. 

We can simply insert the functions in a class (this approach is not very efficient):

In [None]:
from selenium import webdriver

# Let us define our class
class AnimalScraper:
    def extract_links(driver: webdriver, animal: str, root: str) -> list:
        URL = root + animal
        driver.get(URL)
        animal_list = driver.find_elements_by_xpath('//figure[@itemprop="image"]')
        links = []
        for item in animal_list:
            links.append(item.find_element_by_xpath('.//a').get_attribute('href'))
        return links

    def get_image_source(driver: webdriver, link: str) -> str:
        driver.get(link)
        time.sleep(0.5)
        src = driver.find_element_by_xpath('//img[@class="_2UpQX"]').get_attribute('src')
        return src

    def download_images(src: str, animal: str, i: int) -> None:
        urllib.request.urlretrieve(src, f"{animal}_{i}.jpg")

In [None]:
scraper = AnimalScraper()
root = 'https://unsplash.com/s/photos/'
driver = webdriver.Chrome()
animal = 'cat'
links = scraper.extract_links(driver=driver, animal=animal, root='https://unsplash.com/s/photos/')
for i, link in enumerate(links):
    src = scraper.get_image_source(driver=driver, link=link)
    scraper.download_images(src=src, animal=animal, i=i)

The above appears to have made no difference. This is because we are not exploiting the full potential of classes. In the cell above, observe the variables for each method: they are repeated and/or depend on other methods to be run. Instances created from classes can store values in attributes. When a class is constructed, the `__init__` method is used, and values are assigned to `self`.

In [None]:
from selenium import webdriver
import time
# Let's define our class
class AnimalScraper:
    def __init__(self, animal, homepage):
        self.animal = animal
        self.homepage = homepage
        self.driver = webdriver.Chrome()
        #self.links = [] # Initialise links, so that if the user calls for get_image_source, it does not throw an error.
    
    def extract_links(self) -> None:
        self.driver.get(self.homepage + self.animal)
        animal_list = self.driver.find_elements_by_xpath('//figure[@itemprop="image"]')
        self.links = []
        for item in animal_list:
            self.links.append(item.find_element_by_xpath('.//a').get_attribute('href'))
        return self.links

    def get_image_source(self, link: str) -> None:
        self.driver.get(link)
        time.sleep(0.5)
        self.src = self.driver.find_element_by_xpath('//img[@class="oCCRx"]').get_attribute('src')

    def download_images(self, i) -> None:
        urllib.request.urlretrieve(self.src, f"./animals/{self.animal}_{i}.jpg")
    
    def get_animal_images(self):
        all_links = self.extract_links()
        for i, link in enumerate(all_links):
            self.get_image_source(link)
            self.download_images(i)
        self.links = []

Now, the main code will have the following appearance:

In [None]:
cat_scraper = AnimalScraper('cat', 'https://unsplash.com/s/photos/')
cat_scraper.get_animal_images()

This is a considerable improvement over the previous code, and as a bonus, the user cannot easily access some restricted variables (for example, extract_links does not return anything).

> It is worth noting that the process of defining classes and refactoring code is an art, and as such, mastering it requires time and consistency.

## Conclusion
At this point, you should have a good understanding of

- namespaces, scopes, and concerns.
- how to separate concerns and the importance of this.
- how to exploit functions and classes to improve your code.