# Organising Data

- These refactoring techniques deal with data handling, letting you replace primitives (with limited functionality) with more appropriate classes

- It also helps to disentangle class associations, which improves portability and reusability

## 1. Self-Encapsulate Field

- Problem
    - You directly access private fields in a class from outside it

- Solution
    - Create a getter and setter, and only uses these for accessing the field.
    - In Python, you can do this by creating "hidden" fields e.g. `self._protected_field`, and using `@property` decorator to define getters and setters to restrict access

- Motivation
    - There are occasions when the usual getting and setting is insufficient, you want to do some logical checks for the field
    - e.g. when setting phone number, probably good to check if the number of digits is correct 

- Drawbacks
    - This logic is useful, but also adds clutter. If you need a getter and setter for every field, then the number of methods you have multiplies by 3

- How to refactor
    - Create a getter (and optional setter) for the field. They should be either protected or public.
    - Find direct invocations and replace with getter/setter methods
    - In Python, idiomatic way is to use @property

- Relationships with other refactoring methods
    - Similar
        - Encapsulate Field
    - Related
        - Duplicate Observed Data
        - Replace Type Codes with Subclasses
        - Replace Type Code with State/Strategy

- Example:

In [None]:
class bad__Range():
    def __init__(self, low, high):
        self.high = high
        self.low = low

    def check_value(self, value):
        return (value >= self.low) and (value <= self.high)


class good__Range():
    def __init__(self, low, high):
        self._high = high
        self._low = low

    @property
    def high(self):
        return self._high

    @high.setter
    def high(self, value):
        self._high = value

    @property
    def low(self):
        return self._low

    @low.setter
    def low(self, value):
        self._low = value

    def check_value(self, value):
        return (value >= self.low) and (value <= self.high)

## 2. Replace Data Value with Object

- Problem
    - You have a class `A` that has some property `p` stored as a primitive, when ideally `p` can and should have richer data fields and functionality

- Solution
    - Turn `p` into a class and store it as a field in `A`

- Motivation
    - Makes the responsibility of objects clearer
    - Group methods and data into a single relevant class

- How to refactor
    - Create a new class for `p` and pass it into `A`

- Relationships with other refactoring methods
    - Similar
        - Extract Class
        - Introduce Parameter Object
        - Replace Array with Object
        - Replace method with method object

- Related code smells
    - Duplicate Code

- Example:

In [1]:
class bad__Order():
    def __init__(self, customer: str):
        self.customer: str = customer

###

class Customer():
    def __init__(self, name: str):
        self.name: str = name

class good__Order():
    def __init__(self, customer: Customer):
        self.customer: Customer = customer


## 3. Change Value to Reference

- Problem
    - You have many instances of class `A` that contains a reference to class `B`, where every instance of `B` is identical
    - If you instantiate `B` separately for each instance of `A`, you have many copies of `B` floating around in memory. This is inefficient
    - It is also an issue if you need to modify `B` in your code, because you need to modify all instances

- Solution
    - Rather than instantiating identical copies of `B`, for every `A`, instantiate `B` once as a single reference object

- Motivation
    - In most systems, objects are either values or references
        - Reference is where 1 object in your programme corresponds to 1 real world object
        - Values (instances) is where many objects in your programme correspond to the same real world object
            - For example, if I store '2024-01-15' as date1 and '2024-01-15' as date2, the same date is stored in 2 different objects
    - When you have something that changes in your code, and you want it to be consistent across your code, you should store a reference to the same object

- How to refactor
    - Store a place to call an object by some identifier, rather than calling the constructor for each new instance

- Relationships with other refactoring methods
    - Opposite 
        - Change Reference to Value

- Example:

In [3]:
customer_map = {}

class Customer():
    def __init__(self, id: int):
        self.id = id

def create_customer(cid: int):
    ## Lookup map before creating new instance, and return existing intance if it exists
    if cid in customer_map:
        return customer_map.get(cid)
    else:
        customer_map[cid] = Customer(cid)
    return Customer(cid)
customer1 = create_customer(1)
customer1 = create_customer(1)

{1: <__main__.Customer at 0x10620f0d0>}

- In fact, in Python, it's possible to modify the class constructor to achieve this without the need for an external hashmap

In [10]:
class Customer():
    ## Store a map to check if the object with the relevant value has already been created
    _created = {}

    def __new__(cls, name):
        if name in cls._created:
            return cls._created.get(name)
        else:
            ## If object is not yet created, call the __new__ method from the `object` superclass to create a new instance of the class
            cls._created[name] = super().__new__(cls)

    def __init__(self, name):
        self.name = name

ca = Customer('a')
cb = Customer('b')
Customer._created

{'a': <__main__.Customer at 0x1063ccee0>,
 'b': <__main__.Customer at 0x10651d9f0>}

## 4. Change Reference to Value

- Problem
    - Opposite problem to 3. Change Value to Reference
    - An object is too small/infrequently access to justify managing its life cycle

- Solution
    - Just turn it into a value object, instead of a reference object

- Motivation
    - Working with references is much harder than with value objects
    - For one, not all reference management approaches are thread safe. So if you don't implement properly mutex in multi-threaded programmes, you can still end up instantiating multiple copies of a reference object

- How to refactor
    - Remove the life cycle management and instantiation checks, and just create objects on the fly

- Drawbacks
    - If you have multiple of these value objects that need to change in tandem, you have to add checks for such behaviour

- Relationships with other refactoring methods
    - Opposite 
        - Change Value to Reference

- Example (see 3. Change Value to Reference)

## 5. Replace Array with Object

- Problem
    - You have some data stored in a primitive (array, hashmap) that you're passing around 

- Solution
    - Group it into a single object with proper field names

- Motivation
    - If you pass an array around, your code will need to haphazardly track the indices of the objects you want
        - For example, array[0] is username, and array[1] is password
    - This is unclear to future devs, and also unsafe

- How to refactor
    - Create a proper objects and store data as proper field names

- Relationships with other refactoring methods
    - Similar
        - Replace data value with object

- Related code smells
    - Primitive Obsession

- Example:

In [1]:
bad__credentials = ['myusername', 'mypassword']

class Credentials():
    def __init__(self, username: str, password: str):
        self.username: str = username
        self.password: str = password

good__credentials = Credentials(username='abc', password='def')

## 6. Duplicate Observed Data

- Problem
    - You code has a pattern where you have objects that perform operations on some data, where the data reflected should be consistent
        - For example, in something like webserving, suppose you want to flash a pop up when someone scrolls to some part of a page, and to play a sound
        - Both of these are likely to be handled by separate objects
        - But the "state" of the webpage must be consistent across both objects. 
        - However, your `Page` class and `SoundPlayer` both have the page's state instantiated separately, and must be updated independently of each other

- Solution
    - Write a class that is solely responsible for determining the state of a page, and all objects' behaviour relying on page state must get it from this `State` object

- Motivation
    - Avoid duplication in code, or worse, bugs resulting from inconsistent "ground truth"

- How to refactor
    - Pull out the thing where you need to be consistent into a "source of truth" class
    - Make references to the class

- Related Design Pattern
    - Observer

- Related code smells
    - Large Class

- Example:

In [9]:
class bad__A():
    def __init__(self, data):
        self.data = data

    def process_data(self):
        print(self.data)

class bad__B():
    def __init__(self, data):
        self.data= data

    def process_data(self):
        print(self.data)

## No source of truth for dynamic data, can lead to inconsistent processing outcomes between classes that should rely on the same data in theory
data = 10
a = bad__A(data)
a.process_data()
data = 20
b = bad__B(data)
a.process_data()
b.process_data()


10
10
20


In [7]:
class DataObject():
    def __init__(self, data):
        self._data = data

    @property
    def data(self):
        return self._data
    
    @data.setter
    def data(self, value):
        self._data = value

class good__A():
    def __init__(self, data):
        self.data_obj: DataObject = data

    def process_data(self):
        print(self.data_obj.data)

class good__B():
    def __init__(self, data):
        self.data_obj: DataObject = data

    def process_data(self):
        print(self.data_obj.data)

## No source of truth for dynamic data, can lead to inconsistent processing outcomes between classes that should rely on the same data in theory
data = DataObject(10)
a = good__A(data)
a.process_data()
b = good__B(data)
data.data = 20
a.process_data()
b.process_data()


10
20
20


## 7. Change Unidirectional Association to Bidirectional

- Problem
    - You have two classes that rely on each other, but the relationship is only one directional

- Solution
    - Change it to 2 directional

- Motivation
    - Minimise the number of lines in a method, which makes it easier to figure out what it does

- Drawbacks
    - Bidirectional relationships are much harder to maintain
    - Bidirectional relationships make classes very tightly coupled. Don't introduce it for no reason

- How to refactor
    - Add field to hold the relation in both objects

- Relationships with other refactoring methods
    - Opposite 
        - Change Bidirectional Association to Unidirectional

- Example:

## 8. Change Bidirectional Association to Unidirectional

- Problem
    - You have two classes where one class relies on the other, but you have defined a relationship to the other in both classes

- Solution
    - Change it to unidirectional

- Motivation
    - Minimise the number of dependencies
    - Minimise bidirectional associations, because it can lead to memory bloat (garbage collection doesn't work if objects are referenced), and leads to more complex dependency structure

- How to refactor
    - Remove unused association

- Relationships with other refactoring methods
    - Opposite 
        - Change Unidirectional Association to Bidirectional

- Related code smells
    - Inappropriate Intimacy

- Example:

## 9. Replace Magic Number with Symbolic Constant

- Problem
    - You have a constant in your code that is used without explanation

- Solution
    - Assign constant to a variable instead, and call the variable

- Motivation
    - There should be no random numbers in your code. Naming something is its own documentation

- How to refactor
    - Set value to a variable, and call the variable instead

In [None]:
PI = 3.1415
length = 5
bad__circle_area = 3.1415 * (length**2)
good__circle_area = PI * (length**2)

## 10. Encapsulate Field

- Problem
    - You have a public field

- Solution
    - Make private, and create getters and setters for it

- Motivation
    - You don't always want all components of your code to be visible and modifiable. Because it can run into errors, or expose data externally

- How to refactor
    - Set fields to private by default, and only choose public when necessary

- Drawbacks
    - If fields are accessed very often, it can be faster to do direct access than to call a method

- Relationships with other refactoring methods
    - Similar
        - Self Encapsulate Fields
- Related code smells
    - Data Class

- Example:

In [20]:
class bad__Person():
    def __init__(self):
        self.name = 'abc'
    
class good__Person():
    def __init__(self):
        self.__name = 'abc'
    
    @property 
    def name(self):
        return self.__name
    
    @name.setter
    def name(self, value):
        raise ValueError('Disallowed')

p1 = good__Person()
# p1.name = 'name'

## 11. Encapsulate Collection

- Problem
    - A class contains a collection field (i.e. an array) and a simple getter and setter for working with the collection
    - But since it is a collection, it allows for modification

- Solution
    - Put collection in an unmodifiable data structure (e.g. `frozenset` in Python)
    - And create methods to add/delete from it

- Motivation
    - There are times when you want to have a fixed array that is modifiable only in specific circumstances

- How to refactor
    - Use frozenset

- Related code smells
    - Data Class

- Example:

In [26]:
class bad__Person():
    def __init__(self):
        self._bank_accounts: list[str] = ['dbs', 'ocbc', 'uob']
    
    @property
    def bank_accounts(self):
        return self._bank_accounts
    
    @bank_accounts.setter
    def bank_accounts(self, value):
        if type(value) != list:
            raise ValueError('not a list')
        self._bank_accounts = value
    
p1 = bad__Person()
print(p1._bank_accounts)
## Direct access, which is undesirable
p1._bank_accounts.append('not a bank')
print(p1._bank_accounts)

['dbs', 'ocbc', 'uob']
['dbs', 'ocbc', 'uob', 'not a bank']


In [35]:
class good__Person():
    def __init__(self):
        self._bank_accounts: frozenset[str] = frozenset(['dbs', 'ocbc', 'uob'])
    
    @property
    def bank_accounts(self):
        return self._bank_accounts
    
    @bank_accounts.setter
    def bank_accounts(self, value):
        if type(value) != list:
            raise ValueError('not a list')
        self._bank_accounts = frozenset(value)

    def add_bank_account(self, value):
        temp = list(self._bank_accounts) + [value]
        self._bank_accounts = frozenset(temp)

    def remove_bank_account(self, value):
        temp = list(self._bank_accounts)
        if value in temp:
            temp.remove(value)
        self._bank_accounts = frozenset(temp)
        
p2 = good__Person()
print(p2._bank_accounts)
# p2._bank_accounts.append('not a bank')
p2.add_bank_account('not a bank')
print(p2._bank_accounts)

['dbs', 'ocbc', 'uob']
['dbs', 'ocbc', 'uob', 'not a bank']
frozenset({'uob', 'ocbc', 'dbs'})
frozenset({'dbs', 'ocbc', 'uob', 'not a bank'})


## 12. Replace Type Code with Class

- Problem
    - You have a bunch of "type code" (i.e. code that informs users of the exact values that some fields can take)
    - But they aren't used in any of your logical operators or methods. It's just there as info

- Solution
    - Group them into a new class (specifically, in Python's case, an `Enum`)

- Motivation
    - Much clearer for someone to read your code
    - Static type checkers (mypy) can validate if values provided fall outside the defined `Enum`

- How to refactor
    - Make an enum containing the types you want to define
    - Set the field to that enum type

- When NOT to use
    - Remember, in this case, we make the assumption that the type code doesn't affect any actual methods/flows in your class!!
    - If they do, consider the next 2 strategies "Replace Type Code with Subclasses" and "Replace Type Code with State/Strategy"

- Relationships with other refactoring methods
    - Similar
        - Replace Type Code with Subclasses
        - Replace Type Code with State/Strategy

- Related code smells
    - Primitive Obsession

- Example:

In [None]:
from enum import Enum

class BloodType(Enum):
    A = 1
    B = 2
    AB = 3
    O = 4
    
class Person():
    def __init__(self):
        self.valid_bloodtype: BloodType = BloodType.AB
        # self.invalid_bloodtype: BloodType = 'AB'


## 13. Replace Type Code with Subclasses

- Problem
    - You have a bunch of "type code" that is used to control the behaviour of your class

- Solution
    - Make subclasses and implement the different behaviours using polymorphism in the subclass

- Motivation
    - Maintaining type code is always problematic, because you need to worry about the programme mutating them accidentally, or typos in the input etc.
    - When things go wrong, it's not always obvious why

- How to refactor
    - Create a parent class, and add each of the logical flows into a subclass

- When NOT to use
    - If the value of the "type" can change dynamically, this method doesn't make sense, because you need to invoke different classes and different times
    - In such a case, use composition with the next approach "Replace Type Code with State/Strategy" and 

- Relationships with other refactoring methods
    - Opposite
        - Replace Subclass with Fields
    - Similar
        - Replace Type Code with Class
        - Replace Type Code with State/Strategy

- Related code smells
    - Primitive Obsession

- Example:

In [None]:
from enum import Enum
from abc import ABC, abstractmethod

class BloodType(Enum):
    A = 1
    B = 2
    AB = 3
    O = 4
    
class bad__Person():
    def __init__(self):
        self.bloodtype: BloodType = BloodType.AB
        # self.invalid_bloodtype: BloodType = 'AB'

    def do_something(self):
        if self.bloodtype == BloodType.AB:
            print('BloodType is AB')
        elif self.bloodtype == BloodType.O:
            print('BloodType is O')
        else:
            print('BloodType is boring')

######

class good__Person(ABC):

    @property
    @abstractmethod
    def bloodtype(self):
        ...

    @abstractmethod
    def do_something(self):
        ...

class PersonAB(good__Person):
    def __init__(self):
        self._bloodtype: BloodType = BloodType.AB
    
    @property
    def bloodtype(self): #type: ignore
        return self._bloodtype
    
    def do_something(self):
        print('BloodType is AB')

class PersonA(good__Person):
    def __init__(self):
        self._bloodtype: BloodType = BloodType.A
    
    @property
    def bloodtype(self): #type: ignore
        return self._bloodtype
    
    def do_something(self):
        print('BloodType is boring')

## 14. Replace Type Code with State/Strategy

- Problem
    - You have a bunch of "type code" that is used to control the behaviour of your class
    - AND you can't use subclassing to make it go away because the "type" may change dynamically

- Solution
    - Create a new class that holds the "source of truth" at any point in time
    - And pass it into your class holding the method you are trying to run via composition
    - Note that both state/strategy involves creating a new class/classes and implementing the methods that the context class will run
        - The difference is that in state, we expect that the state classes can and will be updated constantly, so the context class should have the ability to update the state

- Motivation
    - Again, this avoids convolution in the behaviour defined in your method
    - Using a "composition" pattern to pass in the state/strategy class lets you easily extend the code when new states occur (you just make a new class with the same interface!)

- How to refactor
    - Create a state/strategy class
    - Pass it into the class containing the operation you want to do (i.e. the context class) via composition

- Drawbacks
    - If you have many states/strategies, you can end up with many classes

- Relationships with other refactoring methods
    - Opposite
        - Replace Subclass with Fields
    - Similar
        - Replace Type Code with Class
        - Replace Type Code with Sub class

- Related code smells
    - Primitive Obsession

- Related Design Patterns
    - State
    - Strategy

- Example:

In [38]:
from abc import ABC, abstractmethod
from enum import Enum

class BloodType(Enum):
    A = 1
    B = 2
    AB = 3
    O = 4

class Person(ABC):
    @property
    @abstractmethod
    def bloodtype(self):
        ...

class ABPerson(Person):
    def __init__(self):
        self._bloodtype = BloodType.AB
    
    @property
    def bloodtype(self): #type: ignore
        return self._bloodtype

class OPerson(Person):
    def __init__(self):
        self._bloodtype = BloodType.O
    
    @property
    def bloodtype(self): #type: ignore
        return self._bloodtype


class BloodTypePrinter():
    def __init__(self, person: Person): 
        self.person: Person = person
    
    def print_bloodtype(self):
        print(self.person.bloodtype)

    def update_bloodtype(self, person: Person):
        self.person: Person = person

test = BloodTypePrinter(ABPerson())
test.print_bloodtype()
test.update_bloodtype(OPerson())
test.print_bloodtype()

BloodType.AB
BloodType.O


## 15. Replace Subclass with Fields

- Problem
    - You have multiple subclasses where the only difference between them is that they return different values in their fields (attributes)

- Solution
    - Instead of splitting these into subclasses, just put them into a single parent class and distinguish them by putting different fields into the constructor

- Motivation
    - Minimise the number of unnecessary classes

- Relationships with other refactoring methods
    - Opposite 
        - Replace Type Code with Subclasses

- Example:

In [None]:
from abc import ABC, abstractmethod
from enum import Enum

def Person(ABC):
    
    @property
    @abstractmethod
    def gender(self):
        ...

def Male(Person):
    def __init__(self):
        self._gender = 'M'

    @property
    def gender(self):
        return self._gender
    
def Female(Person):
    def __init__(self):
        self._gender = 'F'

    @property
    def gender(self):
        return self._gender
    
class Gender(Enum):
    M = 1
    F = 2

def good__Person():
    def __init__(self, gender: Gender):
        self.gender = gender
    