
<h1 style="text-align: center;"><a title="Data Science-AIMS-Cmr-2021-22">Chapter 1: Dataclasses </h1>

**Instructor:** 

* Rockefeller


##  **Learning Objectives**

By the end of this lesson, you will be able to:
1. Understand the difference between traditional Python classes and dataclasses
2. Create dataclasses to represent structured data
3. Use type hints to make your code more readable and maintainable
4. Work with nested dataclasses for complex data structures



###  **1. From Python Classes to Dataclasses**

### Motivation

Classes are the blueprint for creating objects in Python. They allow you to group related data (called attributes) and functionality ( called methods) into a singla, organized unit. 

Traditional Python classes require a lot of **boilerplate code**. Every time you want to create a class that just **stores data**, you end up writing:

  * An `__init__` method
  * A `__repr__` method for nice printing
  * Sometimes `__eq__` for comparisons

That’s fine, but it gets repetitive when you have dozens of data-focused classes.

**Dataclasses** in Python were introduced to **reduce this boilerplate** while keeping the clarity of classes.

### **2. The Difference Between Classes and Dataclasses**

* **Normal Class**

  * You write all the methods (`__init__`, `__repr__`, etc.) yourself.
  * Flexible, but verbose.
* **Dataclass**

  * Python automatically generates `__init__`, `__repr__`, `__eq__`, etc.
  * Best for classes whose main job is to **store structured data**.
  * Still lets you add custom methods if needed.

In short:
**Classes**  are general-purpose tools and **Dataclasses** are specialized tools for data modeling


### **3. The Importance of Dataclasses (especially for data-centric tasks)**

1. **Cleaner Code** – Less boilerplate, more focus on what matters.
2. **Readability** – The structure of data is obvious at first glance.
3. **Maintainability** – Fewer lines of code → fewer bugs.
4. **Extensibility** – You can still add methods like in normal classes.
5. **Data-Centric Tasks** – Perfect for representing rows in a dataset, configurations, parameters, structured inputs/outputs.


### **4. Examples – Simple Dataclass Creation**

In [None]:
from dataclasses import dataclass

@dataclass
class Student:
    name: str
    age: int
    height: float #in meters

In [None]:
from dataclasses import dataclass
from typing import List , Dict , Optional, Tuple, Literal


@dataclass
class Student:
    name: str
    age: int
    height: float #in meters


* No `__init__`, no `__repr__` needed.
* Usage:

In [3]:
student_0001 = Student( name= "Zeinab" , age= 24 , height=1.75 )
print(student_0001)  

Student(name='Zeinab', age=24, height=1.75)



###  **Understanding What Just Happened**

When you use the `@dataclass` decorator, Python automatically creates several special methods for you:

1. **`__init__()`** - Initializes the object with the fields you defined
   - You don't have to write `self.name = name`, etc.
   - All fields become parameters to `__init__` automatically

2. **`__repr__()`** - Provides a nice string representation when you print the object
   - Shows the class name and all field values
   - Very useful for debugging

3. **`__eq__()`** - Allows you to compare two objects for equality
   - Two students with the same data will be considered equal
   - Useful for testing and deduplication

**Try it yourself:** Create another student and print it. Notice how readable the output is!


In [None]:
# TODO ## Write your code here




In [6]:
@dataclass
class Book:
    title: str
    author: str
    pages: int 
    year_of_release : int

In [7]:
book_01 = Book(title="Balafon" , author="Engelber Mveng",
                pages=250 , year_of_release=1981 )
print(book_01)

Book(title='Balafon', author='Engelber Mveng', pages=250, year_of_release=1981)



### ✏️ **Practice Exercise 1: African Literature**

Create dataclass instances for these famous African books:
1. "Things Fall Apart" by Chinua Achebe (Nigeria), 209 pages, published 1958
2. "Half of a Yellow Sun" by Chimamanda Ngozi Adichie (Nigeria), 433 pages, published 2006
3. "Nervous Conditions" by Tsitsi Dangarembga (Zimbabwe), 204 pages, published 1988

**Tasks:**
- Create three Book instances with the data above
- Store them in a list called `african_books`
- Print all three books
- Calculate and print the average number of pages
- Find and print the oldest book

**Hint:** Use a for loop to iterate through the list!


In [None]:
## TODO : Write your answer here





Lets look at another example

In [11]:
@dataclass
class AIMSCourse:
    block_category: str
    name: str
    lecturer: str
    students_group: Literal["Regular Master" ,"COOP" ]
    area: dict[str, str]

In [12]:
course_001 = AIMSCourse(block_category="Skill block",
                        name='Data prep', lecturer="Mr Rockefeller",
                        students_group='COOP',
                        area= {"1": "data science", "2": "Programming"})

course_001

AIMSCourse(block_category='Skill block', name='Data prep', lecturer='Mr Rockefeller', students_group='COOP', area={'1': 'data science', '2': 'Programming'})

In [25]:
course_001.area


{'1': 'data science', '2': 'Programming'}

## Let's model students across the AIMS Network

In [None]:
@dataclass
class AIMSStudent:
    """Represents a student at AIMS centers across Africa"""
    name: str
    country: str
    program: str
    center: str
    batch: str

    # def __repr__(self):
    #     return f"Hey, my name is {self.name}, student from {self.center}, batch {self.batch}.  I am originally from {self.country}"


    

In [21]:
aims_sen_001 = AIMSStudent(name="Tidiane Fall", 
                           country="The Gambia", 
                           program="Regular Master",
                           center= "AIMS Senegal",
                           batch = "2025")



aims_sen_001

AIMSStudent(name='Tidiane Fall', country='The Gambia', program='Regular Master', center='AIMS Senegal', batch='2025')

In [23]:
aims_rw_001 =AIMSStudent(name="Ali Gassama",
                         country="Burkina Faso",
                         program=" COOP",
                         center=" AIMS Rwanda",
                         batch= "2019")

aims_rw_001

AIMSStudent(name='Ali Gassama', country='Burkina Faso', program=' COOP', center=' AIMS Rwanda', batch='2019')

## Another example : Representing Coordinates of an object in an x,y axis

In [None]:
@dataclass
class Point:
    x: int
    y: int

p1 = Point(x = 1 , y = 2)
p2 = Point(x = -2 , y= 10.5)


#print(p1 == p2)  

False


# Representing a rectangle and modeling its behaviour

In [35]:
@dataclass
class Rectangle:
    width: float
    height: float

    def area(self) -> float:
        return self.width * self.height

    def perimeter(self) -> float:
        return 2 * (self.width + self.height)

In [36]:
fig_001 = Rectangle(width=10 , height= 15)
fig_001

Rectangle(width=10, height=15)

In [40]:
area = fig_001.area() 
perimeter = fig_001.perimeter()
area, perimeter

(150, 50)


## **Understanding Type Hints in Dataclasses**

Type hints are annotations that tell Python (and other developers) what type of data each field should contain.

### Common Type Hints:

| Type Hint | Meaning | Example Values | Use Case |
|-----------|---------|----------------|----------|
| `str` | Text/string | `"Amina"`, `"Nigeria"` | Names, addresses, descriptions |
| `int` | Whole number | `24`, `2024`, `-5` | Ages, years, counts |
| `float` | Decimal number | `1.75`, `88.5`, `3.14` | Heights, grades, prices |
| `bool` | True/False | `True`, `False` | Flags, status indicators |
| `List[str]` | List of strings | `["Python", "Data Science"]` | Tags, skills, courses |
| `Dict[str, int]` | Dictionary | `{"age": 24, "year": 2024}` | Mappings, configurations |
| `Optional[str]` | String or None | `"value"` or `None` | Optional fields |
| `Literal["A", "B"]` | Specific values only | `"A"` or `"B"` | Restricted choices |

### Why Type Hints Matter:

1. **Self-Documentation** - Code explains itself
   ```python
   # Without type hints - unclear
   def process(data):
       return data * 2
   
   # With type hints - crystal clear
   def process(data: int) -> int:
       return data * 2
   ```

2. **IDE Support** - Your code editor can:
   - Provide better autocomplete suggestions
   - Warn you about potential type errors
   - Show you what methods are available


3. **Team Collaboration** - Other developers understand your code faster

### ⚠️ Important Note:

In regular dataclasses, type hints are **NOT enforced at runtime**. They're mainly for documentation and tooling.

```python
@dataclass
class Student:
    name: str
    age: int

# This will NOT raise an error (but it should!)
student = Student(name=123, age="twenty")
```

**Coming up:** In the next lesson, we'll learn about Pydantic, which DOES enforce types at runtime!


In [26]:
from typing import Optional

@dataclass
class AIMSEmployee:
    name: str
    department: Optional[str] 

In [27]:
emp_001 =  AIMSEmployee(name="Mariama Diop", department="Kitchen")
emp_002 = AIMSEmployee(name="Joel", department="Academic")
emp_003 = AIMSEmployee(name="Timothee", department="Logistics")


In [28]:
emp_003.__dict__

{'name': 'Timothee', 'department': 'Logistics'}

In [30]:
aims_employees  = []

aims_employees.append(emp_001.__dict__)
aims_employees.append(emp_002.__dict__)

aims_employees.append(emp_003.__dict__)

aims_employees

[{'name': 'Mariama Diop', 'department': 'Kitchen'},
 {'name': 'Joel', 'department': 'Academic'},
 {'name': 'Timothee', 'department': 'Logistics'}]

In [None]:
import json
with open('all_aims_employees.json' , 'w') as file:
    json.dump(aims_employees , file , indent=8)


Unnamed: 0,name,department
0,Mariama Diop,Kitchen
1,Joel,Academic
2,Timothee,Logistics


In [None]:

import pandas as pd
df = pd.read_json('all_aims_employees.json')
df

Now, lets look at the deparment and introducing Enums

Sometimes, you might want to represent a fixed set of related, unchanging constant values in a safe and readable way. Rather than using "magic strings" or numbers, you can use Python enums. 

Short for enumerations, they provide a way to create a set of symbolic names (members) bound to unique, constant values.  Enums provides a designated namespace for these constants, which improves code clarity and reduces errors.

In [1]:
from enum import Enum
class AIMSDepartment(str, Enum):
    KITCHEN = "kitchen"
    ACADEMIC = "Academic"
    LOGISTICS ="Logistics"


In [55]:
AIMSDepartment.ACADEMIC

<AIMSDepartment.ACADEMIC: 'Academic'>

In [56]:
from typing import Optional

@dataclass
class AIMSEmployeeSystem:
    name: str
    department: AIMSDepartment



In [57]:
emp_001 =  AIMSEmployeeSystem(name="Mariama Diop", department=AIMSDepartment.KITCHEN)
emp_002 = AIMSEmployeeSystem(name="Joel", department=AIMSDepartment.ACADEMIC)
emp_003 = AIMSEmployeeSystem(name="Timothee", department=AIMSDepartment.LOGISTICS)

emp_001

AIMSEmployeeSystem(name='Mariama Diop', department=<AIMSDepartment.KITCHEN: 'kitchen'>)

In [58]:
aims_employees_002  = []

aims_employees_002.append(emp_001.__dict__)
aims_employees_002.append(emp_002.__dict__)

aims_employees_002.append(emp_003.__dict__)

aims_employees_002

[{'name': 'Mariama Diop', 'department': <AIMSDepartment.KITCHEN: 'kitchen'>},
 {'name': 'Joel', 'department': <AIMSDepartment.ACADEMIC: 'Academic'>},
 {'name': 'Timothee', 'department': <AIMSDepartment.LOGISTICS: 'Logistics'>}]

In [59]:
aims_employees_002

[{'name': 'Mariama Diop', 'department': <AIMSDepartment.KITCHEN: 'kitchen'>},
 {'name': 'Joel', 'department': <AIMSDepartment.ACADEMIC: 'Academic'>},
 {'name': 'Timothee', 'department': <AIMSDepartment.LOGISTICS: 'Logistics'>}]

In [60]:
import json
with open('all_aims_employees_001.json' , 'w') as file:
    json.dump(aims_employees_002 , file , indent=8)



In [None]:
from enum import Enum

class StudentGroup(str, Enum):
    REGULAR = "Regular Program"
    COOP = "Coop Program"


### **5. Mixing Classes and Dataclasses**

Sometimes, you’ll want the **best of both worlds**:

* Use dataclasses for **data storage**
* Use normal methods for **behavior**


### Nested Dataclasses


In [61]:
@dataclass
class Address:
    street: str
    city: str

@dataclass
class Person:
    name: str
    age: int
    address: Address

In [64]:
addr_001 = Address(street="Kwame Krumah Avenue", 
                   city="Ghana")

addr_002 = Address(street="Cimitiere Yoff", city="Dakar")





In [65]:
person_001 = Person(name="Alice", 
                    age=26,
                    address=addr_001)

person_002 = Person( name="Mariama", age=31, address=addr_002)

person_001


Person(name='Alice', age=26, address=Address(street='Kwame Krumah Avenue', city='Ghana'))

In [66]:
person_002

Person(name='Mariama', age=31, address=Address(street='Cimitiere Yoff', city='Dakar'))

### Mixing Normal Class and Dataclass