![](../header.jpg)

# DataClass

Kevin J. Walchko, Phd

19 July 2020

---

`dataclass` compared to other alternatives:

- `namedtuple`: is an alternative, but has the issue of
    - immutable
    - doesn't know what type it is except for type `tuple`
    - difficult to add default values (sort of)
- `attrs`: have been around a long time and an inspiration with a lot of similarities
    - mutable, supports `frozen` also
    - not in the standard library, need to do: `pip install attrs`
    - supported by python 2.7 and >3.4 
- `dataclass`
    - mutable but can create `frozen` class
    - supported by python >=3.7
    - if parent class has a default value, child class *must* have defaults

There is a backport of `dataclass` to 3.6, install with: `pip install dataclasses`

## References

- Python docs: [dataclasses](https://docs.python.org/3/library/dataclasses.html)
- Real Python: [dataclasses](https://realpython.com/python-data-classes/)

# `dataclass`

In [33]:
from dataclasses import dataclass, field
from dataclasses import asdict, astuple
from dataclasses import InitVar
import time

In [34]:
@dataclass
class C:
    a: int       # 'a' has no default value
    b: int = 0   # assign a default value for 'b'
        
c = C(1)
print(c)

C(a=1, b=0)


In [35]:
@dataclass
class Point:
    x: int
    y: int

In [45]:
p = Point(10, 20)
print(p)
print(p.__sizeof__()) # always 32
print(type(p.x), type(p.y))
print(asdict(p))

# represent objects as dicts or tuples
assert asdict(p) == {'x': 10, 'y': 20}
assert astuple(p) == (10, 20)

Point(x=10, y=20)
32
<class 'int'> <class 'int'>
{'x': 10, 'y': 20}


In [37]:
pp = astuple(p)
print(pp)
ppp = Point(*pp)
print(ppp)

(10, 20)
Point(x=10, y=20)


In [6]:
@dataclass
class CircleArea:
    r: int
    pi: float = 3.14

    @property
    def area(self):
        return self.pi * (self.r ** 2)
    
cir = CircleArea(3)
print(cir)
print(cir.area)

CircleArea(r=3, pi=3.14)
28.26


## Inheritance

In [7]:
from typing import Any

@dataclass
class Base:
    x: Any = 15.0
    y: int = 0

@dataclass
class H(Base):
    z: int = 10
    x: int = 15
        
h = H()
print(h)

# order is base, then derived
# H(x,y,z)
hh = H(15,22,5)
print(hh)

H(x=15, y=0, z=10)
H(x=15, y=22, z=5)


In [23]:
@dataclass
class Position:
    name: str
    lon: float = 0.0
    lat: float = 0.0

try:
    @dataclass
    class Capital(Position):
        country: str # this will FAIL, no default
except:
    print("FAIL here ... this needs a default param!")
        
# the constructor would look like this ... FAIL, need a default after other defaults
@dataclass
class Capital(Position):
    def __init__(self, name: str, lon: float, lat: float, country: str):
        pass

FAIL here!


In [15]:
pos = Position("here",1.1,2.2)
print(pos)

cap = Capital("seatle",1.1,2.2,"usa")
print(cap)

Position(name='here', lon=1.1, lat=2.2)


AttributeError: 'Capital' object has no attribute 'name'

## `field`

There are also `fields` for multiple fields and `Field` (not sure why). Additionally, you can add `default_factory` to create things from `list` or do `__post_init__()` to set a class member.

In [48]:
@dataclass
class J:
    x: list = field(default_factory=list)
        
j = J([1,2,3,4,5])
print(j)
print(j.__sizeof__(), j.x.__sizeof__()) # not measuring correctly
print(J())

J(x=[1, 2, 3, 4, 5])
32 104
J(x=[])


In [10]:
from typing import List

@dataclass
class K:
    mylist: List[int] = field(default_factory=list)

k = K()
k.mylist += [1, 2, 3]
print(k)

K(mylist=[1, 2, 3])


In [11]:
from collections import defaultdict
from typing import DefaultDict

@dataclass
class R:
    a: DefaultDict[str, List] = field(init=False, default_factory=defaultdict)
    
    def __post_init__(self):
        self.a = defaultdict(list)

r = R()
r.a["bob"].append(5)
print(r)

# print(R({'a':[1,2,3,4]}))  # FAIL

rr = R()
rr.a['a'] = [1,2,3,4,5]
rr.a["b"].append(5)
print(rr)

R(a=defaultdict(<class 'list'>, {'bob': [5]}))
R(a=defaultdict(<class 'list'>, {'a': [1, 2, 3, 4, 5], 'b': [5]}))


In [12]:
@dataclass
class F:
    a: float
    b: float
    c: float = field(init=False)
    ts: float = field(init=False, default=time.time())

    def __post_init__(self):
        self.c = self.a + self.b

f = F(3.1,4)
print(f)

F(a=3.1, b=4, c=7.1, ts=1614479851.377162)


In [13]:
@dataclass
class D:
    x: int
    y: int = field(repr=False)
    z: int = field(repr=False, default=10)
    t: int = 20
        
print(D(1,2))
print(D(1,2,3,4))
print(D(1,1))

d = D(2,2)
print(d.x,d.y,d.z,d.t)

D(x=1, t=20)
D(x=1, t=4)
D(x=1, t=20)
2 2 10 20


## Immutable (Frozen)

Not as good as `attr`, has unexpected results

In [15]:
@dataclass(frozen=True)
class Position:
    name: str
    lon: float = 0.0
    lat: float = 0.0

p = Position("a",1,2)
print(p)
p.lon = 4 # this will FAIL

Position(name='a', lon=1, lat=2)


FrozenInstanceError: cannot assign to field 'lon'

In [16]:
@dataclass(frozen=True)
class Position:
    name: str
    lon: float = 0.0
    lat: float = 0.0
    ww: List[int] = field(default_factory=list)

p = Position("a",1,2)
print(p)

p.ww.append(5)  # expect to FAIL, but PASS ... list isn't immutable
print(p)

p.name = "bob" # this will FAIL

Position(name='a', lon=1, lat=2, ww=[])
Position(name='a', lon=1, lat=2, ww=[5])


FrozenInstanceError: cannot assign to field 'name'

## Slots

`dataclass` uses a `dict` to store everything, but we can use slots to reduce memory and speed up performance. However, `attr` does it better.

In [17]:
@dataclass
class SimplePosition:
    name: str
    lon: float
    lat: float

# WARNING: for some reason, you cannot have default values for slots!!
@dataclass
class SlotPosition:
    __slots__ = ['name', 'lon', 'lat']
    name: str
    lon: float
    lat: float