How To Use @dataclass in Python
Modified: July 19 2024
Link to video.
Can be faster than creating a regular class.
How classes are typically written:
class Fruit:
def __init__(self, name: str, calories: float):
self.name = name
self.calories = calories
banana = Fruit('Banana', 10)
We can do the same with a dataclass
:
from dataclasses import dataclass
@dataclass
class Fruit:
name: str
calories: float
banana = Fruit('Banana', 10)
With the first example, we wouldn’t be able to print it nicely without writing a __str__
representation, but here, we can print out banana
and get Fruit(name='Banana', calories=10)
, which looks much nicer. You could also set float = 10
so that calories are set to 10 by default.
Another nice feature that dataclass
make up for is comparison. When using the equality operator, ==
, in a regular class two instances with the same content would not return True
, since they are not the same instance. You could only use isinstance()
, which would return True
if you compared them against one another like so:
banana = Fruit('Banana', 10)
banana2 = Fruit('Banana', 10)
print(isinstance(banana, Fruit) == isinstance(banana2, Fruit))
However, this isn’t very pretty, and they are just checking if they are both an instance of the same class.
We can achieve this with dataclass
:
banana = Fruit('Banana', 10)
banana2 = Fruit('Banana', 10)
print(banana == banana2)
Another way we could achieve this with regular classes is by writing an __eq__
method, like so:
def __eq__(self, other):
return self.__dict__ == other.__dict__
With such a basic example its hard to argue over writing a few extra lines. However, when the class gets more complex, I could see using dataclass
to be more useful.
If we set the frozen
attribute of a dataclass
decorator to True
, it will make the class read only, not allowing for modifications. Dataclasses allow for the same dot operator modifications as normal classes.
Each time you create a class or data class, Python makes a dictionary for that class for it easier to grab attributes. Slots are a good idea to implement to avoid having to recreate it every time we make a new instance of the class. We use the __slots__
method when creating our data class. We can’t use default values when using slots, and must use list formatting:
@dataclass
class Fruit:
__slots__ = ['name', 'calories']
name: str
calories: float
We’ll now have faster attribute access and should also reduce usage of RAM.
After Python 3.10, we could achieve the same as above even simpler:
@dataclass(slots=True)
class Fruit:
name: str
calories: float
If you want to change the string
or repr
representation, you can always overwrite them using the appropriate __str__
, or __repr__
method.
If you use the kw_only
attribute in your dataclass
, you’ll only be able to assign values via keywords only.
@dataclass(kw_only=True)
class Fruit:
name: str
calories: float
def __str__(self):
return f'{self.name}: {self.calories} calories'