Python Dataclasses
Dataclasses are a great way to eliminate repetitive boilerplate code when designing simple data-driven classes in Python. They provide an easy way to create multiple special methods that are seen in most Python classes. This article highlights some features of dataclasses that I feel are most important, but not all features are included here. Please see the official documentation above for a full official module reference.
Why Are Dataclasses Useful?
Dataclasses provide a way to create a more batteries included type of class in Python. Take any project and quickly glance over it for any code that looks like this,
class Employee:
def __init__(self, name: str, position: str, email: str="default@gmail.com"):
self.name = name
self.position = position
self.email = email
def __repr__ (self):
return "{}(name={}, position={}, email={})".format(self.__class__.__name__, self.name, self.position, self.email)
def __eq__(self, other):
if other.__class__ is self.__class__:
return (self.name, self.position, self.email) == (other.name, other.position, other.email)
else:
return NotImplemented
If you have spent some time with Python then you know that these three methods (init, repr, and eq) are pretty essential to any class.
-
__init__
is the constructor for the class and defines the way for objects of the class to be made. -
__repr__
provides a visually appealing representation of the object for programmers using it. -
__eq__
provides the functionality for you to assess does object1 == object2.
Dataclasses seek to eliminate the manual implementation of methods like the ones seen above and more. Dataclasses essentially assume and implement the functionality of various methods for you, but still provide you with the flexibility to provide your own implementation if needed.
Basic Usage
Let’s see how using dataclasses can slim down our class example from above. First, you must import dataclass
from dataclasses
,
from dataclasses import dataclass
Then use the @dataclass
decorator above our Employee
class declaration
@dataclass
class Employee:
name: str
position: str
email: str
Now, go ahead and test some of the class’s functionality…
emp = Employee("dan", "swe", "foo@gmail.com")
print(emp)
# Employee(name='dan', position='swe', email='foo@gmail.com')
emp2 = Employee("dan", "swe", "foo@gmail.com")
print(emp == emp2)
# True
We can see from above that the dataclass decorator has provided our class with some great basic functionality.
Setting Default Instance Variables
Setting defaults with dataclasses comes with a few things to be aware of.
1.) Instance variables with default values must come after those without a default value
So this is invalid:
@dataclass
class Employee:
name: str = "foo"
position: str
email: str
And generates -> TypeError: non-default argument 'position' follows default argument
But this is valid:
@dataclass
class Employee:
position: str
email: str
name: str = "foo"
2.) If you want to further configure the way dataclass handles your default values, use the field
method,
Take a look at this example:
from dataclasses import dataclass
from dataclasses import field
@dataclass
class Employee:
email: str
position: str = field(default="swe", init=False, repr=False, compare=False)
name: str = "foo"
Here we can see a few adjustments were made,
- The position variable has a default value of “swe”
- The position variable will not be included in the
__init__
constructor - The position variable will not be included in the
__repr__
function output - The position variable will not be considered in comparison functions (e.g. Employee1 == Employee2)
If we need to define a default mutable value (e.g. a list or dictionary), use the default_factory
parameter:
@dataclass(frozen=True)
class Employee:
position: str
email: str
name: str = "foo"
teams: list[str] = field(default_factory=list)
e1 = Employee("bar", "foobar@gmail.com")
print(e1.teams)
# []
We can see that by using default_factory
, we can easily create default mutable values.
That’s just a few ways to manipulate the behavior of default values with the field
method.
If no special configuration is needed, simply define your default values like you normally would. (e.g. the name
variable above)
Decorator Configuration
We can take dataclass configuration well beyond that of our instance variables. We can directly manipulate the way the class behaves by providing a few parameters in our Decorator.
See below,
@dataclass(init=False, repr=False, eq=False)
class Employee:
position: str
email: str
name: str = "foo"
We provided a few parameters here to disable the default generation of our __init__
, __repr__
, and __eq__
special methods. Those parameters can be useful if you want to explicitly define your own versions of those methods.
Another parameter that is particularly useful and good to know is frozen
frozen
allows us to create immutable objects.
@dataclass(frozen=True)
class Employee:
position: str
email: str
name: str = "foo"
Any attempts to modify an Employee
object after creation will throw an error.
e1 = Employee("bar", "foobar@gmail.com")
e1.name = "bar"
# dataclasses.FrozenInstanceError: cannot assign to field 'name'
This can be useful if you want to ensure that objects of a class are not changed after initialization. The implementation details are nicely taken care of by the dataclasses module.
Drawbacks
There are some minor drawbacks to dataclasses, if used incorrectly.
Some of these drawbacks include:
- Codebase clutter - Using too many unnecessary dataclasses can cause objects to fly around with a lot of implicit behavior.
- Minor performance hit - Using dataclasses generates a lot of functionality for you, failing to use this functionality is just wasting program resources and impacting efficiency. (Although likely unnoticeable)
Ultimately, if you use dataclasses as they were intended to be used and know what they entail, you won’t run into any problems using them in your codebase.
When To Use
Use dataclasses when:
- You need a simple class with minimal configuration to represent data.
Don’t use dataclasses when:
- You need to create a complex object and plan on having lots of encapsulation and private variables.
- Your object doesn’t represent data and instead handles mostly behavior (e.g. FileParser, ContextManager)
References
Enjoy Reading This Article?
Here are some more articles you might like to read next: