Stateful Closures in Python
Closures are functions that have access to nonlocal variables in an enclosing scope.
Here is a quote that compares closures to objects,
Objects are data with methods attached, closures are functions with data attached. 1
Closure Example
Let’s imagine that we help run an online test proctoring website.
Students take tests on the website, they submit the tests, the tests get graded and the results are sent back to the student.
Behind the scenes, everytime a student submits a test it gets published to a queue and is processed asynchronously by a consumer service.
The problem is that the queue sometimes contains duplicates. We need a function that will keep track of tests processed previously and remove any incoming duplicates.
Please note this is a toy example and solution to demonstrate the use of closures.
So let’s create our processing function,
def create_exam_processor():
processed_messages = set()
def process_exam(message):
hashable_message = frozenset(message.items())
if hashable_message in processed_messages:
print("duplicate message encountered!")
else:
grade_exam(message)
upload_exam(message)
processed_messages.add(hashable_message)
return process_exam
Let’s break it down:
-
process_exam
is our closure - it has access to the nonlocal variable
processed_messages
- we process messages and check for duplicates using our nonlocal scope
This closure allows us to access a list of processed messages without passing around a list or maintaining a global variable.
Let’s use it,
process_exam_message = create_exam_processor()
process_exam_message
is a variable that points to a callable function. This is achievable thanks to python having first-class functions.
queue = [{"student_id": 1234, "exam_id": 99}, {"student_id": 1234, "exam_id": 12}, {"student_id": 1234, "exam_id": 12}]
for message in queue:
process_exam_message(message)
# duplicate message encountered!
As you can see, the first two messages were processed and added to processed_messages
. When the third message came through, it was successfully detected as a duplicate.
Closures vs Classes
Now you might be thinking, if I need to track state across calls, then why not use a class?
You can technically accomplish the same thing,
class ExamProcessor():
def __init__(self):
self.processed_messages = set()
def __call__(self, message):
hashable_message = frozenset(message.items())
if hashable_message in self.processed_messages:
print("duplicate message encountered!")
else:
grade_exam(message)
upload_exam(message)
self.processed_messages.add(hashable_message)
Here we implement the __call__
method to turn our class instance into a callable object.
We manage the state through the object’s instance variables instead of the closure’s nonlocal scope.
Let’s now test this out,
exam_processor = ExamProcessor()
for message in queue:
exam_processor(message)
# duplicate message encountered!
It works! So why use one over the other?
We can first think about performance… Does the performance vary depending on whether we chose a closure to manage state or a callable class?
I added two million records to the queue and performance tested each approach,
Closure | Class |
---|---|
1.840176000 | 2.132076500 |
1.934928700 | 2.022337800 |
1.846624400 | 1.993040100 |
1.846624400 | 2.004416600 |
1.793658300 | 2.002499600 |
As you can see, the closure performed slightly better than the class did.
Why is this?
- It’s quicker to access closure scope than instance variables.
- Classes come with all the overhead that allows for inheritance, special methods, etc.
Another great reason to use closures is that the state/data they manage is completely private. Instance variables in python can be accessed anywhere in the code by any user of the class instance, closures provide a convenient way to protect private data.
From this we can conclude that you should try to stick to plain functions for closure functionality opposed to classes. If you require extra functionality that a class might have to offer then maybe you should consider switching, but only after it is deemed necessary.
Closing Remarks
Closures are a great feature in python and a major benefit that comes when a language supports first-class functions.
Remember that a major benefit of closures are encapsulation, they allow you to store and hide implementation details in the closure’s scope.
Look to use closures when you need to carry some state along with a function’s execution.
References
-
https://mrevelle.blogspot.com/2006/10/closure-on-closures.html ↩
Enjoy Reading This Article?
Here are some more articles you might like to read next: