Evolution of Default Dictionaries in Python
I write a lot of code where I use a dictionary of sets (or lists or counters, etc)
Method 1
dict_set = {}
if key not in dict_set:
dict_set[key] = set()
dict_set[key].add(item)
Method 2
dict_set = {}
dict_set.setdefault(key, set()).add(item)
Method 3
from collections import defaultdict dict_set = defaultdict(set) dict_set[key].add(item)
setdefault was added in Python 2.0 and I've been using (and loving) it for years.
It was only a month or two ago that I discovered collections.defaultdict. Now I use it almost every day.
UPDATE: I forgot to mention that defaultdict was added in Python 2.5. And owing to the fact that int() returns 0 you can use defaultdict(int) for a dictionary of counters.
Comments (9)
Brandon Corfman on Feb. 27, 2008:
I didn't know about method 3 at all. Thanks for a great piece of info.
James Tauber on Feb. 27, 2008:
Michael, is setdefault needed anymore given the existence of defaultdict?
Brandon, you're welcome. I didn't know about it either until recently which is why I thought I'd share it.
Eduardo Padoan on Feb. 27, 2008:
"I also love 'setdefault', but it is going in Python 3. Our love is not shared by the BDFL. :-("
Hm, unlikely:
http://www.python.org/dev/peps/pep-3100/
"""
To be removed:
...
# dict.setdefault()? [15] [UNLIKELY]
"""
Tennessee Leeuwenburg on Feb. 27, 2008:
Thank you for this blog post! I was unaware of the behaviour and had been painstakingly using your Method 1 approach. I tend to have more need for lists than sets, but there you go.
James on Feb. 28, 2008:
I like this simple but very helpful article. I've been struggling to understand defaultdict. This illustration explains it clearly for me
Mr Me on March 23, 2008:
This code is hardly helpful.
What are the benefits?
How is default dict better than the builtin dict? Please, some explanations.
James Tauber on March 23, 2008:
Mr Me, if you don't think Method 3 is better than Method 2 then by all means keep using Method 2.
Some of us think Method 3 is cleaner.
Tal Einat on May 19, 2008:
Method 3 can be somewhat more efficient in certain cases, because the default value is computed only after checking whether the dict has the key. (Note the words "somewhat" and "in certain cases".)
For instance, in the original post, using a defaultdict will cause an empty set to be created only when the dict is accessed with a key it doesn't yet have. With setdefault, on the other hand, a new set will be created on every call to setdefault, since it is just a value passed in to setdefault.
However, if you want to initialize to the number zero, using setdefault may be slightly more efficient, since initializing the integer zero is practically free. In such cases the gain in efficiency will likely be negligible (if it exists at all).
After all of this rambling, I would like to suggest to just use whatever is clearer in your eyes and not consider efficiency at all. If you later realize that you must optimize your code and discover that the setdefault calls are the bottleneck (very unlikely!), then that would be the right time to take efficiency issues into consideration.
Last Modified: Feb. 27, 2008
Author: James Tauber
Michael Foord on Feb. 27, 2008:
I also love 'setdefault', but it is going in Python 3. Our love is not shared by the BDFL. :-(