Documentation Can Speed Up Your Code
I just was documenting some code that looked (roughly) like this:
def a(x, m):
for i, o in enumerate(sorted(x)):
if o > m:
break
return i
and my comment was "return how many items in x are less than or equal to m".
The very act of writing the comment made me realise the code could be rewritten as:
def a(x, m):
return len([o for o in x if o <= m])
which is not only more succinct but runs 6-8 times faster! (UPDATE: ...on my data size -- as Sergio points out below, the two algorithms differ in complexity)
(Incidentally, the list comprehension also runs twice as fast as the equivalent using filter)
UPDATE (2008-03-26): Fascinating discussion going on in the comments :-) Although still largely orthogonal to the main point which is that I was originally trying to solve a domain-specific problem and it was only when I went to comment the code that I realised that what I was really doing was trying to find how many items in a list are less than or equal to a certain number. How best to do that is a topic for...well, I guess the comment thread :-)
Comments (20)
James Tauber on March 25, 2008:
But I want to reiterate that it *does* have everything to do with good comments as if I hadn't have stopped to think about what the code was doing (in order to document it) I wouldn't have realised the far superior algorithm. And that was the main point of my post.
Stephan Deibel on March 25, 2008:
In other words, sort, then start in the middle and compare, then either go to 1/4 or 3/4, each time reducing the jump size. You'll find the key element in O(log(n)).
Here's an untested example:
def binary_search(items, x):
low = 0
high = len(items)
while low < high:
mid = low + ((high - low) / 2)
if items[mid] > x:
high = mid
elif items[mid] < x:
low = mid + 1
# There may be multiple items that are equal so back up to first one
else:
while mid > low:
if items[mid - 1] != x:
return mid
mid -= 1
return mid
return high
Stephan Deibel on March 25, 2008:
http://wingware.com/pub/misc/code/binary.py
Dave Kirby on March 25, 2008:
def a(x, m): return sum(1 for o in x if o <= m)
this is probably faster, but I have not timed it.
@Stephan: the sorted() function is O(n.log(n)), so doing a binary search aferwards will not change the overall complexity to O(log(n)).
Also, take a look at the bisect module - Python comes with batteries included, and that includes doing binary searches.
Sergio on March 25, 2008:
Doesn't really have to do with good comments or the fact that the list comprehension is C space instead of interpreter space, they both are radically different implementations of the same functionality, the first is awfully wrong and the second is the good one. If you want to check the difference in speed between list comprehensions and for loops, there is a very good paper about that in the python website:
http://www.python.org/doc/essays/list2str/
Sergio on March 25, 2008:
I have tested both approaches (generator and list comprehension) and performance is very much the same even with long lists. So im with you that it would be better to use the less memory hungry one.
And yes, you are right in your point, there is no good code without good comments.
James Tauber on March 25, 2008:
Shane Hathaway on March 25, 2008:
>>> def a(x, m):
... count = 0
... for o in x:
... if o <= m:
... count += 1
... return count
Or if you want to a nice iterator-based solution:
>>> def count_true(iterator):
... count = 0
... for o in iterator:
... if o:
... count += 1
... return count
...
>>> def a(x, m):
... return count_true(1 for o in x if o <= m)
...
Nonetheless, your point is well taken: attempting to explain code often leads to enough clarity to improve the code.
emorfo on March 25, 2008:
Doug Napoleone on March 25, 2008:
*cough*
a = lambda x, m: reduce(lambda c, y: c + (1 if y <= m else 0), x, 0)
reduce+lambda for the win on any iterable!!
Glenn on March 26, 2008:
James Tauber on March 26, 2008:
@Glenn: I'm missing something -- the original version works fine with duplicate values.
James Tauber on March 26, 2008:
With x containing 10,000 random integers 1-1000 and m=500 I got the following:
enumerate+sort: 6.29ms
len+list_comprehension: 1.17ms
len+filter+lambda: 2.89ms
sum+generator_expression: 1.58ms
reduce+lambda+conditional_expression: 3.51ms
philhassey on March 26, 2008:
Glenn on March 27, 2008:
Michael Chermside on March 30, 2008:
James Tauber on March 30, 2008:
Although in my case it wasn't actually a function but a top-level piece of code. Had I turned it into a function, I'm sure the act of naming it would have triggered the improvement.
Doug Napoleone on March 30, 2008:
It was not meant to be readable or understandable; it was meant to look like lisp ;-)
James Tauber on March 30, 2008:
Add a Comment
Last Modified: March 26, 2008
Author: jtauber
Sergio on March 25, 2008:
Doesn't really have to do with good comments or the fact that the list comprehension is C space instead of interpreter space, they both are radically different implementations of the same functionality, the first is awfully wrong and the second is the good one. If you want to check the difference in speed between list comprehensions and for loops, there is a very good paper about that in the python website:
http://www.python.org/doc/essays/list2str/