Guinea Pigs, Karaoke Machines and Minimum Sample Size
Yesterday I read a very strange quote:
2.86% of guinea pigs admitted to veterinary hospitals in the survey had been injured by karaoke machines
from Charlie Stross talking about an article in The Register.
It got me wondering what the minimum sample size would be to get 2.86%. It turns out, to the right number of significant digits, 1 in 35 gives 2.86% so it's possible it was only 1 case out of only 35 in the survey.
Here's some Python code for working out the minimum sample size that will result in a given decimal. Note that decimal is to be given as a string so significant trailing zeroes can be used. e.g. min_sample("0.1") == 7 whereas min_sample("0.10") == 10 as you would expect.
def min_sample(decimal): fl = float(decimal) assert 0 < fl < 1 sig_digits = len(decimal) - 2 for sample_size in range(1, (10 ** sig_digits) + 1): if round(round(fl * sample_size) / sample_size, sig_digits) == fl: return sample_size
So the next time you read 22% of X or 24% or Y or 2.86% of Z you can quickly work out the sample size could be as low as 9, 17 or 35 respectively.
P.S. I leave it as an exercise to the reader to rewrite using the decimal module and decide whether it's worth it.
Comments (3)
James Tauber on Oct. 18, 2008:
Very cool, Stephen. I knew there must have been a bunch of work done that didn't require the brute-force of my approach.
James Tauber on Oct. 18, 2008:
Looking at the Stern-Brocot tree reminded me of the Calkin-Wilf tree I used in http://jtauber.com/blog/2004/07/01/enumerating_the_rationals_in_python/
Last Modified: Oct. 18, 2008
Author: James Tauber
Stephen on Oct. 18, 2008:
There's lots of elegant mathematics around best rational approximations to real numbers.
I dimly remember a lecture on Farey sequences, continued fractions, and using them to reverse-engineer minimum sample size for statistics like yours.
I can't remember the exact details (it was a long time ago -- 21 years, wow! -- and my notes are in a box on the other side of the world). But these two links look like a step in the right direction:
http://en.wikipedia.org/wiki/Stern-Brocot_tree
http://www.cut-the-knot.org/ctk/SB_tree.shtml
In short, use a binary search to descend in the Stern-Brocot tree until you have an approximating fraction that is close enough to your decimal number.