Statistics: How Many Would You Check?

Imagine this situation:

You just performed a batch update on millions of users in your database. There were no error messages and you are confident that everything went well. But it wouldn’t hurt to check…

How many users would you have to check to feel confident that everything worked for at least 95% of users?

Analysis

Libraries needed:

library(binom)
library(ggplot2)

The lower bound is generated for 1..100 checks using Wilson method:

n100.wilson = sapply(1:100, function (i) { binom.wilson(i, i)$lower })
d.wilson = data.frame(checks=0, low.conf=0) # fixing 0 --> 0%
d.wilson = rbind(d.wilson, data.frame(checks=seq_along(n100.wilson), low.conf=n100.wilson))
head(d.wilson, 11)

##    checks  low.conf
## 1       0 0.0000000
## 2       1 0.2065493
## 3       2 0.3423802
## 4       3 0.4385030
## 5       4 0.5101092
## 6       5 0.5655175
## 7       6 0.6096657
## 8       7 0.6456696
## 9       8 0.6755924
## 10      9 0.7008550
## 11     10 0.7224672

Those values are plotted:

ggplot(d.wilson, aes(checks, low.conf)) +
  geom_line(alpha=0.3) +
  geom_point(size=1.3) +
  geom_hline(yintercept=0.95, alpha=0.3, color="red") +
  scale_x_continuous("number of checks", breaks=seq(0, 100, 10), minor_breaks=NULL) +
  scale_y_continuous("success % (lower bound)", breaks=seq(0, 1, 0.1), 
    labels=seq(0, 100, 10), minor_breaks=NULL) +
  coord_cartesian(ylim=c(0, 1))

It takes 73 checks to reach a lower bound of 95% of “true” success (the red line).

It’s worth mentioning that if you’re going to check 73 users, you would need to pick those at random. Otherwise, it’s easy to imagine how the beginning of the batch went well and everything went to hell later on.