How To Shuffle and Sample on the Command-Line
How do you shuffle on the command-line? With
On Linux, you already have
On Mac OS X,
brew install coreutils installs shuf as
gshuf (g for GNU), but
I usually alias gshuf to shuf to fix that.
You could use
sort -R /
sort --random-sort as a poor-man shuf. For larger
files, that’s a terrible idea because sort will sort the whole file before
Sampling is the selection of a subset of individuals from within a statistical population – wikipedia.
You just have to pass the
-n flag to
You can allow repeated picks of the same value with the
Picking one thing is simply:
Why Shuffle? Why Sample?
Every time I’m faced with too many things to look at, I don’t trust myself for picking a representative sample. It’s too easy to say “I’ll pick the first 10” and to miss a problem that only happened later.
For example, you might have a system that generates files in a directory. There might be hundreds or thousands of files and you just want to get a feel for their content.
Or you might want to get a feel for what’s happening in a log file:
Depending on your specific situation, it might bring the interesting question of how many things to look at.