Unsorted uniq

May 30, 2014

Everybody gets caught the first time: uniq filters repeated lines – but only if they follow each other. This assumption greatly reduces the memory footprint of uniq and … its usefulness.

I explained previously how awk could be used to replace the classic sort | uniq -c incantation. In short, by skipping the sort, you can scale the solution to much bigger files.

If, however, all you need is an unsorted uniq, there’s an even shorter awk command you can use:

$ cat animals.txt
cats
cats
cats
dogs
birds
cats
dogs
dogs
birds
dogs
dogs
birds

$ cat animals.txt | awk '!cnts[$0]++'
cats
dogs
birds

Broken down as:

Or, in English, print the current line if you’ve never seen it, and mark it seen.

As an added bonus, this command doesn’t have to process the whole file, it will print the new unique lines as they present themselves.

Discussion, links, and tweets

Follow me on Twitter