INNER JOIN Files on the Command-Line
A lot of people don’t know that join
exists, or what it does 🤔
I mean the join
command that’s already on your computer and ready to go:
What is join?
Believe it or not, join
is part of
coreutils,
which – as its name implies – is pretty central to UNIX-like systems.
It sits there, with its better known siblings cut
and paste
What does join
do?
join
does to files what SQL INNER JOIN
does to tables.
join Examples
Imagine two files:
By default, the first column (numbers here, but it doesn’t have to be) is the join key:
Notice that 3
is missing from spanish.txt
. It works the way INNER JOIN
works.
Same, flipping the files: (3
still missing)
Comments:
- files must be sorted ~ that’s important!
- fields are separated by blanks – but field separator can be specified with
-t
- there are other options, check the man page as needed
You can even coerce join
into doing:
LEFT JOIN
RIGHT JOIN
UNION
DIFFERENCE
check the online documentation for examples.
Is join Useful?
I recently had to join 2 large .tsv files. My first instinct was to reach for
AWK … but I remembered join
and it did exactly what I needed.
Knowing that join
exists is 80% of its value 😄