INNER JOIN Files on the Command-Line
A lot of people don’t know that join exists, or what it does 🤔
I mean the join command that’s already on your computer and ready to go:
# BSD (macOS)
> join
usage: join [-a fileno | -v fileno ] [-e string] [-1 field] [-2 field]
[-o list] [-t char] file1 file2
# or
# GNU (linux)
> join
join: missing operand
Try 'join --help' for more information.What is join?
Believe it or not, join is part of
coreutils,
which – as its name implies – is pretty central to UNIX-like systems.
It sits there, with its better known siblings cut and paste
What does join do?
join does to files what SQL INNER JOIN does to tables.
join Examples
Imagine two files:
> cat english.txt
1 one
2 two
3 three
4 four
> cat spanish.txt
1 uno
2 dos
4 cuatroBy default, the first column (numbers here, but it doesn’t have to be) is the join key:
> join english.txt spanish.txt
1 one uno
2 two dos
4 four cuatroNotice that 3 is missing from spanish.txt. It works the way INNER JOIN works.
Same, flipping the files: (3 still missing)
> join spanish.txt english.txt
1 uno one
2 dos two
4 cuatro fourComments:
- files must be sorted ~ that’s important!
- fields are separated by blanks – but field separator can be specified with
-t - there are other options, check the man page as needed
You can even coerce join into doing:
LEFT JOINRIGHT JOINUNIONDIFFERENCE
check the online documentation for examples.
Is join Useful?
I recently had to join 2 large .tsv files. My first instinct was to reach for
AWK … but I remembered join and it did exactly what I needed.
Knowing that join exists is 80% of its value 😄
