INNER JOIN Files on the Command-Line

March 8, 2023

A lot of people don’t know that join exists, or what it does 🤔

I mean the join command that’s already on your computer and ready to go:

# BSD (macOS)
> join
usage: join [-a fileno | -v fileno ] [-e string] [-1 field] [-2 field]
            [-o list] [-t char] file1 file2

# or

# GNU (linux)
> join
join: missing operand
Try 'join --help' for more information.

What is join?

Believe it or not, join is part of coreutils, which – as its name implies – is pretty central to UNIX-like systems.

It sits there, with its better known siblings cut and paste

together: cut, paste and join

What does join do?

join does to files what SQL INNER JOIN does to tables.

join Examples

Imagine two files:

> cat english.txt
1 one
2 two
3 three
4 four

> cat spanish.txt
1 uno
2 dos
4 cuatro

By default, the first column (numbers here, but it doesn’t have to be) is the join key:

> join english.txt spanish.txt
1 one uno
2 two dos
4 four cuatro

Notice that 3 is missing from spanish.txt. It works the way INNER JOIN works.

Same, flipping the files: (3 still missing)

> join spanish.txt english.txt
1 uno one
2 dos two
4 cuatro four

Comments:

You can even coerce join into doing:

check the online documentation for examples.

Is join Useful?

I recently had to join 2 large .tsv files. My first instinct was to reach for AWK … but I remembered join and it did exactly what I needed.

Knowing that join exists is 80% of its value 😄

Discuss on Twitter