Handling Broken JSON with jq
Problem
There’s a missing }
after green.
When jq hits invalid JSON, it completely stops processing the stream.
That’s not always great.
Some Unhelpful Solutions
People will be quick to “fix” your problem:
why don’t you fix the JSON at the source?
If you can do this, that’s the cleanest way out of this.
But… real life is messy. You don’t always control the JSON you have to process.
I ran into this recently: I extracted JSON logs from a system that decided to
truncate some lines, some of the time.
why don’t you “check” your JSON before you … “check” your JSON?
Yes – you could do some minimal regex-based checks, possibly with AWK or grep.
But you know what’s already great at handling JSON? jq.
Using --seq
That’s what you’ll find if you keep searching. The documentation says:
but an example might be clearer:
Just put RS
(ASCII character 0x1e) in front of each record. See below for an example.
(or check the internet draft for more details)
The broken “green” entry is skipped…
Why isn’t there a better solution?
When a JSON parser finds a problem, what’s the best solution?
I don’t think there is one.
If the JSON is invalid … how much of it needs to be thrown out?
- only the current field? –
"color": "green"
- only the current record? inside the closest curlies? –
{ ... }
The answer is probably application-specific. And wouldn’t it be worse if jq silently skipped invalid JSON? How long would it take to debug that?!
With RS
delimiters, you’re explicitly boxing failures: on a parse error, it
skips the current record and forwards to the next RS
.
An Example
Catch lines beginning with {
and insert a RS
there.
DISCLAIMER again: the broken “green” entry is skipped… in this case it’s silent, but I’ve seen other broken cases where a message is shown on STDERR. Use responsibly.