This is the kind of thing you don’t need, until you really do:
-f file, --file=file Read one or more newline separated patterns from file. Empty pattern lines match every input line. Newlines are not consid- ered part of a pattern. If file is empty, nothing is matched.
Here’s a scenario that recently came up:
You have a file with millions of entries, one-per-line, in a tab-separated format. One of the fields (and not necessarily the first one) is the "primary key" you are using to identify the field. You ran a batch job and the logs are telling you about some transient failures. You grep for the failures and accumulate a bunch of "primary keys". You will need to rerun the job for those entries.
Essentially, you need to “grep” the original file for the keys that failed. The problem is that you might have thousands of keys and millions of entries. Depending on the exact size of the data you are dealing with and the amount of time available, you might be able to “brute force” the solution. It might look like this 1:
This spawns 1 grep per key – but it’s a one-liner. Compare with the following, which accomplishes the same thing with 1 process:
This is much faster.
Did I miss anything? How would you tackle this?
- Your grep might need qualifiers (-w, for example), but this will depend on your data.