Grab — A better grep
Grab is a more powerful version of the well-known Grep utility, making use of structural regular expressions as described by Rob Pike in this paper. Grab allows you to be far more precise with your searching than Grep, as it doesn’t constrain itself to working only on individual lines.
Installation
To install grab, all you need is a C compiler:
$ cc -o make make.c # Bootstrap the build script
$ ./make # Build the project
$ ./make install # Install the project
If you want to build with optimizations enabled, you can pass the -r
flag.
$ ./make -r
Description
GRAB(1) General Commands Manual GRAB(1)
NAME
grab, git grab — search for patterns in files
SYNOPSIS
grab [-H never | multi | always] [-bcilLpsUz] pattern [file ...]
grab -h
git grab [-H never | multi | always] [-bcilLpsUz] pattern [glob
...]
git grab -h
DESCRIPTION
The grab utility searches for text matching the given pattern in
the files listed on the command-line, printing the matches to the
standard-output. Unlike the grep(1) utility, grab is not
strictly line-oriented; the structure of matches is left up to
the user to define. For more details on the pattern syntax, see
“Pattern Syntax”.
The git grab utility is identical to the grab utility except that
it takes globs matching files as command-line arguments instead
of files, and processes all non-binary files in the current git
repository that match the provided globs. If no globs are pro‐
vided, all non-binary files in the current git repository are
processed.
grab will read from the files provided on the command-line. If
no files are provided, the standard-input will be read instead.
The special filename ‘-’ can also be provided, which represents
the standard-input.
Similar to the grep(1) utility matches are printed to the stan‐
dard output. They are additionally prefixed with the name of the
file in which pattern was matched, as well as the location of the
match.
The options are as follows:
-b, --byte-offset
Report the positions of pattern matches as the (zero-
based) byte offset of the match from the beginning of the
file.
This option is useful if your text editor (such as vim(1)
or emacs(1)) supports jumping directly to a given byte
offset/position.
This is the default behaviour if the -L option is not
provided.
-c, --color
Force colored output, even if the output device is not a
TTY. This is useful when piping the output of grab into
a pager such as less(1).
This option takes precedence over the environment vari‐
ables described in “ENVIRONMENT” that relate to the usage
of color.
-h, --help
Display help information by opening this manual page.
-H, --header-line=when
Control the usage of a dedicated header line, where the
filename and match position are printed on a dedicated
line above the match. The available options for when
are:
never never use a dedicated header line
always always use a dedicated header line
multi use a dedicated header line when the matched pat‐
tern spans multiple lines
-i, --ignore-case
Match patterns case-insensitively.
-l, --literal
Treat patterns as literal strings, i.e. don’t interpret
them as regular expressions.
-L, --line-position
Report the positions of matches as a (one-based) line-
and column position separated by a colon.
This option may be ill-advised in many circumstances.
See “BUGS” for more details.
-p, --predicate
Return an exit status indicating if a match was found
without writing any output to the standard output. When
simply checking for the presence of a pattern in an in‐
put, this option is far more efficient than redirecting
output to /dev/null.
-s, --strip-newline
Don’t print a newline at the end of a match if the match
already ends in a newline. This can make output seem
more ‘natural’, as many matches will already have termi‐
nating newlines.
-U, --no-unicode
Don’t use Unicode properties when matching \d, \w, etc.
Recognize only ASCII values instead.
-z, --zero
Separate output data by null bytes (‘\0’) instead of new‐
lines. This option can be used to process matches con‐
taining newlines.
Pattern Syntax
A pattern is a sequence of whitespace-separated commands. A com‐
mand is a sequence of an operator, an opening delimiter, a regu‐
lar expression, a closing delimter, and zero-or-more flags. The
last command of a pattern if given no flags need not have a clos‐
ing delimter.
The supported operators are as follows:
g Keep matches that match the given regex.
G Keep matches that don’t match the given regex.
h Highlight substrings in matches that match the given
regex.
H Highlight substrings in matches that don’t match the
given regex.
x Select everything that matches the given regex.
X Select everything that doesn’t match the given regex.
An example pattern to match all numbers that contain a ‘3’ but
aren’t ‘1337’ could be ‘x/[0-9]+/ g/3/ G/^1337$/’. In that pat‐
tern, ‘x/[0-9]+/’ selects all numbers in the input, ‘g/3/’ keeps
only those matches that contain the number 3, and ‘G/^1337$/’
filters out the specific number 1337.
The opening- and closing-delimiter used for each given command
can be any valid UTF-8 codepoint. As a result, the following
pattern using the delimiters ‘|’, ‘.’, and ‘ä’ is well-formed:
x|[0-9]+| g.3. Gä^1337$ä
Delimeters also respect the Unicode ‘Bidirectional Paired
Bracket’ property. This means that alongside the previous exam‐
ples, the following non-exhaustive list of character pairs may be
used as opening- and closing delimiters:
• 「…」
• ⟮…⟯
• ⟨…⟩
It is not recommended that you use characters that have a special
meaning in regular expression syntax as delimiters, unless you’re
using literal patterns via the -l option or the ‘l’ command flag.
Operators are not allowed to take empty regular expression argu‐
ments with one exception: ‘h’. When given an empty regular ex‐
pression argument, the ‘h’ operator assumes the same regular ex‐
pression as the previous operator. This allows you to avoid du‐
plication in the common case where a user wishes to highlight
text matched by a ‘g’ or ‘x’ operator. The following example
pattern selects all words that have a capital letter, and high‐
lights the capital letter(s):
x/\w+/ g/\p{Lu}/ h//
The empty ‘h’ operator is not permitted as the first operator in
a pattern.
While various command-line options exist to alter the behaviour
of patterns such as -i to enable case-insensitive matching or -U
to disable Unicode support, various different options can also be
set at the command-level by appending a command with one-or-more
flags. As an example, one could match all sequences of one-or-
more non-whitespace characters that contain the case-insensitive
literal string ‘[hi]’ by using the following pattern:
x/\S+/ g/[hi]/li
The currently supported flags are as follows:
i/I enable or disable case-insensitive matching respectively
l/L enable or disable treating the supplied regex as a fixed
string
u/U enable or disable Unicode support respectively
ENVIRONMENT
Do not display any colored output when set to a non-empty string,
even if the standard-output is a terminal. This environment
variable takes precedence over CLICOLOR_FORCE.
Force display of colored output when set to a non-empty string,
even if the standard-output isn’t a terminal.
If set to ‘dumb’ disables colored output, taking precedence over
all other environment variables.
EXIT STATUS
The grab utility exits with one of the following values:
0 One or more matches were selected.
1 No matches were selected.
2 A non-fatal error occured, such as failure to read
a file.
>2 A fatal error occured.
EXAMPLES
List all your systems CPU flags, sorted and without duplicates:
$ grab 'x/^flags.*?$/ x/\w+/ G/^flags$/' </proc/cpuinfo |
sort -u
Search for a pattern in multiple files without printing filenames
or position information:
$ cat file1 file2 file3 | grab 'x/pattern/'
Search for usages of an ‘<hb-form-text>’ Vue component — but only
those which are being passed a ‘placeholder’ property — searching
all files in the current git-repository:
$ git grab 'x/<hb-form-text.*?>/ g/\bplaceholder\b/'
'*.vue'
Extract bibliographic references from mdoc(7) formatted manual
pages:
$ grab 'x/(^\.%.*?\n)+/' foo.1 bar.1
SEE ALSO
git(1), grep(1), pcre2syntax(3), regex(7)
Rob Pike, Structural Regular Expressions,
https://doc.cat-v.org/bell_labs/structural_regexps/se.pdf, AT&T
Bell Laboratories, Murray Hill, New Jersey 07974, 1987.
SGR Parameters:
https://en.wikipedia.org/wiki/ANSI_escape_code#SGR
AUTHORS
Thomas Voss <mail@thomasvoss.com>
NOTES
When pattern matching with literal strings you should avoid using
delimeters that are contained within the search string as any
backslashes used to escape the delimeters will be searched for in
the text literally.
BUGS
The pattern string provided as a command-line argument as well as
the provided input files must be encoded as UTF-8. No other en‐
codings are supported unless they are UTF-8 compatible, such as
ASCII.
The -L option has incredibly poor performance compared to the -b
option, especially with very large inputs.
Grab 3.0.0 13 November, 2024 GRAB(1)