aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 4ca40314b55082dfcfca1f9255de542bb40de588 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
# Grab — A better grep

Grab is a more powerful version of the well-known Grep utility, making
use of structural regular expressions as described by Rob Pike in [this
paper][1].  Grab allows you to be far more precise with your searching
than Grep, as it doesn’t constrain itself to working only on individual
lines.


## Installation

To install grab, all you need is a C compiler:

```sh
$ cc -o make make.c  # Bootstrap the build script
$ ./make  # Build the project
$ ./make install  # Install the project
```

If you want to build with optimizations enabled, you can pass the `-r`
flag.

```sh
$ ./make -r
```


## Description

```
GRAB(1)                  General Commands Manual                 GRAB(1)

NAME
       grab, git grab — search for patterns in files

SYNOPSIS
       grab [-H never | multi | always] [-bcilLpsUz] pattern [file ...]
       grab -h

       git  grab  [-H never | multi | always] [-bcilLpsUz] pattern [glob
            ...]
       git grab -h

DESCRIPTION
       The grab utility searches for text matching the given pattern  in
       the files listed on the command-line, printing the matches to the
       standard-output.    Unlike  the  grep(1)  utility,  grab  is  not
       strictly line-oriented; the structure of matches is  left  up  to
       the  user to define.  For more details on the pattern syntax, see
       “Pattern Syntax”.

       The git grab utility is identical to the grab utility except that
       it takes globs matching files as command-line  arguments  instead
       of  files,  and processes all non-binary files in the current git
       repository that match the provided globs.  If no globs  are  pro‐
       vided,  all  non-binary  files  in the current git repository are
       processed.

       grab will read from the files provided on the  command-line.   If
       no  files  are provided, the standard-input will be read instead.
       The special filename ‘-’ can also be provided,  which  represents
       the standard-input.

       Similar  to  the grep(1) utility matches are printed to the stan‐
       dard output.  They are additionally prefixed with the name of the
       file in which pattern was matched, as well as the location of the
       match.

       The options are as follows:

       -b, --byte-offset
               Report the positions of pattern  matches  as  the  (zero-
               based) byte offset of the match from the beginning of the
               file.

               This option is useful if your text editor (such as vim(1)
               or  emacs(1))  supports  jumping directly to a given byte
               offset/position.

               This is the default behaviour if the  -L  option  is  not
               provided.

       -c, --color
               Force  colored output, even if the output device is not a
               TTY.  This is useful when piping the output of grab  into
               a pager such as less(1).

               This  option  takes precedence over the environment vari‐
               ables described in “ENVIRONMENT” that relate to the usage
               of color.

       -h, --help
               Display help information by opening this manual page.

       -H, --header-line=when
               Control the usage of a dedicated header line,  where  the
               filename  and  match  position are printed on a dedicated
               line above the match.  The  available  options  for  when
               are:

               never   never use a dedicated header line
               always  always use a dedicated header line
               multi   use a dedicated header line when the matched pat‐
                       tern spans multiple lines

       -i, --ignore-case
               Match patterns case-insensitively.

       -l, --literal
               Treat  patterns  as literal strings, i.e. don’t interpret
               them as regular expressions.

       -L, --line-position
               Report the positions of matches as  a  (one-based)  line-
               and column position separated by a colon.

               This  option  may  be  ill-advised in many circumstances.
               See “BUGS” for more details.

       -p, --predicate
               Return an exit status indicating if  a  match  was  found
               without  writing any output to the standard output.  When
               simply checking for the presence of a pattern in  an  in‐
               put,  this  option is far more efficient than redirecting
               output to /dev/null.

       -s, --strip-newline
               Don’t print a newline at the end of a match if the  match
               already  ends  in  a  newline.  This can make output seem
               more ‘natural’, as many matches will already have  termi‐
               nating newlines.

       -U, --no-unicode
               Don’t  use  Unicode properties when matching \d, \w, etc.
               Recognize only ASCII values instead.

       -z, --zero
               Separate output data by null bytes (‘\0’) instead of new‐
               lines.  This option can be used to process  matches  con‐
               taining newlines.

   Pattern Syntax
       A pattern is a sequence of whitespace-separated commands.  A com‐
       mand  is a sequence of an operator, an opening delimiter, a regu‐
       lar expression, a closing delimter, and zero-or-more flags.   The
       last command of a pattern if given no flags need not have a clos‐
       ing delimter.

       The supported operators are as follows:

       g       Keep matches that match the given regex.
       G       Keep matches that don’t match the given regex.
       h       Highlight  substrings  in  matches  that  match the given
               regex.
       H       Highlight substrings in  matches  that  don’t  match  the
               given regex.
       x       Select everything that matches the given regex.
       X       Select everything that doesn’t match the given regex.

       An  example  pattern  to match all numbers that contain a ‘3’ but
       aren’t ‘1337’ could be ‘x/[0-9]+/ g/3/ G/^1337$/’.  In that  pat‐
       tern,  ‘x/[0-9]+/’ selects all numbers in the input, ‘g/3/’ keeps
       only those matches that contain the  number  3,  and  ‘G/^1337$/’
       filters out the specific number 1337.

       The  opening-  and  closing-delimiter used for each given command
       can be any valid UTF-8 codepoint.  As  a  result,  the  following
       pattern using the delimiters ‘|’, ‘.’, and ‘ä’ is well-formed:

             x|[0-9]+| g.3. Gä^1337$ä

       Delimeters   also   respect  the  Unicode  ‘Bidirectional  Paired
       Bracket’ property.  This means that alongside the previous  exam‐
       ples, the following non-exhaustive list of character pairs may be
       used as opening- and closing delimiters:

       •   「…」
       •   ⟮…⟯
       •   ⟨…⟩

       It is not recommended that you use characters that have a special
       meaning in regular expression syntax as delimiters, unless you’re
       using literal patterns via the -l option or the ‘l’ command flag.

       Operators  are not allowed to take empty regular expression argu‐
       ments with one exception: ‘h’.  When given an empty  regular  ex‐
       pression  argument, the ‘h’ operator assumes the same regular ex‐
       pression as the previous operator.  This allows you to avoid  du‐
       plication  in  the  common  case where a user wishes to highlight
       text matched by a ‘g’ or ‘x’  operator.   The  following  example
       pattern  selects  all words that have a capital letter, and high‐
       lights the capital letter(s):

             x/\w+/ g/\p{Lu}/ h//

       The empty ‘h’ operator is not permitted as the first operator  in
       a pattern.

       While  various  command-line options exist to alter the behaviour
       of patterns such as -i to enable case-insensitive matching or  -U
       to disable Unicode support, various different options can also be
       set  at the command-level by appending a command with one-or-more
       flags.  As an example, one could match all sequences  of  one-or-
       more  non-whitespace characters that contain the case-insensitive
       literal string ‘[hi]’ by using the following pattern:

             x/\S+/ g/[hi]/li

       The currently supported flags are as follows:

       i/I     enable or disable case-insensitive matching respectively
       l/L     enable or disable treating the supplied regex as a  fixed
               string
       u/U     enable or disable Unicode support respectively

ENVIRONMENT
       Do not display any colored output when set to a non-empty string,
       even  if  the  standard-output  is  a terminal.  This environment
       variable takes precedence over CLICOLOR_FORCE.
       Force display of colored output when set to a  non-empty  string,
       even if the standard-output isn’t a terminal.
       If  set to ‘dumb’ disables colored output, taking precedence over
       all other environment variables.

EXIT STATUS
       The grab utility exits with one of the following values:

             0       One or more matches were selected.
             1       No matches were selected.
             2       A non-fatal error occured, such as failure to  read
                     a file.
             >2      A fatal error occured.

EXAMPLES
       List all your systems CPU flags, sorted and without duplicates:

             $  grab  'x/^flags.*?$/ x/\w+/ G/^flags$/' </proc/cpuinfo |
             sort -u

       Search for a pattern in multiple files without printing filenames
       or position information:

             $ cat file1 file2 file3 | grab 'x/pattern/'

       Search for usages of an ‘<hb-form-text>’ Vue component — but only
       those which are being passed a ‘placeholder’ property — searching
       all files in the current git-repository:

             $  git   grab   'x/<hb-form-text.*?>/   g/\bplaceholder\b/'
             '*.vue'

       Extract  bibliographic  references  from mdoc(7) formatted manual
       pages:

             $ grab 'x/(^\.%.*?\n)+/' foo.1 bar.1

SEE ALSO
       git(1), grep(1), pcre2syntax(3), regex(7)

       Rob       Pike,       Structural       Regular       Expressions,
       https://doc.cat-v.org/bell_labs/structural_regexps/se.pdf,   AT&T
       Bell Laboratories, Murray Hill, New Jersey 07974, 1987.

       SGR                                                   Parameters:
       https://en.wikipedia.org/wiki/ANSI_escape_code#SGR

AUTHORS
       Thomas Voss <mail@thomasvoss.com>

NOTES
       When pattern matching with literal strings you should avoid using
       delimeters  that  are  contained  within the search string as any
       backslashes used to escape the delimeters will be searched for in
       the text literally.

BUGS
       The pattern string provided as a command-line argument as well as
       the provided input files must be encoded as UTF-8.  No other  en‐
       codings  are  supported unless they are UTF-8 compatible, such as
       ASCII.

       The -L option has incredibly poor performance compared to the  -b
       option, especially with very large inputs.

Grab 3.0.0                  13 November, 2024                    GRAB(1)
```


[1]: https://doc.cat-v.org/bell_labs/structural_regexps/se.pdf