m4_define(HL, ‘‘m4_patsubst($1, ‘‘<\([^>]*\)>’’, ‘‘@span .hl-red {=\1}’’)’’)m4_dnl html lang="en" { head { HEAD } body { header { div { h1 {-Reinvent The Wheel!} INCLUDE(nav.gsp) } figure .quote { blockquote { p {= You have to do what must be done. Nobody is going to ask you, “why didn’t you make it?”. It’s either do it or not. Do not think about what you’re feeling, do it no matter what. } } figcaption {-Haroon Khan} } } main { h2 #story {-Story of a Software Engineer} p {- It was your average Wednesday afternoon, and I was working my job. My specific task on this day was quite simple: document our custom Vue components that make up most of our products UI. } p {- This should be a relatively easy task and for the most part it was, but I had an issue. Some of these components had some @em{-really} obscure properties that could influence their behavior, and seeing as much of the codebase was written 10 years ago by utter idiots, the code implementing these properties is @em{-really} hard to read. } p {- I decided that it would be quite a bit easier to instead of trying to study the @em{-definitions} of these properties, to try to study the @em{-usage} of these properties. But how do I find them? Our codebase is hundreds of thousands of lines of code, and these properties have very generic names such as ‘@em{-browser}’. Additionally while the components are easy to search for, they’re used in hundreds of places and such properties may only be used once or twice. } p {- The solution? I thought it would be the trusty tool in every hackers toolbelt: @code{-grep}. } h2 #downfall {-The Downfall of Grep} p {- I thought that @code{-grep} would be my saviour. The tool that would answer the call to find me the usages I so desired. So I whipped that baby out and went straight to work: } figure { pre { FMT_CODE(grep.sh) } } p {- You can probably tell from the fact I’m writing this post that this did not work. If you’ve ever worked with Vue or something similar, you might even be able to figure out why. For those unfamiliar with the frontend (you’re a treasure that must be preserved), allow me to show you something that is all too common in a Vue codebase: } figure { pre .vue { FMT_CODE(example.vue) } } p {- The issue here is clear: the property we’re searching for (‘browser’) is on an entirely different line from the component we’re searching for (‘@code{-}’). It’s not enough to search for just the component because it’s used everywhere but only a few rare usages interest me, and it’s not enough to search for just the attribute because many different components have attributes of the same name (and @em{-no} they don’t have the same behavior; the codebase is shit). } p {- What I need is a tool that will let me search for patterns that span multiple lines. } h2 #grab {-Introducing Grab} figure .quote { blockquote { p {= The current UNIX® text processing tools are weakened by the built-in concept of a line. There is a simple notation that can describe the ‘shape’ of files when the typical array-of-lines picture is inadequate. That notation is regular expressions. Using regular expressions to describe the structure in addition to the contents of files has interesting applications, and yields elegant methods for dealing with some problems the current tools handle clumsily. When operations using these expressions are composed, the result is reminiscent of shell pipelines. } } figcaption {-Rob Pike} } p {- That quote is from the abstract of @cite {-Structural Regular Expressions}, a paper written by the one and only Rob Pike back in 1987. It describes an idea by which we stop assuming that all data is organized in lines, and instead use regular expressions to define the shapes comprising our data. } p {- I actually had read this paper some years ago and it had always sat in the back of my mind. I had actually toyed around in the past with an implementation of @code{-grep} that wasn’t strictly line-oriented, but it was very bare-bones, and lacked basic faculties such as reporting the positions of matches, something I desperately needed. } p {- So over the following few days I made major changes, rewrote lots of the code, and overall turned my tool — @code{-grab} — into a staple part of my hackers toolbelt. } h2 #how {-How Grab Finds Text} p {- If you’re familiar with the UNIX environment, you’re probably used to querying text with tools such as @code{-sed} and @code{-awk} using regular expressions. These are the same regular expressions we as programmers all know and love, but with one important — yet often overlooked — characteristic: you cannot match the newline. } p {- The @code{-grab} utility moves away from this limiting paradigm; the newline is treated no differently from another other character you want to match. Want to match an entire paragraph of text? The pattern is as simple as ‘@code{-[^\\n].‌+?(?=\\n\\n|$)}’. It may look complicated if you’re new to regular expressions — PCREs to be specific — but it’s really quite simple. You just match a non-newline character, and then as many characters as possible until reaching either a double newline, or the end of input. } p {- On its own this isn’t too amazing though. The great thing of @code{-grep} is that it doesn’t just show you matches, but it shows you them in the context of a complete line. @code{-grab} solves this in the same way described in Rob Pike’s paper: chaining operations. } p {- Say we want to iterate not over lines but over paragraphs. We can use the following @em{-pattern}: } figure { pre { FMT_CODE(x.pat) } } p {- Here we’re using the ‘x’ operator. It iterates over all occurrences of the pattern. In this case we’re iterating over all paragraphs in our input. Maybe we want to see all paragraphs which contain doubled words (for example: ‘the the’), a common typo found in text files. For this we can use the ‘g’ operator: } figure { pre { FMT_CODE(g.pat) } } p {- The fundamental difference between the two operators is that the ‘x’ operator specifies the structure to iterate over. In the context of @code{-grep} these are lines, but in @code{-grab} they can be whatever you want. The ‘g’ operator on the other hand doesn’t modify the structure of the matches returned to you at all; it simply acts as a filter selecting matches with match the given regular expression. } p {- Here’s an interactive example: } figure { pre { FMT_CODE(example-1.sh) } } p {- This is almost perfect; there’s just one bit missing. In my interactive example I’ve shown how you can use the power of @code{-grab} to find paragraphs in your files containing doubled words. This is really handy if you find yourself writing websites, documentation, or other long-form written content. } p {- Given my example though, how easily were you able to spot the doubled words? It probably didn’t stick out to you right away, unlike if it had been highlighted by some bright flashy color. It is for this reason that the ‘h’ operator exists. This operator is unique in that it does not change the given selections at all. Any matches made by previous occurrences of ‘x' and ‘g’ will be displayed the same with and without the use of ‘h’. } p {- The ‘h’ operators is purely for the user. By using this operator you can specify a pattern for which matching text must be @em{-highlighted}. Let’s apply it to the previous example and see how the doubled words are made instantly obvious to the user: } figure { pre { HL(FMT_CODE(example-2.sh)) } } p {- There is an obvious problem here: the duplication of the regular expression provided to the ‘g’ and ‘h’ operators. It is @em{-extremely} common that you will want to highlight text that was just matched by a ‘g’ operator. Like, @em{-really} common. So common in fact that the ‘h’ operator supports a shorthand syntax for this exact situation: @code {-h//}. Giving an empty regular expression as an argument to an operator is illegal with the exception of the ‘h’ operator. When this operator is given an empty argument, it assumes the regular expression of the previous operator: } figure { pre { HL(FMT_CODE(example-3.sh)) } } h2 #final {-Final Solution} p {- So… what was the final solution to my problem? How did I find all the @code{-} tags in my jobs codebase that were passed the ‘browser’ attribute? Well here’s how: } figure { pre { FMT_CODE(answer.sh) } } p {- Quick, simple, and elegant. Just the way I like it! } h2 #more {-Additional Operators} p {- Here I’ve shown you the 3 main operators: ‘x’, ‘g’, and ‘h’. These are not all however! Each operator also has a capital variant (‘X’, ‘G’, ‘H’) which behaves the same but instead of working on text that matches the given pattern, these operators match on text which @em{-doesn’t} match the given pattern. } p {- These operators allow for better pattern matching. For example a pattern to match all numbers which contain a ‘3’ but which aren’t ‘1337’ could be written as @code{-x/[0-9]+/ g/3/ G/^1337$/}. } } footer { FOOT } } }