summaryrefslogtreecommitdiffhomepage
path: root/src/prj/mmv/index.html
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2023-08-15 14:57:32 +0200
committerThomas Voss <mail@thomasvoss.com> 2023-08-15 14:57:32 +0200
commitd5635e946e9df6f519ec8cf08cebfc35dbe6c788 (patch)
tree46893cffdf23a2b15f8b7839c69d5df2bcbb8bca /src/prj/mmv/index.html
parentcfa35dcb2d332977e80a5811b6d42e9949bd4814 (diff)
Add a post on ‘mmv’
Diffstat (limited to 'src/prj/mmv/index.html')
-rw-r--r--src/prj/mmv/index.html658
1 files changed, 658 insertions, 0 deletions
diff --git a/src/prj/mmv/index.html b/src/prj/mmv/index.html
new file mode 100644
index 0000000..d13f7c8
--- /dev/null
+++ b/src/prj/mmv/index.html
@@ -0,0 +1,658 @@
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ m4_include(head.html)
+ </head>
+ <body>
+ <header>
+ <div>
+ <h1>Moving Files the Right Way</h1>
+ m4_include(nav.html)
+ </div>
+
+ <figure class="quote">
+ <blockquote>
+ <p>I think the OpenBSD crowd is a bunch of masturbating
+ monkeys, in that they make such a big deal about
+ concentrating on security to the point where they pretty much
+ admit that nothing else matters to them.</p>
+ </blockquote>
+ <figcaption>
+ Linux Torvalds
+ </figcaption>
+ </figure>
+ </header>
+
+ <main>
+ <p>
+ <em>
+ You can find the <code>mmv</code> git repository over at
+ <a href="https://git.sr.ht/~mango/mmv"
+ target="_blank">sourcehut</a>
+ or <a href="https://github.com/Mango0x45/mmv">GitHub</a>.
+ </em>
+ </p>
+
+ <h2>Table of Contents</h2>
+
+ <ul>
+ <li><a href="#prologue">Prologue</a></li>
+ <li><a href="#moving">Advanced Moving and Pitfalls</a></li>
+ <li><a href="#mapping">Name Mapping with <code>mmv</code></a></li>
+ <li><a href="#newlines">Filenames with Embedded Newlines</a></li>
+ <ul>
+ <li><a href="#0-flag">The Simple Case</a></li>
+ <li><a href="#e-flag">Encoding Newlines</a></li>
+ </ul>
+ <li><a href="#i-flag">Individual Execution</a></li>
+ <li><a href="#safety">Safety</a></li>
+ <li><a href="#examples">Examples</a></li>
+ </ul>
+
+ <h2 id="prologue">Prologue</h2>
+ <p>
+ File moving and renaming is one of the most common tasks we
+ undertake on the command-line. We basically always do this with
+ the <code>mv</code> utility, and it gets the job done most of the
+ time. Want to rename one file? Use <code>mv</code>! Want to
+ move a bunch of files into a directory? Use <code>mv</code>!
+ How could mv ever go wrong? Well I’m glad you asked!
+ </p>
+
+ <h2 id="moving">Advanced Moving and Pitfalls</h2>
+ <p>
+ Let’s start off nice and simple. You just inherited a C project
+ that uses the sacrilegious
+ <a
+ href="https://en.wikipedia.org/wiki/Camel_case"
+ target="_blank"
+ >camelCase</a>
+ naming convention for its files:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(ls-files.sh.html)</pre>
+ </figure>
+
+ <p>
+ This deeply upsets you, as it upsets me. So you decide you want
+ to switch all these files to use
+ <a
+ href="https://en.wikipedia.org/wiki/Snake_case"
+ target="_blank"
+ >snake_case</a>,
+ like a normal person. Well how would you do this? You use
+ <code>mv</code>! This is what you might end up doing:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(manual-mv.sh.html)</pre>
+ </figure>
+
+ <p>
+ Well… it works I guess, but it’s a pretty shitty way of renaming
+ these files. Luckily we only had 5, but what if this was a much
+ larger project with many more files to rename? Things would get
+ tedious. So instead we can use a pipeline for
+ this:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(camel-to-snake-naïve.sh.html)</pre>
+ </figure>
+
+ <aside>
+ <p>
+ The given example assumes your <code>sed</code>
+ implementation supports ‘<code>\L</code>’ which is a
+ non-standard <abbr class="gnu">GNU</abbr> extension.
+ </p>
+ </aside>
+
+ <p>
+ That works and it gets the job done, but it’s not really ideal is
+ it? There are a couple of issues with this.
+ </p>
+
+ <ol>
+ <li>
+ <p>
+ You’re writing more complicated code. This has the
+ obvious drawback of potentially being more error-prone,
+ but also risks taking more time to write than you’d like
+ as you might have forgotten if <code>xargs</code>
+ actually has an ‘<code>-L</code>’ option or not (which
+ would require reading the
+ <a href="https://www.man7.org/linux/man-pages/man1/xargs.1.html"
+ target="_blank" ><code>xargs(1)</code></a> manual).
+ </p>
+ </li>
+ <li>
+ <p>
+ If you try to rename the file <em>foo</em>
+ to <em>bar</em> but <em>bar</em> already exists, you end
+ up deleting a file you may not have wanted to.
+ </p>
+ </li>
+ <li>
+ <p>
+ In a similar vein to the previous point, you need to be
+ very careful about schemes like renaming the
+ file <em>a</em> to <em>b</em> and <em>b</em>
+ to <em>c</em>. You run the risk of turning <em>a</em>
+ into <em>c</em> and losing the file <em>b</em> entirely.
+ </p>
+ </li>
+ <li>
+ <p>
+ Moving symbolic links is its own whole can of worms. If
+ a symlink points to a relative location then you need to
+ make sure you keep pointing to the right place. If the
+ symlink is absolute however then you can leave it
+ untouched. But what if the symlink points to a file
+ that you’re moving as part of your batch move operation?
+ Now you need to handle that too.
+ </p>
+ </li>
+ </ol>
+
+ <h2 id="mapping">Name Mapping with <code>mmv</code></h2>
+
+ <p>
+ What is <code>mmv</code>? It’s the solution to all your
+ problems, that’s what it is! <code>mmv</code> takes as its
+ argument(s) a utility and that utilities arguments and uses that
+ to create a mapping between old and new filenames — similar to
+ the <code>map()</code> function found in many programming
+ languages. I think to best convey how the tool functions, I
+ should provide an example. Let’s try to do the same thing we did
+ previously where we tried to turn camelCase files to snake_case,
+ but using <code>mmv</code>:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(camel-to-snake-smart.sh.html)</pre>
+ </figure>
+
+ <p>Let me break down how this works.</p>
+
+ <p>
+ <code>mmv</code> starts by reading a series of filenames
+ separated by newlines from the standard input. Yes, sometimes
+ filenames have newlines in them and yes there is a way to handle
+ them but I shall get to that later. The filenames that
+ <code>mmv</code> reads from the standard input will be referred
+ to as the <em>input files</em>. Once all the input files have
+ been read, the utility specified by the arguments is spawned; in
+ this case that would be <code>sed</code> with the argument
+ <code>'s/[A-Z]/\L_&/g'</code>. The input files are then piped
+ into <code>sed</code> the exact same way that they would have
+ been if we ran the above commands without <code>mmv</code>, and
+ the output of <code>sed</code> then forms what will be referred
+ to as the <em>output files</em>. Once a complete list of output
+ files is accumulated, each input file gets renamed to its
+ corresponding output file.
+ </p>
+
+ <p>
+ Let’s look at a simpler example. Say we want to rename 2 files
+ in the current directory to use lowercase letters, we could use
+ the following command:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(mmv-tr.sh.html)</pre>
+ </figure>
+
+ <p>
+ In the above example <code>mmv</code> reads 2 lines from
+ standard input, those being <em>LICENSE</em>
+ and <em>README</em>. Those are our 2 input files now.
+ The <code>tr</code> utility is then spawned and the input files
+ are piped into it. We can simulate this in the shell:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(tr.sh.html)</pre>
+ </figure>
+
+ <p>
+ As you can see above, <code>tr</code> has produced 2 lines of
+ output; these are our 2 output files. Since we now have our 2
+ input files and 2 output files, <code>mmv</code> can go ahead
+ and rename the files. In this case it will rename
+ <em>LICENSE</em> to <em>license</em> and
+ <em>README</em> to <em>readme</em>. For some examples, check
+ the <a href="#examples">examples</a> section of this page down
+ below.
+ </p>
+
+ <h2 id="newlines">Filenames with Embedded Newlines</h2>
+
+ <p>
+ People are retarded, and as a result we have filenames with
+ newlines in them. All it would have taken to solve this issue
+ for everyone was for literally <strong>anybody</strong> during
+ the early UNIX days to go “<em>hey, this is a bad idea!</em>”,
+ but alas, we must deal with this. Newlines are of course not
+ the only special characters filenames can contain, but they are
+ the single most infuriating to deal with; the UNIX utilities all
+ being line-oriented really doesn’t work well with these files.
+ </p>
+
+ <p>
+ So how does <code>mmv</code> deal with special characters, and
+ newlines in particular? Well it does so by providing the user
+ with the <code>-0</code> and <code>-e</code> flags:
+ </p>
+
+ <dl>
+ <dt><code>-0</code></dt>
+ <dd>
+ <p>
+ Tell <code>mmv</code> to expect its input to not be
+ separated by newlines (‘<code>\n</code>’), but by NUL
+ bytes (‘<code>\0</code>’). NUL bytes are the only
+ characters not allowed in filenames besides forward
+ slashes, so they are an obvious choice for an
+ alternative separator.
+ </p>
+ </dd>
+ <dt><code>-e</code></dt>
+ <dd>
+ <p>
+ Encode newlines in filenames before passing them to the
+ provided utility. Newline characters are replaced by the
+ literal string ‘<code>\n</code>’ and backslashes by the
+ literal string ‘<code>\\</code>’. After processing, the
+ resulting output is decoded again.
+ </p>
+ <p>
+ If combined with the <code>-0</code> flag, then while
+ input will be read assuming a NUL-byte input-seperator,
+ the encoded input files will be written to the spawned
+ process newline-seperated.
+ </p>
+ </dd>
+ </dl>
+
+ <h3 id="0-flag">The Simple Case</h3>
+
+ <p>
+ In order to better understand these flags and how they work
+ let’s go though another example. We have 2 files — one with and
+ one without an embedded newline — and our goal is to simply
+ reverse these filenames. In this example I am going to be
+ displaying newlines in filenames with the “<code>$'\n'</code>”
+ syntax as this is how my shell displays embedded newlines.
+ </p>
+
+ <p>
+ We can start by just trying to naïvely pass these 2 files
+ to <code>mmv</code> and use <code>rev</code> to reverse the
+ names, but this doesn’t work:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(mmv-rev.sh.html)</pre>
+ </figure>
+
+ <p>
+ The reason this doesn’t work is because due to the line-oriented
+ nature of <code>ls</code> and <code>rev</code>, we are actually
+ trying to rename the files <em>foo</em>, <em>bar</em>, and
+ <em>baz</em> to the new filenames <em>zab</em>,
+ <em>rab</em>, and <em>oof</em>. As can be seen in the following
+ diagram, the embedded newline is causing our input to be ambiguous
+ and <code>mmv</code> can’t reliably proceed
+ anymore <x-ref>1</x-ref>:
+ </p>
+
+ <figure>
+ <object data="conflict.svg" type="image/svg+xml"></object>
+ </figure>
+
+ <aside>
+ <p data-ref="1">
+ The reason you get a cryptic “file not found” error message
+ is because <code>mmv</code> tries to assert that all the
+ input files actually exist before doing anything. Since
+ “foo” isn’t a real file, we error out.
+ </p>
+ </aside>
+
+ <p>
+ The first thing we need to do in order to proceed is to pass
+ the <code>-0</code> flag to <code>mmv</code>. This will
+ tell <code>mmv</code> that we want to use the NUL-byte as our
+ input separator and not the newline. We also need <code>ls</code>
+ to actually provide us with the filenames delimited by NUL-bytes.
+ Luckily <abbr class="gnu">GNU</abbr> <code>ls</code> gives us the
+ <code>--zero</code> flag to do just that:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(mmv-rev-zero.sh.html)</pre>
+ </figure>
+
+ <p>
+ So we’re getting places, but we aren’t quite there yet. The
+ issue we’re getting now is that <code>mmv</code> recieved 2
+ input files from the standard input, but <code>rev</code>
+ produced 3 output files. Why is that? Well let’s try our hand
+ at a little bit of command-line debugging with <code>sed</code>:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(sed-debugging.sh.html)</pre>
+ </figure>
+
+ <p>
+ If you aren’t quite sure what the above is doing, here’s a quick
+ summary:
+ </p>
+
+ <ul>
+ <li>
+ The <code>-U</code> flag given to <code>ls</code> tells it
+ not to sort our output. This is purely just to keep this
+ example clear to the reader.
+ </li>
+ <li>
+ The <code>-n</code> flag given to <code>sed</code> tells it
+ not to print the input line automatically at the end of the
+ provided script.
+ </li>
+ <li>
+ The <code>l</code> command in <code>sed</code> prints the
+ current input in a “visually unambiguous form”.
+ </li>
+ </ul>
+
+ <p>
+ In the <code>sed</code> output, we can see that <samp>$</samp>
+ represents the end of a line, and <samp>\000</samp> represents
+ the NUL-byte. All looks good here, we have two inputs seperated
+ by NUL-bytes. Now let’s try to throw in <code>rev</code>:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(sed-debugging-rev.sh.html)</pre>
+ </figure>
+
+ <p>
+ Well wouldn’t you know it? Since <code>rev</code> <em>also</em>
+ works with newline-seperated input, it reversed out NUL-byte
+ seperators and now gives us 3 outputs. Luckily the folks over
+ at <em>util-linux</em> provided us with the <code>-0</code> flag
+ here too, so that we can properly handle NUL-delimited input.
+ Combining all of this together we get a final working product:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(reverse-embedded-newline.sh.html)</pre>
+ </figure>
+
+ <h3 id="e-flag">Encoding Newlines</h3>
+
+ <p>
+ Sometimes we want to rename a bunch of files, but the command we
+ want to use doesn’t support NUL-bytes as nicely as we would
+ like. In these cases, you may want to consider encoding your
+ newline characters into the literal string ‘<code>\n</code>’ and
+ then passing your input newline-seperated to your given command
+ with the <code>-e</code> flag.
+ </p>
+
+ <p>
+ For a real-world example, perhaps you want to edit some
+ filenames in vim, or whatever other editor you use. Well we can
+ do this incredibly easily with the <code>vipe</code> utility
+ from
+ the <a href="https://joeyh.name/code/moreutils/">moreutils</a>
+ collection. The <code>vipe</code> command simply reads input
+ from the standard input, opens it up in your editor, and then
+ prints the resulting output to the standard output; perfect
+ for <code>mmv</code>! We do not really want to deal with
+ NUL-bytes in our text-editor though, so let’s just encode our
+ newlines:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(vipe.sh.html)</pre>
+ </figure>
+
+ <aside>
+ <p>
+ Notice how you still need to pass the <code>-0</code> flag
+ to <code>mmv</code> know that our inputfiles may have
+ embedded newlines.
+ </p>
+ </aside>
+
+ <p>
+ When running the above code example, you will see the following
+ in your editor:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(vim.html)</pre>
+ </figure>
+
+ <p>
+ After you exit your editor, <code>mmv</code> will decode all
+ occurances of ‘<code>\n</code>’ back into a newline, and all
+ occurances of ‘<code>\\</code>’ back into a backslash:
+ </p>
+
+ <figure>
+ <object data="e-flag.svg" type="image/svg+xml"></object>
+ </figure>
+
+ <h2 id="i-flag">Individual Execution</h2>
+ <p>
+ The previous examples are great and all, but what do you do if
+ your mapping command doesn’t have the concept of an input
+ seperator at all? This is where the <code>-i</code> flag comes
+ into play. With the <code>-i</code> flag we can
+ get <code>mmv</code> to execute our mapping command for every
+ input filename. This means that as long as we can work with a
+ complete buffer, we don’t need to worry about seperators.
+ </p>
+
+ <p>
+ To be honest, I cannot really think of any situation where you
+ might actually need to do this. If you can think of one,
+ please <a href="mailto:mail@thomasvoss.com">email me</a> and
+ I’ll update the example on this page. Regardless, let’s imagine
+ that we wanted to rename some files so that their filenames are
+ replaced with their filename
+ <a href="https://en.wikipedia.org/wiki/SHA-1" target="_blank">
+ SHA-1 hash</a>.
+ On Linux we have the <code>sha1sum</code> program which reads
+ input from the standard input and outputs the SHA-1 hash. This
+ is how we would use it with <code>mmv</code>:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(sha1sum-long-example.sh.html)</pre>
+ </figure>
+
+ <p>
+ Another approach is to invoke <code>mmv</code> twice:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(sha1sum-short-example.sh.html)</pre>
+ </figure>
+
+ <p>
+ If you are confused about why we need to make a call
+ to <code>awk</code>, it’s because the <code>sha1sum</code>
+ program outputs 2 columns of data. The first column is our hash
+ and the second column is the filename where the to-be-hashed
+ data was read from. We don’t want the second column.
+ </p>
+
+ <p>
+ Unlike in previous examples where one process was spawned to map
+ all our filenames, with the <code>-i</code> flag we are spawning
+ a new instance for each filename. If you struggle to visualize
+ this, perhaps the following diagrams help:
+ </p>
+
+ <figure>
+ <figcaption>Invoking <code>mmv</code> without <code>-i</code></figcaption>
+ <object data="without-i-flag.svg" type="image/svg+xml"></object>
+ </figure>
+
+ <figure>
+ <figcaption>Invoking <code>mmv</code> with <code>-i</code></figcaption>
+ <object data="with-i-flag.svg" type="image/svg+xml"></object>
+ </figure>
+
+ <h2 id="safety">Safety</h2>
+ <p>
+ When compared to the standard <code>for f in *; do mv $f …;
+ done</code> or <code>ls | … | xargs -L2 mv</code>
+ constructs, <code>mmv</code> is significantly more safe to use.
+ These are some of the safety features that are built into the
+ tool:
+ </p>
+
+ <ol>
+ <li>
+ If the number of input- and output files differs, execution
+ is aborted before making any changes.
+ </li>
+ <li>
+ If an input file is renamed to the name of another input
+ file, the second input file is not lost (i.e. you can rename
+ <em>a</em> to <em>b</em> and <em>b</em> to <em>a</em> with
+ no problem).
+ </li>
+ <li>
+ All input files must be unique and all output files must be
+ unique. Otherwise execution is aborted before making any
+ changes.
+ </li>
+ <li>
+ In the case that something goes wrong during execution
+ (perhaps you tried to move a file to a non-existant
+ directory, or a syscall failed), a backup of your input
+ files is saved automatically by <code>mmv</code> for
+ recovery.
+ </li>
+ </ol>
+
+ <p>
+ Due to the way <code>mmv</code> handles #2, when things do go
+ wrong you may find that all of your input files have
+ disappeared. Don’t worry though, <code>mmv</code> takes a
+ backup of your code before doing anything. If you
+ run <code>mmv</code> with the <code>-v</code> option for verbose
+ output, you’ll notice it backing up your stuff in
+ the <code>$XDG_CACHE_DIR</code> directory:
+ </p>
+
+ <figure>
+ <pre>m4_fmt_code(mmv-verbose.sh.html)</pre>
+ </figure>
+
+ <p>
+ Upon successful execution
+ the <code>$XDG_CACHE_DIR/mmv/TIMESTAMP</code> directory will be
+ automatically removed, but it remains when things go wrong so
+ that you can recover any missing data. The names of the
+ backup-subdirectories in the <code>$XDG_CACHE_DIR/mmv</code>
+ directory are timestamps of when the directories were created.
+ This should make it easier for you to figure out which directory
+ you need to recover if you happen to have multiple of these.
+ </p>
+
+ <h2 id="examples">Examples</h2>
+
+ <aside>
+ <p>
+ All of these examples are ripped straight from
+ the <code>mmv(1)</code> manual page. If you
+ installed <code>mmv</code> through a package manager or
+ via <code>make install</code> then you should have the
+ manual installed on your system.
+ </p>
+ </aside>
+
+ <p>Swap the files <em>foo</em> and <em>bar</em>:</p>
+ <figure>
+ <pre>m4_fmt_code(examples/swap.sh.html)</pre>
+ </figure>
+
+ <p>
+ Rename all files in the current directory to use hyphens (‘-’)
+ instead of spaces:
+ </p>
+ <figure>
+ <pre>m4_fmt_code(examples/hyphens.sh.html)</pre>
+ </figure>
+
+ <p>
+ Rename a given list of movies to use lowercase letters and
+ hyphens instead of uppercase letters and spaces, and number them
+ so that they’re properly ordered in globs (e.g. rename <em>The
+ Return of the King.mp4</em> to
+ <em>02-the-return-of-the-king.mp4</em>):
+ </p>
+ <figure>
+ <pre>m4_fmt_code(examples/number.sh.html)</pre>
+ </figure>
+
+ <p>
+ Rename files interactively in your editor while encoding newline
+ into the literal string ‘<code>\n</code>’, making use
+ of <code><a href="https://linux.die.net/man/1/vipe"
+ target="_blank">vipe(1)</a></code> from <em>moreutils</em>:
+ </p>
+ <figure>
+ <pre>m4_fmt_code(examples/vipe.sh.html)</pre>
+ </figure>
+
+ <p>
+ Rename all C source code- and header files in a git repository
+ to use snake_case instead of camelCase using
+ the <abbr class="gnu">GNU</abbr>
+ <code><a href="https://www.man7.org/linux/man-pages/man1/sed.1.html"
+ target="_blank">sed(1)</a></code> ‘<code>\n</code>’ extension:
+ </p>
+ <figure>
+ <pre>m4_fmt_code(examples/camel-to-snake.sh.html)</pre>
+ </figure>
+
+ <p>
+ Lowercase all filenames within a directory hierarchy which may
+ contain newline characters:
+ </p>
+ <figure>
+ <pre>m4_fmt_code(examples/lowercase.sh.html)</pre>
+ </figure>
+
+ <p>
+ Map filenames which may contain newlines in the current
+ directory with the command ‘<code>cmd</code>’, which itself does
+ not support nul-byte separated entries. This only works
+ assuming your mapping doesn’t require any context outside of the
+ given input filename (for example, you would not be able to
+ number your files as this requires knowledge of the input files
+ position in the input list):
+ </p>
+ <figure>
+ <pre>m4_fmt_code(examples/i-flag.sh.html)</pre>
+ </figure>
+ </main>
+
+ <hr>
+
+ <footer>
+ m4_footer
+ </footer>
+ </body>
+</html>