summaryrefslogtreecommitdiffhomepage
path: root/src/prj/mmv/index.html
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2023-09-11 05:15:20 +0200
committerThomas Voss <mail@thomasvoss.com> 2023-09-11 05:15:20 +0200
commitbda44e93541fa478abf3ce4b3461f026a90fa8cb (patch)
treea62a7e1d456effe914a77b45f66485c3e8bfd92d /src/prj/mmv/index.html
parentced3ed9ddde25614bbc9777a5d546eee2a44a2e0 (diff)
Move the site from HTML to GSP
Diffstat (limited to 'src/prj/mmv/index.html')
-rw-r--r--src/prj/mmv/index.html667
1 files changed, 0 insertions, 667 deletions
diff --git a/src/prj/mmv/index.html b/src/prj/mmv/index.html
deleted file mode 100644
index 09aadb1..0000000
--- a/src/prj/mmv/index.html
+++ /dev/null
@@ -1,667 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
- <head>
- m4_include(head.html)
- </head>
- <body>
- <header>
- <div>
- <h1>Moving Files the Right Way</h1>
- m4_include(nav.html)
- </div>
-
- <figure class="quote">
- <blockquote>
- <p>I think the OpenBSD crowd is a bunch of masturbating
- monkeys, in that they make such a big deal about
- concentrating on security to the point where they pretty much
- admit that nothing else matters to them.</p>
- </blockquote>
- <figcaption>
- Linux Torvalds
- </figcaption>
- </figure>
- </header>
-
- <main>
- <p>
- <em>
- You can find the <code>mmv</code> git repository over at
- <a href="https://git.sr.ht/~mango/mmv" target="_blank">sourcehut</a>
- or <a href="https://github.com/Mango0x45/mmv"
- target="_blank">GitHub</a>.
- </em>
- </p>
-
- <p>
- NOTE: As of the
- <a href="https://git.sr.ht/~mango/mmv/refs/v1.2.0">v1.2.0</a> release
- there is now also the <code>mcp</code> utility. It behaves the same as
- the <code>mmv</code> utility but it copies files instead of moving them.
- It also doesn’t support the ‘<code>-n</code>’ flag as it doesn’t need to
- deal with backups.
- </p>
-
- <h2>Table of Contents</h2>
-
- <ul>
- <li><a href="#prologue">Prologue</a></li>
- <li><a href="#moving">Advanced Moving and Pitfalls</a></li>
- <li><a href="#mapping">Name Mapping with <code>mmv</code></a></li>
- <li><a href="#newlines">Filenames with Embedded Newlines</a></li>
- <ul>
- <li><a href="#0-flag">The Simple Case</a></li>
- <li><a href="#e-flag">Encoding Newlines</a></li>
- </ul>
- <li><a href="#i-flag">Individual Execution</a></li>
- <li><a href="#safety">Safety</a></li>
- <li><a href="#examples">Examples</a></li>
- </ul>
-
- <h2 id="prologue">Prologue</h2>
- <p>
- File moving and renaming is one of the most common tasks we
- undertake on the command-line. We basically always do this with
- the <code>mv</code> utility, and it gets the job done most of the
- time. Want to rename one file? Use <code>mv</code>! Want to
- move a bunch of files into a directory? Use <code>mv</code>!
- How could mv ever go wrong? Well I’m glad you asked!
- </p>
-
- <h2 id="moving">Advanced Moving and Pitfalls</h2>
- <p>
- Let’s start off nice and simple. You just inherited a C project
- that uses the sacrilegious
- <a
- href="https://en.wikipedia.org/wiki/Camel_case"
- target="_blank"
- >camelCase</a>
- naming convention for its files:
- </p>
-
- <figure>
- <pre>m4_fmt_code(ls-files.sh.html)</pre>
- </figure>
-
- <p>
- This deeply upsets you, as it upsets me. So you decide you want
- to switch all these files to use
- <a
- href="https://en.wikipedia.org/wiki/Snake_case"
- target="_blank"
- >snake_case</a>,
- like a normal person. Well how would you do this? You use
- <code>mv</code>! This is what you might end up doing:
- </p>
-
- <figure>
- <pre>m4_fmt_code(manual-mv.sh.html)</pre>
- </figure>
-
- <p>
- Well… it works I guess, but it’s a pretty shitty way of renaming
- these files. Luckily we only had 5, but what if this was a much
- larger project with many more files to rename? Things would get
- tedious. So instead we can use a pipeline for
- this:
- </p>
-
- <figure>
- <pre>m4_fmt_code(camel-to-snake-naïve.sh.html)</pre>
- </figure>
-
- <aside>
- <p>
- The given example assumes your <code>sed</code>
- implementation supports ‘<code>\L</code>’ which is a
- non-standard <abbr class="gnu">GNU</abbr> extension.
- </p>
- </aside>
-
- <p>
- That works and it gets the job done, but it’s not really ideal is
- it? There are a couple of issues with this.
- </p>
-
- <ol>
- <li>
- <p>
- You’re writing more complicated code. This has the
- obvious drawback of potentially being more error-prone,
- but also risks taking more time to write than you’d like
- as you might have forgotten if <code>xargs</code>
- actually has an ‘<code>-L</code>’ option or not (which
- would require reading the
- <a href="https://www.man7.org/linux/man-pages/man1/xargs.1.html"
- target="_blank" ><code>xargs(1)</code></a> manual).
- </p>
- </li>
- <li>
- <p>
- If you try to rename the file <em>foo</em>
- to <em>bar</em> but <em>bar</em> already exists, you end
- up deleting a file you may not have wanted to.
- </p>
- </li>
- <li>
- <p>
- In a similar vein to the previous point, you need to be
- very careful about schemes like renaming the
- file <em>a</em> to <em>b</em> and <em>b</em>
- to <em>c</em>. You run the risk of turning <em>a</em>
- into <em>c</em> and losing the file <em>b</em> entirely.
- </p>
- </li>
- <li>
- <p>
- Moving symbolic links is its own whole can of worms. If
- a symlink points to a relative location then you need to
- make sure you keep pointing to the right place. If the
- symlink is absolute however then you can leave it
- untouched. But what if the symlink points to a file
- that you’re moving as part of your batch move operation?
- Now you need to handle that too.
- </p>
- </li>
- </ol>
-
- <h2 id="mapping">Name Mapping with <code>mmv</code></h2>
-
- <p>
- What is <code>mmv</code>? It’s the solution to all your
- problems, that’s what it is! <code>mmv</code> takes as its
- argument(s) a utility and that utilities arguments and uses that
- to create a mapping between old and new filenames — similar to
- the <code>map()</code> function found in many programming
- languages. I think to best convey how the tool functions, I
- should provide an example. Let’s try to do the same thing we did
- previously where we tried to turn camelCase files to snake_case,
- but using <code>mmv</code>:
- </p>
-
- <figure>
- <pre>m4_fmt_code(camel-to-snake-smart.sh.html)</pre>
- </figure>
-
- <p>Let me break down how this works.</p>
-
- <p>
- <code>mmv</code> starts by reading a series of filenames
- separated by newlines from the standard input. Yes, sometimes
- filenames have newlines in them and yes there is a way to handle
- them but I shall get to that later. The filenames that
- <code>mmv</code> reads from the standard input will be referred
- to as the <em>input files</em>. Once all the input files have
- been read, the utility specified by the arguments is spawned; in
- this case that would be <code>sed</code> with the argument
- <code>'s/[A-Z]/\L_&/g'</code>. The input files are then piped
- into <code>sed</code> the exact same way that they would have
- been if we ran the above commands without <code>mmv</code>, and
- the output of <code>sed</code> then forms what will be referred
- to as the <em>output files</em>. Once a complete list of output
- files is accumulated, each input file gets renamed to its
- corresponding output file.
- </p>
-
- <p>
- Let’s look at a simpler example. Say we want to rename 2 files
- in the current directory to use lowercase letters, we could use
- the following command:
- </p>
-
- <figure>
- <pre>m4_fmt_code(mmv-tr.sh.html)</pre>
- </figure>
-
- <p>
- In the above example <code>mmv</code> reads 2 lines from
- standard input, those being <em>LICENSE</em>
- and <em>README</em>. Those are our 2 input files now.
- The <code>tr</code> utility is then spawned and the input files
- are piped into it. We can simulate this in the shell:
- </p>
-
- <figure>
- <pre>m4_fmt_code(tr.sh.html)</pre>
- </figure>
-
- <p>
- As you can see above, <code>tr</code> has produced 2 lines of
- output; these are our 2 output files. Since we now have our 2
- input files and 2 output files, <code>mmv</code> can go ahead
- and rename the files. In this case it will rename
- <em>LICENSE</em> to <em>license</em> and
- <em>README</em> to <em>readme</em>. For some examples, check
- the <a href="#examples">examples</a> section of this page down
- below.
- </p>
-
- <h2 id="newlines">Filenames with Embedded Newlines</h2>
-
- <p>
- People are retarded, and as a result we have filenames with
- newlines in them. All it would have taken to solve this issue
- for everyone was for literally <strong>anybody</strong> during
- the early UNIX days to go “<em>hey, this is a bad idea!</em>”,
- but alas, we must deal with this. Newlines are of course not
- the only special characters filenames can contain, but they are
- the single most infuriating to deal with; the UNIX utilities all
- being line-oriented really doesn’t work well with these files.
- </p>
-
- <p>
- So how does <code>mmv</code> deal with special characters, and
- newlines in particular? Well it does so by providing the user
- with the <code>-0</code> and <code>-e</code> flags:
- </p>
-
- <dl>
- <dt><code>-0</code></dt>
- <dd>
- <p>
- Tell <code>mmv</code> to expect its input to not be
- separated by newlines (‘<code>\n</code>’), but by NUL
- bytes (‘<code>\0</code>’). NUL bytes are the only
- characters not allowed in filenames besides forward
- slashes, so they are an obvious choice for an
- alternative separator.
- </p>
- </dd>
- <dt><code>-e</code></dt>
- <dd>
- <p>
- Encode newlines in filenames before passing them to the
- provided utility. Newline characters are replaced by the
- literal string ‘<code>\n</code>’ and backslashes by the
- literal string ‘<code>\\</code>’. After processing, the
- resulting output is decoded again.
- </p>
- <p>
- If combined with the <code>-0</code> flag, then while
- input will be read assuming a NUL-byte input-seperator,
- the encoded input files will be written to the spawned
- process newline-seperated.
- </p>
- </dd>
- </dl>
-
- <h3 id="0-flag">The Simple Case</h3>
-
- <p>
- In order to better understand these flags and how they work
- let’s go though another example. We have 2 files — one with and
- one without an embedded newline — and our goal is to simply
- reverse these filenames. In this example I am going to be
- displaying newlines in filenames with the “<code>$'\n'</code>”
- syntax as this is how my shell displays embedded newlines.
- </p>
-
- <p>
- We can start by just trying to naïvely pass these 2 files
- to <code>mmv</code> and use <code>rev</code> to reverse the
- names, but this doesn’t work:
- </p>
-
- <figure>
- <pre>m4_fmt_code(mmv-rev.sh.html)</pre>
- </figure>
-
- <p>
- The reason this doesn’t work is because due to the line-oriented
- nature of <code>ls</code> and <code>rev</code>, we are actually
- trying to rename the files <em>foo</em>, <em>bar</em>, and
- <em>baz</em> to the new filenames <em>zab</em>,
- <em>rab</em>, and <em>oof</em>. As can be seen in the following
- diagram, the embedded newline is causing our input to be ambiguous
- and <code>mmv</code> can’t reliably proceed
- anymore <x-ref>1</x-ref>:
- </p>
-
- <figure>
- <object data="conflict.svg" type="image/svg+xml"></object>
- </figure>
-
- <aside>
- <p data-ref="1">
- The reason you get a cryptic “file not found” error message
- is because <code>mmv</code> tries to assert that all the
- input files actually exist before doing anything. Since
- “foo” isn’t a real file, we error out.
- </p>
- </aside>
-
- <p>
- The first thing we need to do in order to proceed is to pass
- the <code>-0</code> flag to <code>mmv</code>. This will
- tell <code>mmv</code> that we want to use the NUL-byte as our
- input separator and not the newline. We also need <code>ls</code>
- to actually provide us with the filenames delimited by NUL-bytes.
- Luckily <abbr class="gnu">GNU</abbr> <code>ls</code> gives us the
- <code>--zero</code> flag to do just that:
- </p>
-
- <figure>
- <pre>m4_fmt_code(mmv-rev-zero.sh.html)</pre>
- </figure>
-
- <p>
- So we’re getting places, but we aren’t quite there yet. The
- issue we’re getting now is that <code>mmv</code> recieved 2
- input files from the standard input, but <code>rev</code>
- produced 3 output files. Why is that? Well let’s try our hand
- at a little bit of command-line debugging with <code>sed</code>:
- </p>
-
- <figure>
- <pre>m4_fmt_code(sed-debugging.sh.html)</pre>
- </figure>
-
- <p>
- If you aren’t quite sure what the above is doing, here’s a quick
- summary:
- </p>
-
- <ul>
- <li>
- The <code>-U</code> flag given to <code>ls</code> tells it
- not to sort our output. This is purely just to keep this
- example clear to the reader.
- </li>
- <li>
- The <code>-n</code> flag given to <code>sed</code> tells it
- not to print the input line automatically at the end of the
- provided script.
- </li>
- <li>
- The <code>l</code> command in <code>sed</code> prints the
- current input in a “visually unambiguous form”.
- </li>
- </ul>
-
- <p>
- In the <code>sed</code> output, we can see that <samp>$</samp>
- represents the end of a line, and <samp>\000</samp> represents
- the NUL-byte. All looks good here, we have two inputs seperated
- by NUL-bytes. Now let’s try to throw in <code>rev</code>:
- </p>
-
- <figure>
- <pre>m4_fmt_code(sed-debugging-rev.sh.html)</pre>
- </figure>
-
- <p>
- Well wouldn’t you know it? Since <code>rev</code> <em>also</em>
- works with newline-seperated input, it reversed out NUL-byte
- seperators and now gives us 3 outputs. Luckily the folks over
- at <em>util-linux</em> provided us with the <code>-0</code> flag
- here too, so that we can properly handle NUL-delimited input.
- Combining all of this together we get a final working product:
- </p>
-
- <figure>
- <pre>m4_fmt_code(reverse-embedded-newline.sh.html)</pre>
- </figure>
-
- <h3 id="e-flag">Encoding Newlines</h3>
-
- <p>
- Sometimes we want to rename a bunch of files, but the command we
- want to use doesn’t support NUL-bytes as nicely as we would
- like. In these cases, you may want to consider encoding your
- newline characters into the literal string ‘<code>\n</code>’ and
- then passing your input newline-seperated to your given command
- with the <code>-e</code> flag.
- </p>
-
- <p>
- For a real-world example, perhaps you want to edit some
- filenames in vim, or whatever other editor you use. Well we can
- do this incredibly easily with the <code>vipe</code> utility
- from
- the <a href="https://joeyh.name/code/moreutils/">moreutils</a>
- collection. The <code>vipe</code> command simply reads input
- from the standard input, opens it up in your editor, and then
- prints the resulting output to the standard output; perfect
- for <code>mmv</code>! We do not really want to deal with
- NUL-bytes in our text-editor though, so let’s just encode our
- newlines:
- </p>
-
- <figure>
- <pre>m4_fmt_code(vipe.sh.html)</pre>
- </figure>
-
- <aside>
- <p>
- Notice how you still need to pass the <code>-0</code> flag
- to <code>mmv</code> know that our inputfiles may have
- embedded newlines.
- </p>
- </aside>
-
- <p>
- When running the above code example, you will see the following
- in your editor:
- </p>
-
- <figure>
- <pre>m4_fmt_code(vim.html)</pre>
- </figure>
-
- <p>
- After you exit your editor, <code>mmv</code> will decode all
- occurances of ‘<code>\n</code>’ back into a newline, and all
- occurances of ‘<code>\\</code>’ back into a backslash:
- </p>
-
- <figure>
- <object data="e-flag.svg" type="image/svg+xml"></object>
- </figure>
-
- <h2 id="i-flag">Individual Execution</h2>
- <p>
- The previous examples are great and all, but what do you do if
- your mapping command doesn’t have the concept of an input
- seperator at all? This is where the <code>-i</code> flag comes
- into play. With the <code>-i</code> flag we can
- get <code>mmv</code> to execute our mapping command for every
- input filename. This means that as long as we can work with a
- complete buffer, we don’t need to worry about seperators.
- </p>
-
- <p>
- To be honest, I cannot really think of any situation where you
- might actually need to do this. If you can think of one,
- please <a href="mailto:mail@thomasvoss.com">email me</a> and
- I’ll update the example on this page. Regardless, let’s imagine
- that we wanted to rename some files so that their filenames are
- replaced with their filename
- <a href="https://en.wikipedia.org/wiki/SHA-1" target="_blank">
- SHA-1 hash</a>.
- On Linux we have the <code>sha1sum</code> program which reads
- input from the standard input and outputs the SHA-1 hash. This
- is how we would use it with <code>mmv</code>:
- </p>
-
- <figure>
- <pre>m4_fmt_code(sha1sum-long-example.sh.html)</pre>
- </figure>
-
- <p>
- Another approach is to invoke <code>mmv</code> twice:
- </p>
-
- <figure>
- <pre>m4_fmt_code(sha1sum-short-example.sh.html)</pre>
- </figure>
-
- <p>
- If you are confused about why we need to make a call
- to <code>awk</code>, it’s because the <code>sha1sum</code>
- program outputs 2 columns of data. The first column is our hash
- and the second column is the filename where the to-be-hashed
- data was read from. We don’t want the second column.
- </p>
-
- <p>
- Unlike in previous examples where one process was spawned to map
- all our filenames, with the <code>-i</code> flag we are spawning
- a new instance for each filename. If you struggle to visualize
- this, perhaps the following diagrams help:
- </p>
-
- <figure>
- <figcaption>Invoking <code>mmv</code> without <code>-i</code></figcaption>
- <object data="without-i-flag.svg" type="image/svg+xml"></object>
- </figure>
-
- <figure>
- <figcaption>Invoking <code>mmv</code> with <code>-i</code></figcaption>
- <object data="with-i-flag.svg" type="image/svg+xml"></object>
- </figure>
-
- <h2 id="safety">Safety</h2>
- <p>
- When compared to the standard <code>for f in *; do mv $f …;
- done</code> or <code>ls | … | xargs -L2 mv</code>
- constructs, <code>mmv</code> is significantly more safe to use.
- These are some of the safety features that are built into the
- tool:
- </p>
-
- <ol>
- <li>
- If the number of input- and output files differs, execution
- is aborted before making any changes.
- </li>
- <li>
- If an input file is renamed to the name of another input
- file, the second input file is not lost (i.e. you can rename
- <em>a</em> to <em>b</em> and <em>b</em> to <em>a</em> with
- no problem).
- </li>
- <li>
- All input files must be unique and all output files must be
- unique. Otherwise execution is aborted before making any
- changes.
- </li>
- <li>
- In the case that something goes wrong during execution
- (perhaps you tried to move a file to a non-existant
- directory, or a syscall failed), a backup of your input
- files is saved automatically by <code>mmv</code> for
- recovery.
- </li>
- </ol>
-
- <p>
- Due to the way <code>mmv</code> handles #2, when things do go
- wrong you may find that all of your input files have
- disappeared. Don’t worry though, <code>mmv</code> takes a
- backup of your code before doing anything. If you
- run <code>mmv</code> with the <code>-v</code> option for verbose
- output, you’ll notice it backing up your stuff in
- the <code>$XDG_CACHE_DIR</code> directory:
- </p>
-
- <figure>
- <pre>m4_fmt_code(mmv-verbose.sh.html)</pre>
- </figure>
-
- <p>
- Upon successful execution
- the <code>$XDG_CACHE_DIR/mmv/TIMESTAMP</code> directory will be
- automatically removed, but it remains when things go wrong so
- that you can recover any missing data. The names of the
- backup-subdirectories in the <code>$XDG_CACHE_DIR/mmv</code>
- directory are timestamps of when the directories were created.
- This should make it easier for you to figure out which directory
- you need to recover if you happen to have multiple of these.
- </p>
-
- <h2 id="examples">Examples</h2>
-
- <aside>
- <p>
- All of these examples are ripped straight from
- the <code>mmv(1)</code> manual page. If you
- installed <code>mmv</code> through a package manager or
- via <code>make install</code> then you should have the
- manual installed on your system.
- </p>
- </aside>
-
- <p>Swap the files <em>foo</em> and <em>bar</em>:</p>
- <figure>
- <pre>m4_fmt_code(examples/swap.sh.html)</pre>
- </figure>
-
- <p>
- Rename all files in the current directory to use hyphens (‘-’)
- instead of spaces:
- </p>
- <figure>
- <pre>m4_fmt_code(examples/hyphens.sh.html)</pre>
- </figure>
-
- <p>
- Rename a given list of movies to use lowercase letters and
- hyphens instead of uppercase letters and spaces, and number them
- so that they’re properly ordered in globs (e.g. rename <em>The
- Return of the King.mp4</em> to
- <em>02-the-return-of-the-king.mp4</em>):
- </p>
- <figure>
- <pre>m4_fmt_code(examples/number.sh.html)</pre>
- </figure>
-
- <p>
- Rename files interactively in your editor while encoding newline
- into the literal string ‘<code>\n</code>’, making use
- of <code><a href="https://linux.die.net/man/1/vipe"
- target="_blank">vipe(1)</a></code> from <em>moreutils</em>:
- </p>
- <figure>
- <pre>m4_fmt_code(examples/vipe.sh.html)</pre>
- </figure>
-
- <p>
- Rename all C source code- and header files in a git repository
- to use snake_case instead of camelCase using
- the <abbr class="gnu">GNU</abbr>
- <code><a href="https://www.man7.org/linux/man-pages/man1/sed.1.html"
- target="_blank">sed(1)</a></code> ‘<code>\n</code>’ extension:
- </p>
- <figure>
- <pre>m4_fmt_code(examples/camel-to-snake.sh.html)</pre>
- </figure>
-
- <p>
- Lowercase all filenames within a directory hierarchy which may
- contain newline characters:
- </p>
- <figure>
- <pre>m4_fmt_code(examples/lowercase.sh.html)</pre>
- </figure>
-
- <p>
- Map filenames which may contain newlines in the current
- directory with the command ‘<code>cmd</code>’, which itself does
- not support nul-byte separated entries. This only works
- assuming your mapping doesn’t require any context outside of the
- given input filename (for example, you would not be able to
- number your files as this requires knowledge of the input files
- position in the input list):
- </p>
- <figure>
- <pre>m4_fmt_code(examples/i-flag.sh.html)</pre>
- </figure>
- </main>
-
- <hr>
-
- <footer>
- m4_footer
- </footer>
- </body>
-</html>