diff options
Diffstat (limited to 'src/prj/mmv/index.html')
-rw-r--r-- | src/prj/mmv/index.html | 667 |
1 files changed, 0 insertions, 667 deletions
diff --git a/src/prj/mmv/index.html b/src/prj/mmv/index.html deleted file mode 100644 index 09aadb1..0000000 --- a/src/prj/mmv/index.html +++ /dev/null @@ -1,667 +0,0 @@ -<!DOCTYPE html> -<html lang="en"> - <head> - m4_include(head.html) - </head> - <body> - <header> - <div> - <h1>Moving Files the Right Way</h1> - m4_include(nav.html) - </div> - - <figure class="quote"> - <blockquote> - <p>I think the OpenBSD crowd is a bunch of masturbating - monkeys, in that they make such a big deal about - concentrating on security to the point where they pretty much - admit that nothing else matters to them.</p> - </blockquote> - <figcaption> - Linux Torvalds - </figcaption> - </figure> - </header> - - <main> - <p> - <em> - You can find the <code>mmv</code> git repository over at - <a href="https://git.sr.ht/~mango/mmv" target="_blank">sourcehut</a> - or <a href="https://github.com/Mango0x45/mmv" - target="_blank">GitHub</a>. - </em> - </p> - - <p> - NOTE: As of the - <a href="https://git.sr.ht/~mango/mmv/refs/v1.2.0">v1.2.0</a> release - there is now also the <code>mcp</code> utility. It behaves the same as - the <code>mmv</code> utility but it copies files instead of moving them. - It also doesn’t support the ‘<code>-n</code>’ flag as it doesn’t need to - deal with backups. - </p> - - <h2>Table of Contents</h2> - - <ul> - <li><a href="#prologue">Prologue</a></li> - <li><a href="#moving">Advanced Moving and Pitfalls</a></li> - <li><a href="#mapping">Name Mapping with <code>mmv</code></a></li> - <li><a href="#newlines">Filenames with Embedded Newlines</a></li> - <ul> - <li><a href="#0-flag">The Simple Case</a></li> - <li><a href="#e-flag">Encoding Newlines</a></li> - </ul> - <li><a href="#i-flag">Individual Execution</a></li> - <li><a href="#safety">Safety</a></li> - <li><a href="#examples">Examples</a></li> - </ul> - - <h2 id="prologue">Prologue</h2> - <p> - File moving and renaming is one of the most common tasks we - undertake on the command-line. We basically always do this with - the <code>mv</code> utility, and it gets the job done most of the - time. Want to rename one file? Use <code>mv</code>! Want to - move a bunch of files into a directory? Use <code>mv</code>! - How could mv ever go wrong? Well I’m glad you asked! - </p> - - <h2 id="moving">Advanced Moving and Pitfalls</h2> - <p> - Let’s start off nice and simple. You just inherited a C project - that uses the sacrilegious - <a - href="https://en.wikipedia.org/wiki/Camel_case" - target="_blank" - >camelCase</a> - naming convention for its files: - </p> - - <figure> - <pre>m4_fmt_code(ls-files.sh.html)</pre> - </figure> - - <p> - This deeply upsets you, as it upsets me. So you decide you want - to switch all these files to use - <a - href="https://en.wikipedia.org/wiki/Snake_case" - target="_blank" - >snake_case</a>, - like a normal person. Well how would you do this? You use - <code>mv</code>! This is what you might end up doing: - </p> - - <figure> - <pre>m4_fmt_code(manual-mv.sh.html)</pre> - </figure> - - <p> - Well… it works I guess, but it’s a pretty shitty way of renaming - these files. Luckily we only had 5, but what if this was a much - larger project with many more files to rename? Things would get - tedious. So instead we can use a pipeline for - this: - </p> - - <figure> - <pre>m4_fmt_code(camel-to-snake-naïve.sh.html)</pre> - </figure> - - <aside> - <p> - The given example assumes your <code>sed</code> - implementation supports ‘<code>\L</code>’ which is a - non-standard <abbr class="gnu">GNU</abbr> extension. - </p> - </aside> - - <p> - That works and it gets the job done, but it’s not really ideal is - it? There are a couple of issues with this. - </p> - - <ol> - <li> - <p> - You’re writing more complicated code. This has the - obvious drawback of potentially being more error-prone, - but also risks taking more time to write than you’d like - as you might have forgotten if <code>xargs</code> - actually has an ‘<code>-L</code>’ option or not (which - would require reading the - <a href="https://www.man7.org/linux/man-pages/man1/xargs.1.html" - target="_blank" ><code>xargs(1)</code></a> manual). - </p> - </li> - <li> - <p> - If you try to rename the file <em>foo</em> - to <em>bar</em> but <em>bar</em> already exists, you end - up deleting a file you may not have wanted to. - </p> - </li> - <li> - <p> - In a similar vein to the previous point, you need to be - very careful about schemes like renaming the - file <em>a</em> to <em>b</em> and <em>b</em> - to <em>c</em>. You run the risk of turning <em>a</em> - into <em>c</em> and losing the file <em>b</em> entirely. - </p> - </li> - <li> - <p> - Moving symbolic links is its own whole can of worms. If - a symlink points to a relative location then you need to - make sure you keep pointing to the right place. If the - symlink is absolute however then you can leave it - untouched. But what if the symlink points to a file - that you’re moving as part of your batch move operation? - Now you need to handle that too. - </p> - </li> - </ol> - - <h2 id="mapping">Name Mapping with <code>mmv</code></h2> - - <p> - What is <code>mmv</code>? It’s the solution to all your - problems, that’s what it is! <code>mmv</code> takes as its - argument(s) a utility and that utilities arguments and uses that - to create a mapping between old and new filenames — similar to - the <code>map()</code> function found in many programming - languages. I think to best convey how the tool functions, I - should provide an example. Let’s try to do the same thing we did - previously where we tried to turn camelCase files to snake_case, - but using <code>mmv</code>: - </p> - - <figure> - <pre>m4_fmt_code(camel-to-snake-smart.sh.html)</pre> - </figure> - - <p>Let me break down how this works.</p> - - <p> - <code>mmv</code> starts by reading a series of filenames - separated by newlines from the standard input. Yes, sometimes - filenames have newlines in them and yes there is a way to handle - them but I shall get to that later. The filenames that - <code>mmv</code> reads from the standard input will be referred - to as the <em>input files</em>. Once all the input files have - been read, the utility specified by the arguments is spawned; in - this case that would be <code>sed</code> with the argument - <code>'s/[A-Z]/\L_&/g'</code>. The input files are then piped - into <code>sed</code> the exact same way that they would have - been if we ran the above commands without <code>mmv</code>, and - the output of <code>sed</code> then forms what will be referred - to as the <em>output files</em>. Once a complete list of output - files is accumulated, each input file gets renamed to its - corresponding output file. - </p> - - <p> - Let’s look at a simpler example. Say we want to rename 2 files - in the current directory to use lowercase letters, we could use - the following command: - </p> - - <figure> - <pre>m4_fmt_code(mmv-tr.sh.html)</pre> - </figure> - - <p> - In the above example <code>mmv</code> reads 2 lines from - standard input, those being <em>LICENSE</em> - and <em>README</em>. Those are our 2 input files now. - The <code>tr</code> utility is then spawned and the input files - are piped into it. We can simulate this in the shell: - </p> - - <figure> - <pre>m4_fmt_code(tr.sh.html)</pre> - </figure> - - <p> - As you can see above, <code>tr</code> has produced 2 lines of - output; these are our 2 output files. Since we now have our 2 - input files and 2 output files, <code>mmv</code> can go ahead - and rename the files. In this case it will rename - <em>LICENSE</em> to <em>license</em> and - <em>README</em> to <em>readme</em>. For some examples, check - the <a href="#examples">examples</a> section of this page down - below. - </p> - - <h2 id="newlines">Filenames with Embedded Newlines</h2> - - <p> - People are retarded, and as a result we have filenames with - newlines in them. All it would have taken to solve this issue - for everyone was for literally <strong>anybody</strong> during - the early UNIX days to go “<em>hey, this is a bad idea!</em>”, - but alas, we must deal with this. Newlines are of course not - the only special characters filenames can contain, but they are - the single most infuriating to deal with; the UNIX utilities all - being line-oriented really doesn’t work well with these files. - </p> - - <p> - So how does <code>mmv</code> deal with special characters, and - newlines in particular? Well it does so by providing the user - with the <code>-0</code> and <code>-e</code> flags: - </p> - - <dl> - <dt><code>-0</code></dt> - <dd> - <p> - Tell <code>mmv</code> to expect its input to not be - separated by newlines (‘<code>\n</code>’), but by NUL - bytes (‘<code>\0</code>’). NUL bytes are the only - characters not allowed in filenames besides forward - slashes, so they are an obvious choice for an - alternative separator. - </p> - </dd> - <dt><code>-e</code></dt> - <dd> - <p> - Encode newlines in filenames before passing them to the - provided utility. Newline characters are replaced by the - literal string ‘<code>\n</code>’ and backslashes by the - literal string ‘<code>\\</code>’. After processing, the - resulting output is decoded again. - </p> - <p> - If combined with the <code>-0</code> flag, then while - input will be read assuming a NUL-byte input-seperator, - the encoded input files will be written to the spawned - process newline-seperated. - </p> - </dd> - </dl> - - <h3 id="0-flag">The Simple Case</h3> - - <p> - In order to better understand these flags and how they work - let’s go though another example. We have 2 files — one with and - one without an embedded newline — and our goal is to simply - reverse these filenames. In this example I am going to be - displaying newlines in filenames with the “<code>$'\n'</code>” - syntax as this is how my shell displays embedded newlines. - </p> - - <p> - We can start by just trying to naïvely pass these 2 files - to <code>mmv</code> and use <code>rev</code> to reverse the - names, but this doesn’t work: - </p> - - <figure> - <pre>m4_fmt_code(mmv-rev.sh.html)</pre> - </figure> - - <p> - The reason this doesn’t work is because due to the line-oriented - nature of <code>ls</code> and <code>rev</code>, we are actually - trying to rename the files <em>foo</em>, <em>bar</em>, and - <em>baz</em> to the new filenames <em>zab</em>, - <em>rab</em>, and <em>oof</em>. As can be seen in the following - diagram, the embedded newline is causing our input to be ambiguous - and <code>mmv</code> can’t reliably proceed - anymore <x-ref>1</x-ref>: - </p> - - <figure> - <object data="conflict.svg" type="image/svg+xml"></object> - </figure> - - <aside> - <p data-ref="1"> - The reason you get a cryptic “file not found” error message - is because <code>mmv</code> tries to assert that all the - input files actually exist before doing anything. Since - “foo” isn’t a real file, we error out. - </p> - </aside> - - <p> - The first thing we need to do in order to proceed is to pass - the <code>-0</code> flag to <code>mmv</code>. This will - tell <code>mmv</code> that we want to use the NUL-byte as our - input separator and not the newline. We also need <code>ls</code> - to actually provide us with the filenames delimited by NUL-bytes. - Luckily <abbr class="gnu">GNU</abbr> <code>ls</code> gives us the - <code>--zero</code> flag to do just that: - </p> - - <figure> - <pre>m4_fmt_code(mmv-rev-zero.sh.html)</pre> - </figure> - - <p> - So we’re getting places, but we aren’t quite there yet. The - issue we’re getting now is that <code>mmv</code> recieved 2 - input files from the standard input, but <code>rev</code> - produced 3 output files. Why is that? Well let’s try our hand - at a little bit of command-line debugging with <code>sed</code>: - </p> - - <figure> - <pre>m4_fmt_code(sed-debugging.sh.html)</pre> - </figure> - - <p> - If you aren’t quite sure what the above is doing, here’s a quick - summary: - </p> - - <ul> - <li> - The <code>-U</code> flag given to <code>ls</code> tells it - not to sort our output. This is purely just to keep this - example clear to the reader. - </li> - <li> - The <code>-n</code> flag given to <code>sed</code> tells it - not to print the input line automatically at the end of the - provided script. - </li> - <li> - The <code>l</code> command in <code>sed</code> prints the - current input in a “visually unambiguous form”. - </li> - </ul> - - <p> - In the <code>sed</code> output, we can see that <samp>$</samp> - represents the end of a line, and <samp>\000</samp> represents - the NUL-byte. All looks good here, we have two inputs seperated - by NUL-bytes. Now let’s try to throw in <code>rev</code>: - </p> - - <figure> - <pre>m4_fmt_code(sed-debugging-rev.sh.html)</pre> - </figure> - - <p> - Well wouldn’t you know it? Since <code>rev</code> <em>also</em> - works with newline-seperated input, it reversed out NUL-byte - seperators and now gives us 3 outputs. Luckily the folks over - at <em>util-linux</em> provided us with the <code>-0</code> flag - here too, so that we can properly handle NUL-delimited input. - Combining all of this together we get a final working product: - </p> - - <figure> - <pre>m4_fmt_code(reverse-embedded-newline.sh.html)</pre> - </figure> - - <h3 id="e-flag">Encoding Newlines</h3> - - <p> - Sometimes we want to rename a bunch of files, but the command we - want to use doesn’t support NUL-bytes as nicely as we would - like. In these cases, you may want to consider encoding your - newline characters into the literal string ‘<code>\n</code>’ and - then passing your input newline-seperated to your given command - with the <code>-e</code> flag. - </p> - - <p> - For a real-world example, perhaps you want to edit some - filenames in vim, or whatever other editor you use. Well we can - do this incredibly easily with the <code>vipe</code> utility - from - the <a href="https://joeyh.name/code/moreutils/">moreutils</a> - collection. The <code>vipe</code> command simply reads input - from the standard input, opens it up in your editor, and then - prints the resulting output to the standard output; perfect - for <code>mmv</code>! We do not really want to deal with - NUL-bytes in our text-editor though, so let’s just encode our - newlines: - </p> - - <figure> - <pre>m4_fmt_code(vipe.sh.html)</pre> - </figure> - - <aside> - <p> - Notice how you still need to pass the <code>-0</code> flag - to <code>mmv</code> know that our inputfiles may have - embedded newlines. - </p> - </aside> - - <p> - When running the above code example, you will see the following - in your editor: - </p> - - <figure> - <pre>m4_fmt_code(vim.html)</pre> - </figure> - - <p> - After you exit your editor, <code>mmv</code> will decode all - occurances of ‘<code>\n</code>’ back into a newline, and all - occurances of ‘<code>\\</code>’ back into a backslash: - </p> - - <figure> - <object data="e-flag.svg" type="image/svg+xml"></object> - </figure> - - <h2 id="i-flag">Individual Execution</h2> - <p> - The previous examples are great and all, but what do you do if - your mapping command doesn’t have the concept of an input - seperator at all? This is where the <code>-i</code> flag comes - into play. With the <code>-i</code> flag we can - get <code>mmv</code> to execute our mapping command for every - input filename. This means that as long as we can work with a - complete buffer, we don’t need to worry about seperators. - </p> - - <p> - To be honest, I cannot really think of any situation where you - might actually need to do this. If you can think of one, - please <a href="mailto:mail@thomasvoss.com">email me</a> and - I’ll update the example on this page. Regardless, let’s imagine - that we wanted to rename some files so that their filenames are - replaced with their filename - <a href="https://en.wikipedia.org/wiki/SHA-1" target="_blank"> - SHA-1 hash</a>. - On Linux we have the <code>sha1sum</code> program which reads - input from the standard input and outputs the SHA-1 hash. This - is how we would use it with <code>mmv</code>: - </p> - - <figure> - <pre>m4_fmt_code(sha1sum-long-example.sh.html)</pre> - </figure> - - <p> - Another approach is to invoke <code>mmv</code> twice: - </p> - - <figure> - <pre>m4_fmt_code(sha1sum-short-example.sh.html)</pre> - </figure> - - <p> - If you are confused about why we need to make a call - to <code>awk</code>, it’s because the <code>sha1sum</code> - program outputs 2 columns of data. The first column is our hash - and the second column is the filename where the to-be-hashed - data was read from. We don’t want the second column. - </p> - - <p> - Unlike in previous examples where one process was spawned to map - all our filenames, with the <code>-i</code> flag we are spawning - a new instance for each filename. If you struggle to visualize - this, perhaps the following diagrams help: - </p> - - <figure> - <figcaption>Invoking <code>mmv</code> without <code>-i</code></figcaption> - <object data="without-i-flag.svg" type="image/svg+xml"></object> - </figure> - - <figure> - <figcaption>Invoking <code>mmv</code> with <code>-i</code></figcaption> - <object data="with-i-flag.svg" type="image/svg+xml"></object> - </figure> - - <h2 id="safety">Safety</h2> - <p> - When compared to the standard <code>for f in *; do mv $f …; - done</code> or <code>ls | … | xargs -L2 mv</code> - constructs, <code>mmv</code> is significantly more safe to use. - These are some of the safety features that are built into the - tool: - </p> - - <ol> - <li> - If the number of input- and output files differs, execution - is aborted before making any changes. - </li> - <li> - If an input file is renamed to the name of another input - file, the second input file is not lost (i.e. you can rename - <em>a</em> to <em>b</em> and <em>b</em> to <em>a</em> with - no problem). - </li> - <li> - All input files must be unique and all output files must be - unique. Otherwise execution is aborted before making any - changes. - </li> - <li> - In the case that something goes wrong during execution - (perhaps you tried to move a file to a non-existant - directory, or a syscall failed), a backup of your input - files is saved automatically by <code>mmv</code> for - recovery. - </li> - </ol> - - <p> - Due to the way <code>mmv</code> handles #2, when things do go - wrong you may find that all of your input files have - disappeared. Don’t worry though, <code>mmv</code> takes a - backup of your code before doing anything. If you - run <code>mmv</code> with the <code>-v</code> option for verbose - output, you’ll notice it backing up your stuff in - the <code>$XDG_CACHE_DIR</code> directory: - </p> - - <figure> - <pre>m4_fmt_code(mmv-verbose.sh.html)</pre> - </figure> - - <p> - Upon successful execution - the <code>$XDG_CACHE_DIR/mmv/TIMESTAMP</code> directory will be - automatically removed, but it remains when things go wrong so - that you can recover any missing data. The names of the - backup-subdirectories in the <code>$XDG_CACHE_DIR/mmv</code> - directory are timestamps of when the directories were created. - This should make it easier for you to figure out which directory - you need to recover if you happen to have multiple of these. - </p> - - <h2 id="examples">Examples</h2> - - <aside> - <p> - All of these examples are ripped straight from - the <code>mmv(1)</code> manual page. If you - installed <code>mmv</code> through a package manager or - via <code>make install</code> then you should have the - manual installed on your system. - </p> - </aside> - - <p>Swap the files <em>foo</em> and <em>bar</em>:</p> - <figure> - <pre>m4_fmt_code(examples/swap.sh.html)</pre> - </figure> - - <p> - Rename all files in the current directory to use hyphens (‘-’) - instead of spaces: - </p> - <figure> - <pre>m4_fmt_code(examples/hyphens.sh.html)</pre> - </figure> - - <p> - Rename a given list of movies to use lowercase letters and - hyphens instead of uppercase letters and spaces, and number them - so that they’re properly ordered in globs (e.g. rename <em>The - Return of the King.mp4</em> to - <em>02-the-return-of-the-king.mp4</em>): - </p> - <figure> - <pre>m4_fmt_code(examples/number.sh.html)</pre> - </figure> - - <p> - Rename files interactively in your editor while encoding newline - into the literal string ‘<code>\n</code>’, making use - of <code><a href="https://linux.die.net/man/1/vipe" - target="_blank">vipe(1)</a></code> from <em>moreutils</em>: - </p> - <figure> - <pre>m4_fmt_code(examples/vipe.sh.html)</pre> - </figure> - - <p> - Rename all C source code- and header files in a git repository - to use snake_case instead of camelCase using - the <abbr class="gnu">GNU</abbr> - <code><a href="https://www.man7.org/linux/man-pages/man1/sed.1.html" - target="_blank">sed(1)</a></code> ‘<code>\n</code>’ extension: - </p> - <figure> - <pre>m4_fmt_code(examples/camel-to-snake.sh.html)</pre> - </figure> - - <p> - Lowercase all filenames within a directory hierarchy which may - contain newline characters: - </p> - <figure> - <pre>m4_fmt_code(examples/lowercase.sh.html)</pre> - </figure> - - <p> - Map filenames which may contain newlines in the current - directory with the command ‘<code>cmd</code>’, which itself does - not support nul-byte separated entries. This only works - assuming your mapping doesn’t require any context outside of the - given input filename (for example, you would not be able to - number your files as this requires knowledge of the input files - position in the input list): - </p> - <figure> - <pre>m4_fmt_code(examples/i-flag.sh.html)</pre> - </figure> - </main> - - <hr> - - <footer> - m4_footer - </footer> - </body> -</html> |