From d5635e946e9df6f519ec8cf08cebfc35dbe6c788 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Tue, 15 Aug 2023 14:57:32 +0200 Subject: Add a post on ‘mmv’ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- src/prj/mmv/index.html | 658 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 658 insertions(+) create mode 100644 src/prj/mmv/index.html (limited to 'src/prj/mmv/index.html') diff --git a/src/prj/mmv/index.html b/src/prj/mmv/index.html new file mode 100644 index 0000000..d13f7c8 --- /dev/null +++ b/src/prj/mmv/index.html @@ -0,0 +1,658 @@ + + + + m4_include(head.html) + + +
+
+

Moving Files the Right Way

+ m4_include(nav.html) +
+ +
+
+

I think the OpenBSD crowd is a bunch of masturbating + monkeys, in that they make such a big deal about + concentrating on security to the point where they pretty much + admit that nothing else matters to them.

+
+
+ Linux Torvalds +
+
+
+ +
+

+ + You can find the mmv git repository over at + sourcehut + or GitHub. + +

+ +

Table of Contents

+ + + +

Prologue

+

+ File moving and renaming is one of the most common tasks we + undertake on the command-line. We basically always do this with + the mv utility, and it gets the job done most of the + time. Want to rename one file? Use mv! Want to + move a bunch of files into a directory? Use mv! + How could mv ever go wrong? Well I’m glad you asked! +

+ +

Advanced Moving and Pitfalls

+

+ Let’s start off nice and simple. You just inherited a C project + that uses the sacrilegious + camelCase + naming convention for its files: +

+ +
+
m4_fmt_code(ls-files.sh.html)
+
+ +

+ This deeply upsets you, as it upsets me. So you decide you want + to switch all these files to use + snake_case, + like a normal person. Well how would you do this? You use + mv! This is what you might end up doing: +

+ +
+
m4_fmt_code(manual-mv.sh.html)
+
+ +

+ Well… it works I guess, but it’s a pretty shitty way of renaming + these files. Luckily we only had 5, but what if this was a much + larger project with many more files to rename? Things would get + tedious. So instead we can use a pipeline for + this: +

+ +
+
m4_fmt_code(camel-to-snake-naïve.sh.html)
+
+ + + +

+ That works and it gets the job done, but it’s not really ideal is + it? There are a couple of issues with this. +

+ +
    +
  1. +

    + You’re writing more complicated code. This has the + obvious drawback of potentially being more error-prone, + but also risks taking more time to write than you’d like + as you might have forgotten if xargs + actually has an ‘-L’ option or not (which + would require reading the + xargs(1) manual). +

    +
  2. +
  3. +

    + If you try to rename the file foo + to bar but bar already exists, you end + up deleting a file you may not have wanted to. +

    +
  4. +
  5. +

    + In a similar vein to the previous point, you need to be + very careful about schemes like renaming the + file a to b and b + to c. You run the risk of turning a + into c and losing the file b entirely. +

    +
  6. +
  7. +

    + Moving symbolic links is its own whole can of worms. If + a symlink points to a relative location then you need to + make sure you keep pointing to the right place. If the + symlink is absolute however then you can leave it + untouched. But what if the symlink points to a file + that you’re moving as part of your batch move operation? + Now you need to handle that too. +

    +
  8. +
+ +

Name Mapping with mmv

+ +

+ What is mmv? It’s the solution to all your + problems, that’s what it is! mmv takes as its + argument(s) a utility and that utilities arguments and uses that + to create a mapping between old and new filenames — similar to + the map() function found in many programming + languages. I think to best convey how the tool functions, I + should provide an example. Let’s try to do the same thing we did + previously where we tried to turn camelCase files to snake_case, + but using mmv: +

+ +
+
m4_fmt_code(camel-to-snake-smart.sh.html)
+
+ +

Let me break down how this works.

+ +

+ mmv starts by reading a series of filenames + separated by newlines from the standard input. Yes, sometimes + filenames have newlines in them and yes there is a way to handle + them but I shall get to that later. The filenames that + mmv reads from the standard input will be referred + to as the input files. Once all the input files have + been read, the utility specified by the arguments is spawned; in + this case that would be sed with the argument + 's/[A-Z]/\L_&/g'. The input files are then piped + into sed the exact same way that they would have + been if we ran the above commands without mmv, and + the output of sed then forms what will be referred + to as the output files. Once a complete list of output + files is accumulated, each input file gets renamed to its + corresponding output file. +

+ +

+ Let’s look at a simpler example. Say we want to rename 2 files + in the current directory to use lowercase letters, we could use + the following command: +

+ +
+
m4_fmt_code(mmv-tr.sh.html)
+
+ +

+ In the above example mmv reads 2 lines from + standard input, those being LICENSE + and README. Those are our 2 input files now. + The tr utility is then spawned and the input files + are piped into it. We can simulate this in the shell: +

+ +
+
m4_fmt_code(tr.sh.html)
+
+ +

+ As you can see above, tr has produced 2 lines of + output; these are our 2 output files. Since we now have our 2 + input files and 2 output files, mmv can go ahead + and rename the files. In this case it will rename + LICENSE to license and + README to readme. For some examples, check + the examples section of this page down + below. +

+ +

Filenames with Embedded Newlines

+ +

+ People are retarded, and as a result we have filenames with + newlines in them. All it would have taken to solve this issue + for everyone was for literally anybody during + the early UNIX days to go “hey, this is a bad idea!”, + but alas, we must deal with this. Newlines are of course not + the only special characters filenames can contain, but they are + the single most infuriating to deal with; the UNIX utilities all + being line-oriented really doesn’t work well with these files. +

+ +

+ So how does mmv deal with special characters, and + newlines in particular? Well it does so by providing the user + with the -0 and -e flags: +

+ +
+
-0
+
+

+ Tell mmv to expect its input to not be + separated by newlines (‘\n’), but by NUL + bytes (‘\0’). NUL bytes are the only + characters not allowed in filenames besides forward + slashes, so they are an obvious choice for an + alternative separator. +

+
+
-e
+
+

+ Encode newlines in filenames before passing them to the + provided utility. Newline characters are replaced by the + literal string ‘\n’ and backslashes by the + literal string ‘\\’. After processing, the + resulting output is decoded again. +

+

+ If combined with the -0 flag, then while + input will be read assuming a NUL-byte input-seperator, + the encoded input files will be written to the spawned + process newline-seperated. +

+
+
+ +

The Simple Case

+ +

+ In order to better understand these flags and how they work + let’s go though another example. We have 2 files — one with and + one without an embedded newline — and our goal is to simply + reverse these filenames. In this example I am going to be + displaying newlines in filenames with the “$'\n'” + syntax as this is how my shell displays embedded newlines. +

+ +

+ We can start by just trying to naïvely pass these 2 files + to mmv and use rev to reverse the + names, but this doesn’t work: +

+ +
+
m4_fmt_code(mmv-rev.sh.html)
+
+ +

+ The reason this doesn’t work is because due to the line-oriented + nature of ls and rev, we are actually + trying to rename the files foo, bar, and + baz to the new filenames zab, + rab, and oof. As can be seen in the following + diagram, the embedded newline is causing our input to be ambiguous + and mmv can’t reliably proceed + anymore 1: +

+ +
+ +
+ + + +

+ The first thing we need to do in order to proceed is to pass + the -0 flag to mmv. This will + tell mmv that we want to use the NUL-byte as our + input separator and not the newline. We also need ls + to actually provide us with the filenames delimited by NUL-bytes. + Luckily GNU ls gives us the + --zero flag to do just that: +

+ +
+
m4_fmt_code(mmv-rev-zero.sh.html)
+
+ +

+ So we’re getting places, but we aren’t quite there yet. The + issue we’re getting now is that mmv recieved 2 + input files from the standard input, but rev + produced 3 output files. Why is that? Well let’s try our hand + at a little bit of command-line debugging with sed: +

+ +
+
m4_fmt_code(sed-debugging.sh.html)
+
+ +

+ If you aren’t quite sure what the above is doing, here’s a quick + summary: +

+ + + +

+ In the sed output, we can see that $ + represents the end of a line, and \000 represents + the NUL-byte. All looks good here, we have two inputs seperated + by NUL-bytes. Now let’s try to throw in rev: +

+ +
+
m4_fmt_code(sed-debugging-rev.sh.html)
+
+ +

+ Well wouldn’t you know it? Since rev also + works with newline-seperated input, it reversed out NUL-byte + seperators and now gives us 3 outputs. Luckily the folks over + at util-linux provided us with the -0 flag + here too, so that we can properly handle NUL-delimited input. + Combining all of this together we get a final working product: +

+ +
+
m4_fmt_code(reverse-embedded-newline.sh.html)
+
+ +

Encoding Newlines

+ +

+ Sometimes we want to rename a bunch of files, but the command we + want to use doesn’t support NUL-bytes as nicely as we would + like. In these cases, you may want to consider encoding your + newline characters into the literal string ‘\n’ and + then passing your input newline-seperated to your given command + with the -e flag. +

+ +

+ For a real-world example, perhaps you want to edit some + filenames in vim, or whatever other editor you use. Well we can + do this incredibly easily with the vipe utility + from + the moreutils + collection. The vipe command simply reads input + from the standard input, opens it up in your editor, and then + prints the resulting output to the standard output; perfect + for mmv! We do not really want to deal with + NUL-bytes in our text-editor though, so let’s just encode our + newlines: +

+ +
+
m4_fmt_code(vipe.sh.html)
+
+ + + +

+ When running the above code example, you will see the following + in your editor: +

+ +
+
m4_fmt_code(vim.html)
+
+ +

+ After you exit your editor, mmv will decode all + occurances of ‘\n’ back into a newline, and all + occurances of ‘\\’ back into a backslash: +

+ +
+ +
+ +

Individual Execution

+

+ The previous examples are great and all, but what do you do if + your mapping command doesn’t have the concept of an input + seperator at all? This is where the -i flag comes + into play. With the -i flag we can + get mmv to execute our mapping command for every + input filename. This means that as long as we can work with a + complete buffer, we don’t need to worry about seperators. +

+ +

+ To be honest, I cannot really think of any situation where you + might actually need to do this. If you can think of one, + please email me and + I’ll update the example on this page. Regardless, let’s imagine + that we wanted to rename some files so that their filenames are + replaced with their filename + + SHA-1 hash. + On Linux we have the sha1sum program which reads + input from the standard input and outputs the SHA-1 hash. This + is how we would use it with mmv: +

+ +
+
m4_fmt_code(sha1sum-long-example.sh.html)
+
+ +

+ Another approach is to invoke mmv twice: +

+ +
+
m4_fmt_code(sha1sum-short-example.sh.html)
+
+ +

+ If you are confused about why we need to make a call + to awk, it’s because the sha1sum + program outputs 2 columns of data. The first column is our hash + and the second column is the filename where the to-be-hashed + data was read from. We don’t want the second column. +

+ +

+ Unlike in previous examples where one process was spawned to map + all our filenames, with the -i flag we are spawning + a new instance for each filename. If you struggle to visualize + this, perhaps the following diagrams help: +

+ +
+
Invoking mmv without -i
+ +
+ +
+
Invoking mmv with -i
+ +
+ +

Safety

+

+ When compared to the standard for f in *; do mv $f …; + done or ls | … | xargs -L2 mv + constructs, mmv is significantly more safe to use. + These are some of the safety features that are built into the + tool: +

+ +
    +
  1. + If the number of input- and output files differs, execution + is aborted before making any changes. +
  2. +
  3. + If an input file is renamed to the name of another input + file, the second input file is not lost (i.e. you can rename + a to b and b to a with + no problem). +
  4. +
  5. + All input files must be unique and all output files must be + unique. Otherwise execution is aborted before making any + changes. +
  6. +
  7. + In the case that something goes wrong during execution + (perhaps you tried to move a file to a non-existant + directory, or a syscall failed), a backup of your input + files is saved automatically by mmv for + recovery. +
  8. +
+ +

+ Due to the way mmv handles #2, when things do go + wrong you may find that all of your input files have + disappeared. Don’t worry though, mmv takes a + backup of your code before doing anything. If you + run mmv with the -v option for verbose + output, you’ll notice it backing up your stuff in + the $XDG_CACHE_DIR directory: +

+ +
+
m4_fmt_code(mmv-verbose.sh.html)
+
+ +

+ Upon successful execution + the $XDG_CACHE_DIR/mmv/TIMESTAMP directory will be + automatically removed, but it remains when things go wrong so + that you can recover any missing data. The names of the + backup-subdirectories in the $XDG_CACHE_DIR/mmv + directory are timestamps of when the directories were created. + This should make it easier for you to figure out which directory + you need to recover if you happen to have multiple of these. +

+ +

Examples

+ + + +

Swap the files foo and bar:

+
+
m4_fmt_code(examples/swap.sh.html)
+
+ +

+ Rename all files in the current directory to use hyphens (‘-’) + instead of spaces: +

+
+
m4_fmt_code(examples/hyphens.sh.html)
+
+ +

+ Rename a given list of movies to use lowercase letters and + hyphens instead of uppercase letters and spaces, and number them + so that they’re properly ordered in globs (e.g. rename The + Return of the King.mp4 to + 02-the-return-of-the-king.mp4): +

+
+
m4_fmt_code(examples/number.sh.html)
+
+ +

+ Rename files interactively in your editor while encoding newline + into the literal string ‘\n’, making use + of vipe(1) from moreutils: +

+
+
m4_fmt_code(examples/vipe.sh.html)
+
+ +

+ Rename all C source code- and header files in a git repository + to use snake_case instead of camelCase using + the GNU + sed(1)\n’ extension: +

+
+
m4_fmt_code(examples/camel-to-snake.sh.html)
+
+ +

+ Lowercase all filenames within a directory hierarchy which may + contain newline characters: +

+
+
m4_fmt_code(examples/lowercase.sh.html)
+
+ +

+ Map filenames which may contain newlines in the current + directory with the command ‘cmd’, which itself does + not support nul-byte separated entries. This only works + assuming your mapping doesn’t require any context outside of the + given input filename (for example, you would not be able to + number your files as this requires knowledge of the input files + position in the input list): +

+
+
m4_fmt_code(examples/i-flag.sh.html)
+
+
+ +
+ + + + -- cgit v1.2.3