You can find the mmv
git repository over at
sourcehut
or GitHub.
Table of Contents
- Prologue
- Advanced Moving and Pitfalls
- Name Mapping with
mmv
- Filenames with Embedded Newlines
- Individual Execution
- Safety
- Examples
Prologue
File moving and renaming is one of the most common tasks we
undertake on the command-line. We basically always do this with
the mv
utility, and it gets the job done most of the
time. Want to rename one file? Use mv
! Want to
move a bunch of files into a directory? Use mv
!
How could mv ever go wrong? Well I’m glad you asked!
Advanced Moving and Pitfalls
Let’s start off nice and simple. You just inherited a C project that uses the sacrilegious camelCase naming convention for its files:
This deeply upsets you, as it upsets me. So you decide you want
to switch all these files to use
snake_case,
like a normal person. Well how would you do this? You use
mv
! This is what you might end up doing:
Well… it works I guess, but it’s a pretty shitty way of renaming these files. Luckily we only had 5, but what if this was a much larger project with many more files to rename? Things would get tedious. So instead we can use a pipeline for this:
That works and it gets the job done, but it’s not really ideal is it? There are a couple of issues with this.
-
You’re writing more complicated code. This has the obvious drawback of potentially being more error-prone, but also risks taking more time to write than you’d like as you might have forgotten if
xargs
actually has an ‘-L
’ option or not (which would require reading thexargs(1)
manual). -
If you try to rename the file foo to bar but bar already exists, you end up deleting a file you may not have wanted to.
-
In a similar vein to the previous point, you need to be very careful about schemes like renaming the file a to b and b to c. You run the risk of turning a into c and losing the file b entirely.
-
Moving symbolic links is its own whole can of worms. If a symlink points to a relative location then you need to make sure you keep pointing to the right place. If the symlink is absolute however then you can leave it untouched. But what if the symlink points to a file that you’re moving as part of your batch move operation? Now you need to handle that too.
Name Mapping with mmv
What is mmv
? It’s the solution to all your
problems, that’s what it is! mmv
takes as its
argument(s) a utility and that utilities arguments and uses that
to create a mapping between old and new filenames — similar to
the map()
function found in many programming
languages. I think to best convey how the tool functions, I
should provide an example. Let’s try to do the same thing we did
previously where we tried to turn camelCase files to snake_case,
but using mmv
:
Let me break down how this works.
mmv
starts by reading a series of filenames
separated by newlines from the standard input. Yes, sometimes
filenames have newlines in them and yes there is a way to handle
them but I shall get to that later. The filenames that
mmv
reads from the standard input will be referred
to as the input files. Once all the input files have
been read, the utility specified by the arguments is spawned; in
this case that would be sed
with the argument
's/[A-Z]/\L_&/g'
. The input files are then piped
into sed
the exact same way that they would have
been if we ran the above commands without mmv
, and
the output of sed
then forms what will be referred
to as the output files. Once a complete list of output
files is accumulated, each input file gets renamed to its
corresponding output file.
Let’s look at a simpler example. Say we want to rename 2 files in the current directory to use lowercase letters, we could use the following command:
In the above example mmv
reads 2 lines from
standard input, those being LICENSE
and README. Those are our 2 input files now.
The tr
utility is then spawned and the input files
are piped into it. We can simulate this in the shell:
As you can see above, tr
has produced 2 lines of
output; these are our 2 output files. Since we now have our 2
input files and 2 output files, mmv
can go ahead
and rename the files. In this case it will rename
LICENSE to license and
README to readme. For some examples, check
the examples section of this page down
below.
Filenames with Embedded Newlines
People are retarded, and as a result we have filenames with newlines in them. All it would have taken to solve this issue for everyone was for literally anybody during the early UNIX days to go “hey, this is a bad idea!”, but alas, we must deal with this. Newlines are of course not the only special characters filenames can contain, but they are the single most infuriating to deal with; the UNIX utilities all being line-oriented really doesn’t work well with these files.
So how does mmv
deal with special characters, and
newlines in particular? Well it does so by providing the user
with the -0
and -e
flags:
-0
-
Tell
mmv
to expect its input to not be separated by newlines (‘\n
’), but by NUL bytes (‘\0
’). NUL bytes are the only characters not allowed in filenames besides forward slashes, so they are an obvious choice for an alternative separator. -e
-
Encode newlines in filenames before passing them to the provided utility. Newline characters are replaced by the literal string ‘
\n
’ and backslashes by the literal string ‘\\
’. After processing, the resulting output is decoded again.If combined with the
-0
flag, then while input will be read assuming a NUL-byte input-seperator, the encoded input files will be written to the spawned process newline-seperated.
The Simple Case
In order to better understand these flags and how they work
let’s go though another example. We have 2 files — one with and
one without an embedded newline — and our goal is to simply
reverse these filenames. In this example I am going to be
displaying newlines in filenames with the “$'\n'
”
syntax as this is how my shell displays embedded newlines.
We can start by just trying to naïvely pass these 2 files
to mmv
and use rev
to reverse the
names, but this doesn’t work:
The reason this doesn’t work is because due to the line-oriented
nature of ls
and rev
, we are actually
trying to rename the files foo, bar, and
baz to the new filenames zab,
rab, and oof. As can be seen in the following
diagram, the embedded newline is causing our input to be ambiguous
and mmv
can’t reliably proceed
anymore
The first thing we need to do in order to proceed is to pass
the -0
flag to mmv
. This will
tell mmv
that we want to use the NUL-byte as our
input separator and not the newline. We also need ls
to actually provide us with the filenames delimited by NUL-bytes.
Luckily GNU ls
gives us the
--zero
flag to do just that:
So we’re getting places, but we aren’t quite there yet. The
issue we’re getting now is that mmv
recieved 2
input files from the standard input, but rev
produced 3 output files. Why is that? Well let’s try our hand
at a little bit of command-line debugging with sed
:
If you aren’t quite sure what the above is doing, here’s a quick summary:
-
The
-U
flag given tols
tells it not to sort our output. This is purely just to keep this example clear to the reader. -
The
-n
flag given tosed
tells it not to print the input line automatically at the end of the provided script. -
The
l
command insed
prints the current input in a “visually unambiguous form”.
In the sed
output, we can see that $
represents the end of a line, and \000 represents
the NUL-byte. All looks good here, we have two inputs seperated
by NUL-bytes. Now let’s try to throw in rev
:
Well wouldn’t you know it? Since rev
also
works with newline-seperated input, it reversed out NUL-byte
seperators and now gives us 3 outputs. Luckily the folks over
at util-linux provided us with the -0
flag
here too, so that we can properly handle NUL-delimited input.
Combining all of this together we get a final working product:
Encoding Newlines
Sometimes we want to rename a bunch of files, but the command we
want to use doesn’t support NUL-bytes as nicely as we would
like. In these cases, you may want to consider encoding your
newline characters into the literal string ‘\n
’ and
then passing your input newline-seperated to your given command
with the -e
flag.
For a real-world example, perhaps you want to edit some
filenames in vim, or whatever other editor you use. Well we can
do this incredibly easily with the vipe
utility
from
the moreutils
collection. The vipe
command simply reads input
from the standard input, opens it up in your editor, and then
prints the resulting output to the standard output; perfect
for mmv
! We do not really want to deal with
NUL-bytes in our text-editor though, so let’s just encode our
newlines:
When running the above code example, you will see the following in your editor:
After you exit your editor, mmv
will decode all
occurances of ‘\n
’ back into a newline, and all
occurances of ‘\\
’ back into a backslash:
Individual Execution
The previous examples are great and all, but what do you do if
your mapping command doesn’t have the concept of an input
seperator at all? This is where the -i
flag comes
into play. With the -i
flag we can
get mmv
to execute our mapping command for every
input filename. This means that as long as we can work with a
complete buffer, we don’t need to worry about seperators.
To be honest, I cannot really think of any situation where you
might actually need to do this. If you can think of one,
please email me and
I’ll update the example on this page. Regardless, let’s imagine
that we wanted to rename some files so that their filenames are
replaced with their filename
SHA-1 hash.
On Linux we have the sha1sum
program which reads
input from the standard input and outputs the SHA-1 hash. This
is how we would use it with mmv
:
Another approach is to invoke mmv
twice:
If you are confused about why we need to make a call
to awk
, it’s because the sha1sum
program outputs 2 columns of data. The first column is our hash
and the second column is the filename where the to-be-hashed
data was read from. We don’t want the second column.
Unlike in previous examples where one process was spawned to map
all our filenames, with the -i
flag we are spawning
a new instance for each filename. If you struggle to visualize
this, perhaps the following diagrams help:
Safety
When compared to the standard for f in *; do mv $f …;
done
or ls | … | xargs -L2 mv
constructs, mmv
is significantly more safe to use.
These are some of the safety features that are built into the
tool:
- If the number of input- and output files differs, execution is aborted before making any changes.
- If an input file is renamed to the name of another input file, the second input file is not lost (i.e. you can rename a to b and b to a with no problem).
- All input files must be unique and all output files must be unique. Otherwise execution is aborted before making any changes.
-
In the case that something goes wrong during execution
(perhaps you tried to move a file to a non-existant
directory, or a syscall failed), a backup of your input
files is saved automatically by
mmv
for recovery.
Due to the way mmv
handles #2, when things do go
wrong you may find that all of your input files have
disappeared. Don’t worry though, mmv
takes a
backup of your code before doing anything. If you
run mmv
with the -v
option for verbose
output, you’ll notice it backing up your stuff in
the $XDG_CACHE_DIR
directory:
Upon successful execution
the $XDG_CACHE_DIR/mmv/TIMESTAMP
directory will be
automatically removed, but it remains when things go wrong so
that you can recover any missing data. The names of the
backup-subdirectories in the $XDG_CACHE_DIR/mmv
directory are timestamps of when the directories were created.
This should make it easier for you to figure out which directory
you need to recover if you happen to have multiple of these.
Examples
Swap the files foo and bar:
Rename all files in the current directory to use hyphens (‘-’) instead of spaces:
Rename a given list of movies to use lowercase letters and hyphens instead of uppercase letters and spaces, and number them so that they’re properly ordered in globs (e.g. rename The Return of the King.mp4 to 02-the-return-of-the-king.mp4):
Rename files interactively in your editor while encoding newline
into the literal string ‘\n
’, making use
of vipe(1)
from moreutils:
Rename all C source code- and header files in a git repository
to use snake_case instead of camelCase using
the GNU
sed(1)
‘\n
’ extension:
Lowercase all filenames within a directory hierarchy which may contain newline characters:
Map filenames which may contain newlines in the current
directory with the command ‘cmd
’, which itself does
not support nul-byte separated entries. This only works
assuming your mapping doesn’t require any context outside of the
given input filename (for example, you would not be able to
number your files as this requires knowledge of the input files
position in the input list):