src/prj/mmv/index.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658

<!DOCTYPE html>
<html lang="en">
	<head>
		m4_include(head.html)
	</head>
	<body>
		<header>
			<div>
				<h1>Moving Files the Right Way</h1>
				m4_include(nav.html)
			</div>

			<figure class="quote">
				<blockquote>
					<p>I think the OpenBSD crowd is a bunch of masturbating
					monkeys, in that they make such a big deal about
					concentrating on security to the point where they pretty much
					admit that nothing else matters to them.</p>
				</blockquote>
				<figcaption>
					Linux Torvalds
				</figcaption>
			</figure>
		</header>

		<main>
			<p>
				<em>
					You can find the <code>mmv</code> git repository over at
					<a href="https://git.sr.ht/~mango/mmv"
					   target="_blank">sourcehut</a>
					   or <a href="https://github.com/Mango0x45/mmv">GitHub</a>.
				</em>
			</p>

			<h2>Table of Contents</h2>

			<ul>
				<li><a href="#prologue">Prologue</a></li>
				<li><a href="#moving">Advanced Moving and Pitfalls</a></li>
				<li><a href="#mapping">Name Mapping with <code>mmv</code></a></li>
				<li><a href="#newlines">Filenames with Embedded Newlines</a></li>
				<ul>
					<li><a href="#0-flag">The Simple Case</a></li>
					<li><a href="#e-flag">Encoding Newlines</a></li>
				</ul>
				<li><a href="#i-flag">Individual Execution</a></li>
				<li><a href="#safety">Safety</a></li>
				<li><a href="#examples">Examples</a></li>
			</ul>
			
			<h2 id="prologue">Prologue</h2>
			<p>
				File moving and renaming is one of the most common tasks we
				undertake on the command-line.  We basically always do this with
				the <code>mv</code> utility, and it gets the job done most of the
				time.  Want to rename one file?  Use <code>mv</code>!  Want to
				move a bunch of files into a directory?  Use <code>mv</code>!
				How could mv ever go wrong?  Well I’m glad you asked!
			</p>

			<h2 id="moving">Advanced Moving and Pitfalls</h2>
			<p>
				Let’s start off nice and simple.  You just inherited a C project
				that uses the sacrilegious
				<a
					href="https://en.wikipedia.org/wiki/Camel_case"
					target="_blank"
				>camelCase</a>
				naming convention for its files:
			</p>

			<figure>
				<pre>m4_fmt_code(ls-files.sh.html)</pre>
			</figure>

			<p>
				This deeply upsets you, as it upsets me.  So you decide you want
				to switch all these files to use
				<a
					href="https://en.wikipedia.org/wiki/Snake_case"
					target="_blank"
				>snake_case</a>,
				like a normal person.  Well how would you do this?  You use
				<code>mv</code>!  This is what you might end up doing:
			</p>

			<figure>
				<pre>m4_fmt_code(manual-mv.sh.html)</pre>
			</figure>

			<p>
				Well… it works I guess, but it’s a pretty shitty way of renaming
				these files.  Luckily we only had 5, but what if this was a much
				larger project with many more files to rename?  Things would get
				tedious.  So instead we can use a pipeline for
				this:
			</p>

			<figure>
				<pre>m4_fmt_code(camel-to-snake-naïve.sh.html)</pre>
			</figure>

			<aside>
				<p>
					The given example assumes your <code>sed</code>
					implementation supports ‘<code>\L</code>’ which is a
					non-standard <abbr class="gnu">GNU</abbr> extension.
				</p>
			</aside>

			<p>
				That works and it gets the job done, but it’s not really ideal is
				it?  There are a couple of issues with this.
			</p>

			<ol>
				<li>
					<p>
						You’re writing more complicated code.  This has the
						obvious drawback of potentially being more error-prone,
						but also risks taking more time to write than you’d like
						as you might have forgotten if <code>xargs</code>
						actually has an ‘<code>-L</code>’ option or not (which
						would require reading the
						<a href="https://www.man7.org/linux/man-pages/man1/xargs.1.html"
							target="_blank" ><code>xargs(1)</code></a> manual).
					</p>
				</li>
				<li>
					<p>
						If you try to rename the file <em>foo</em>
						to <em>bar</em> but <em>bar</em> already exists, you end
						up deleting a file you may not have wanted to.
					</p>
				</li>
				<li>
					<p>
						In a similar vein to the previous point, you need to be
						very careful about schemes like renaming the
						file <em>a</em> to <em>b</em> and <em>b</em>
						to <em>c</em>.  You run the risk of turning <em>a</em>
						into <em>c</em> and losing the file <em>b</em> entirely.
					</p>
				</li>
				<li>
					<p>
						Moving symbolic links is its own whole can of worms.  If
						a symlink points to a relative location then you need to
						make sure you keep pointing to the right place.  If the
						symlink is absolute however then you can leave it
						untouched.  But what if the symlink points to a file
						that you’re moving as part of your batch move operation?
						Now you need to handle that too.
					</p>
				</li>
			</ol>

			<h2 id="mapping">Name Mapping with <code>mmv</code></h2>

			<p>
				What is <code>mmv</code>?  It’s the solution to all your
				problems, that’s what it is!  <code>mmv</code> takes as its
				argument(s) a utility and that utilities arguments and uses that
				to create a mapping between old and new filenames — similar to
				the <code>map()</code> function found in many programming
				languages.  I think to best convey how the tool functions, I
				should provide an example.  Let’s try to do the same thing we did
				previously where we tried to turn camelCase files to snake_case,
				but using <code>mmv</code>:
			</p>

			<figure>
				<pre>m4_fmt_code(camel-to-snake-smart.sh.html)</pre>
			</figure>

			<p>Let me break down how this works.</p>

			<p>
				<code>mmv</code> starts by reading a series of filenames
				separated by newlines from the standard input.  Yes, sometimes
				filenames have newlines in them and yes there is a way to handle
				them but I shall get to that later.  The filenames that
				<code>mmv</code> reads from the standard input will be referred
				to as the <em>input files</em>.  Once all the input files have
				been read, the utility specified by the arguments is spawned; in
				this case that would be <code>sed</code> with the argument
				<code>'s/[A-Z]/\L_&/g'</code>. The input files are then piped
				into <code>sed</code> the exact same way that they would have
				been if we ran the above commands without <code>mmv</code>, and
				the output of <code>sed</code> then forms what will be referred
				to as the <em>output files</em>.  Once a complete list of output
				files is accumulated, each input file gets renamed to its
				corresponding output file.
			</p>

			<p>
				Let’s look at a simpler example.  Say we want to rename 2 files
				in the current directory to use lowercase letters, we could use
				the following command:
			</p>
			
			<figure>
				<pre>m4_fmt_code(mmv-tr.sh.html)</pre>
			</figure>

			<p>
				In the above example <code>mmv</code> reads 2 lines from
				standard input, those being <em>LICENSE</em>
				and <em>README</em>.  Those are our 2 input files now.
				The <code>tr</code> utility is then spawned and the input files
				are piped into it.  We can simulate this in the shell:
			</p>

			<figure>
				<pre>m4_fmt_code(tr.sh.html)</pre>
			</figure>

			<p>
				As you can see above, <code>tr</code> has produced 2 lines of
				output; these are our 2 output files.  Since we now have our 2
				input files and 2 output files, <code>mmv</code> can go ahead
				and rename the files.  In this case it will rename
				<em>LICENSE</em> to <em>license</em> and
				<em>README</em> to <em>readme</em>.  For some examples, check
				the <a href="#examples">examples</a> section of this page down
				below.
			</p>

			<h2 id="newlines">Filenames with Embedded Newlines</h2>

			<p>
				People are retarded, and as a result we have filenames with
				newlines in them.  All it would have taken to solve this issue
				for everyone was for literally <strong>anybody</strong> during
				the early UNIX days to go “<em>hey, this is a bad idea!</em>”,
				but alas, we must deal with this.  Newlines are of course not
				the only special characters filenames can contain, but they are
				the single most infuriating to deal with; the UNIX utilities all
				being line-oriented really doesn’t work well with these files.
			</p>

			<p>
				So how does <code>mmv</code> deal with special characters, and
				newlines in particular?  Well it does so by providing the user
				with the <code>-0</code> and <code>-e</code> flags:
			</p>

			<dl>
				<dt><code>-0</code></dt>
				<dd>
					<p>
						Tell <code>mmv</code> to expect its input to not be
						separated by newlines (‘<code>\n</code>’), but by NUL
						bytes (‘<code>\0</code>’).  NUL bytes are the only
						characters not allowed in filenames besides forward
						slashes, so they are an obvious choice for an
						alternative separator.
					</p>
				</dd>
				<dt><code>-e</code></dt>
				<dd>
					<p>
						Encode newlines in filenames before passing them to the
						provided utility.  Newline characters are replaced by the
						literal string ‘<code>\n</code>’ and backslashes by the
						literal string ‘<code>\\</code>’.  After processing, the
						resulting output is decoded again.
					</p>
					<p>
						If combined with the <code>-0</code> flag, then while
						input will be read assuming a NUL-byte input-seperator,
						the encoded input files will be written to the spawned
						process newline-seperated.
					</p>
				</dd>
			</dl>

			<h3 id="0-flag">The Simple Case</h3>

			<p>
				In order to better understand these flags and how they work
				let’s go though another example.  We have 2 files — one with and
				one without an embedded newline — and our goal is to simply
				reverse these filenames.  In this example I am going to be
				displaying newlines in filenames with the “<code>$'\n'</code>”
				syntax as this is how my shell displays embedded newlines.
			</p>

			<p>
				We can start by just trying to naïvely pass these 2 files
				to <code>mmv</code> and use <code>rev</code> to reverse the
				names, but this doesn’t work:
			</p>

			<figure>
				<pre>m4_fmt_code(mmv-rev.sh.html)</pre>
			</figure>

			<p>
			  The reason this doesn’t work is because due to the line-oriented
			  nature of <code>ls</code> and <code>rev</code>, we are actually
			  trying to rename the files <em>foo</em>, <em>bar</em>, and
			  <em>baz</em> to the new filenames <em>zab</em>,
			  <em>rab</em>, and <em>oof</em>.  As can be seen in the following
			  diagram, the embedded newline is causing our input to be ambiguous
			  and <code>mmv</code> can’t reliably proceed
			  anymore <x-ref>1</x-ref>:
			</p>

			<figure>
				<object data="conflict.svg" type="image/svg+xml"></object>
			</figure>

			<aside>
				<p data-ref="1">
					The reason you get a cryptic “file not found” error message
					is because <code>mmv</code> tries to assert that all the
					input files actually exist before doing anything.  Since
					“foo” isn’t a real file, we error out.
				</p>
			</aside>
			
			<p>
			  The first thing we need to do in order to proceed is to pass
			  the <code>-0</code> flag to <code>mmv</code>.  This will
			  tell <code>mmv</code> that we want to use the NUL-byte as our
			  input separator and not the newline.  We also need <code>ls</code>
			  to actually provide us with the filenames delimited by NUL-bytes.
			  Luckily <abbr class="gnu">GNU</abbr> <code>ls</code> gives us the
			  <code>--zero</code> flag to do just that:
			</p>

			<figure>
			  <pre>m4_fmt_code(mmv-rev-zero.sh.html)</pre>
			</figure>

			<p>
				So we’re getting places, but we aren’t quite there yet.  The
				issue we’re getting now is that <code>mmv</code> recieved 2
				input files from the standard input, but <code>rev</code>
				produced 3 output files.  Why is that?  Well let’s try our hand
				at a little bit of command-line debugging with <code>sed</code>:
			</p>

			<figure>
				<pre>m4_fmt_code(sed-debugging.sh.html)</pre>
			</figure>

			<p>
				If you aren’t quite sure what the above is doing, here’s a quick
				summary:
			</p>

			<ul>
				<li>
					The <code>-U</code> flag given to <code>ls</code> tells it
					not to sort our output.  This is purely just to keep this
					example clear to the reader.
				</li>
				<li>
					The <code>-n</code> flag given to <code>sed</code> tells it
					not to print the input line automatically at the end of the
					provided script.
				</li>
				<li>
					The <code>l</code> command in <code>sed</code> prints the
					current input in a “visually unambiguous form”.
				</li>
			</ul>

			<p>
				In the <code>sed</code> output, we can see that <samp>$</samp>
				represents the end of a line, and <samp>\000</samp> represents
				the NUL-byte.  All looks good here, we have two inputs seperated
				by NUL-bytes.  Now let’s try to throw in <code>rev</code>:
			</p>

			<figure>
				<pre>m4_fmt_code(sed-debugging-rev.sh.html)</pre>
			</figure>

			<p>
				Well wouldn’t you know it?  Since <code>rev</code> <em>also</em>
				works with newline-seperated input, it reversed out NUL-byte
				seperators and now gives us 3 outputs.  Luckily the folks over
				at <em>util-linux</em> provided us with the <code>-0</code> flag
				here too, so that we can properly handle NUL-delimited input.
				Combining all of this together we get a final working product:
			</p>

			<figure>
				<pre>m4_fmt_code(reverse-embedded-newline.sh.html)</pre>
			</figure>

			<h3 id="e-flag">Encoding Newlines</h3>

			<p>
				Sometimes we want to rename a bunch of files, but the command we
				want to use doesn’t support NUL-bytes as nicely as we would
				like.  In these cases, you may want to consider encoding your
				newline characters into the literal string ‘<code>\n</code>’ and
				then passing your input newline-seperated to your given command
				with the <code>-e</code> flag.
			</p>

			<p>
				For a real-world example, perhaps you want to edit some
				filenames in vim, or whatever other editor you use.  Well we can
				do this incredibly easily with the <code>vipe</code> utility
				from
				the <a href="https://joeyh.name/code/moreutils/">moreutils</a>
				collection.  The <code>vipe</code> command simply reads input
				from the standard input, opens it up in your editor, and then
				prints the resulting output to the standard output; perfect
				for <code>mmv</code>!  We do not really want to deal with
				NUL-bytes in our text-editor though, so let’s just encode our
				newlines:
			</p>

			<figure>
				<pre>m4_fmt_code(vipe.sh.html)</pre>
			</figure>

			<aside>
				<p>
					Notice how you still need to pass the <code>-0</code> flag
					to <code>mmv</code> know that our inputfiles may have
					embedded newlines.
				</p>
			</aside>

			<p>
				When running the above code example, you will see the following
				in your editor:
			</p>

			<figure>
				<pre>m4_fmt_code(vim.html)</pre>
			</figure>

			<p>
				After you exit your editor, <code>mmv</code> will decode all
				occurances of ‘<code>\n</code>’ back into a newline, and all
				occurances of ‘<code>\\</code>’ back into a backslash:
			</p>

			<figure>
				<object data="e-flag.svg" type="image/svg+xml"></object>
			</figure>

			<h2 id="i-flag">Individual Execution</h2>
			<p>
				The previous examples are great and all, but what do you do if
				your mapping command doesn’t have the concept of an input
				seperator at all?  This is where the <code>-i</code> flag comes
				into play.  With the <code>-i</code> flag we can
				get <code>mmv</code> to execute our mapping command for every
				input filename.  This means that as long as we can work with a
				complete buffer, we don’t need to worry about seperators.
			</p>

			<p>
				To be honest, I cannot really think of any situation where you
				might actually need to do this.  If you can think of one,
				please <a href="mailto:mail@thomasvoss.com">email me</a> and
				I’ll update the example on this page.  Regardless, let’s imagine
				that we wanted to rename some files so that their filenames are
				replaced with their filename
				<a href="https://en.wikipedia.org/wiki/SHA-1" target="_blank">
					SHA-1 hash</a>.
				On Linux we have the <code>sha1sum</code> program which reads
				input from the standard input and outputs the SHA-1 hash.  This
				is how we would use it with <code>mmv</code>:
			</p>

			<figure>
				<pre>m4_fmt_code(sha1sum-long-example.sh.html)</pre>
			</figure>

			<p>
				Another approach is to invoke <code>mmv</code> twice:
			</p>

			<figure>
				<pre>m4_fmt_code(sha1sum-short-example.sh.html)</pre>
			</figure>

			<p>
				If you are confused about why we need to make a call
				to <code>awk</code>, it’s because the <code>sha1sum</code>
				program outputs 2 columns of data.  The first column is our hash
				and the second column is the filename where the to-be-hashed
				data was read from.  We don’t want the second column.
			</p>

			<p>
				Unlike in previous examples where one process was spawned to map
				all our filenames, with the <code>-i</code> flag we are spawning
				a new instance for each filename.  If you struggle to visualize
				this, perhaps the following diagrams help:
			</p>

			<figure>
				<figcaption>Invoking <code>mmv</code> without <code>-i</code></figcaption>
				<object data="without-i-flag.svg" type="image/svg+xml"></object>
			</figure>

			<figure>
				<figcaption>Invoking <code>mmv</code> with <code>-i</code></figcaption>
				<object data="with-i-flag.svg" type="image/svg+xml"></object>
			</figure>

			<h2 id="safety">Safety</h2>
			<p>
				When compared to the standard <code>for f in *; do mv $f …;
				done</code> or <code>ls | … | xargs -L2 mv</code>
				constructs, <code>mmv</code> is significantly more safe to use.
				These are some of the safety features that are built into the
				tool:
			</p>

			<ol>
				<li>
					If the number of input- and output files differs, execution
					is aborted before making any changes.
				</li>
				<li>
					If an input file is renamed to the name of another input
					file, the second input file is not lost (i.e. you can rename
					<em>a</em> to <em>b</em> and <em>b</em> to <em>a</em> with
					no problem).
				</li>
				<li>
					All input files must be unique and all output files must be
					unique. Otherwise execution is aborted before making any
					changes.
				</li>
				<li>
					In the case that something goes wrong during execution
					(perhaps you tried to move a file to a non-existant
					directory, or a syscall failed), a backup of your input
					files is saved automatically by <code>mmv</code> for
					recovery.
				</li>
			</ol>

			<p>
				Due to the way <code>mmv</code> handles #2, when things do go
				wrong you may find that all of your input files have
				disappeared.  Don’t worry though, <code>mmv</code> takes a
				backup of your code before doing anything.  If you
				run <code>mmv</code> with the <code>-v</code> option for verbose
				output, you’ll notice it backing up your stuff in
				the <code>$XDG_CACHE_DIR</code> directory:
			</p>

			<figure>
				<pre>m4_fmt_code(mmv-verbose.sh.html)</pre>
			</figure>

			<p>
				Upon successful execution
				the <code>$XDG_CACHE_DIR/mmv/TIMESTAMP</code> directory will be
				automatically removed, but it remains when things go wrong so
				that you can recover any missing data.  The names of the
				backup-subdirectories in the <code>$XDG_CACHE_DIR/mmv</code>
				directory are timestamps of when the directories were created.
				This should make it easier for you to figure out which directory
				you need to recover if you happen to have multiple of these.
			</p>
			
			<h2 id="examples">Examples</h2>

			<aside>
				<p>
					All of these examples are ripped straight from
					the <code>mmv(1)</code> manual page. If you
					installed <code>mmv</code> through a package manager or
					via <code>make install</code> then you should have the
					manual installed on your system.
				</p>
			</aside>

			<p>Swap the files <em>foo</em> and <em>bar</em>:</p>
			<figure>
				<pre>m4_fmt_code(examples/swap.sh.html)</pre>
			</figure>

			<p>
				Rename all files in the current directory to use hyphens (‘-’)
				instead of spaces:
			</p>
			<figure>
				<pre>m4_fmt_code(examples/hyphens.sh.html)</pre>
			</figure>

			<p>
				Rename a given list of movies to use lowercase letters and
				hyphens instead of uppercase letters and spaces, and number them
				so that they’re properly ordered in globs (e.g. rename <em>The
				Return of the King.mp4</em> to
				<em>02-the-return-of-the-king.mp4</em>):
			</p>
			<figure>
				<pre>m4_fmt_code(examples/number.sh.html)</pre>
			</figure>

			<p>
				Rename files interactively in your editor while encoding newline
				into the literal string ‘<code>\n</code>’, making use
				of <code><a href="https://linux.die.net/man/1/vipe"
				target="_blank">vipe(1)</a></code> from <em>moreutils</em>:
			</p>
			<figure>
				<pre>m4_fmt_code(examples/vipe.sh.html)</pre>
			</figure>

			<p>
				Rename all C source code- and header files in a git repository
				to use snake_case instead of camelCase using
				the <abbr class="gnu">GNU</abbr>
				<code><a href="https://www.man7.org/linux/man-pages/man1/sed.1.html"
				target="_blank">sed(1)</a></code> ‘<code>\n</code>’ extension:
			</p>
			<figure>
				<pre>m4_fmt_code(examples/camel-to-snake.sh.html)</pre>
			</figure>

			<p>
				Lowercase all filenames within a directory hierarchy which may
				contain newline characters:
			</p>
			<figure>
				<pre>m4_fmt_code(examples/lowercase.sh.html)</pre>
			</figure>

			<p>
				Map filenames which may contain newlines in the current
				directory with the command ‘<code>cmd</code>’, which itself does
				not support nul-byte separated entries.  This only works
				assuming your mapping doesn’t require any context outside of the
				given input filename (for example, you would not be able to
				number your files as this requires knowledge of the input files
				position in the input list):
			</p>
			<figure>
				<pre>m4_fmt_code(examples/i-flag.sh.html)</pre>
			</figure>
		</main>

		<hr>
			
		<footer>
			m4_footer
		</footer>
	</body>
</html>