1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
|
Internet Architecture Board (IAB) H. Flanagan
Request for Comments: 8153 RFC Editor
Category: Informational April 2017
ISSN: 2070-1721
Digital Preservation Considerations for the RFC Series
Abstract
The RFC Editor is both the publisher and the archivist for the RFC
Series. This document applies specifically to the archivist role of
the RFC Editor. It provides guidance on when and how to preserve
RFCs and describes the tools required to view or re-create RFCs as
necessary. This document also highlights gaps in the current process
and suggests compromises to balance cost with best practice.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Architecture Board (IAB)
and represents information that the IAB has deemed valuable to
provide for permanent record. It represents the consensus of the
Internet Architecture Board (IAB). Documents approved for
publication by the IAB are not a candidate for any level of Internet
Standard; see Section 2 of RFC 7841.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc8153.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Flanagan Informational [Page 1]
^L
RFC 8153 Digital Preservation April 2017
Table of Contents
1. Introduction ....................................................2
1.1. Terminology ................................................4
1.2. Life Cycle of Digital Preservation .........................4
2. Updating Policy and Procedure ...................................5
2.1. Acquisition of Documents ...................................6
2.2. Ingestion of Documents .....................................6
2.3. Metadata and Document Registration .........................7
2.4. Normalization and Standardization of Canonical File
Structure and Format .......................................9
2.4.1. 'Best Effort' Data Retention .......................10
2.4.2. Single Format for Archival Purposes ................11
2.4.3. Holistic Archiving of the Computing Environment ....12
2.5. Transformation/Migration to Current Publication Formats ...12
2.6. System Parameters .........................................13
2.7. Financial Impact ..........................................13
3. Recommendations ................................................14
4. Summary ........................................................15
5. IANA Considerations ............................................15
6. Security Considerations ........................................15
7. Informative References .........................................16
IAB Members at the Time of Approval ...............................18
Author's Address ..................................................18
1. Introduction
The RFC Editor is both the publisher and the archivist for the RFC
Series, a series of technical specifications and policy documents
that includes foundational Internet standards [RFC6635] [RFC-SERIES].
The goal of the RFC Editor is to is to produce clear, consistent, and
readable documents for the Internet community. Over time, the RFC
Editor will use as many modern features, such as hyperlinks and
content markup, within the document as necessary to convey the
information the authors intended for their audience. As the
archivist, however, the main goal is to preserve both the information
described and the documents themselves for the indefinite future. To
meet both of these goals, the RFC Editor must find the necessary
balance between the publication needs of today and the archival needs
of tomorrow, while acknowledging a finite set of resources to
complete both aspects of the RFC Editor function.
While many files are created during the editing process, this
document focuses on the archival needs of the Internet-Drafts (I-Ds)
that were approved for publication and the RFCs that resulted from
these I-Ds; I-Ds before they are approved for publication by the
appropriate stream-approving body are out of scope.
Flanagan Informational [Page 2]
^L
RFC 8153 Digital Preservation April 2017
To summarize, the key areas of tension between the roles of publisher
and archivist are:
o the desire of the publisher to meet the needs expressed by authors
who want to use the latest technology (e.g., vector graphics, live
links, and a rich set of metadata) within their documents; and
o the desire of the archivist to support only the simplest format
for documents possible -- currently held by the Series to be
plain-text, ASCII-only documents -- so that the tools needed to
view the documents are equally simple and resistant to changes in
technology, resulting in a set of documents that will be easier to
archive for at least the next several decades, if not centuries.
Through most of the history of the RFC Series, the file format for
RFCs has been plain text with an ASCII-only character set. This
choice offered the simplest format likely to remain available to the
largest number of consumers and the format most likely to be
resistant to changes in technology over time. Increasingly, however,
consumers and authors are requesting additional features that would
allow for easy reading on a wider array of devices while retaining
all the metadata authors intended in their documents. In 2013, RFC
6949 ("RFC Series Format Requirements and Future Development")
captured the high-level requirements for the Series; the fundamental
issue was that plain-text, ASCII-only documents no longer meet the
needs of the communities interested in using and producing RFCs
[RFC6949].
The assertion that plain-text, ASCII-only documents no longer meet
the needs of the community suggests that the simple archival process
maintained by the RFC Editor is also no longer sufficient. More
complex tools and file formats require a more complex process to
ensure that RFCs can be read and rendered far into the future. This
document describes the considerations that must inform any changes in
policy and procedure, and it describes a model for the RFC Series to
follow when additional formats beyond plain-text, ASCII-only RFCs are
published. The functional model that provides the framework for the
archival process described in this document was derived from the ISO
Open Archival Information System (OAIS) reference model, defined in
"Space data and information transfer systems -- Open archival
information system (OAIS) -- Reference model" [ISO14721].
Flanagan Informational [Page 3]
^L
RFC 8153 Digital Preservation April 2017
1.1. Terminology
Acquisition: The point at which a document is accepted by the RFC
Editor for future inclusion into the archive.
Ingestion: The point at which a digital object is assigned all
necessary metadata to describe the object and its contents and is
added to the archive.
Bitstream preservation: The process of storing and maintaining
digital objects over time, ensuring that there is no loss or
corruption of the bits making up those objects.
Content preservation: The retention of the ability to read, listen,
or watch a digital file in perpetuity. Content preservation is not
about the bits being stored; it is about being able to access and
present those bits to the user.
1.2. Life Cycle of Digital Preservation
The basic process for preserving digital information has been
described by a variety of organizations. From the Life cycle
Information For E-Literature (LIFE) project [LIFE] in the United
Kingdom to the ongoing digital preservation work in the U.S. Library
of Congress [USLOC], the basic digital preservation process is
straightforward. Documents are acquired and processed, metadata is
recorded, physical media is refreshed, and content is regularly
checked to see if it is still accessible by interested parties.
Complexities arise when one considers the need to preserve both the
bits of the digital objects themselves and the tools with which to
express those bits in an environment that experiences rapid changes
in technology.
For most of the existence of the RFC Series, the digital preservation
process has been fairly simple, focusing on bitstream preservation
and relying on paper copies of digital files.
The current archival process for the RFC Series is as follows:
1. Acquisition: The RFC Editor database is updated to indicate an
I-D has been approved for publication. At this point, the
document is taken through the editorial process on the way to
publication [RFC-PUB].
2. Ingestion: The RFC is added to the archive at the time of
publication.
Flanagan Informational [Page 4]
^L
RFC 8153 Digital Preservation April 2017
3. Metadata creation: The details regarding an RFC, including RFC
number, author, title, abstract, etc., are created at time of
publication. Additional metadata in the form of status and
errata can be added or changed at any time, following the process
of the originating document stream.
4. Bitstream preservation: This part of the process is handled as
part of the IT system administration; all servers, disks, and
backup technology are refreshed on a regular cycle.
5. Content preservation: All RFCs since January 2010 have been
printed out on standard office paper at time of publication, and
the electronic files have been preserved on disk and in backups
with no particular focus on preserving the entire computing
environment used to create the electronic documents. Most RFCs
prior to January 2010 are also available on paper, but there are
gaps in the record and issues of ownership around the paper
copies before that date.
When the format for RFCs transitions from plain-text, ASCII-only
files to an XML format with multiple outputs, the overall archival
process will become more complex. Additional metadata and some (or
possibly all) of the computing environment may need to be added to
the archive.
2. Updating Policy and Procedure
RFCs are created and published as digital objects. Unlike paper-
based publications, a digital collection requires a focus on
retaining the details of the technology as well as retaining the
object itself. Specifically, a digital archive needs to:
o consider the inherent instability of digital media,
o plan for a relatively short path to technological obsolescence,
o schedule regular media updates,
o apply predefined criteria for technology evaluation, and
o ensure the continued authenticity and integrity of documents
through any changes in technology.
As the custodian and canonical source of RFCs and associated errata,
the RFC Editor must consider how to ensure the availability and
integrity of this document series far into the future and determine
whether the focus must be on bitstream preservation, content
preservation, or both.
Flanagan Informational [Page 5]
^L
RFC 8153 Digital Preservation April 2017
The RFC Editor has several advantages in acting as the digital
archivist for the Series. Since the RFC Editor is the publisher as
well as the archivist, the RFC Editor controls the format of the
material and the process for adding that material to an archive and
can add any additional metadata considered necessary. External
material, while a major consideration for more general archives, is
no longer accepted by the RFC Editor. (See "Internet Archaeology:
Documents from Early History" [RFC-HISTORY] for the list of non-RFC
digital objects held by the RFC Editor.)
This document describes several different preservation models that
may fit the needs of the Series and raises several points for
community consideration. Specifically, this document covers
information on:
o Acquisition of documents
o Ingestion of documents
o Metadata and document registration
o Normalization and standardization of canonical file structure and
format
o Transformation/migration to current publication formats
o Content and computing environment preservation
o System parameters
o Financial impact
2.1. Acquisition of Documents
The acquisition process for documents intended for the archive starts
with the submission of an approved I-D for publication. During the
editorial process, information such as the document metadata is
finalized prior to publication. However, the initial I-D as
submitted and the RFC produced from it do not formally enter the
archive until the time of publication, which is considered the point
of ingestion from an archival perspective.
2.2. Ingestion of Documents
Once an RFC is published, the canonical format is considered
immutable. At this point, the RFC Production Center, one of the
internal roles within the RFC Editor, assigns the document metadata
that an archivist needs to identify the unique object.
Flanagan Informational [Page 6]
^L
RFC 8153 Digital Preservation April 2017
In the case of RFCs, the metadata assigned to a document at the time
of publication includes:
o the RFC number
o ISSN
o publication date
o Digital Object Identifier (DOI)
Additional metadata, such as author name, is assigned earlier in the
document creation process, but it is subject to change up to the
point of publication. More information on metadata is available in
Section 2.3 ("Metadata and Document Registration").
In terms of deciding what to accept in the archive -- a major
question for most archives and yet a simple one for the RFC Series --
the RFC Editor accepts documents that are approved for publication by
the approving body of one of the document streams: the IETF, IAB,
IRTF, or Independent Submission streams [RFC7841]. Each document
stream has defined processes on when and how I-Ds are approved and
submitted to the RFC Editor for publication. The RFC Editor does not
select documents for publication and archiving; the RFC Editor edits
and publishes documents approved for publication by the document
streams.
The RFC Editor holds no copyright on I-Ds or RFCs. As per the IETF
Trust Legal Provisions [TLP], the copyright for RFCs is held by the
authors and the IETF Trust. At any point in time, the current
entities providing RFC Editor services must be able to release the
archive of RFCs to the IETF Trust.
Note: The RFC Editor is currently only responsible for RFCs; any
associated datasets or other research data is not considered within
the RFC Editor's mandate at this time; therefore, no consideration to
the archival requirements of such datasets is covered in this
document.
2.3. Metadata and Document Registration
Metadata is data about data. In the field of digital archiving, this
is the data that clearly identifies every aspect of a document, from
its identifier (i.e., the RFC number and the I-D draft string) to the
size and file format of the document and more. Metadata is stored in
a central registry that records information on exactly what is being
Flanagan Informational [Page 7]
^L
RFC 8153 Digital Preservation April 2017
preserved and where it is located, information on authenticity and
provenance, and details on the hardware and/or software needed to
view or create the documents.
The RFC Editor maintains this registry in the form of a database that
includes all metadata available for documents being edited and for
published RFCs. This database feeds the search engine on the RFC
Editor website and the info pages available for every RFC (e.g.,
http://www.rfc-editor.org/info/rfc####).
Following is the current list of metadata presented in the RFC info
pages:
o RFC number
o Canonical URI
o Title
o Status
o Updates (if applicable)
o Updated by (if applicable)
o Obsoletes (if applicable)
o Obsoleted by (if applicable)
o Authors
o Stream
o Abstract
o Content-Type
o Character Set
o ISSN
o Publication date
o Digital Object Identifier (DOI)
The following metadata will be added in the future:
o Publication format URIs
Flanagan Informational [Page 8]
^L
RFC 8153 Digital Preservation April 2017
Info pages also include links to errata, IPR searches, and both
plain-text and XML citation files.
In terms of best practice, all documents used as normative references
within an RFC would also be stored in the archive. While this is
done automatically when the normative reference is another RFC (the
usual case), retaining a copy of third-party documents is considered
out of scope for the RFC Editor. As the digital archive industry
stabilizes, services such as Perma.cc [PERMACC] may be a reasonable
compromise. These services provide a permanent URI and image capture
of online documents, with a goal of buffering against URI and online
availability changes.
2.4. Normalization and Standardization of Canonical File Structure and
Format
The normalization process is perhaps the most technically critical
part of digital archiving. The purpose is content preservation --
making sure the data accepted for archiving are in the most stable
and easily accessed formats possible for the long-term future and
require the least amount of re-engineering and emulation of
environments in order to view the document in the future.
Normalization is about enabling long-term access to the information
within a document.
Over the history of the RFC Series, documents have been submitted for
publication in a variety of formats, including paper for the earliest
RFCs. Today, the majority of RFCs are available in both a canonical
plain-text format and PDF format. For exceptions, see the RFC Online
Project [RFC-ONLINE].
Currently, all RFCs are printed out to paper and stored at time of
publication. This has been a reasonable backup plan for several
decades. With few of the features one might expect from a digital
document format (such as links, metadata within the document, and
line drawings), plain-text files do not lose much, if any,
information when printed out to paper. However, as the published
formats change (see RFC 6949), printing to paper provides less value
as much of the metadata that is an intrinsic yet invisible part of
the rendered document will be lost in such printing. With that in
mind, the focus needs to change to preserving the new file formats
electronically.
While each RFC today is printed to paper and all electronic versions
stored on multiple hard drives, no particular effort is made to
ensure copies of the software used to render or read the canonical
Flanagan Informational [Page 9]
^L
RFC 8153 Digital Preservation April 2017
plain-text RFC are also archived. The RFC Editor has several choices
on how to adapt to the need to archive a more complex set of data and
follow best practice as defined by the digital archive community:
o a simplified bitstream preservation model that focuses on standard
"best effort" data-retention practices, which rely on backups,
upgrades, and regular equipment change to preserve the data. This
model assumes that emulators may be built when needed if the
formats used go out of common use (a significant part of the model
currently followed by the RFC Editor).
o a content preservation model that focuses on one publication
format as the version most likely to be viewable and provide all
necessary metadata in the future. This is a viable option
considering that PDF/A-3 [PDF], one of the intended publication
formats, was designed for this type of archiving.
o a complex bitstream and content preservation model that focuses on
archiving the canonical XML and the entire computing environment
required to create, view and render all outputs from that file.
This is the "best practice" from an archivist's perspective.
Those options are listed in order of least to greatest complexity and
expense. More detail on each option is described below.
2.4.1. 'Best Effort' Data Retention
When dealing with very simple data structures such as plain-text,
ASCII-only files, the experience of the RFC Series suggests that for
the last few decades, hardware and operating system changes have had
minimal impact on the document files being stored. While a complete
failure of an operating system migration corrupted the dataset in the
past, that situation represents a somewhat different problem than the
tools themselves changing such that plain-text files are not easily
read with existing technology. Given that the basic plain-text
format and ASCII encoding remain in common use, the standard
protections against file corruption and data loss, such as disk
mirroring, off-site backups, and periodic restoration testing, will
continue to provide access to the entirety of the RFC Series for the
foreseeable future. As has been pointed out, both in this document
and in broader community discussion, that is not sufficient for
complex formats such as XML, HTML, PDF, or other proprietary formats
offered by today's large IT companies. The risk of technological
change resulting in the file formats mentioned being deprecated or
changed without backwards compatibility is fairly high when looking
decades or centuries into the future.
Flanagan Informational [Page 10]
^L
RFC 8153 Digital Preservation April 2017
It is recommended that this model of archiving the RFC Series cease
to be the primary model after the plain-text, ASCII-only format is no
longer the canonical format. Best effort data retention is a
necessary but not sufficient level of effort for preserving a digital
archive. For more guidance on how to define best effort data
retention, the section on "Media and Formats, Summary
Recommendations" in the 2009 version of the Digital Preservation
Handbook [DPC2009] provides useful and concrete information.
2.4.2. Single Format for Archival Purposes
If preserving the information described by a document, rather than
the document itself, is the primary purpose of an archive, then
focusing efforts on a single file format is a reasonable option.
Some well-supported archival tooling projects follow this route, such
as Archivematica [ARCHIVEMATICA]. By selecting a feature-rich yet
fundamentally stable file format for documents, an organization may
avoid expensive whole-environment reconstruction in order to view the
document. The PDF/A formats were designed to be an archival format
for electronic documents, and PDF/A-3 is one of the options intended
for publication as the RFC Series moves from a plain-text canonical
format to an XML canonical format with multiple publication formats.
A PDF/A-3 file can be produced that embeds the XML from which the
PDF/A-3 file was created; this allows for both original and rendered
document validation if one has the correct tools available to see the
source of the PDF/A-3 file [RFC7995]. The XML is not otherwise
visible when viewing the PDF/A-3 file through typical PDF reader
software.
When looking at the need to archive RFCs in a resource-limited
environment, a content-preservation-only model has merit, but it is
not without risks. First, PDF/A-3 will not be the canonical format;
it is intended to be one of the rendered outputs. It may contain
rendering bugs that were not intended to be in the document. Second,
while the various PDF/A formats were designed to be archival, they
have not been put to the test of time to determine if they will
actually live up to the design goals.
This is a valid option to consider, but the risks, priorities, and
costs must be discussed by the community before a decision is made to
follow this path. The best option may be to combine this with one of
the other methods of archiving described in this document to help
minimize both risk and cost.
Flanagan Informational [Page 11]
^L
RFC 8153 Digital Preservation April 2017
2.4.3. Holistic Archiving of the Computing Environment
Preserving everything published by the RFC Editor in order to have a
permanent record of information, standards, and best practice is
arguably the whole point of being an archival series. One can argue
that it is not only about the information described in an RFC, it is
also about supporting Intellectual Property Rights (IPR) and
retaining the history of the Internet. In following this model,
however, one must consider the complexity of the archival environment
as matching, and possibly exceeding, the complexity of the file
formats being preserved.
Consider a future where XML has been obsoleted for half a century,
HTML5 was a format used three to four human generations ago, and PDF/
A-3 is no longer supported by any existing company's reading
software. For RFCs that were produced with XML as their canonical
format, an archive must not only hold the data, it must also hold the
entire computing environment that allows the data to be rendered and
viewed. Operating systems and hardware on which those OSs can run,
each major version of each piece of software used or relied upon
during the publication of an RFC, browsers and readers for HTML, PDF,
and any other publication format must be preserved in some fashion.
This is considered best practice when archiving digital documents.
This is also the most expensive method, and the cost only increases
over time as more and more instances of the computing environment
must be preserved over the lifetime of the Series.
This is a valid option to consider, but the sheer scope of resources
required suggests that this must be discussed by the community before
a decision is made. Pursuing this may require an entirely different
paradigm for the RFC Editor from what has been considered in the
past; expanding the scope and resources for the RFC Editor, finding a
third party to take over the responsibilities of archiving, or some
other option may be necessary.
2.5. Transformation/Migration to Current Publication Formats
Because normalization is a complex subject, it is important to
consider how to mitigate the risk of failure of the normalization
process.
The RFC Editor is responsible for making RFCs available to the
Internet community. The canonical version of an RFC does not change
once published; any formats officially rendered from the canonical
version, however, may change. One way to mitigate the need to
preserve the entire computing environment for an RFC, including web
browsers and PDF readers, would be to take advantage of the non-
canonical nature of the publication formats and re-render them from
Flanagan Informational [Page 12]
^L
RFC 8153 Digital Preservation April 2017
the canonical source at the point that browser or reader technology
has changed sufficiently to make RFCs largely unavailable to 'modern'
tools.
For example, the RFC Editor may develop the practice of annually
reviewing the tools needed to view the publication formats created by
the RFC Editor to determine whether or not the current common and
popular reader technologies (i.e., web browsers, PDF viewers,
e-readers) can view the existing publication formats. During that
review, the RFC Editor would work with the community to determine if
the current publication formats meet the needs of the community and
whether any should be retired or added to improve the availability of
information to the community at that time.
2.6. System Parameters
While the industry best practice on the backup and restoration of
data is not sufficient as a long-term archival solution, it is still
a necessary part of keeping the Series available now and into the
future. In the past, nearly 800 RFCs had to be manually transcribed
from paper back to electronic format due to a failed server migration
and insufficient backups.
The underlying servers hosting the tools, database, RFCs, and errata
are the physical link in the archival environment. While such
systems cannot and should not remain static and unchanging, there
must be clear documentation regarding the environment, in particular,
the storage, backups, and recovery processes for all RFC-related
material. The documentation must include information on the refresh
cycle for the physical storage and backup media and describe a
regular cycle of data restoration and/or migration testing.
2.7. Financial Impact
Having a policy regarding digital archiving provides input into the
budget process. The main costs associated with digital archives come
from the complexity and quantity of the material being archived, as
described in Section 2.4 on normalization.
Estimating potential costs and providing figures are outside of the
scope of this document, but it should be noted that costs are a major
factor when determining what level of archival practice an
organization will follow.
For more information on potential business plans and cost modeling
for digital preservation, see the "Business cases, benefits, costs,
and impact" section of the Digital Preservation Handbook [DPC].
Flanagan Informational [Page 13]
^L
RFC 8153 Digital Preservation April 2017
3. Recommendations
Given the need to balance cost and complexity with retention of
information for historic, legal, and informational purposes,
preservation efforts should focus on the XML canonical format files,
the PDF/A-3 format files, the xml2rfc tool and its documentation, and
at least two PDF reader applications capable of extracting the
embedded XML. Care should be taken that the software being included
in this archive has a provision for free copies for backup or
archival purposes. All other formats and the overall computing
environment should be stored as described in "best effort" data
retention (Section 2.4.1), which should in turn be described in the
appropriate vendor contract for the RFC Publisher.
Particular preservation efforts should be made by:
o choosing a format designed for archiving RFCs (PDF/A-3 as
indicated by [RFC7995])
o embedding the canonical XML format within the PDF/A-3 file for
RFCs
o retaining a copy of the plain-text or XML file submitted for
approved I-Ds
o retaining all major versions of the tools and their associated
documentation used to acquire and ingest an RFC
o retaining the final XML file as well as the PDF/A-3 file with the
embedded XML
o retaining at least two software reader applications to ensure the
PDF/A-3 and XML files can be viewed in the future
o partnering with other digital archives around the world to mirror
copies of the target data
In order to control costs and focus the archiving effort on the
entire content of an RFC, including the metadata and other features
embedded within each RFC published in more than just plain text,
printing each RFC to paper upon publication is no longer reasonable.
Proper data storage and mirrored copies of RFCs provide more
efficient and effective copies in case of catastrophic failure of the
existing archive of material.
Particular focus should be given to finding partners that specialize
in digital preservation to ingest RFCs. Ideally, they will ingest
all material associated with an RFC, including all metadata, digital
Flanagan Informational [Page 14]
^L
RFC 8153 Digital Preservation April 2017
signatures, and the approved I-D that was submitted to the RFC
Editor. The possibilities and options should be discussed with each
archival partner; at minimum, they must ingest copies of RFCs as they
are published, with the basic metadata associated with each document.
Preservation efforts should be reviewed and validated through a
biennial audit that will verify that the targeted content and all its
associated metadata can be read with existing tools. The full
process from acquisition to ingestion should be reviewed to ensure
that best current practice is being followed from the perspective of
the digital archive community. Since the overall model for the
digital archive maintained by the RFC Editor follows the OAIS
reference model, the associated audit guidelines should also be
followed. While the RFC Editor does not seek to be recognized as
'OAIS-compliant' at this time, use of the ISO standard "Space data
and information transfer systems -- Audit and certification of
trustworthy digital repositories" [ISO16363] would provide a solid,
accepted method for structuring an audit for this digital archive.
4. Summary
The RFC Series is worth archiving. It contains the history of the
early Internet, as well as some of the key standards for Internet
technology and best practice today. Who knows what the community
will create in the future? There are many ways to preserve the
Series, from relying on preservation of the bits, to focusing on a
single file format, to preserving the entire computing environment.
Each possibility, or permutations of them, involves risks and
requires varying levels of resources. The goal of this document is
to describe the possibilities and associated risks so that the
community can come to an informed decision regarding what it is
willing to see supported far into the future.
5. IANA Considerations
This document does not require any IANA actions.
6. Security Considerations
This document assumes that the origination of RFCs via the RFC Editor
is secure and trusted. With that assumption, the activities
discussed in this document do not affect the security of the
Internet.
Flanagan Informational [Page 15]
^L
RFC 8153 Digital Preservation April 2017
7. Informative References
[ARCHIVEMATICA]
"Archivematica", <https://www.archivematica.org/wiki/
Main_Page>.
[DPC] Digital Preservation Coalition, "Digital Preservation
Handbook", 2015, <http://dpconline.org/handbook>.
[DPC2009] Digital Preservation Coalition, "Digital Preservation
Handbook", 2009, <http://www.dpconline.org/docman/digital-
preservation-handbook/304-digital-preservation-handbook-
media-and-formats>.
[ISO14721] International Organization for Standardization, "Space
data and information transfer systems -- Open archival
information system (OAIS) -- Reference model",
ISO 14721:2012, 2012.
[ISO16363] International Organization for Standardization, "Space
data and information transfer systems -- Audit and
certification of trustworthy digital repositories",
ISO 16363:2012, 2012.
[LIFE] Hole, B., "LIFE^3: Predictive Costing of Digital
Preservation", July 2010,
<http://www.life.ac.uk/3/docs/Hole_pasig_v1.pdf>.
[PDF] International Organization for Standardization, "Document
management -- Electronic document file format for long-
term preservation -- Part 3: Use of ISO 32000-1 with
support for embedded files (PDF/A-3)", ISO 19005-3:2012,
2012.
[PERMACC] "Perma.cc", <http://perma.cc/>.
[RFC-HISTORY]
RFC Editor, "Internet Archaeology: Documents from Early
History", <http://www.rfc-editor.org/history.html>.
[RFC-ONLINE]
RFC Editor, "History of RFC Online Project",
<http://www.rfc-editor.org/rfc-online-2000.html>.
[RFC-PUB] RFC Editor, "Publication Process",
<http://www.rfc-editor.org/pubprocess.html>.
Flanagan Informational [Page 16]
^L
RFC 8153 Digital Preservation April 2017
[RFC-SERIES]
RFC Editor, "About Us",
<http://www.rfc-editor.org/RFCoverview.html>.
[RFC6635] Kolkman, O., Ed., Halpern, J., Ed., and IAB, "RFC Editor
Model (Version 2)", RFC 6635, DOI 10.17487/RFC6635, June
2012, <http://www.rfc-editor.org/info/rfc6635>.
[RFC6949] Flanagan, H. and N. Brownlee, "RFC Series Format
Requirements and Future Development", RFC 6949,
DOI 10.17487/RFC6949, May 2013,
<http://www.rfc-editor.org/info/rfc6949>.
[RFC7841] Halpern, J., Ed., Daigle, L., Ed., and O. Kolkman, Ed.,
"RFC Streams, Headers, and Boilerplates", RFC 7841,
DOI 10.17487/RFC7841, May 2016,
<http://www.rfc-editor.org/info/rfc7841>.
[RFC7995] Hansen, T., Ed., Masinter, L., and M. Hardy, "PDF Format
for RFCs", RFC 7995, DOI 10.17487/RFC7995, December 2016,
<http://www.rfc-editor.org/info/rfc7995>.
[TLP] IETF Trust, "Trust Legal Provisions (TLP)",
<https://trustee.ietf.org/trust-legal-provisions.html>.
[USLOC] LeFurgy, B., "Life Cycle Models for Digital Stewardship",
February 2012,
<http://blogs.loc.gov/digitalpreservation/2012/02/
life-cycle-models-for-digital-stewardship/>.
Flanagan Informational [Page 17]
^L
RFC 8153 Digital Preservation April 2017
IAB Members at the Time of Approval
The IAB members at the time this document was approved were (in
alphabetical order):
Jari Arkko
Ralph Droms
Ted Hardie
Joe Hildebrand
Lee Howard
Erik Nordmark
Robert Sparks
Andrew Sullivan
Dave Thaler
Martin Thomson
Brian Trammell
Suzanne Woolf
Author's Address
Heather Flanagan
RFC Editor
Email: rse@rfc-editor.org
URI: http://orcid.org/0000-0002-2647-2220
Flanagan Informational [Page 18]
^L
|