summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8153.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8153.txt')
-rw-r--r--doc/rfc/rfc8153.txt1011
1 files changed, 1011 insertions, 0 deletions
diff --git a/doc/rfc/rfc8153.txt b/doc/rfc/rfc8153.txt
new file mode 100644
index 0000000..d21e725
--- /dev/null
+++ b/doc/rfc/rfc8153.txt
@@ -0,0 +1,1011 @@
+
+
+
+
+
+
+Internet Architecture Board (IAB) H. Flanagan
+Request for Comments: 8153 RFC Editor
+Category: Informational April 2017
+ISSN: 2070-1721
+
+
+ Digital Preservation Considerations for the RFC Series
+
+Abstract
+
+ The RFC Editor is both the publisher and the archivist for the RFC
+ Series. This document applies specifically to the archivist role of
+ the RFC Editor. It provides guidance on when and how to preserve
+ RFCs and describes the tools required to view or re-create RFCs as
+ necessary. This document also highlights gaps in the current process
+ and suggests compromises to balance cost with best practice.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Architecture Board (IAB)
+ and represents information that the IAB has deemed valuable to
+ provide for permanent record. It represents the consensus of the
+ Internet Architecture Board (IAB). Documents approved for
+ publication by the IAB are not a candidate for any level of Internet
+ Standard; see Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc8153.
+
+Copyright Notice
+
+ Copyright (c) 2017 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document.
+
+
+
+
+
+
+
+Flanagan Informational [Page 1]
+
+RFC 8153 Digital Preservation April 2017
+
+
+Table of Contents
+
+ 1. Introduction ....................................................2
+ 1.1. Terminology ................................................4
+ 1.2. Life Cycle of Digital Preservation .........................4
+ 2. Updating Policy and Procedure ...................................5
+ 2.1. Acquisition of Documents ...................................6
+ 2.2. Ingestion of Documents .....................................6
+ 2.3. Metadata and Document Registration .........................7
+ 2.4. Normalization and Standardization of Canonical File
+ Structure and Format .......................................9
+ 2.4.1. 'Best Effort' Data Retention .......................10
+ 2.4.2. Single Format for Archival Purposes ................11
+ 2.4.3. Holistic Archiving of the Computing Environment ....12
+ 2.5. Transformation/Migration to Current Publication Formats ...12
+ 2.6. System Parameters .........................................13
+ 2.7. Financial Impact ..........................................13
+ 3. Recommendations ................................................14
+ 4. Summary ........................................................15
+ 5. IANA Considerations ............................................15
+ 6. Security Considerations ........................................15
+ 7. Informative References .........................................16
+ IAB Members at the Time of Approval ...............................18
+ Author's Address ..................................................18
+
+1. Introduction
+
+ The RFC Editor is both the publisher and the archivist for the RFC
+ Series, a series of technical specifications and policy documents
+ that includes foundational Internet standards [RFC6635] [RFC-SERIES].
+ The goal of the RFC Editor is to is to produce clear, consistent, and
+ readable documents for the Internet community. Over time, the RFC
+ Editor will use as many modern features, such as hyperlinks and
+ content markup, within the document as necessary to convey the
+ information the authors intended for their audience. As the
+ archivist, however, the main goal is to preserve both the information
+ described and the documents themselves for the indefinite future. To
+ meet both of these goals, the RFC Editor must find the necessary
+ balance between the publication needs of today and the archival needs
+ of tomorrow, while acknowledging a finite set of resources to
+ complete both aspects of the RFC Editor function.
+
+ While many files are created during the editing process, this
+ document focuses on the archival needs of the Internet-Drafts (I-Ds)
+ that were approved for publication and the RFCs that resulted from
+ these I-Ds; I-Ds before they are approved for publication by the
+ appropriate stream-approving body are out of scope.
+
+
+
+
+Flanagan Informational [Page 2]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ To summarize, the key areas of tension between the roles of publisher
+ and archivist are:
+
+ o the desire of the publisher to meet the needs expressed by authors
+ who want to use the latest technology (e.g., vector graphics, live
+ links, and a rich set of metadata) within their documents; and
+
+ o the desire of the archivist to support only the simplest format
+ for documents possible -- currently held by the Series to be
+ plain-text, ASCII-only documents -- so that the tools needed to
+ view the documents are equally simple and resistant to changes in
+ technology, resulting in a set of documents that will be easier to
+ archive for at least the next several decades, if not centuries.
+
+ Through most of the history of the RFC Series, the file format for
+ RFCs has been plain text with an ASCII-only character set. This
+ choice offered the simplest format likely to remain available to the
+ largest number of consumers and the format most likely to be
+ resistant to changes in technology over time. Increasingly, however,
+ consumers and authors are requesting additional features that would
+ allow for easy reading on a wider array of devices while retaining
+ all the metadata authors intended in their documents. In 2013, RFC
+ 6949 ("RFC Series Format Requirements and Future Development")
+ captured the high-level requirements for the Series; the fundamental
+ issue was that plain-text, ASCII-only documents no longer meet the
+ needs of the communities interested in using and producing RFCs
+ [RFC6949].
+
+ The assertion that plain-text, ASCII-only documents no longer meet
+ the needs of the community suggests that the simple archival process
+ maintained by the RFC Editor is also no longer sufficient. More
+ complex tools and file formats require a more complex process to
+ ensure that RFCs can be read and rendered far into the future. This
+ document describes the considerations that must inform any changes in
+ policy and procedure, and it describes a model for the RFC Series to
+ follow when additional formats beyond plain-text, ASCII-only RFCs are
+ published. The functional model that provides the framework for the
+ archival process described in this document was derived from the ISO
+ Open Archival Information System (OAIS) reference model, defined in
+ "Space data and information transfer systems -- Open archival
+ information system (OAIS) -- Reference model" [ISO14721].
+
+
+
+
+
+
+
+
+
+
+Flanagan Informational [Page 3]
+
+RFC 8153 Digital Preservation April 2017
+
+
+1.1. Terminology
+
+ Acquisition: The point at which a document is accepted by the RFC
+ Editor for future inclusion into the archive.
+
+ Ingestion: The point at which a digital object is assigned all
+ necessary metadata to describe the object and its contents and is
+ added to the archive.
+
+ Bitstream preservation: The process of storing and maintaining
+ digital objects over time, ensuring that there is no loss or
+ corruption of the bits making up those objects.
+
+ Content preservation: The retention of the ability to read, listen,
+ or watch a digital file in perpetuity. Content preservation is not
+ about the bits being stored; it is about being able to access and
+ present those bits to the user.
+
+1.2. Life Cycle of Digital Preservation
+
+ The basic process for preserving digital information has been
+ described by a variety of organizations. From the Life cycle
+ Information For E-Literature (LIFE) project [LIFE] in the United
+ Kingdom to the ongoing digital preservation work in the U.S. Library
+ of Congress [USLOC], the basic digital preservation process is
+ straightforward. Documents are acquired and processed, metadata is
+ recorded, physical media is refreshed, and content is regularly
+ checked to see if it is still accessible by interested parties.
+ Complexities arise when one considers the need to preserve both the
+ bits of the digital objects themselves and the tools with which to
+ express those bits in an environment that experiences rapid changes
+ in technology.
+
+ For most of the existence of the RFC Series, the digital preservation
+ process has been fairly simple, focusing on bitstream preservation
+ and relying on paper copies of digital files.
+
+ The current archival process for the RFC Series is as follows:
+
+ 1. Acquisition: The RFC Editor database is updated to indicate an
+ I-D has been approved for publication. At this point, the
+ document is taken through the editorial process on the way to
+ publication [RFC-PUB].
+
+ 2. Ingestion: The RFC is added to the archive at the time of
+ publication.
+
+
+
+
+
+Flanagan Informational [Page 4]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ 3. Metadata creation: The details regarding an RFC, including RFC
+ number, author, title, abstract, etc., are created at time of
+ publication. Additional metadata in the form of status and
+ errata can be added or changed at any time, following the process
+ of the originating document stream.
+
+ 4. Bitstream preservation: This part of the process is handled as
+ part of the IT system administration; all servers, disks, and
+ backup technology are refreshed on a regular cycle.
+
+ 5. Content preservation: All RFCs since January 2010 have been
+ printed out on standard office paper at time of publication, and
+ the electronic files have been preserved on disk and in backups
+ with no particular focus on preserving the entire computing
+ environment used to create the electronic documents. Most RFCs
+ prior to January 2010 are also available on paper, but there are
+ gaps in the record and issues of ownership around the paper
+ copies before that date.
+
+ When the format for RFCs transitions from plain-text, ASCII-only
+ files to an XML format with multiple outputs, the overall archival
+ process will become more complex. Additional metadata and some (or
+ possibly all) of the computing environment may need to be added to
+ the archive.
+
+2. Updating Policy and Procedure
+
+ RFCs are created and published as digital objects. Unlike paper-
+ based publications, a digital collection requires a focus on
+ retaining the details of the technology as well as retaining the
+ object itself. Specifically, a digital archive needs to:
+
+ o consider the inherent instability of digital media,
+
+ o plan for a relatively short path to technological obsolescence,
+
+ o schedule regular media updates,
+
+ o apply predefined criteria for technology evaluation, and
+
+ o ensure the continued authenticity and integrity of documents
+ through any changes in technology.
+
+ As the custodian and canonical source of RFCs and associated errata,
+ the RFC Editor must consider how to ensure the availability and
+ integrity of this document series far into the future and determine
+ whether the focus must be on bitstream preservation, content
+ preservation, or both.
+
+
+
+Flanagan Informational [Page 5]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ The RFC Editor has several advantages in acting as the digital
+ archivist for the Series. Since the RFC Editor is the publisher as
+ well as the archivist, the RFC Editor controls the format of the
+ material and the process for adding that material to an archive and
+ can add any additional metadata considered necessary. External
+ material, while a major consideration for more general archives, is
+ no longer accepted by the RFC Editor. (See "Internet Archaeology:
+ Documents from Early History" [RFC-HISTORY] for the list of non-RFC
+ digital objects held by the RFC Editor.)
+
+ This document describes several different preservation models that
+ may fit the needs of the Series and raises several points for
+ community consideration. Specifically, this document covers
+ information on:
+
+ o Acquisition of documents
+
+ o Ingestion of documents
+
+ o Metadata and document registration
+
+ o Normalization and standardization of canonical file structure and
+ format
+
+ o Transformation/migration to current publication formats
+
+ o Content and computing environment preservation
+
+ o System parameters
+
+ o Financial impact
+
+2.1. Acquisition of Documents
+
+ The acquisition process for documents intended for the archive starts
+ with the submission of an approved I-D for publication. During the
+ editorial process, information such as the document metadata is
+ finalized prior to publication. However, the initial I-D as
+ submitted and the RFC produced from it do not formally enter the
+ archive until the time of publication, which is considered the point
+ of ingestion from an archival perspective.
+
+2.2. Ingestion of Documents
+
+ Once an RFC is published, the canonical format is considered
+ immutable. At this point, the RFC Production Center, one of the
+ internal roles within the RFC Editor, assigns the document metadata
+ that an archivist needs to identify the unique object.
+
+
+
+Flanagan Informational [Page 6]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ In the case of RFCs, the metadata assigned to a document at the time
+ of publication includes:
+
+ o the RFC number
+
+ o ISSN
+
+ o publication date
+
+ o Digital Object Identifier (DOI)
+
+ Additional metadata, such as author name, is assigned earlier in the
+ document creation process, but it is subject to change up to the
+ point of publication. More information on metadata is available in
+ Section 2.3 ("Metadata and Document Registration").
+
+ In terms of deciding what to accept in the archive -- a major
+ question for most archives and yet a simple one for the RFC Series --
+ the RFC Editor accepts documents that are approved for publication by
+ the approving body of one of the document streams: the IETF, IAB,
+ IRTF, or Independent Submission streams [RFC7841]. Each document
+ stream has defined processes on when and how I-Ds are approved and
+ submitted to the RFC Editor for publication. The RFC Editor does not
+ select documents for publication and archiving; the RFC Editor edits
+ and publishes documents approved for publication by the document
+ streams.
+
+ The RFC Editor holds no copyright on I-Ds or RFCs. As per the IETF
+ Trust Legal Provisions [TLP], the copyright for RFCs is held by the
+ authors and the IETF Trust. At any point in time, the current
+ entities providing RFC Editor services must be able to release the
+ archive of RFCs to the IETF Trust.
+
+ Note: The RFC Editor is currently only responsible for RFCs; any
+ associated datasets or other research data is not considered within
+ the RFC Editor's mandate at this time; therefore, no consideration to
+ the archival requirements of such datasets is covered in this
+ document.
+
+2.3. Metadata and Document Registration
+
+ Metadata is data about data. In the field of digital archiving, this
+ is the data that clearly identifies every aspect of a document, from
+ its identifier (i.e., the RFC number and the I-D draft string) to the
+ size and file format of the document and more. Metadata is stored in
+ a central registry that records information on exactly what is being
+
+
+
+
+
+Flanagan Informational [Page 7]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ preserved and where it is located, information on authenticity and
+ provenance, and details on the hardware and/or software needed to
+ view or create the documents.
+
+ The RFC Editor maintains this registry in the form of a database that
+ includes all metadata available for documents being edited and for
+ published RFCs. This database feeds the search engine on the RFC
+ Editor website and the info pages available for every RFC (e.g.,
+ http://www.rfc-editor.org/info/rfc####).
+
+ Following is the current list of metadata presented in the RFC info
+ pages:
+
+ o RFC number
+
+ o Canonical URI
+
+ o Title
+
+ o Status
+
+ o Updates (if applicable)
+
+ o Updated by (if applicable)
+
+ o Obsoletes (if applicable)
+
+ o Obsoleted by (if applicable)
+
+ o Authors
+
+ o Stream
+
+ o Abstract
+
+ o Content-Type
+
+ o Character Set
+
+ o ISSN
+
+ o Publication date
+
+ o Digital Object Identifier (DOI)
+
+ The following metadata will be added in the future:
+
+ o Publication format URIs
+
+
+
+Flanagan Informational [Page 8]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ Info pages also include links to errata, IPR searches, and both
+ plain-text and XML citation files.
+
+ In terms of best practice, all documents used as normative references
+ within an RFC would also be stored in the archive. While this is
+ done automatically when the normative reference is another RFC (the
+ usual case), retaining a copy of third-party documents is considered
+ out of scope for the RFC Editor. As the digital archive industry
+ stabilizes, services such as Perma.cc [PERMACC] may be a reasonable
+ compromise. These services provide a permanent URI and image capture
+ of online documents, with a goal of buffering against URI and online
+ availability changes.
+
+2.4. Normalization and Standardization of Canonical File Structure and
+ Format
+
+ The normalization process is perhaps the most technically critical
+ part of digital archiving. The purpose is content preservation --
+ making sure the data accepted for archiving are in the most stable
+ and easily accessed formats possible for the long-term future and
+ require the least amount of re-engineering and emulation of
+ environments in order to view the document in the future.
+ Normalization is about enabling long-term access to the information
+ within a document.
+
+ Over the history of the RFC Series, documents have been submitted for
+ publication in a variety of formats, including paper for the earliest
+ RFCs. Today, the majority of RFCs are available in both a canonical
+ plain-text format and PDF format. For exceptions, see the RFC Online
+ Project [RFC-ONLINE].
+
+ Currently, all RFCs are printed out to paper and stored at time of
+ publication. This has been a reasonable backup plan for several
+ decades. With few of the features one might expect from a digital
+ document format (such as links, metadata within the document, and
+ line drawings), plain-text files do not lose much, if any,
+ information when printed out to paper. However, as the published
+ formats change (see RFC 6949), printing to paper provides less value
+ as much of the metadata that is an intrinsic yet invisible part of
+ the rendered document will be lost in such printing. With that in
+ mind, the focus needs to change to preserving the new file formats
+ electronically.
+
+ While each RFC today is printed to paper and all electronic versions
+ stored on multiple hard drives, no particular effort is made to
+ ensure copies of the software used to render or read the canonical
+
+
+
+
+
+Flanagan Informational [Page 9]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ plain-text RFC are also archived. The RFC Editor has several choices
+ on how to adapt to the need to archive a more complex set of data and
+ follow best practice as defined by the digital archive community:
+
+ o a simplified bitstream preservation model that focuses on standard
+ "best effort" data-retention practices, which rely on backups,
+ upgrades, and regular equipment change to preserve the data. This
+ model assumes that emulators may be built when needed if the
+ formats used go out of common use (a significant part of the model
+ currently followed by the RFC Editor).
+
+ o a content preservation model that focuses on one publication
+ format as the version most likely to be viewable and provide all
+ necessary metadata in the future. This is a viable option
+ considering that PDF/A-3 [PDF], one of the intended publication
+ formats, was designed for this type of archiving.
+
+ o a complex bitstream and content preservation model that focuses on
+ archiving the canonical XML and the entire computing environment
+ required to create, view and render all outputs from that file.
+ This is the "best practice" from an archivist's perspective.
+
+ Those options are listed in order of least to greatest complexity and
+ expense. More detail on each option is described below.
+
+2.4.1. 'Best Effort' Data Retention
+
+ When dealing with very simple data structures such as plain-text,
+ ASCII-only files, the experience of the RFC Series suggests that for
+ the last few decades, hardware and operating system changes have had
+ minimal impact on the document files being stored. While a complete
+ failure of an operating system migration corrupted the dataset in the
+ past, that situation represents a somewhat different problem than the
+ tools themselves changing such that plain-text files are not easily
+ read with existing technology. Given that the basic plain-text
+ format and ASCII encoding remain in common use, the standard
+ protections against file corruption and data loss, such as disk
+ mirroring, off-site backups, and periodic restoration testing, will
+ continue to provide access to the entirety of the RFC Series for the
+ foreseeable future. As has been pointed out, both in this document
+ and in broader community discussion, that is not sufficient for
+ complex formats such as XML, HTML, PDF, or other proprietary formats
+ offered by today's large IT companies. The risk of technological
+ change resulting in the file formats mentioned being deprecated or
+ changed without backwards compatibility is fairly high when looking
+ decades or centuries into the future.
+
+
+
+
+
+Flanagan Informational [Page 10]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ It is recommended that this model of archiving the RFC Series cease
+ to be the primary model after the plain-text, ASCII-only format is no
+ longer the canonical format. Best effort data retention is a
+ necessary but not sufficient level of effort for preserving a digital
+ archive. For more guidance on how to define best effort data
+ retention, the section on "Media and Formats, Summary
+ Recommendations" in the 2009 version of the Digital Preservation
+ Handbook [DPC2009] provides useful and concrete information.
+
+2.4.2. Single Format for Archival Purposes
+
+ If preserving the information described by a document, rather than
+ the document itself, is the primary purpose of an archive, then
+ focusing efforts on a single file format is a reasonable option.
+ Some well-supported archival tooling projects follow this route, such
+ as Archivematica [ARCHIVEMATICA]. By selecting a feature-rich yet
+ fundamentally stable file format for documents, an organization may
+ avoid expensive whole-environment reconstruction in order to view the
+ document. The PDF/A formats were designed to be an archival format
+ for electronic documents, and PDF/A-3 is one of the options intended
+ for publication as the RFC Series moves from a plain-text canonical
+ format to an XML canonical format with multiple publication formats.
+ A PDF/A-3 file can be produced that embeds the XML from which the
+ PDF/A-3 file was created; this allows for both original and rendered
+ document validation if one has the correct tools available to see the
+ source of the PDF/A-3 file [RFC7995]. The XML is not otherwise
+ visible when viewing the PDF/A-3 file through typical PDF reader
+ software.
+
+ When looking at the need to archive RFCs in a resource-limited
+ environment, a content-preservation-only model has merit, but it is
+ not without risks. First, PDF/A-3 will not be the canonical format;
+ it is intended to be one of the rendered outputs. It may contain
+ rendering bugs that were not intended to be in the document. Second,
+ while the various PDF/A formats were designed to be archival, they
+ have not been put to the test of time to determine if they will
+ actually live up to the design goals.
+
+ This is a valid option to consider, but the risks, priorities, and
+ costs must be discussed by the community before a decision is made to
+ follow this path. The best option may be to combine this with one of
+ the other methods of archiving described in this document to help
+ minimize both risk and cost.
+
+
+
+
+
+
+
+
+Flanagan Informational [Page 11]
+
+RFC 8153 Digital Preservation April 2017
+
+
+2.4.3. Holistic Archiving of the Computing Environment
+
+ Preserving everything published by the RFC Editor in order to have a
+ permanent record of information, standards, and best practice is
+ arguably the whole point of being an archival series. One can argue
+ that it is not only about the information described in an RFC, it is
+ also about supporting Intellectual Property Rights (IPR) and
+ retaining the history of the Internet. In following this model,
+ however, one must consider the complexity of the archival environment
+ as matching, and possibly exceeding, the complexity of the file
+ formats being preserved.
+
+ Consider a future where XML has been obsoleted for half a century,
+ HTML5 was a format used three to four human generations ago, and PDF/
+ A-3 is no longer supported by any existing company's reading
+ software. For RFCs that were produced with XML as their canonical
+ format, an archive must not only hold the data, it must also hold the
+ entire computing environment that allows the data to be rendered and
+ viewed. Operating systems and hardware on which those OSs can run,
+ each major version of each piece of software used or relied upon
+ during the publication of an RFC, browsers and readers for HTML, PDF,
+ and any other publication format must be preserved in some fashion.
+ This is considered best practice when archiving digital documents.
+ This is also the most expensive method, and the cost only increases
+ over time as more and more instances of the computing environment
+ must be preserved over the lifetime of the Series.
+
+ This is a valid option to consider, but the sheer scope of resources
+ required suggests that this must be discussed by the community before
+ a decision is made. Pursuing this may require an entirely different
+ paradigm for the RFC Editor from what has been considered in the
+ past; expanding the scope and resources for the RFC Editor, finding a
+ third party to take over the responsibilities of archiving, or some
+ other option may be necessary.
+
+2.5. Transformation/Migration to Current Publication Formats
+
+ Because normalization is a complex subject, it is important to
+ consider how to mitigate the risk of failure of the normalization
+ process.
+
+ The RFC Editor is responsible for making RFCs available to the
+ Internet community. The canonical version of an RFC does not change
+ once published; any formats officially rendered from the canonical
+ version, however, may change. One way to mitigate the need to
+ preserve the entire computing environment for an RFC, including web
+ browsers and PDF readers, would be to take advantage of the non-
+ canonical nature of the publication formats and re-render them from
+
+
+
+Flanagan Informational [Page 12]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ the canonical source at the point that browser or reader technology
+ has changed sufficiently to make RFCs largely unavailable to 'modern'
+ tools.
+
+ For example, the RFC Editor may develop the practice of annually
+ reviewing the tools needed to view the publication formats created by
+ the RFC Editor to determine whether or not the current common and
+ popular reader technologies (i.e., web browsers, PDF viewers,
+ e-readers) can view the existing publication formats. During that
+ review, the RFC Editor would work with the community to determine if
+ the current publication formats meet the needs of the community and
+ whether any should be retired or added to improve the availability of
+ information to the community at that time.
+
+2.6. System Parameters
+
+ While the industry best practice on the backup and restoration of
+ data is not sufficient as a long-term archival solution, it is still
+ a necessary part of keeping the Series available now and into the
+ future. In the past, nearly 800 RFCs had to be manually transcribed
+ from paper back to electronic format due to a failed server migration
+ and insufficient backups.
+
+ The underlying servers hosting the tools, database, RFCs, and errata
+ are the physical link in the archival environment. While such
+ systems cannot and should not remain static and unchanging, there
+ must be clear documentation regarding the environment, in particular,
+ the storage, backups, and recovery processes for all RFC-related
+ material. The documentation must include information on the refresh
+ cycle for the physical storage and backup media and describe a
+ regular cycle of data restoration and/or migration testing.
+
+2.7. Financial Impact
+
+ Having a policy regarding digital archiving provides input into the
+ budget process. The main costs associated with digital archives come
+ from the complexity and quantity of the material being archived, as
+ described in Section 2.4 on normalization.
+
+ Estimating potential costs and providing figures are outside of the
+ scope of this document, but it should be noted that costs are a major
+ factor when determining what level of archival practice an
+ organization will follow.
+
+ For more information on potential business plans and cost modeling
+ for digital preservation, see the "Business cases, benefits, costs,
+ and impact" section of the Digital Preservation Handbook [DPC].
+
+
+
+
+Flanagan Informational [Page 13]
+
+RFC 8153 Digital Preservation April 2017
+
+
+3. Recommendations
+
+ Given the need to balance cost and complexity with retention of
+ information for historic, legal, and informational purposes,
+ preservation efforts should focus on the XML canonical format files,
+ the PDF/A-3 format files, the xml2rfc tool and its documentation, and
+ at least two PDF reader applications capable of extracting the
+ embedded XML. Care should be taken that the software being included
+ in this archive has a provision for free copies for backup or
+ archival purposes. All other formats and the overall computing
+ environment should be stored as described in "best effort" data
+ retention (Section 2.4.1), which should in turn be described in the
+ appropriate vendor contract for the RFC Publisher.
+
+ Particular preservation efforts should be made by:
+
+ o choosing a format designed for archiving RFCs (PDF/A-3 as
+ indicated by [RFC7995])
+
+ o embedding the canonical XML format within the PDF/A-3 file for
+ RFCs
+
+ o retaining a copy of the plain-text or XML file submitted for
+ approved I-Ds
+
+ o retaining all major versions of the tools and their associated
+ documentation used to acquire and ingest an RFC
+
+ o retaining the final XML file as well as the PDF/A-3 file with the
+ embedded XML
+
+ o retaining at least two software reader applications to ensure the
+ PDF/A-3 and XML files can be viewed in the future
+
+ o partnering with other digital archives around the world to mirror
+ copies of the target data
+
+ In order to control costs and focus the archiving effort on the
+ entire content of an RFC, including the metadata and other features
+ embedded within each RFC published in more than just plain text,
+ printing each RFC to paper upon publication is no longer reasonable.
+ Proper data storage and mirrored copies of RFCs provide more
+ efficient and effective copies in case of catastrophic failure of the
+ existing archive of material.
+
+ Particular focus should be given to finding partners that specialize
+ in digital preservation to ingest RFCs. Ideally, they will ingest
+ all material associated with an RFC, including all metadata, digital
+
+
+
+Flanagan Informational [Page 14]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ signatures, and the approved I-D that was submitted to the RFC
+ Editor. The possibilities and options should be discussed with each
+ archival partner; at minimum, they must ingest copies of RFCs as they
+ are published, with the basic metadata associated with each document.
+
+ Preservation efforts should be reviewed and validated through a
+ biennial audit that will verify that the targeted content and all its
+ associated metadata can be read with existing tools. The full
+ process from acquisition to ingestion should be reviewed to ensure
+ that best current practice is being followed from the perspective of
+ the digital archive community. Since the overall model for the
+ digital archive maintained by the RFC Editor follows the OAIS
+ reference model, the associated audit guidelines should also be
+ followed. While the RFC Editor does not seek to be recognized as
+ 'OAIS-compliant' at this time, use of the ISO standard "Space data
+ and information transfer systems -- Audit and certification of
+ trustworthy digital repositories" [ISO16363] would provide a solid,
+ accepted method for structuring an audit for this digital archive.
+
+4. Summary
+
+ The RFC Series is worth archiving. It contains the history of the
+ early Internet, as well as some of the key standards for Internet
+ technology and best practice today. Who knows what the community
+ will create in the future? There are many ways to preserve the
+ Series, from relying on preservation of the bits, to focusing on a
+ single file format, to preserving the entire computing environment.
+ Each possibility, or permutations of them, involves risks and
+ requires varying levels of resources. The goal of this document is
+ to describe the possibilities and associated risks so that the
+ community can come to an informed decision regarding what it is
+ willing to see supported far into the future.
+
+5. IANA Considerations
+
+ This document does not require any IANA actions.
+
+6. Security Considerations
+
+ This document assumes that the origination of RFCs via the RFC Editor
+ is secure and trusted. With that assumption, the activities
+ discussed in this document do not affect the security of the
+ Internet.
+
+
+
+
+
+
+
+
+Flanagan Informational [Page 15]
+
+RFC 8153 Digital Preservation April 2017
+
+
+7. Informative References
+
+ [ARCHIVEMATICA]
+ "Archivematica", <https://www.archivematica.org/wiki/
+ Main_Page>.
+
+ [DPC] Digital Preservation Coalition, "Digital Preservation
+ Handbook", 2015, <http://dpconline.org/handbook>.
+
+ [DPC2009] Digital Preservation Coalition, "Digital Preservation
+ Handbook", 2009, <http://www.dpconline.org/docman/digital-
+ preservation-handbook/304-digital-preservation-handbook-
+ media-and-formats>.
+
+ [ISO14721] International Organization for Standardization, "Space
+ data and information transfer systems -- Open archival
+ information system (OAIS) -- Reference model",
+ ISO 14721:2012, 2012.
+
+ [ISO16363] International Organization for Standardization, "Space
+ data and information transfer systems -- Audit and
+ certification of trustworthy digital repositories",
+ ISO 16363:2012, 2012.
+
+ [LIFE] Hole, B., "LIFE^3: Predictive Costing of Digital
+ Preservation", July 2010,
+ <http://www.life.ac.uk/3/docs/Hole_pasig_v1.pdf>.
+
+ [PDF] International Organization for Standardization, "Document
+ management -- Electronic document file format for long-
+ term preservation -- Part 3: Use of ISO 32000-1 with
+ support for embedded files (PDF/A-3)", ISO 19005-3:2012,
+ 2012.
+
+ [PERMACC] "Perma.cc", <http://perma.cc/>.
+
+ [RFC-HISTORY]
+ RFC Editor, "Internet Archaeology: Documents from Early
+ History", <http://www.rfc-editor.org/history.html>.
+
+ [RFC-ONLINE]
+ RFC Editor, "History of RFC Online Project",
+ <http://www.rfc-editor.org/rfc-online-2000.html>.
+
+ [RFC-PUB] RFC Editor, "Publication Process",
+ <http://www.rfc-editor.org/pubprocess.html>.
+
+
+
+
+
+Flanagan Informational [Page 16]
+
+RFC 8153 Digital Preservation April 2017
+
+
+ [RFC-SERIES]
+ RFC Editor, "About Us",
+ <http://www.rfc-editor.org/RFCoverview.html>.
+
+ [RFC6635] Kolkman, O., Ed., Halpern, J., Ed., and IAB, "RFC Editor
+ Model (Version 2)", RFC 6635, DOI 10.17487/RFC6635, June
+ 2012, <http://www.rfc-editor.org/info/rfc6635>.
+
+ [RFC6949] Flanagan, H. and N. Brownlee, "RFC Series Format
+ Requirements and Future Development", RFC 6949,
+ DOI 10.17487/RFC6949, May 2013,
+ <http://www.rfc-editor.org/info/rfc6949>.
+
+ [RFC7841] Halpern, J., Ed., Daigle, L., Ed., and O. Kolkman, Ed.,
+ "RFC Streams, Headers, and Boilerplates", RFC 7841,
+ DOI 10.17487/RFC7841, May 2016,
+ <http://www.rfc-editor.org/info/rfc7841>.
+
+ [RFC7995] Hansen, T., Ed., Masinter, L., and M. Hardy, "PDF Format
+ for RFCs", RFC 7995, DOI 10.17487/RFC7995, December 2016,
+ <http://www.rfc-editor.org/info/rfc7995>.
+
+ [TLP] IETF Trust, "Trust Legal Provisions (TLP)",
+ <https://trustee.ietf.org/trust-legal-provisions.html>.
+
+ [USLOC] LeFurgy, B., "Life Cycle Models for Digital Stewardship",
+ February 2012,
+ <http://blogs.loc.gov/digitalpreservation/2012/02/
+ life-cycle-models-for-digital-stewardship/>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Flanagan Informational [Page 17]
+
+RFC 8153 Digital Preservation April 2017
+
+
+IAB Members at the Time of Approval
+
+ The IAB members at the time this document was approved were (in
+ alphabetical order):
+
+ Jari Arkko
+ Ralph Droms
+ Ted Hardie
+ Joe Hildebrand
+ Lee Howard
+ Erik Nordmark
+ Robert Sparks
+ Andrew Sullivan
+ Dave Thaler
+ Martin Thomson
+ Brian Trammell
+ Suzanne Woolf
+
+Author's Address
+
+ Heather Flanagan
+ RFC Editor
+
+ Email: rse@rfc-editor.org
+ URI: http://orcid.org/0000-0002-2647-2220
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Flanagan Informational [Page 18]
+