summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc971.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc971.txt')
-rw-r--r--doc/rfc/rfc971.txt512
1 files changed, 512 insertions, 0 deletions
diff --git a/doc/rfc/rfc971.txt b/doc/rfc/rfc971.txt
new file mode 100644
index 0000000..d004e4b
--- /dev/null
+++ b/doc/rfc/rfc971.txt
@@ -0,0 +1,512 @@
+
+Network Working Group Annette L. DeSchon
+Request for Comments: 971 ISI
+ January 1986
+
+ A SURVEY OF DATA REPRESENTATION STANDARDS
+
+
+Status of This Memo
+
+ This RFC discusses data representation conventions in the
+ ARPA-Internet and suggests possible resolutions. No proposals in
+ this document are intended as standards for the ARPA-Internet at this
+ time. Rather, it is hoped that a general consensus will emerge as to
+ the appropriate approach to these issues, leading eventually to the
+ adoption of ARPA-Internet standards. Distribution of this memo is
+ unlimited.
+
+1. Introduction
+
+ This report is a comparison of several data representation standards
+ that are currently in use. The standards, or system type
+ definitions, that will be discussed are the CCITT X.409
+ recommendation, the NBS Computer Based Message System (CBMS)
+ standard, DARPA Multimedia Mail system, the Courier remote procedure
+ call protocol, and the SUN Remote Procedure Call package.
+
+ One purpose of this report is to determine how the CCITT standard,
+ which is gaining wide acceptance internationally, compares with some
+ of the other standards that have been developed in the areas of
+ electronic mail, distributed interprocess communication, and remote
+ procedure call. The CCITT X.409 recommendation, which is entitled
+ "Presentation Transfer Syntax and Notation" is an international
+ standard which is a part of the X.400 series Message Handling Systems
+ (MHS) specifications [1]. It has been adopted by both the NBS and
+ the ISO standards organizations. In addition, some commercial
+ organizations have announced intentions to support a CCITT interface
+ for electronic mail. The NBS Computer Based Message System (CBMS)
+ standard was developed previously and was published as a Federal
+ Information Processing Standard (FIPS Publication 98) in 1983 [3].
+ The DARPA Multimedia Mail system is an experimental electronic mail
+ system which is in use in the DARPA Internet [2,4,5]. It is used to
+ create and distribute messages that incorporate text, graphics,
+ stored speech, and images and has been implemented on on several very
+ different machines. Courier is the XEROX network systems remote
+ procedure call protocol [7]. The SUN Remote Procedure Call package
+ implements "network pipes" between UNIX machines [6].
+
+
+
+
+
+
+
+DeSchon [Page 1]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+2. Background
+
+ This section presents a brief overview of the basic terminology and
+ approach of each data representation standard.
+
+ 2.1. Interprocess Communication Standards
+
+ The standards that are oriented towards distributed interprocess
+ communication or remote procedure call, between like machines,
+ generally favor the use of types that map easily into the types
+ defined in the programming language in use on the system. For
+ example, the types defined for the XEROX Courier system resemble
+ the types found in the Mesa programming language. Similarly, the
+ SUN Remote Procedure Call system types resemble the types found in
+ the C programming language. An advantage of a system implemented
+ using like machines is that the external data representation can
+ be defined in such a way that the conversion to and from the local
+ format is minimal.
+
+ 2.1.1. Courier
+
+ The Courier standard data types are used to define the data
+ objects which are transported bi-directionally between system
+ elements that are running the Courier remote procedure call
+ protocol. The "standard representation" of a type is the
+ encoding of the data which is transmitted. The "standard
+ notation" refers to the conventions for the interpretation of
+ the data by higher-level applications. The standard
+ representation of a data object encodes the value of the
+ object, but the type of the object is determined by the
+ software that generates or interprets the representation.
+
+ 2.1.2. SUN Remote Procedure Call Package
+
+ The SUN Remote Procedure Call package includes routines which
+ allow a process on one UNIX machine to consume data produced by
+ a process on another UNIX machine. This is called a "network
+ pipe" and is an extension of the standard UNIX pipe. The
+ "eXternal Data Representation (XDR)" standard defines the
+ routines that are used to encode or "serialize" data for
+ transmission, or to decode or "deserialize" data for local
+ interpretation. The syntax suggests that perhaps it should be
+ called "remote interprocess communication" rather than "remote
+ procedure call".
+
+
+
+
+
+DeSchon [Page 2]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+ 2.2. Message Standards
+
+ The message oriented standards, including DARPA Multimedia Mail,
+ NBS CBMS, and the CCITT X.409 standards, seem to favor more
+ general, highly extensible type definitions. This may have
+ something to do with the expectation that a system will include
+ many different machines, programmed using many different
+ programming languages.
+
+ 2.2.1. DARPA Multimedia Mail
+
+ The DARPA Multimedia Mail system was developed for use in DoD
+ Internet community. The set of data elements used in the
+ Multimedia Message Handling Facility (MMHF) is referred to as
+ its "presentation transfer syntax". The encoding of these data
+ elements varies with the data type being represented. Each
+ begins with a one-octet "element-code". Some data elements are
+ of a pre-determined length. For example, the INTEGER data
+ element occupies five octets, one for the element-code, and
+ four which contain the "value component". Other data elements,
+ however, may vary in length. For example, the TEXT data
+ element, is made up of a one-octet element-code, a three-octet
+ count of the characters to follow, and a variable number of
+ octets, each containing one right-justified seven bit ASCII
+ character. The element-code and the length constitute the "tag
+ component".
+
+ A "base data element" is self contained, while a "structured
+ data element" is formed using other data elements. The LIST
+ data element is used to create structures composed of other
+ elements. The tag component of a LIST is made up of a
+ one-octet element-code, a three-octet count of the number of
+ octets to follow, and a two-octet count of the number of
+ elements that follow. The PROPLIST data element is used to
+ create a structure that consists of a set of unordered
+ name-value pairs. The tag component of a PROPLIST is made up
+ of a one-octet element-code, a three-octet count of the number
+ of octets to follow, and a one-octet count of the number of
+ name-value pairs in the PROPLIST. Both the LIST and the
+ PROPLIST elements are followed by an ENDLIST data element.
+
+ 2.2.2. NBS Computer Based Message System
+
+ The NBS Computer Based Message System (CBMS) standard was
+ developed to specify the format of a message at the interface
+ between different computer-based message systems. Each data
+ element consists of a series of "components". The five
+
+
+DeSchon [Page 3]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+ possible types of component are the "identifier octet", the
+ "length code", the "qualifier", the "property-list" component,
+ and the "data element contents". Every data element contains
+ an identifier octet and a length code. The identifier octet
+ contains a one-bit flag that signifies whether the data element
+ contains a property-list, and a code identifying the data
+ element and signifying whether it contains a qualifier. In the
+ NBS standard, the property-list is associated with a data
+ element and contains properties such as a "printing-name" or a
+ "comment". The meaning of the qualifier depends on the data
+ element code. The length code indicates the number of octets
+ following, and is between one and three octets in length.
+
+ Each data element is inherently a "primitive data element",
+ which contains a basic item of information, or a "constructor
+ data element", which contains one or more data elements. The
+ "field" data element (itself a constructor) uses a qualifier
+ component, which contains a "field identifier" to indicate
+ which specific field is being represented within a message.
+
+ 2.2.3. CCITT Recommendation X.409
+
+ The CCITT recommendation X.409 defines the notation and the
+ representational technique used to specify and to encode the
+ Message Handling System (MHS) protocols. The following is a
+ description of the CCITT approach to encoding type definitions.
+ A data element consists of three components, the "identifier"
+ (type), the "length", and the "contents". An element and its
+ components consist of a sequence of an integral number of
+ octets. An identifier consists of a "class" ("universal",
+ "application-wide", "context-specific", or "private-use"), a
+ "form" ("primitive" or "constructor"), and the "id code".
+ There is a convention defined for both single-octet and
+ multi-octet identifiers. The length specifies the length of
+ the contents in octets, and is itself variable in length.
+ There is also an "indefinite" value defined for the length;
+ this means that no length for the contents is specified, and
+ the contents is terminated with the the "end-of-contents" (EOC)
+ element. In X.409 it is possible to determine whether a data
+ element is a primitive or a constructor from the form part of
+ the identifier. In addition it is possible to "tag" the data
+ by attaching meaning to an id code within the context of a
+ specific application.
+
+
+
+
+
+
+DeSchon [Page 4]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+3. Implicit Versus Explicit Representation
+
+ In both the SUN Remote Procedure Call system and the XEROX Courier
+ system the type definitions of external data are implicit. This
+ means that for a given type of call, or message, the type definitions
+ which is to be used to interpret the data, are agreed upon by the
+ sender and the receiver in advance. In other words, parameters (or
+ message fields) are assumed to be in a predefined order. Each
+ parameter is assumed to be of a predefined type. This means the data
+ cannot be reformated into the local form until it reaches a process
+ that knows about the types of specific parameters. At this point,
+ the conversion can be accomplished using system routines that know
+ how to convert from the external format to the local format. If the
+ system is homogeneous there may be very little conversion required.
+ In addition, no extra overhead of sending the type definitions with
+ the data is incurred.
+
+ In the DARPA Multimedia Mail system, the NBS CBMS standard, and the
+ CCITT X.409 recommendation, type definitions are explicit. In this
+ case the type definitions are encoded into the message. There are
+ several advantages to this approach. One advantage is that it allows
+ a low level receiver process in the destination host to convert the
+ data from the standard form to a form appropriate for the local host,
+ as it received. This can increase efficiency if it allows the
+ destination host to avoid passing around data that does not conform
+ to the local word boundaries. Another advantage is that it provides
+ flexibility for future expansion. Since the overall length is a part
+ of the type definition, it allows a host to deal with or ignore data
+ of types that it does not necessarily understand. Since the
+ interpretation of the data is not dependent on its position, message
+ fields (or parameters) can be reordered, or optionally omitted. The
+ disadvantages of this approach are as follows. Assuming that no
+ field could be omitted, the external representation of the message
+ may be longer than it would have been if an implicit representation
+ had been used. In addition, extra time may be consumed by the
+ conversion between external format and local format, since the
+ external format almost certainly will not match the local format for
+ any of the participants.
+
+
+
+
+
+
+
+
+
+
+
+DeSchon [Page 5]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+4. Data Representation Standards Scorecard
+
+ The following table is a comparison of the data elements defined for
+ the various standards being discussed. It is provided in order to
+ give a general idea of the types defined for each standard, but it
+ should be noted that the grouping of these types does not indicate
+ one type corresponds exactly to any other. Where it is applicable,
+ the identifier code appears in parantheses following the name of the
+ data element. Under "NUMBER", "S" stands for signed, "U" stands for
+ unsigned, "V" stands for variable, and the number represents the
+ number of bits. For example, "Integer S16" means a "signed 16-bit
+ integer".
+
+
+ Type CCITT MMM NBS XEROX Sun
+ -----------------------------------------------------------------------
+ END | End-of- | ENDLIST | End-of- | -- | --
+ | Contents | (11) | Constructor| |
+ | (0) | | (1) | |
+ | | | | |
+ PAD | Null (5) | NOP (0) | No-Op (0) | -- | --
+ | | PAD (1) | Padding | |
+ | | | (33) | |
+ | | | | |
+ RECORD | Set (17) | PROPLIST | Set (11) | -- | --
+ | | (14) | | |
+ | Sequence | LIST (9) | Sequence | Sequence | Structure
+ | (16) | | (10) | |
+ | | | | Record |
+ | | | Message | |
+ | | | (77) | |
+ | -- | -- | -- | Array | Fixed Array
+ | | | | | Counted Array
+ | "Choice" | -- | -- | Choice |Discriminated-
+ | "Any" | | | | Union
+ | | | | |
+ | "Tagged" | "name" | Field (76) | -- | --
+ | | |Unique-ID(9)| |
+ | -- | SHARE-TAG | -- | -- | --
+ | | (12) | | |
+ | | SHARE-REF | | |
+ | | (13) | | |
+ | | | | |
+ | -- | -- | Compressed | -- | --
+ | | | (70) | |
+ | -- | ENCRYPT | Encrypted | -- | --
+ | | (14) | (71) | |
+
+
+DeSchon [Page 6]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+ Type CCITT MMM NBS XEROX Sun
+ -----------------------------------------------------------------------
+ BOOLEAN| Boolean(1)| BOOLEAN(2)| Boolean(8) | Boolean | Boolean
+ | | | | |
+ NUMBER | Integer(2)| EPI (5) | Integer(32)| Integer | Integer
+ | SV | SV | SV | S16 | S32
+ | | INDEX (3) | | Cardinal | Unsigned Int
+ | | U16 | | U16 | U32
+ | | INTEGER(4)| |Unspecified|Enumeration
+ | | S32 | | 16 | 32
+ | | | | Long Int |Hyper Integer
+ | | | | S32 | S64
+ | | | | Long Card |Uns Hyper Int
+ | | | | U32 | U64
+ | | | | | Double Prec
+ | | | | | 64
+ | -- | FLOAT (15)| -- | -- | Float Pt
+ | | 64 | | | 32
+ | | | | |
+ BIT- | Bit String| BITSTR(6) | Bit-String | -- | --
+ STRING| (3) | | (67) | |
+ | Octet- | -- | -- | -- | Opaque
+ | String(4)| | | |
+ | | | | |
+ STRING | IA5 (22) | TEXT (8) | ASCII- | String | Counted-
+ | | | String (2)| | Byte String
+ | | NAME (7) | | |
+ | Numeric | | | |
+ | (18) | | | |
+ | Printable | | | |
+ | (19) | | | |
+ | T.61 (20) | | | |
+ | Videotex | | | |
+ | (21) | | | |
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+DeSchon [Page 7]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+ Type CCITT MMM NBS XEROX Sun
+ -----------------------------------------------------------------------
+ OTHER | UTC Time | -- | Date (40) | -- | --
+ | (23) | | | |
+ | Gen Time | | | |
+ | (24) | | | |
+ | -- | -- | Property- | -- | --
+ | | | List (36)| |
+ | -- | -- |Property(69)| -- | --
+ | | | | |
+ | -- | -- | -- | Procedure | --
+ | | | | |
+ | -- | -- | Vendor- | -- | --
+ | | | Defined | |
+ | | | (127) | |
+ | | | Extension | |
+ | | | (126) | |
+
+
+5. Conclusions
+
+ Of the standards discussed in this survey, the CCITT approach (X.409)
+ has already gained wide acceptance. For a system that will include a
+ number of dissimilar hosts, as might be the case for an Internet
+ application, a standard that employs explicit representation, such as
+ the CCITT X.409, would probably work well. Using the CCITT X.409
+ standard it is possible to construct most of the data elements that
+ are specified for the other standards, with the possible exception of
+ the "floating point" type. However, some of the flexibility that has
+ been built into this standard, such as the "private-use class" may
+ lead to ambiguity and a lack of coordination between implementors at
+ different sites. If a standard such as the CCITT were to be used in
+ an Internet experiment a fully defined (but large) subset would
+ probably have to be selected.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+DeSchon [Page 8]
+
+
+
+RFC 971 January 1986
+A Survey of Data Representation Standards
+
+
+6. References
+
+ [1] "Message Handling Systems: Presentation Transfer Syntax and
+ Notation", Recommendation X.409, Document AP VIII-66-E,
+ International Telegraph and Telephone Consultative Committee
+ (CCITT), Malaga-Torremolinos, June, 1984.
+
+ [2] J. Garcia-Luna, A. Poggio, and D. Elliot, "Research into
+ Multimedia Message System Architecture", SRI International,
+ February, 1984.
+
+ [3] "Specification for Message Format for Computer Based Message
+ Systems", FIPS Pub 98 (also published as RFC 841), National
+ Bureau of Standards, January, 1983.
+
+ [4] J. Postel, "Internet Multimedia Mail Transfer Protocol", USC
+ Information Sciences Institute, MMM-11 (RFC-759 revised), March,
+ 1982.
+
+ [5] J. Postel, "Internet Multimedia Mail Document Format", USC
+ Information Sciences Institute, MMM-12 (RFC-767 revised), March,
+ 1982.
+
+ [6] "Extended Data Representation Reference Manual", SUN
+ Microsystems, September, 1984.
+
+ [7] "Courier: The Remote Procedure Call Protocol", XSIS-038112,
+ XEROX Corporation, December, 1981.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+DeSchon [Page 9]
+