summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1505.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1505.txt')
-rw-r--r--doc/rfc/rfc1505.txt2019
1 files changed, 2019 insertions, 0 deletions
diff --git a/doc/rfc/rfc1505.txt b/doc/rfc/rfc1505.txt
new file mode 100644
index 0000000..d03da24
--- /dev/null
+++ b/doc/rfc/rfc1505.txt
@@ -0,0 +1,2019 @@
+
+
+
+
+
+
+Network Working Group A. Costanzo
+Request for Comments: 1505 AKC Consulting
+Obsoletes: 1154 D. Robinson
+ Computervision Corporation
+ R. Ullmann
+ August 1993
+
+
+ Encoding Header Field for Internet Messages
+
+Status of this Memo
+
+ This memo defines an Experimental Protocol for the Internet
+ community. It does not specify an Internet standard. Discussion and
+ suggestions for improvement are requested. Please refer to the
+ current edition of the "IAB Official Protocol Standards" for the
+ standardization state and status of this protocol. Distribution of
+ this memo is unlimited.
+
+IESG Note
+
+ Note that a standards-track technology already exists in this area
+ [11].
+
+Abstract
+
+ This document expands upon the elective experimental Encoding header
+ field which permits the mailing of multi-part, multi-structured
+ messages. It replaces RFC 1154 [1].
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . 3
+ 2. The Encoding Field . . . . . . . . . . . . . . . . . 3
+ 2.1 Format of the Encoding Field . . . . . . . . . . . 3
+ 2.2 <count> . . . . . . . . . . . . . . . . . . . . . 4
+ 2.3 <keyword> . . . . . . . . . . . . . . . . . . . . 4
+ 2.3.1 Nested Keywords . . . . . . . . . . . . . . . . 4
+ 2.4 Comments . . . . . . . . . . . . . . . . . . . . . 4
+ 3. Encodings . . . . . . . . . . . . . . . . . . . . . 5
+ 3.1 Text . . . . . . . . . . . . . . . . . . . . . . . 5
+ 3.2 Message . . . . . . . . . . . . . . . . . . . . . 6
+ 3.3 Hex . . . . . . . . . . . . . . . . . . . . . . . 6
+ 3.4 EVFU . . . . . . . . . . . . . . . . . . . . . . . 6
+ 3.5 EDI-X12 and EDIFACT . . . . . . . . . . . . . . . 7
+ 3.6 FS . . . . . . . . . . . . . . . . . . . . . . . 7
+ 3.7 LZJU90 . . . . . . . . . . . . . . . . . . . . . . 7
+ 3.8 LZW . . . . . . . . . . . . . . . . . . . . . . . 7
+
+
+
+Costanzo, Robinson & Ullmann [Page 1]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ 3.9 UUENCODE . . . . . . . . . . . . . . . . . . . . . 7
+ 3.10 PEM and PEM-Clear . . . . . . . . . . . . . . . . 8
+ 3.11 PGP . . . . . . . . . . . . . . . . . . . . . . . 8
+ 3.12 Signature . . . . . . . . . . . . . . . . . . . 10
+ 3.13 TAR . . . . . . . . . . . . . . . . . . . . . . 10
+ 3.14 PostScript . . . . . . . . . . . . . . . . . . . 10
+ 3.15 SHAR . . . . . . . . . . . . . . . . . . . . . . 10
+ 3.16 Uniform Resource Locator . . . . . . . . . . . . 10
+ 3.17 Registering New Keywords . . . . . . . . . . . . 11
+ 4. FS (File System) Object Encoding . . . . . . . . . 11
+ 4.1 Sections . . . . . . . . . . . . . . . . . . . . 12
+ 4.1.1 Directory . . . . . . . . . . . . . . . . . . 12
+ 4.1.2 Entry . . . . . . . . . . . . . . . . . . . . 13
+ 4.1.3 File . . . . . . . . . . . . . . . . . . . . . 13
+ 4.1.4 Segment . . . . . . . . . . . . . . . . . . . 13
+ 4.1.5 Data . . . . . . . . . . . . . . . . . . . . . 14
+ 4.2 Attributes . . . . . . . . . . . . . . . . . . . 14
+ 4.2.1 Display . . . . . . . . . . . . . . . . . . . 14
+ 4.2.2 Comment . . . . . . . . . . . . . . . . . . . 15
+ 4.2.3 Type . . . . . . . . . . . . . . . . . . . . . 15
+ 4.2.4 Created . . . . . . . . . . . . . . . . . . . 15
+ 4.2.5 Modified . . . . . . . . . . . . . . . . . . . 15
+ 4.2.6 Accessed . . . . . . . . . . . . . . . . . . . 15
+ 4.2.7 Owner . . . . . . . . . . . . . . . . . . . . 15
+ 4.2.8 Group . . . . . . . . . . . . . . . . . . . . 16
+ 4.2.9 ACL . . . . . . . . . . . . . . . . . . . . . 16
+ 4.2.10 Password . . . . . . . . . . . . . . . . . . . 16
+ 4.2.11 Block . . . . . . . . . . . . . . . . . . . . 16
+ 4.2.12 Record . . . . . . . . . . . . . . . . . . . . 17
+ 4.2.13 Application . . . . . . . . . . . . . . . . . 17
+ 4.3 Date Field . . . . . . . . . . . . . . . . . . . 17
+ 4.3.1 Syntax . . . . . . . . . . . . . . . . . . . . 17
+ 4.3.2 Semantics . . . . . . . . . . . . . . . . . . 17
+ 5. LZJU90: Compressed Encoding . . . . . . . . . . . 18
+ 5.1 Overview . . . . . . . . . . . . . . . . . . . . 18
+ 5.2 Specification of the LZJU90 compression . . . . 19
+ 5.3 The Decoder . . . . . . . . . . . . . . . . . . 21
+ 5.3.1 An example of an Encoder . . . . . . . . . . . 27
+ 5.3.2 Example LZJU90 Compressed Object . . . . . . . 33
+ 6. Alphabetical Listing of Defined Encodings . . . . 34
+ 7. Security Considerations . . . . . . . . . . . . . 34
+ 8. References . . . . . . . . . . . . . . . . . . . . 34
+ 9. Acknowledgements . . . . . . . . . . . . . . . . . 35
+ 10. Authors' Addresses . . . . . . . . . . . . . . . . 36
+
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 2]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+1. Introduction
+
+ STD 11, RFC 822 [2] defines an electronic mail message to consist of
+ two parts, the message header and the message body, separated by a
+ blank line.
+
+ The Encoding header field permits the message body itself to be
+ further broken up into parts, each part also separated from the next
+ by a blank line. Thus, conceptually, a message has a header part,
+ followed by one or more body parts, all separated by apparently blank
+ lines. Each body part has an encoding type. The default (no
+ Encoding field in the header) is a one part message body of type
+ "Text".
+
+ The purpose of Encoding is to be descriptive of the content of a mail
+ message without placing constraints on the content or requiring
+ additional structure to appear in the body of the message that will
+ interfere with other processing.
+
+ A similar message format is used in the network news facility, and
+ posted articles are often transferred by gateways between news and
+ mail. The Encoding field is perhaps even more useful in news, where
+ articles often are uuencoded or shar'd, and have a number of
+ different nested encodings of graphics images and so forth. In news
+ in particular, the Encoding header keeps the structural information
+ within the (usually concealed) article header, without affecting the
+ visual presentation by simple news-reading software.
+
+2. The Encoding Field
+
+ The Encoding field consists of one or more subfields, separated by
+ commas. Each subfield corresponds to a part of the message, in the
+ order of that part's appearance. A subfield consists of a line count
+ and a keyword or a series of nested keywords defining the encoding.
+ The line count is optional in the last subfield.
+
+2.1 Format of the Encoding Field
+
+ The format of the Encoding field is:
+
+ [ <count> <keyword> [ <keyword> ]* , ]*
+ [ <count> ] <keyword> [ <keyword> ]*
+
+ where:
+
+ <count> := a decimal integer
+ <keyword> := a single alphanumeric token starting with an alpha
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 3]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+2.2 <count>
+
+ The line count is a decimal number specifying the number of text
+ lines in the part. Parts are separated by a blank line, which is not
+ included in the count of either the preceding or following part.
+ Blank lines consist only of CR/LF. Count may be zero, it must be
+ non-negative.
+
+ It is always possible to determine if the count is present because a
+ count always begins with a digit and a keyword always begins with a
+ letter.
+
+ The count is not required on the last or only part. A multi-part
+ message that consists of only one part is thus identical to a
+ single-part message.
+
+2.3 <keyword>
+
+ Keyword defines the encoding type. The keyword is a common single-
+ word name for the encoding type and is not case-sensitive.
+
+ Encoding: 107 Text
+
+2.3.1 Nested Keywords
+
+ Nested keywords are a series of keywords defining a multi-encoded
+ message part. The encoding keywords may either be an actual series
+ of encoding steps the encoder used to generate the message part or
+ may merely be used to more precisely identify the type of encoding
+ (as in the use of the keyword "Signature").
+
+ Nested keywords are parsed and generated from left to right. The
+ order is significant. A decoding application would process the list
+ from left to right, whereas, an encoder would process the Internet
+ message and generate the nested keywords in the reverse order of the
+ actual encoding process.
+
+ Encoding: 458 uuencode LZW tar (Unix binary object)
+
+2.4 Comments
+
+ Comments enclosed in parentheses may be inserted anywhere in the
+ encoding field. Mail reading systems may pass the comments to their
+ clients. Comments must not be used by mail reading systems for
+ content interpretation. Other parameters defining the type of
+ encoding must be contained within the body portion of the Internet
+ message or be implied by a keyword in the encoding field.
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 4]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+3. Encodings
+
+ This section describes some of the defined encodings used. An
+ alphabetical listing is provided in Section 6.
+
+ As with the other keyword-defined parts of the header format
+ standard, new keywords are expected and welcomed. Several basic
+ principles should be followed in adding encodings. The keyword
+ should be the most common single word name for the encoding,
+ including acronyms if appropriate. The intent is that different
+ implementors will be likely to choose the same name for the same
+ encoding. Keywords should not be too general: "binary" would have
+ been a bad choice for the "hex" encoding.
+
+ The encoding should be as free from unnecessary idiosyncracies as
+ possible, except when conforming to an existing standard, in which
+ case there is nothing that can be done.
+
+ The encoding should, if possible, use only the 7 bit ASCII printing
+ characters if it is a complete transformation of a source document
+ (e.g., "hex" or "uuencode"). If it is essentially a text format, the
+ full range may be used. If there is an external standard, the
+ character set may already be defined. Keywords beginning with "X-"
+ are permanently reserved to implementation-specific use. No standard
+ registered encoding keyword will ever begin with "X-".
+
+ New encoding keywords which are not reserved for implementation-
+ specific use must be registered with the Internet Assigned Numbers
+ Authority (IANA). Refer to section 3.17 for additional information.
+
+3.1 Text
+
+ This indicates that the message is in no particular encoded format,
+ but is to be presented to the user as-is.
+
+ The text is ISO-10646-UTF-1 [3]. As specified in STD 10, RFC 821
+ [10], the message is expected to consist of lines of reasonable
+ length (less than or equal to 1000 characters).
+
+ On some older implementations of mail and news, only the 7 bit subset
+ of ISO-10646-UTF-1 can be used. This is identical to the ASCII 7 bit
+ code. On some mail transports that are not compliant with STD 10,
+ RFC 821 [10], line length may be restricted by the service.
+
+ Text may be followed by a nested keyword to define the encoded part
+ further, e.g., "signature":
+
+ Encoding: 496 Text, 8 Text Signature
+
+
+
+Costanzo, Robinson & Ullmann [Page 5]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ An automated file sending service may find this useful, for example,
+ to differentiate between and ignore the signature area when parsing
+ the body of a message for file requests.
+
+3.2 Message
+
+ This encoding indicates that the body part is itself in the format of
+ an Internet message, with its own header part and body part(s). A
+ "message" body part's message header may be a full Internet message
+ header or it may consist only of an Encoding field.
+
+ Using the message encoding on returned mail makes it practical for a
+ mail reading system to implement a reliable automatic resending
+ function, if the mailer generates it when returning contents. It is
+ also useful in a "copy append" MUA (mail user agent) operation.
+
+ MTAs (mail transfer agents) returning mail should generate an
+ Encoding header. Note that this does not require any parsing or
+ transformation of the returned message; the message is simply
+ appended un-modified; MTAs are prohibited from modifying the content
+ of messages.
+
+ Encoding: 7 Text (Return Reason), Message (Returned Mail)
+
+3.3 Hex
+
+ The encoding indicates that the body part contains binary data,
+ encoded as 2 hexadecimal digits per byte, highest significant nibble
+ first.
+
+ Lines consist of an even number of hexadecimal digits. Blank lines
+ are not permitted. The decode process must accept lines with between
+ 2 and 1000 characters, inclusive.
+
+ The Hex encoding is provided as a simple way of providing a method of
+ encoding small binary objects.
+
+3.4 EVFU
+
+ EVFU (electronic vertical format unit) specifies that each line
+ begins with a one-character "channel selector". The original purpose
+ was to select a channel on a paper tape loop controlling the printer.
+
+ This encoding is sometimes called "FORTRAN" format. It is the
+ default output format of FORTRAN programs on a number of computer
+ systems.
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 6]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ The legal characters are '0' to '9', '+', '-', and space. These
+ correspond to the 12 rows (and absence of a punch) on a printer
+ control tape (used when the control unit was electromechanical).
+
+ The channels that have generally agreed definitions are:
+
+ 1 advances to the first print line on the next page
+ 0 skip a line, i.e., double-space
+ + over-print the preceeding line
+ - skip 2 lines, i.e., triple-space
+ (space) print on the next line, single-space
+
+3.5 EDI-X12 and EDIFACT
+
+ The EDI-X12 and EDIFACT keywords indicate that the message or part is
+ a EDI (Electronic Document Interchange) business document, formatted
+ according to ANSI X12 or the EDIFACT standard.
+
+ A message containing a note and 2 X12 purchase orders might have an
+ encoding of:
+
+ Encoding: 17 TEXT, 146 EDI-X12, 69 EDI-X12
+
+3.6 FS
+
+ The FS (File System) keyword specifies a section consisting of
+ encoded file system objects. This encoding method (defined in
+ section 4) allows the moving of a structured set of files from one
+ environment to another while preserving all common elements.
+
+3.7 LZJU90
+
+ The LZJU90 keyword specifies a section consisting of an encoded
+ binary or text object. The encoding (defined in section 5) provides
+ both compression and representation in a text format.
+
+3.8 LZW
+
+ The LZW keyword specifies a section consisting of the data produced
+ by the Unix compress program.
+
+3.9 UUENCODE
+
+ The uuencode keyword specifies a section consisting of the output of
+ the uuencode program supplied as part of uucp.
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 7]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+3.10 PEM and PEM-Clear
+
+ The PEM and PEM-Clear keywords indicate that the section is encrypted
+ with the methods specified in RFCs 1421-1424 [4,5,6,7] or uses the
+ MIC-Clear encapsulation specified therein.
+
+ A simple text object encrypted with PEM has the header:
+
+ Encoding: PEM Text
+
+ Note that while this indicates that the text resulting from the PEM
+ decryption is ISO-10646-UTF-1 text, the present version of PEM
+ further restricts this to only the 7 bit subset. A future version of
+ PEM may lift this restriction.
+
+ If the object resulting from the decryption starts with Internet
+ message header(s), the encoding is:
+
+ Encoding: PEM Message
+
+ This is useful to conceal both the encoding within and the headers
+ not needed to deliver the message (such as Subject:).
+
+ PEM does not provide detached signatures, but rather provides the
+ MIC-Clear mode to send messages with integrity checks that are not
+ encrypted. In this mode, the keyword PEM-Clear is used:
+
+ Encoding: PEM-Clear EDIFACT
+
+ The example being a non-encrypted EDIFACT transaction with a digital
+ signature. With the proper selection of PEM parameters and
+ environment, this can also provide non-repudiation, but it does not
+ provide confidentiality.
+
+ Decoders that are capable of decrypting PEM treat the two keywords in
+ the same way, using the contained PEM headers to distinguish the
+ mode. Decoders that do not understand PEM can use the PEM-Clear
+ keyword as a hint that it may be useful to treat the section as text,
+ or even continue the decode sequence after removing the PEM headers.
+
+ When Encoding is used for PEM, the RFC934 [9] encapsulation specified
+ in RFC1421 is not used.
+
+3.11 PGP
+
+ The PGP keyword indicates that the section is encrypted using the
+ Pretty Good Privacy specification, or is a public key block, keyring,
+ or detached signature meaningful to the PGP program. (These objects
+
+
+
+Costanzo, Robinson & Ullmann [Page 8]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ are distinguished by internal information.)
+
+ The keyword actually implies 3 different transforms: a compression
+ step, the encryption, and an ASCII encoding. These transforms are
+ internal to the PGP encoder/decoder. A simple text message encrypted
+ with PGP is specified by:
+
+ Encoding: PGP Text
+
+ An EDI transaction using ANSI X12 might be:
+
+ Encoding: 176 PGP EDI-X12
+
+ Since an evesdropper can still "see" the nested type (Text or EDI in
+ these examples), thus making information available to traffic
+ analysis which is undesirable in some applications, the sender may
+ prefer to use:
+
+ Encoding: PGP Message
+
+ As discussed in the description of the Message keyword, the enclosed
+ object may have a complete header or consist only of an Encoding:
+ header describing its content.
+
+ When PGP is used to transmit an encoded key or keyring, with no
+ object significant to the mail user agent as a result of the decoding
+ (e.g., text to display), the keyword is used by itself.
+
+ Another case of the PGP keyword occurs in "clear-signing" a message.
+ That is, sending an un-encrypted message with a digital signature
+ providing authentication and (in some environments) non-deniability.
+
+ Encoding: 201 Text, 8 PGP Signature, 4 Text Signature
+
+ This example indicates a 201 line message, followed by an 8 line (in
+ its encoded form) PGP detached signature. The processing of the PGP
+ section is expected (in this example) to result in a text object that
+ is to be treated by the receiver as a signature, possibly something
+ like:
+
+ [PGP signed Ariel@Process.COM Robert L Ullmann VALID/TRUSTED]
+
+ Note that the PGP signature algorithm is applied to the encoded form
+ of the clear-text section, not the object(s) before encoding. (Which
+ would be quite difficult for encodings like tar or FS). Continuing
+ the example, the PGP signature is then followed by a 4 line
+ "ordinary" signature section.
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 9]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+3.12 Signature
+
+ The signature keyword indicates that the section contains an Internet
+ message signature. An Internet message signature is an area of an
+ Internet message (usually located at the end) which contains a single
+ line or multiple lines of characters. The signature may comprise the
+ sender's name or a saying the sender is fond of. It is normally
+ inserted automatically in all outgoing message bodies. The encoding
+ keyword "Signature" must always be nested and follow another keyword.
+
+ Encoding: 14 Text, 3 Text Signature
+
+ A usenet news posting program should generate an encoding showing
+ which is the text and which is the signature area of the posted
+ message.
+
+3.13 TAR
+
+ The tar keyword specifies a section consisting of the output of the
+ tar program supplied as part of Unix.
+
+3.14 PostScript
+
+ The PostScript keyword specifies a section formatted according to the
+ PostScript [8] computer program language definition. PostScript is a
+ registered trademark of Adobe Systems Inc.
+
+3.15 SHAR
+
+ The SHAR keyword specifies a section encoded in shell archive format.
+ Use of shar, although supported, is not recommended.
+
+ WARNING: Because the shell archive may contain commands you may not
+ want executed, the decoder should not automatically execute decoded
+ shell archived statements. This warning also applies to any future
+ types that include commands to be executed by the receiver.
+
+3.16 Uniform Resource Locator
+
+ The URL keyword indicates that the section consists of zero or more
+ references to resources of some type. URL provides a facility to
+ include by reference arbitrary external resources from various
+ sources in the Internet. The specification of URL is a work in
+ progress in the URI working group of the IETF.
+
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 10]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+3.17 Registering New Keywords
+
+ New encoding keywords which are not reserved for implementation-
+ specific use must be registered with the Internet Assigned Numbers
+ Authority (IANA). IANA acts as a central registry for these values.
+ IANA may reject or modify the keyword registration request if it does
+ not meet the criteria as specified in section 3. Keywords beginning
+ with "X-" are permanently reserved to implementation-specific use.
+ IANA will not register an encoding keyword that begins with "X-".
+ Registration requests should be sent via electronic mail to IANA as
+ follows:
+
+ To: IANA@isi.edu
+ Subject: Registration of a new EHF-MAIL Keyword
+
+ The mail message must specify the keyword for the encoding and
+ acronyms if appropriate. Documentation defining the keyword and its
+ proposed purpose must be included. The documentation must either
+ reference an external non-Internet standards document or an existing
+ or soon to be RFC. If applicable, the documentation should contain a
+ draft version of the future RFC. The draft must be submitted as a
+ RFC according to the normal procedure within a reasonable amount of
+ time after the keyword's registration has been approved.
+
+4. FS (File System) Object Encoding
+
+ The file system encoding provides a standard, transportable encoding
+ of file system objects from many different operating systems. The
+ intent is to allow the moving of a structured set of files from one
+ environment to another while preserving common elements. At the same
+ time, files can be moved within a single environment while preserving
+ all attributes.
+
+ The representations consist of a series of nested sections, with
+ attributes defined at the appropriate levels. Each section begins
+ with an open bracket "[" followed by a directive keyword and ends
+ with a close bracket "]". Attributes are lines, beginning with a
+ keyword. Lines which begin with a LWSP (linear white space)
+ character are continuation lines.
+
+ Any string-type directive or attribute may be a simple string not
+ starting with a quotation mark ( " ) and not containing special
+ characters (e.g. newline) or LWSP (space and tab). The string name
+ begins with the first non-LWSP character on the line following the
+ attribute or directive keyword and ends with the last non-LWSP
+ character.
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 11]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ Otherwise, the character string name is enclosed in quotes. The
+ string itself contains characters in ISO-10646-UTF-1 but is quoted
+ and escaped at octet level (as elsewhere in RFC822 [2]). The strings
+ begin and end with a quotation mark ( " ). Octets equal to quote in
+ the string are escaped, as are octets equal to the escape characters
+ (\" and \\). The escaped octets may be part of a UTF multi-octet
+ character. Octets that are not printable are escaped with \nnn octal
+ representation. When an escape (\) occurs at the end of a line, the
+ escape, the end of the line, and the first character of the next
+ line, which must be one of the LWSP characters, are removed
+ (ignored).
+
+ [ file Simple-File.Name
+
+ [ file " Long file name starting with spaces and having a couple\
+ [sic] of nasties in it like this newline\012near the end."
+
+ Note that in the above example, there is one space (not two) between
+ "couple" and "[sic]". The encoder may choose to use the nnn sequence
+ for any character that might cause trouble. Refer to section 5.1 for
+ line length recommendations.
+
+4.1 Sections
+
+ A section starts with an open bracket, followed by a keyword that
+ defines the type of section.
+
+ The section keywords are:
+
+ directory
+ entry
+ file
+ segment
+ data
+
+ The encoding may start with either a file, directory or entry. A
+ directory section may contain zero or more file, entry, and directory
+ sections. A file section contains a data section or zero or more
+ segment sections. A segment section contains a data section or zero
+ or more segment sections.
+
+4.1.1 Directory
+
+ This indicates the start of a directory. There is one parameter, the
+ entry name of the directory:
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 12]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ [ directory foo
+ ...
+ ]
+
+4.1.2 Entry
+
+ The entry keyword represents an entry in a directory that is not a
+ file or a sub-directory. Examples of entries are soft links in Unix,
+ or access categories in Primos. A Primos access category might look
+ like this:
+
+ [ entry SYS.ACAT
+ type ACAT
+ created 27 Jan 1987 15:31:04.00
+ acl SYADMIN:* ARIEL:DALURWX $REST:
+ ]
+
+4.1.3 File
+
+ The file keyword is followed by the entry name of the file. The
+ section then continues with attributes, possibly segments, and then
+ data.
+
+ [ file MY.FILE
+ created 27 Feb 1987 12:10:20.07
+ modified 27 Mar 1987 16:17:03.02
+ type DAM
+ [ data LZJU90
+ * LZJU90
+ ...
+ ]]
+
+4.1.4 Segment
+
+ This is used to define segments of a file. It should only be used
+ when encoding files that are actually segmented. The optional
+ parameter is the number or name of the segment.
+
+ When encoding Macintosh files, the two forks of the file are treated
+ as segments:
+
+
+
+
+
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 13]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ [ file A.MAC.FILE
+ display "A Mac File"
+ type MAC
+ comment "I created this myself"
+ ...
+ [ segment resource
+ [ data ...
+ ...
+ ]]
+ [ segment data
+ [ data ...
+ ...
+ ]]]
+
+4.1.5 Data
+
+ The data section contains the encoded data of the file. The encoding
+ method is defined in section 5. The data section must be last within
+ the containing section.
+
+4.2 Attributes
+
+ Attributes may occur within file, entry, directory, and segment
+ sections. Attributes must occur before sub-sections.
+
+ The attribute directives are:
+
+ display
+ type
+ created
+ modified
+ accessed
+ owner
+ group
+ acl
+ password
+ block
+ record
+ application
+
+4.2.1 Display
+
+ This indicates the display name of the object. Some systems, such as
+ the Macintosh, use a different form of the name for matching or
+ uniqueness.
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 14]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+4.2.2 Comment
+
+ This contains an arbitrary comment on the object. The Macintosh
+ stores this attribute with the file.
+
+4.2.3 Type
+
+ The type of an object is usually of interest only to the operating
+ system that the object was created on.
+
+ Types are:
+
+ ACAT access category (Primos)
+ CAM contiguous access method (Primos)
+ DAM direct access method (Primos)
+ FIXED fixed length records (VMS)
+ FLAT `flat file', sequence of bytes (Unix, DOS, default)
+ ISAM indexed-sequential access method (VMS)
+ LINK soft link (Unix)
+ MAC Macintosh file
+ SAM sequential access method (Primos)
+ SEGSAM segmented direct access method (Primos)
+ SEGDAM segmented sequential access method (Primos)
+ TEXT lines of ISO-10646-UTF-1 text ending with CR/LF
+ VAR variable length records (VMS)
+
+4.2.4 Created
+
+ Indicates the creation date of the file. Dates are in the format
+ defined in section 4.3.
+
+4.2.5 Modified
+
+ Indicates the date and time the file was last modified or closed
+ after being open for write.
+
+4.2.6 Accessed
+
+ Indicates the date and time the file was last accessed on the
+ original file system.
+
+4.2.7 Owner
+
+ The owner directive gives the name or numerical ID of the owner or
+ creator of the file.
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 15]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+4.2.8 Group
+
+ The group directive gives the name(s) or numerical IDs of the group
+ or groups to which the file belongs.
+
+4.2.9 ACL
+
+ This directive specifies the access control list attribute of an
+ object (the ACL attribute may occur more than once within an object).
+ The list consist of a series of pairs of IDs and access codes in the
+ format:
+
+ user-ID:access-list
+
+
+ There are four reserved IDs:
+
+ $OWNER the owner or creator
+ $GROUP a member of the group or groups
+ $SYSTEM a system administrator
+ $REST everyone else
+
+ The access list is zero or more single letters:
+
+ A add (create file)
+ D delete
+ L list (read directory)
+ P change protection
+ R read
+ U use
+ W write
+ X execute
+ * all possible access
+
+4.2.10 Password
+
+ The password attribute gives the access password for this object.
+ Since the content of the object follows (being the raison d'etre of
+ the encoding), the appearance of the password in plain text is not
+ considered a security problem. If the password is actually set by
+ the decoder on a created object, the security (or lack) is the
+ responsibility of the application domain controlling the decoder as
+ is true of ACL and other protections.
+
+4.2.11 Block
+
+ The block attribute gives the block size of the file as a decimal
+ number of bytes.
+
+
+
+Costanzo, Robinson & Ullmann [Page 16]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+4.2.12 Record
+
+ The record attribute gives the record size of the file as a decimal
+ number of bytes.
+
+4.2.13 Application
+
+ This specifies the application that the file was created with or
+ belongs to. This is of particular interest for Macintosh files.
+
+4.3 Date Field
+
+ Various attributes have a date and time subsequent to and associated
+ with them.
+
+4.3.1 Syntax
+
+ The syntax of the date field is a combination of date, time, and
+ timezone:
+
+ DD Mon YYYY HH:MM:SS.FFFFFF [+-]HHMMSS
+
+ Date := DD Mon YYYY 1 or 2 Digits " " 3 Alpha " " 4 Digits
+ DD := Day e.g. "08", " 8", "8"
+ Mon := Month "Jan" | "Feb" | "Mar" | "Apr" |
+ "May" | "Jun" | "Jul" | "Aug" |
+ "Sep" | "Oct" | "Nov" | "Dec"
+ YYYY := Year
+ Time := HH:MM:SS.FFFFFF 2 Digits ":" 2 Digits [ ":" 2 Digits
+ ["." 1 to 6 Digits ] ]
+ e.g. 00:00:00, 23:59:59.999999
+ HH := Hours 00 to 23
+ MM := Minutes 00 to 59
+ SS := Seconds 00 to 60 (60 only during a leap second)
+ FFFFF:= Fraction
+ Zone := [+-]HHMMSS "+" | "-" 2 Digits [ 2 Digits
+ [ 2 Digits ] ]
+ HH := Local Hour Offset
+ MM := Local Minutes Offset
+ SS := Local Seconds Offset
+
+4.3.2 Semantics
+
+ The date information is that which the file system has stored in
+ regard to the file system object. Date information is stored
+ differently and with varying degrees of precision by different
+ computer file systems. An encoder must include as much date
+ information as it has available concerning the file system object. A
+
+
+
+Costanzo, Robinson & Ullmann [Page 17]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ decoder which receives an object encoded with a date field containing
+ greater precision than its own must disregard the excessive
+ information. Zone is Co-ordinated Universal Time "UTC" (formerly
+ called "Greenwich Mean Time"). The field specifies the time zone of
+ the file system object as an offset from Universal Time. It is
+ expressed as a signed [+-] two, four or six digit number.
+
+ A file that was created April 15, 1993 at 8:05 p.m. in Roselle Park,
+ New Jersey, U.S.A. might have a date field which looks like:
+
+ 15 Apr 1993 20:05:22.12 -0500
+
+5. LZJU90: Compressed Encoding
+
+ LZJU90 is an encoding for a binary or text object to be sent in an
+ Internet mail message. The encoding provides both compression and
+ representation in a text format that will successfully survive
+ transmission through the many different mailers and gateways that
+ comprise the Internet and connected mail networks.
+
+5.1 Overview
+
+ The encoding first compresses the binary object, using a modified
+ LZ77 algorithm, called LZJU90. It then encodes each 6 bits of the
+ output of the compression as a text character, using a character set
+ chosen to survive any translations between codes, such as ASCII to
+ EBCDIC. The 64 six-bit strings 000000 through 111111 are represented
+ by the characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z".
+ The output text begins with a line identifying the encoding. This is
+ for visual reference only, the "Encoding:" field in the header
+ identifies the section to the user program. It also names the object
+ that was encoded, usually by a file name.
+
+ The format of this line is:
+
+ * LZJU90 <name>
+
+
+ where <name> is optional. For example:
+
+ * LZJU90 vmunix
+
+ This is followed by the compressed and encoded data, broken into
+ lines where convenient. It is recommended that lines be broken every
+ 78 characters to survive mailers than incorrectly restrict line
+ length. The decoder must accept lines with 1 to 1000 characters on
+ each line. After this, there is one final line that gives the number
+ of bytes in the original data and a CRC of the original data. This
+
+
+
+Costanzo, Robinson & Ullmann [Page 18]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ should match the byte count and CRC found during decompression.
+
+ This line has the format:
+
+ * <count> <CRC>
+
+
+ where <count> is a decimal number, and CRC is 8 hexadecimal digits.
+ For example:
+
+ * 4128076 5AC2D50E
+
+ The count used in the Encoding: field in the message header is the
+ total number of lines, including the start and end lines that begin
+ with *. A complete example is given in section 5.3.2.
+
+5.2 Specification of the LZJU90 compression
+
+ The Lempel-Ziv-Storer-Szymanski model of mixing pointers and literal
+ characters is used in the compression algorithm. Repeat occurrences
+ of strings of octets are replaced by pointers to the earlier
+ occurrence.
+
+ The data compression is defined by the decoding algorithm. Any
+ encoder that emits symbols which cause the decoder to produce the
+ original input is defined to be valid.
+
+ There are many possible strategies for the maximal-string matching
+ that the encoder does, section 5.3.1 gives the code for one such
+ algorithm. Regardless of which algorithm is used, and what tradeoffs
+ are made between compression ratio and execution speed or space, the
+ result can always be decoded by the simple decoder.
+
+ The compressed data consists of a mixture of unencoded literal
+ characters and copy pointers which point to an earlier occurrence of
+ the string to be encoded.
+
+ Compressed data contains two types of codewords:
+
+ LITERAL pass the literal directly to the uncompressed output.
+
+ COPY length, offset
+ go back offset characters in the output and copy length
+ characters forward to the current position.
+
+ To distinguish between codewords, the copy length is used. A copy
+ length of zero indicates that the following codeword is a literal
+ codeword. A copy length greater than zero indicates that the
+
+
+
+Costanzo, Robinson & Ullmann [Page 19]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ following codeword is a copy codeword.
+
+ To improve copy length encoding, a threshold value of 2 has been
+ subtracted from the original copy length for copy codewords, because
+ the minimum copy length is 3 in this compression scheme.
+
+ The maximum offset value is set at 32255. Larger offsets offer
+ extremely low improvements in compression (less than 1 percent,
+ typically).
+
+ No special encoding is done on the LITERAL characters. However,
+ unary encoding is used for the copy length and copy offset values to
+ improve compression. A start-step-stop unary code is used.
+
+ A (start, step, stop) unary code of the integers is defined as
+ follows: The Nth codeword has N ones followed by a zero followed by
+ a field of size START + (N * STEP). If the field width is equal to
+ STOP then the preceding zero can be omitted. The integers are laid
+ out sequentially through these codewords. For example, (0, 1, 4)
+ would look like:
+
+ Codeword Range
+
+ 0 0
+ 10x 1-2
+ 110xx 3-6
+ 1110xxx 7-14
+ 1111xxxx 15-30
+
+ Following are the actual values used for copy length and copy offset:
+
+ The copy length is encoded with a (0, 1, 7) code leading to a maximum
+ copy length of 256 by including the THRESHOLD value of 2.
+
+ Codeword Range
+
+ 0 0
+ 10x 3-4
+ 110xx 5-8
+ 1110xxx 9-16
+ 11110xxxx 17-32
+ 111110xxxxx 33-64
+ 1111110xxxxxx 65-128
+ 1111111xxxxxxx 129-256
+
+ The copy offset is encoded with a (9, 1, 14) code leading to a
+ maximum copy offset of 32255. Offset 0 is reserved as an end of
+ compressed data flag.
+
+
+
+Costanzo, Robinson & Ullmann [Page 20]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ Codeword Range
+
+ 0xxxxxxxxx 0-511
+ 10xxxxxxxxxx 512-1535
+ 110xxxxxxxxxxx 1536-3583
+ 1110xxxxxxxxxxxx 3485-7679
+ 11110xxxxxxxxxxxxx 7680-15871
+ 11111xxxxxxxxxxxxxx 15872-32255
+
+ The 0 has been chosen to signal the start of the field for ease of
+ encoding. (The bit generator can simply encode one more bit than is
+ significant in the binary representation of the excess.)
+
+ The stop values are useful in the encoding to prevent out of range
+ values for the lengths and offsets, as well as shortening some codes
+ by one bit.
+
+ The worst case compression using this scheme is a 1/8 increase in
+ size of the encoded data. (One zero bit followed by 8 character
+ bits). After the character encoding, the worst case ratio is 3/2 to
+ the original data.
+
+ The minimum copy length of 3 has been chosen because the worst case
+ copy length and offset is 3 bits (3) and 19 bits (32255) for a total
+ of 22 bits to encode a 3 character string (24 bits).
+
+5.3 The Decoder
+
+ As mentioned previously, the compression is defined by the decoder.
+ Any encoder that produced output that is correctly decoded is by
+ definition correct.
+
+ The following is an implementation of the decoder, written more for
+ clarity and as much portability as possible, rather than for maximum
+ speed.
+
+ When optimized for a specific environment, it will run significantly
+ faster.
+
+ /* LZJU 90 Decoding program */
+
+ /* Written By Robert Jung and Robert Ullmann, 1990 and 1991. */
+
+ /* This code is NOT COPYRIGHT, not protected. It is in the true
+ Public Domain. */
+
+ #include <stdio.h>
+ #include <string.h>
+
+
+
+Costanzo, Robinson & Ullmann [Page 21]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ typedef unsigned char uchar;
+ typedef unsigned int uint;
+
+ #define N 32255
+ #define THRESHOLD 3
+
+ #define STRTP 9
+ #define STEPP 1
+ #define STOPP 14
+ #define STRTL 0
+ #define STEPL 1
+ #define STOPL 7
+
+ static FILE *in;
+ static FILE *out;
+
+ static int getbuf;
+ static int getlen;
+ static long in_count;
+ static long out_count;
+ static long crc;
+ static long crctable[256];
+ static uchar xxcodes[] =
+ "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\
+ abcdefghijklmnopqrstuvwxyz";
+ static uchar ddcodes[256];
+
+ static uchar text[N];
+
+ #define CRCPOLY 0xEDB88320
+ #define CRC_MASK 0xFFFFFFFF
+ #define UPDATE_CRC(crc, c) \
+ crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \
+ ^ (crc >> 8)
+ #define START_RECD "* LZJU90"
+
+
+
+ void MakeCrctable() /* Initialize CRC-32 table */
+ {
+ uint i, j;
+ long r;
+ for (i = 0; i <= 255; i++) {
+ r = i;
+ for (j = 8; j > 0; j--) {
+ if (r & 1)
+ r = (r >> 1) ^ CRCPOLY;
+ else
+
+
+
+Costanzo, Robinson & Ullmann [Page 22]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ r >>= 1;
+ }
+ crctable[i] = r;
+ }
+ }
+
+
+
+ int GetXX() /* Get xxcode and translate */
+ {
+ int c;
+ do {
+ if ((c = fgetc(in)) == EOF)
+ c = 0;
+ } while (c == '\n');
+ in_count++;
+ return ddcodes[c];
+ }
+
+
+
+ int GetBit() /* Get one bit from input buffer */
+ {
+ int c;
+ while (getlen <= 0) {
+ c = GetXX();
+ getbuf |= c << (10-getlen);
+ getlen += 6;
+ }
+ c = (getbuf & 0x8000) != 0;
+ getbuf <<= 1;
+ getbuf &= 0xFFFF;
+ getlen--;
+ return(c);
+ }
+
+
+
+ int GetBits(int len) /* Get len bits */
+ {
+ int c;
+ while (getlen <= 10) {
+ c = GetXX();
+ getbuf |= c << (10-getlen);
+ getlen += 6;
+ }
+ if (getlen < len) {
+ c = (uint)getbuf >> (16-len);
+
+
+
+Costanzo, Robinson & Ullmann [Page 23]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ getbuf = GetXX();
+ c |= getbuf >> (6+getlen-len);
+ getbuf <<= (10+len-getlen);
+ getbuf &= 0xFFFF;
+ getlen -= len - 6;
+ }
+ else {
+ c = (uint)getbuf >> (16-len);
+ getbuf <<= len;
+ getbuf &= 0xFFFF;
+ getlen -= len;
+ }
+ return(c);
+ }
+
+
+
+ int DecodePosition() /* Decode offset position pointer */
+ {
+ int c;
+ int width;
+ int plus;
+ int pwr;
+ plus = 0;
+ pwr = 1 << STRTP;
+ for (width = STRTP; width < STOPP; width += STEPP) {
+ c = GetBit();
+ if (c == 0)
+ break;
+ plus += pwr;
+ pwr <<= 1;
+ }
+ if (width != 0)
+ c = GetBits(width);
+ c += plus;
+ return(c);
+ }
+
+
+
+ int DecodeLength() /* Decode code length */
+ {
+ int c;
+ int width;
+ int plus;
+ int pwr;
+ plus = 0;
+ pwr = 1 << STRTL;
+
+
+
+Costanzo, Robinson & Ullmann [Page 24]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ for (width = STRTL; width < STOPL; width += STEPL) {
+ c = GetBit();
+ if (c == 0)
+ break;
+ plus += pwr;
+ pwr <<= 1;
+ }
+ if (width != 0)
+ c = GetBits(width);
+ c += plus;
+ return(c);
+ }
+
+
+ void InitCodes() /* Initialize decode table */
+ {
+ int i;
+ for (i = 0; i < 256; i++) ddcodes[i] = 0;
+ for (i = 0; i < 64; i++) ddcodes[xxcodes[i]] = i;
+ return;
+ }
+
+ main(int ac, char **av) /* main program */
+ {
+ int r;
+ int j, k;
+ int c;
+ int pos;
+ char buf[80];
+ char name[3];
+ long num, bytes;
+
+ if (ac < 3) {
+ fprintf(stderr, "usage: judecode in out\n");
+ return(1);
+ }
+
+ in = fopen(av[1], "r");
+ if (!in){
+ fprintf(stderr, "Can't open %s\n", av[1]);
+ return(1);
+ }
+
+
+ out = fopen(av[2], "wb");
+ if (!out) {
+ fprintf(stderr, "Can't open %s\n", av[2]);
+ fclose(in);
+
+
+
+Costanzo, Robinson & Ullmann [Page 25]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ return(1);
+ }
+
+ while (1) {
+ if (fgets(buf, sizeof(buf), in) == NULL) {
+ fprintf(stderr, "Unexpected EOF\n");
+ return(1);
+ }
+ if (strncmp(buf, START_RECD, strlen(START_RECD)) == 0)
+ break;
+ }
+
+ in_count = 0;
+ out_count = 0;
+ getbuf = 0;
+ getlen = 0;
+
+ InitCodes();
+ MakeCrctable();
+
+ crc = CRC_MASK;
+ r = 0;
+
+ while (feof(in) == 0) {
+ c = DecodeLength();
+ if (c == 0) {
+ c = GetBits(8);
+ UPDATE_CRC(crc, c);
+ out_count++;
+ text[r] = c;
+ fputc(c, out);
+ if (++r >= N)
+ r = 0;
+ }
+
+ else {
+ pos = DecodePosition();
+ if (pos == 0)
+ break;
+ pos--;
+ j = c + THRESHOLD - 1;
+ pos = r - pos - 1;
+ if (pos < 0)
+ pos += N;
+ for (k = 0; k < j; k++) {
+ c = text[pos];
+ text[r] = c;
+ UPDATE_CRC(crc, c);
+
+
+
+Costanzo, Robinson & Ullmann [Page 26]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ out_count++;
+ fputc(c, out);
+ if (++r >= N)
+ r = 0;
+ if (++pos >= N)
+ pos = 0;
+ }
+ }
+ }
+
+ fgetc(in); /* skip newline */
+
+ if (fscanf(in, "* %ld %lX", &bytes, &num) != 2) {
+ fprintf(stderr, "CRC record not found\n");
+ return(1);
+ }
+
+ else if (crc != num) {
+ fprintf(stderr,
+ "CRC error, expected %lX, found %lX\n",
+ crc, num);
+ return(1);
+ }
+
+ else if (bytes != out_count) {
+ fprintf(stderr,
+ "File size error, expected %lu, found %lu\n",
+ bytes, out_count);
+ return(1);
+ }
+
+ else
+ fprintf(stderr,
+ "File decoded to %lu bytes correctly\n",
+ out_count);
+
+ fclose(in);
+ fclose(out);
+ return(0);
+ }
+
+
+5.3.1 An example of an Encoder
+
+ Many algorithms are possible for the encoder, with different
+ tradeoffs between speed, size, and complexity. The following is a
+ simple example program which is fairly efficient; more sophisticated
+ implementations will run much faster, and in some cases produce
+
+
+
+Costanzo, Robinson & Ullmann [Page 27]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ somewhat better compression.
+
+ This example also shows that the encoder need not use the entire
+ window available. Not using the full window costs a small amount of
+ compression, but can greatly increase the speed of some algorithms.
+
+ /* LZJU 90 Encoding program */
+
+ /* Written By Robert Jung and Robert Ullmann, 1990 and 1991. */
+
+ /* This code is NOT COPYRIGHT, not protected. It is in the true
+ Public Domain. */
+
+ #include <stdio.h>
+
+ typedef unsigned char uchar;
+ typedef unsigned int uint;
+
+ #define N 24000 /* Size of window buffer */
+ #define F 256 /* Size of look-ahead buffer */
+ #define THRESHOLD 3
+ #define K 16384 /* Size of hash table */
+
+ #define STRTP 9
+ #define STEPP 1
+ #define STOPP 14
+
+ #define STRTL 0
+ #define STEPL 1
+ #define STOPL 7
+
+ #define CHARSLINE 78
+
+ static FILE *in;
+ static FILE *out;
+
+ static int putlen;
+ static int putbuf;
+ static int char_ct;
+ static long in_count;
+ static long out_count;
+ static long crc;
+ static long crctable[256];
+ static uchar xxcodes[] =
+ "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\
+ abcdefghijklmnopqrstuvwxyz";
+ uchar window_text[N + F + 1];
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 28]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ /* text contains window, plus 1st F of window again
+ (for comparisons) */
+
+ uint hash_table[K];
+ /* table of pointers into the text */
+
+ #define CRCPOLY 0xEDB88320
+ #define CRC_MASK 0xFFFFFFFF
+ #define UPDATE_CRC(crc, c) \
+ crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \
+ ^ (crc >> 8)
+
+
+ void MakeCrctable() /* Initialize CRC-32 table */
+ {
+ uint i, j;
+ long r;
+ for (i = 0; i <= 255; i++) {
+ r = i;
+ for (j = 8; j > 0; j--) {
+ if (r & 1)
+ r = (r >> 1) ^ CRCPOLY;
+ else
+ r >>= 1;
+ }
+ crctable[i] = r;
+ }
+ }
+
+
+
+ void PutXX(int c) /* Translate and put xxcode */
+ {
+ c = xxcodes[c & 0x3F];
+ if (++char_ct > CHARSLINE) {
+ char_ct = 1;
+ fputc('\n', out);
+ }
+ fputc(c, out);
+ out_count++;
+ }
+
+
+ void PutBits(int c, int len) /* Put rightmost "len" bits of "c" */
+ {
+ c <<= 16 - len;
+ c &= 0xFFFF;
+ putbuf |= (uint) c >> putlen;
+
+
+
+Costanzo, Robinson & Ullmann [Page 29]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ c <<= 16 - putlen;
+ c &= 0xFFFF;
+ putlen += len;
+ while (putlen >= 6) {
+ PutXX(putbuf >> 10);
+ putlen -= 6;
+ putbuf <<= 6;
+ putbuf &= 0xFFFF;
+ putbuf |= (uint) c >> 10;
+ c = 0;
+ }
+ }
+
+
+ void EncodePosition(int ch) /* Encode offset position pointer */
+ {
+ int width;
+ int prefix;
+ int pwr;
+ pwr = 1 << STRTP;
+ for (width = STRTP; ch >= pwr; width += STEPP, pwr <<= 1)
+ ch -= pwr;
+ if ((prefix = width - STRTP) != 0)
+ PutBits(0xffff, prefix);
+ if (width < STOPP)
+ width++;
+ /* else if (width > STOPP)
+ abort(); do nothing */
+ PutBits(ch, width);
+ }
+
+
+ void EncodeLength(int ch) /* Encode code length */
+ {
+ int width;
+ int prefix;
+ int pwr;
+ pwr = 1 << STRTL;
+ for (width = STRTL; ch >= pwr; width += STEPL, pwr <<= 1)
+ ch -= pwr;
+ if ((prefix = width - STRTL) != 0)
+ PutBits(0xffff, prefix);
+ if (width < STOPL)
+ width++;
+ /* else if (width > STOPL)
+ abort(); do nothing */
+ PutBits(ch, width);
+ }
+
+
+
+Costanzo, Robinson & Ullmann [Page 30]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ main(int ac, char **av) /* main program */
+ {
+ uint r, s, i, c;
+ uchar *p, *rp;
+ int match_position;
+ int match_length;
+ int len;
+ uint hash, h;
+
+ if (ac < 3) {
+ fprintf(stderr, "usage: juencode in out\n");
+ return(1);
+ }
+
+ in = fopen(av[1], "rb");
+ if (!in) {
+ fprintf(stderr, "Can't open %s\n", av[1]);
+ return(1);
+ }
+
+ out = fopen(av[2], "w");
+ if (!out) {
+ fprintf(stderr, "Can't open %s\n", av[2]);
+ fclose(in);
+ return(1);
+ }
+
+ char_ct = 0;
+ in_count = 0;
+ out_count = 0;
+ putbuf = 0;
+ putlen = 0;
+ hash = 0;
+
+ MakeCrctable();
+ crc = CRC_MASK;
+
+ fprintf(out, "* LZJU90 %s\n", av[1]);
+
+ /* The hash table inititialization is somewhat arbitrary */
+ for (i = 0; i < K; i++) hash_table[i] = i % N;
+
+ r = 0;
+ s = 0;
+
+ /* Fill lookahead buffer */
+
+ for (len = 0; len < F && (c = fgetc(in)) != EOF; len++) {
+
+
+
+Costanzo, Robinson & Ullmann [Page 31]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ UPDATE_CRC(crc, c);
+ in_count++;
+ window_text[s++] = c;
+ }
+
+
+ while (len > 0) {
+ /* look for match in window at hash position */
+ h = ((((window_text[r] << 5) ^ window_text[r+1])
+ << 5) ^ window_text[r+2]);
+ p = window_text + hash_table[h % K];
+ rp = window_text + r;
+ for (i = 0, match_length = 0; i < F; i++) {
+ if (*p++ != *rp++) break;
+ match_length++;
+ }
+ match_position = r - hash_table[h % K];
+ if (match_position <= 0) match_position += N;
+
+ if (match_position > N - F - 2) match_length = 0;
+ if (match_position > in_count - len - 2)
+ match_length = 0; /* ! :-) */
+
+ if (match_length > len)
+ match_length = len;
+ if (match_length < THRESHOLD) {
+ EncodeLength(0);
+ PutBits(window_text[r], 8);
+ match_length = 1;
+ }
+ else {
+ EncodeLength(match_length - THRESHOLD + 1);
+ EncodePosition(match_position);
+ }
+
+ for (i = 0; i < match_length &&
+ (c = fgetc(in)) != EOF; i++) {
+ UPDATE_CRC(crc, c);
+ in_count++;
+ window_text[s] = c;
+ if (s < F - 1)
+ window_text
+ [s + N] = c;
+ if (++s > N - 1) s = 0;
+ hash = ((hash << 5) ^ window_text[r]);
+ if (r > 1) hash_table[hash % K] = r - 2;
+ if (++r > N - 1) r = 0;
+ }
+
+
+
+Costanzo, Robinson & Ullmann [Page 32]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ while (i++ < match_length) {
+ if (++s > N - 1) s = 0;
+ hash = ((hash << 5) ^ window_text[r]);
+ if (r > 1) hash_table[hash % K] = r - 2;
+ if (++r > N - 1 ) r = 0;
+ len--;
+ }
+ }
+
+
+ /* end compression indicator */
+ EncodeLength(1);
+ EncodePosition(0);
+ PutBits(0, 7);
+
+ fprintf(out, "\n* %lu %08lX\n", in_count, crc);
+ fprintf(stderr, "Encoded %lu bytes to %lu symbols\n",
+ in_count, out_count);
+
+ fclose(in);
+ fclose(out);
+
+ return(0);
+ }
+
+
+5.3.2 Example LZJU90 Compressed Object
+
+ The following is an example of an LZJU90 compressed object. Using
+ this as source for the program in section 5.3 will reveal what it is.
+
+ Encoding: 7 LZJU90 Text
+
+ * LZJU90 example
+ 8-mBtWA7WBVZ3dEBtnCNdU2WkE4owW+l4kkaApW+o4Ir0k33Ao4IE4kk
+ bYtk1XY618NnCQl+OHQ61d+J8FZBVVCVdClZ2-LUI0v+I4EraItasHbG
+ VVg7c8tdk2lCBtr3U86FZANVCdnAcUCNcAcbCMUCdicx0+u4wEETHcRM
+ 7tZ2-6Btr268-Eh3cUAlmBth2-IUo3As42laIE2Ao4Yq4G-cHHT-wCEU
+ 6tjBtnAci-I++
+ * 190 081E2601
+
+
+
+
+
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 33]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+6. Alphabetical Listing of Defined Encodings
+
+
+ Keyword Description Section Reference(s)
+ _______ ___________ _______ ____________
+
+ EDIFACT EDIFACT format 3.5
+ EDI-X12 EDI X12 format 3.5 ANSI X12
+ EVFU FORTRAN format 3.4
+ FS File System format 3.6, 4
+ Hex Hex binary format 3.3
+ LZJU90 LZJU90 format 3.7, 5
+ LZW LZW format 3.8
+ Message Encapsulated Message 3.2 STD 11, RFC 822
+ PEM, PEM-Clear Privacy Enhanced Mail 3.10 RFC 1421-1424
+ PGP Pretty Good Privacy 3.11
+ Postscript Postscript format 3.14 [8]
+ Shar Shell Archive format 3.15
+ Signature Signature 3.12
+ Tar Tar format 3.13
+ Text Text 3.1 IS 10646
+ uuencode uuencode format 3.9
+ URL external URL-reference 3.16
+
+7. Security Considerations
+
+ Security of content and the receiving (decoding) system is discussed
+ in sections 3.10, 3.11, 3.15, and 4.2.10. The considerations
+ mentioned also apply to other encodings and attributes with similar
+ functions.
+
+8. References
+
+ [1] Robinson, D. and R. Ullmann, "Encoding Header Field for Internet
+ Messages", RFC 1154, Prime Computer, Inc., April 1990.
+
+ [2] Crocker, D., "Standard for the Format of ARPA Internet Text
+ Messages", STD 11, RFC 822, University of Delaware, August 1982.
+
+ [3] International Organization for Standardization, Information
+ Technology -- Universal Coded Character Set (UCS). ISO/IEC
+ 10646-1:1993, June 1993.
+
+ [4] Linn, J., "Privacy Enhancement for Internet Electronic Mail: Part
+ I: Message Encryption and Authentication Procedures" RFC 1421,
+ IAB IRTF PSRG, IETF PEM WG, February 1993.
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 34]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+ [5] Kent, S., "Privacy Enhancement for Internet Electronic Mail: Part
+ II: Certificate-Based Key Management", RFC 1422, IAB IRTF PSRG,
+ IETF PEM, BBN, February 1993.
+
+ [6] Balenson, D., "Privacy Enhancement for Internet Electronic Mail:
+ Part III: Algorithms, Modes, and Identifiers", RFC 1423, IAB IRTF
+ PSRG, IETF PEM WG, TIS, February 1993.
+
+ [7] Kaliski, B., "Privacy Enhancement for Internet Electronic Mail:
+ Part IV: Key Certification and Related Services", RFC 1424, RSR
+ Laboratories, February 1993.
+
+ [8] Adobe Systems Inc., PostScript Language Reference Manual. 2nd
+ Edition, 2nd Printing, January 1991.
+
+ [9] Rose, M. and E. Steffererud, "Proposed Standard for Message
+ Encapsulation", RFC 934, Delaware and NMA, January 1985.
+
+ [10] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 821,
+ USC/Information Sciences Institute, August 1982.
+
+ [11] Borenstein, N., and N. Freed, "MIME (Multipurpose Internet Mail
+ Extensions): Mechanisms for Specifying and Describing the Format
+ of Internet Message Bodies", RFC 1341, Bellcore, Innosoft, June
+ 1992.
+
+ [12] Borenstein, N., and M. Linimon, "Extension of MIME Content-Types
+ to a New Medium", RFC 1437, 1 April 1993.
+
+9. Acknowledgements
+
+ The authors would like to thank Robert Jung for his contributions to
+ this work, in particular the public domain sample code for LZJU90.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 35]
+
+RFC 1505 Encoding Header Field August 1993
+
+
+10. Authors' Addresses
+
+ Albert K. Costanzo
+ AKC Consulting Inc.
+ P.O. Box 4031
+ Roselle Park, NJ 07204-0531
+
+ Phone: +1 908 298 9000
+ Email: AL@AKC.COM
+
+
+ David Robinson
+ Computervision Corporation
+ 100 Crosby Drive
+ Bedford, MA 01730
+
+ Phone: +1 617 275 1800 x2774
+ Email: DRB@Relay.CV.COM
+
+
+ Robert Ullmann
+
+ Phone: +1 617 247 7959
+ Email: ariel@world.std.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Costanzo, Robinson & Ullmann [Page 36]
+ \ No newline at end of file