diff options
Diffstat (limited to 'doc/rfc/rfc373.txt')
-rw-r--r-- | doc/rfc/rfc373.txt | 218 |
1 files changed, 218 insertions, 0 deletions
diff --git a/doc/rfc/rfc373.txt b/doc/rfc/rfc373.txt new file mode 100644 index 0000000..fe75d79 --- /dev/null +++ b/doc/rfc/rfc373.txt @@ -0,0 +1,218 @@ + + + + + + +NWG/RFC #373 14 July 1972 +NIC 11058 SU-AI + + + ARBITRARY CHARACTER SETS + + by John McCarthy + +It would be nice to be able to have documents stored in computers that +could include arbitrary characters and to be able to display them on +any CRT screen, edit them using any keyboard, and print them on any +printer. The object of this memorandum is to suggest how to get there +from here with special reference to the ARPA network. + +Where are we now? + + (1) At present, there is 96 character ASCII, and everyone agrees that + it should be included in any larger set. + + (2) Many installations are dependent on 64 character sets which do not + even include the lower case latin alphabet. + + (3) At the Stanford Artificial Intelligence Laboratory, we have a 114 + character set that includes 96 character ASCII and which is + implemented in our keyboards, displays, and line printer + + (4) Printers are becoming available that get their character designs + out of memory, for example, the Xerox XGP printer, one of which we are + getting. + + (5) The IMLAC type display has the character designs in main memory so + that changing the displayed set is just a matter of reloading the + memory. + + (6) Many display systems share the character generator among many + display units. In some of these, e.g. the Datadisc, arbitrary sets + are probably feasible (using kludgery to be described later), but in + other systems, e.g. our III's arbitrary sets are not feasible. + +One possible approach to communication in expanded character sets is +to produce an expanded standard set of characters, perhaps using 8 or +9 bits and expect new equipment to implement this set. This approach +has the disadvantage that it will be very hard to get agreement on +what the next step should be, and even if formal agreement is +realized, many groups will find it in their interest to ignore the +standard. + + + + + + [Page 1] + +NWG/RFC# 373 JMC 14-JUL-72 12:41 11058 +ARBITRARY CHARACTER SETS by John McCarthy + +Therefore, I would like to suggest that the next step be to arbitrary +character sets. I suggest implementing this in the following way: + + (1) There be established a registry of characters. Anyone can + register a new character. Each character has a unique number, 17 bits + should be enough even to include Chinese. Besides this, each + character has a name in ASCII usually mnemonic. Finally, the + character has a design which is a picture on a 50 by 50 dot matrix. + + (2) Besides the registry of characters, there is a registry of + characters sets, which different groups are using for different + classes of documents. A registered character set has a registry + number and a table giving the correspondence between the character + codes as bit sequences and the registered character numbers. + + (3) Associated with a document is a statement of the character code + used therein. This may be one of the registered codes or it may + contain in addition modifications described by an auxiliary table + giving the code correspondence with registered character numbers. A + character code may have an escape character that says that the next + character is described by its registry number. The statement of the + character code may be a header on the document or the receiver may + have to learn it by some other means, e.g. because its library + catalog entry contains this information. + + (4) Devices such as printers and displays draw characters in different + ways and standardization doesn't seem feasible at present. Therefore, + it is necessary to provide a way of going from the standard + description of a character using a 50 by 50 dot matrix to whatever + method the device uses. This is up to the programmers who are + supporting the device. Some may choose to manually create files + describing how registered characters are implemented. They may find + it too much work to provide for all the characters and to update their + files when new characters are registered. Others will provide + programs for going from the registered descriptions to descriptions + compatible with their implementations. Perhaps most will hand tailor + the characters most used and provide a program for the others. + + + + + + + + + + + + [Page 2] + +NWG/RFC# 373 JMC 14-JUL-72 12:41 11058 +ARBITRARY CHARACTER SETS by John McCarthy + + (5) The easiest device to handle is the line printer because it is + slow. At the beginning of the print job, the SPOOL program will look + up the character set and load the printer's memory with the character + designs used in the particular document. Sometimes, it may have to go + through the network to one of the computers that stores the registry + in order to find out what to do. + + (6) Display systems that have a character memory for each display unit + can be handled in about the same way. Users will occasionally + experience delays when the display programs are surprised by + unfamiliar characters. + + (7) Display systems that share character memories require more + complicated treatment. The object is to keep the memory large enough + to keep all the characters that the current set of users is using and + to handle the required table lookups from the different character + codes in a nice way. There will be limitations on the diversity of + character sets that can be in use simultaneously. Systems like the + Datadisc that only look up the character when it is first written can + be extended to work with large sets. Systems that have to look up + each character code 30 times per second in order to maintain the + display won't work so well. + +I have no special ideas about how to make keyboards adaptable to +arbitrary sets. Each user may have to fend for himself. + +In this memorandum so far, I have ignored typography, i.e. the fact +that in printed documents the same letter may be printed in many +fonts. Perhaps, each character in each font will require a separate +registered description, but with a constant difference between the +numbers of the same character in different fonts. Installations will +again have to decide what font distinctions they will implement. + +Some other issues that might be considered are whether means can be +provided to adapt texts automatically to the line and page lengths of +the different devices. + +It seems to me most likely that the typographical problems cannot be +solved at this time, and it would be best to adopt conventions for +registering character designs at this time, and leave typography for +later. + + + + + + + + [Page 3] + +NWG/RFC# 373 JMC 14-JUL-72 12:41 11058 +ARBITRARY CHARACTER SETS by John McCarthy + +In my opinion, there is no real obstacle to establishing the registry +in the ARPA network now, getting the standards organization to work, +and being able to exchange documents in extended character sets as +soon as the various installations can acquire the printers and display +devices. + +It is the present policy of the Stanford Artificial Intelligence +Laboratory to acquire no more devices that are wedded to fixed +character sets. + + + + + + [ This RFC was put into machine readable form for entry ] + [ into the online RFC archives by BBN Corp. under the ] + [ direction of Alex McKenzie. 1/97 ] + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + [Page 4] + |