From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc2731.txt | 1291 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1291 insertions(+) create mode 100644 doc/rfc/rfc2731.txt (limited to 'doc/rfc/rfc2731.txt') diff --git a/doc/rfc/rfc2731.txt b/doc/rfc/rfc2731.txt new file mode 100644 index 0000000..1d1d194 --- /dev/null +++ b/doc/rfc/rfc2731.txt @@ -0,0 +1,1291 @@ + + + + + + +Network Working Group J. Kunze +Request for Comments: 2731 Dublin Core +Category: Informational Metadata Initiative + December 1999 + + + Encoding Dublin Core Metadata in HTML + + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +1. Abstract + + The Dublin Core [DC1] is a small set of metadata elements for + describing information resources. This document explains how these + elements are expressed using the META and LINK tags of HTML + [HTML4.0]. A sequence of metadata elements embedded in an HTML file + is taken to be a description of that file. Examples illustrate + conventions allowing interoperation with current software that + indexes, displays, and manipulates metadata, such as [SWISH-E], + [freeWAIS-sf2.0], [GLIMPSE], [HARVEST], [ISEARCH], etc., and the Perl + [PERL] scripts in the appendix. + +2. HTML, Dublin Core, and Non-Dublin Core Metadata + + The Dublin Core (DC) metadata initiative [DCHOME] has produced a + small set of resource description categories [DC1], or elements of + metadata (literally, data about data). Metadata elements are + typically small relative to the resource they describe and may, if + the resource format permits, be embedded in it. Two such formats are + the Hypertext Markup Language (HTML) and the Extensible Markup + Language (XML); HTML is currently in wide use, but once standardized, + XML [XML] in conjunction with the Resource Description Framework + [RDF] promise a significantly more expressive means of encoding + metadata. The [RDF] specification actually describes a way to use + RDF within an HTML document by adhering to an abbreviated syntax. + + + + + + + +Kunze Informational [Page 1] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + This document explains how to encode metadata using HTML 4.0 + [HTML4.0]. It is not concerned with element semantics, which are + defined elsewhere. For illustrative purposes, some element semantics + are alluded to, but in no way should semantics appearing here be + considered definitive. + + The HTML encoding allows elements of DC metadata to be interspersed + with non-DC elements (provided such mixing is consistent with rules + governing use of those non-DC elements). A DC element is indicated + by the prefix "DC", and a non-DC element by another prefix; for + example, the prefix "AC" is used with elements from the A-Core [AC]. + +3. The META Tag + + The META tag of HTML is designed to encode a named metadata element. + Each element describes a given aspect of a document or other + information resource. For example, this tagged metadata element, + + + + says that Homer Simpson is the Creator, where the element named + Creator is defined in the DC element set. In the more general form, + + + + the capitalized words are meant to be replaced in actual + descriptions; thus in the example, + + ELEMENT_NAME was: Creator + ELEMENT_VALUE was: Simpson, Homer + and PREFIX was: DC + + Within a META tag the first letter of a Dublin Core element name is + capitalized. DC places no restriction on alphabetic case in an + element value and any number of META tagged elements may appear + together, in any order. More than one DC element with the same name + may appear, and each DC element is optional. The next example is a + book description with two authors, two titles, and no other metadata. + + + + + + + + + +Kunze Informational [Page 2] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + + + + The prefix "DC" precedes each Dublin Core element encoded with META, + and it is separated by a period (.) from the element name following + it. Each non-DC element should be encoded with a prefix that can be + used to trace its origin and definition; the linkage between prefix + and element definition is made with the LINK tag, as explained in the + next section. Non-DC elements, such as Email from the A-Core [AC], + may appear together with DC elements, as in + + + + + + This example also shows how some special characters may be encoded. + The author name in the first element contains a diacritic encoded as + an HTML character entity reference -- in this case an accented letter + E. Similarly, the last line contains two double-quote characters + encoded so as to avoid being interpreted as element content + delimiters. + +4. The LINK Tag + + The LINK tag of HTML may be used to associate an element name prefix + with the reference definition of the element set that it identifies. + A sequence of META tags describing a resource is incomplete without + one such LINK tag for each different prefix appearing in the + sequence. The previous example could be considered complete with the + addition of these two LINK tags: + + + + + In general, the association takes the form + + + + + + + + +Kunze Informational [Page 3] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + where, in actual descriptions, PREFIX is to be replaced by the prefix + and LOCATION_OF_DEFINITION by the URL or URN of the defining + document. When embedded in the HEAD part of an HTML file, a sequence + of LINK and META tags describes the information in the surrounding + HTML file itself. Here is a complete HTML file with its own embedded + description. + + + + A Dirge + + + + + + + + +
+               Rough wind, that moanest loud
+                 Grief too sad for song;
+               Wild wind, when sullen cloud
+                 Knells all the night long;
+               Sad storm, whose tears are vain,
+               Bare woods, whose branches strain,
+               Deep caves and dreary main, -
+                 Wail, for the world's wrong!
+       
+ + +5. Encoding Recommendations + + HTML allows more flexibility in principle and in practice than is + recommended here for encoding metadata. Limited flexibility + encourages easy development of software for extracting and processing + metadata. At this early evolutionary stage of internet metadata, + easy prototyping and experimentation hastens the development of + useful standards. + + + + + + +Kunze Informational [Page 4] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Adherence is therefore recommended to the tagging style exemplified + in this document as regards prefix and element name capitalization, + double-quoting (") of attribute values, and not starting more than + one META tag on a line. There is much room for flexibility, but + choosing a style and sticking with it will likely make metadata + manipulation and editing easier. The following META tags adhere to + the recommendations and carry identical metadata in three different + styles: + + + + + + Use of these recommendations is known to result in metadata that may + be harvested, indexed, and manipulated by popular, freely available + software packages such as [SWISH-E], [freeWAIS-sf2.0], [GLIMPSE], + [HARVEST], and [ISEARCH], among others. These conventions also work + with the metadata processing scripts appearing in the appendix, as + well as with most of the [DCPROJECTS] applications referenced from + the [DCHOME] site. Software support for the LINK tag and qualifier + conventions (see the next section) is not currently widespread. + + Ordering of metadata elements is not preserved in general. Writers + of software for metadata indexing and display should try to preserve + relative ordering among META tagged elements having the same name + (e.g., among multiple authors), however, metadata providers and + searchers have no guarantee that ordering will be preserved in + metadata that passes through unknown systems. + +6. Dublin Core in Real Descriptions + + In actual resource description it is often necessary to qualify + Dublin Core elements to add nuances of meaning. While neither the + general principles nor the specific semantics of DC qualifiers are + within scope of this document, everyday uses of the qualifier syntax + are illustrated to lend realism to later examples. Without further + explanation, the three ways in which the optional qualifier syntax is + currently (subject to change) used to supplement the META tag may be + summarized as follows: + + + + + + + + +Kunze Informational [Page 5] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + + + Accordingly, a posthumous work in Spanish might be described with + + + + + + + + Note that the qualifier syntax and label suffixes (which follow an + element name and a period) used in examples in this document merely + reflect current trends in the HTML encoding of qualifiers. Use of + this syntax and these suffixes is neither a standard nor a + recommendation. + +7. Encoding Dublin Core Elements + + This section consists of very simple Dublin Core encoding examples, + arranged by element. + + Title (name given to the resource) + ----- + + + + + + + + + + + + + + +Kunze Informational [Page 6] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Creator (entity that created the content) + ------- + + + + + + + + + + + + + + + Subject (topic or keyword) + ------- + + + + + + + + + + + + + + +Kunze Informational [Page 7] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Description (account, summary, or abstract of the content) + ----------- + + + + + + + + Publisher (entity that made the resource available) + --------- + + + + + + + + + + Contributor (other entity that made a contribution) + ----------- + + + + + + + + + + + +Kunze Informational [Page 8] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Date (of an event in the life of the resource; [WTN8601] recommended) + ---- + + + + + + + + + + + + + + + + + + + + Type (nature, genre, or category; [DCT1] recommended) + ---- + + + + + + + + + +Kunze Informational [Page 9] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + + + + + + + + + + + + + + + + Format (physical or digital data format, plus optional dimensions) + ------ + + + + + + + + + + + + + +Kunze Informational [Page 10] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + + + + + Identifier (of the resource) + ---------- + + + + + + + + + + + + Source (reference to the resource's origin) + ------ + + + + + + Language (of the content of the resource; [RFC1766] recommended) + -------- + + + + + + + + +Kunze Informational [Page 11] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + + + + + + + + + + + Relation (reference to a related resource) + -------- + + + + + + + + + + + + + + Coverage (extent or scope of the content) + -------- + + + + + + + +Kunze Informational [Page 12] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + + + + + + + Rights (text or identifier of a rights management statement) + ------ + + + + + +8. Security Considerations + + The syntax rules for encoding Dublin Core metadata in HTML that are + documented here pose no direct risk to computers and networks. + People can use these rules to encode metadata that is inaccurate or + even deliberately misleading (creating mischief in the form of "index + spam"), however, this reflects a general pattern of HTML META tag + abuse that is not limited to the encoding of metadata from the Dublin + Core set. Even traditional metadata encoding schemes (e.g., [MARC]) + are not immune to inaccuracy, although they are generally followed in + environments where production quality greatly exceeds that of the + average Web site. + + Systems that process metadata encoded with META tags need to consider + issues related to its accuracy and validity as part of their design + and implementation, and users of such systems need to consider the + design and implementation assumptions. Various approaches may be + relevant for certain applications, such as adding statements of + metadata provenance, signing of metadata with digital signatures, and + automating certain aspects of metadata creation; but these are far + outside the scope of this document and the underlying META tag syntax + that it describes. + + + + + + + + + +Kunze Informational [Page 13] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +9. Appendix -- Perl Scripts that Manipulate HTML Encoded Metadata + + This section contains two simple programs that work with versions 4 + and 5 of the Perl [PERL] scripting language interpreter. They may be + taken and freely adapted for local organizational needs, research + proposals, venture capital bids, etc. A variety of applications are + within easy reach of implementors that choose to build on these + scripts. + + Script 1: Metadata Format Conversion + ------------------------------------- + + Here is a simple Perl script that correctly recognizes every example + of metadata encoding in this document. It shows how a modest + scripting effort can produce a utility that converts metadata from + one format to another. Minor changes are sufficient to support a + number of output formats. + +#!/depot/bin/perl +# +# This simple perl script extracts metadata embedded in an HTML file +# and outputs it in an alternate format. Issues warning about missing +# element name or value. +# +# Handles mixed case tags and attribute values, one per line or spanning +# several lines. Also handles a quoted string spanning multiple lines. +# No error checking. Does not tolerate more than one ") { + next if (! //i) { + while (<>) { + $meta .= $_; + last if (/>/); + } + } + $name = $meta =~ /name\s*=\s*"([^"]*)"/i + ? $1 : "MISSING ELEMENT NAME"; + $content = $meta =~ /content\s*=\s*"([^"]*)"/i + ? $1 : "MISSING ELEMENT VALUE"; + ($scheme) = $meta =~ /scheme\s*=\s*"([^"]*)"/i; + ($lang) = $meta =~ /lang\s*=\s*"([^"]*)"/i; + + if ($lang || $scheme) { + $mod = " ($lang"; + if (! $scheme) + + + +Kunze Informational [Page 14] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + { $mod .= ")"; } + elsif (! $lang) + { $mod .= "$scheme)" } + else + { $mod .= ", $scheme)"; } + } + else + { $mod = ""; } + + print " @|$name$mod; $content\n"; +} +print "@)urc;\n"; +# ---- end of Perl script ---- + + When the conversion script is run on the metadata file example from + the LINK tag section (section 4), it produces the following output. + + @(urc; + @|DC.Title; A Dirge + @|DC.Creator; Shelley, Percy Bysshe + @|DC.Type; poem + @|DC.Date; 1820 + @|DC.Format; text/html + @|DC.Language; en + @)urc; + + Script 2: Automated Metadata Creation + -------------------------------------- + + The creation and maintenance of high-quality metadata can be + extremely expensive without automation to assist in processes such as + supplying pre-set or computed defaults, validating syntax, verifying + value ranges, spell checking, etc. Considerable relief could be had + from a script that reduced an individual provider's metadata burden + to just the title of each document. Below is such a script. It lets + the provider of an HTML document abbreviate an entire embedded + resource description using a single HTML comment statement that looks + like + + + + Our script processes this statement specially as a kind of "metadata + block" declaration with attached title. The general form is + + + + + + + + +Kunze Informational [Page 15] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + This statement works much like a "Web server-side include" in that + the script replaces it with a fully-specified block of metadata and + triggers other replacements. Once installed, the script can output + HTML files suitable for integration into one's production Web server + procedures. + + The individual provider keeps a separate "template" file of + infrequently changing pre-set values for metadata elements. If the + provider's needs are simple enough, the only element values besides + the title that differ from one document to the next may be generated + automatically. Using the script, values may be referenced as + variables from within the template or within the document. Our + variable references have the form "(--mbVARNAME)", and here is what + they look like inside a template: + + (--mbtitle) + + + + + + + + + + + The above template represents the metadata block that will describe + the document once the variable references are replaced with real + values. By the conventions of our script, the following variables + will be replaced in both the template and in the document: + + (--mbfilesize) size of the final output file + (--mbtitle) title of the document + (--mblanguage) language of the document + (--mbbaseURL) beginning part of document identifier + (--mbfilename) last part (minus .html) of identifier + (--mbfilemodtime) last modification date of the document + + + + + +Kunze Informational [Page 16] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Here's an example HTML file to run the script on. + + + + + + + +

+ From: Acting Shift Supervisor + To: Plant Control Personnel + RE: (--mbtitle) + Date: (--mbfilemodtime) +

+ Pursuant to directive DOH:10.2001/405aec of article B-2022, + subsection 48.2.4.4.1c regarding staff morale and employee + productivity standards, the current allocation of doughnut + acquisition funds shall be increased effective immediately. + + + + Note that because replacement occurs throughout the document, the + provider need only enter the title once instead of twice (normally + the title must be entered once in the HTML head and again in the HTML + body). After running the script, the above file is transformed into + this: + + + + Nutritional Allocation Increase + + + + + + + + + + + + +

+ From: Acting Shift Supervisor + To: Plant Control Personnel + RE: Nutritional Allocation Increase + Date: 1999-03-08 +

+ Pursuant to directive DOH:10.2001/405aec of article B-2022, + subsection 48.2.4.4.1c regarding staff morale and employee + productivity standards, the current allocation of doughnut + acquisition funds shall be increased effective immediately. + + + + Here is the script that accomplishes this transformation. + +#!/depot/bin/perl +# +# This Perl script processes metadata block declarations of the form +# and variable references of the +# form (--mbVARNAME), replacing them with full metadata blocks and +# variable values, respectively. Requires a "template" file. +# Outputs an HTML file. +# +# Invoke this script with a single filename argument, "foo". It creates +# an output file "foo.html" using a temporary working file "foo.work". +# The size of foo.work is measured after variable replacement, and is +# later inserted into the file in such a way that the file's size does +# not change in the process. Has little or no error checking. + +$infile = shift; +open(IN, "< $infile") + or die("Could not open input file \"$infile\""); +$workfile = "$infile.work"; +unlink($workfile); +open(WORK, "+> $workfile") + or die("Could not open work file \"$workfile\""); + +@offsets = (); # records locations for late size replacement +$title = ""; # gets the title during metablock processing +$language = "en"; # pre-set language here (not in the template) +$baseURL = "http://moes.bar.com/doh"; # pre-set base URL here also +$filename = "$infile.html"; # final output filename +$filesize = "(--mbfilesize)"; # replaced late (separate pass) + + + +Kunze Informational [Page 18] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +($year, $month, $day) = (localtime( (stat IN) [9] ))[5, 4, 3]; +$filemodtime = sprintf "%s-%02s-%02s", 1900 + $year, 1 + $month, $day; + +sub putout { # outputs current line with variable replacement + if (! /\(--mb/) { + print WORK; + return; + } + if (/\(--mbfilesize\)/) # remember where it was + { push @offsets, tell WORK; } # but don't replace yet + s/\(--mbtitle\)/$title/g; + s/\(--mblanguage\)/$language/g; + s/\(--mbbaseURL\)/$baseURL/g; + s/\(--mbfilename\)/$filename/g; + s/\(--mbfilemodtime\)/$filemodtime/g; + print WORK; +} + +while () { # main loop for input file + if (! /(.*)(.*)//) { + $remainder = $1; + } + else { + while () { + $title .= $_; + last if (/(.*)\s*-->(.*)/); + } + $title .= $1; + $remainder = $2; + } + open(TPLATE, "< template") + or die("Could not open template file"); + while () # subloop for template file + { &putout; } + close(TPLATE); + $_ = $remainder; + &putout; + + + + + + + +Kunze Informational [Page 19] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +} +close(IN); + +# Now replace filesize variables without altering total byte count. +select( (select(WORK), $| = 1) [0] ); # first flush output so we +if (($size = -s WORK) < 100000) # can get final file size + { $scale = 0; } # and set scale factor or +else { # compute it, keeping width of size field low + for ($scale = 0; $size >= 1000; $scale++) + { $size /= 1024; } +} +$filesize = sprintf "%7.7s %sbytes", + $size, (" ", "K", "M", "G", "T", "P") [$scale]; + +foreach $pos (@offsets) { # loop through saved size locations + seek WORK, $pos, 0; # read the line found there + $_ = ; + # $filesize must be exactly as wide as "(--mbfilesize)" + s/\(--mbfilesize\)/$filesize/g; + seek WORK, $pos, 0; # rewrite it with replacement + print WORK; +} + +close(WORK); +rename($workfile, "$filename") + or die("Could not rename \"$workfile\" to \"$filename\""); +# ---- end of Perl script ---- + + + + + + + + + + + + + + + + + + + + + + + + +Kunze Informational [Page 20] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +10. Author's Address + + John A. Kunze + Center for Knowledge Management + University of California, San Francisco + 530 Parnassus Ave, Box 0840 + San Francisco, CA 94143-0840, USA + + Fax: +1 415-476-4653 + EMail: jak@ckm.ucsf.edu + + +11. References + + [AAT] Art and Architecture Thesaurus, Getty Information + Institute. + http://shiva.pub.getty.edu/aat_browser/ + + [AC] The A-Core: Metadata about Content Metadata, (in + progress) + http://metadata.net/ac/draft-iannella-admin-01.txt + + [DC1] Weibel, S., Kunze, J., Lagoze, C. and M. Wolf, + "Dublin Core Metadata for Resource Discovery", RFC + 2413, September 1998. + ftp://ftp.isi.edu/in-notes/rfc2413.txt + + [DCHOME] Dublin Core Initiative Home Page. + http://purl.org/DC/ + + [DCPROJECTS] Projects Using Dublin Core Metadata. + http://purl.org/DC/projects/index.htm + + [DCT1] Dublin Core Type List 1, DC Type Working Group, + March 1999. + http://www.loc.gov/marc/typelist.html + + [freeWAIS-sf2.0] The enhanced freeWAIS distribution, February 1999. + http://ls6-www.cs.uni- + dortmund.de/ir/projects/freeWAIS-sf/ + + [GLIMPSE] Glimpse Home Page. + http://glimpse.cs.arizona.edu/ + + [HARVEST] Harvest Web Indexing. + http://www.tardis.ed.ac.uk/harvest/ + + + + + +Kunze Informational [Page 21] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + [HTML4.0] Hypertext Markup Language 4.0 Specification, April + 1998. + http://www.w3.org/TR/REC-html40/ + + [ISEARCH] Isearch Resources Page. + http://www.etymon.com/Isearch/ + + [ISO639-2] Code for the representation of names of languages, + 1996. + http://www.indigo.ie/egt/standards/iso639/iso639-2- + en.html + + [ISO8601] ISO 8601:1988(E), Data elements and interchange + formats -- Information interchange -- Representation + of dates and times, International Organization for + Standardization, June 1988. + http://www.iso.ch/markete/8601.pdf + + [MARC] USMARC Format for Bibliographic Data, US Library of + Congress. + http://lcweb.loc.gov/marc/marc.html + + [PERL] L. Wall, T. Christiansen, R. Schwartz, Programming + Perl, Second Edition, O'Reilly, 1996. + + [RDF] Resource Description Framework Model and Syntax + Specification, February 1999. + http://www.w3.org/TR/REC-rdf-syntax/ + + [RFC1766] Alvestrand, H., "Tags for the Identification of + Languages", RFC 1766, March 1996. + ftp://ftp.isi.edu/in-notes/rfc1766.txt + + [SWISH-E] Simple Web Indexing System for Humans - Enhanced. + http://sunsite.Berkeley.EDU/SWISH-E/ + + [TGN] Thesaurus of Geographic Names, Getty Information + Institute. + http://shiva.pub.getty.edu/tgn_browser/ + + [WTN8601] W3C Technical Note - Profile of ISO 8601 Date and + Time Formats. + http://www.w3.org/TR/NOTE-datetime + + [XML] Extensible Markup Language (XML). + http://www.w3.org/TR/REC-xml + + + + + +Kunze Informational [Page 22] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +12. Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Kunze Informational [Page 23] + -- cgit v1.2.3