diff options
Diffstat (limited to 'doc/rfc/rfc2731.txt')
-rw-r--r-- | doc/rfc/rfc2731.txt | 1291 |
1 files changed, 1291 insertions, 0 deletions
diff --git a/doc/rfc/rfc2731.txt b/doc/rfc/rfc2731.txt new file mode 100644 index 0000000..1d1d194 --- /dev/null +++ b/doc/rfc/rfc2731.txt @@ -0,0 +1,1291 @@ + + + + + + +Network Working Group J. Kunze +Request for Comments: 2731 Dublin Core +Category: Informational Metadata Initiative + December 1999 + + + Encoding Dublin Core Metadata in HTML + + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +1. Abstract + + The Dublin Core [DC1] is a small set of metadata elements for + describing information resources. This document explains how these + elements are expressed using the META and LINK tags of HTML + [HTML4.0]. A sequence of metadata elements embedded in an HTML file + is taken to be a description of that file. Examples illustrate + conventions allowing interoperation with current software that + indexes, displays, and manipulates metadata, such as [SWISH-E], + [freeWAIS-sf2.0], [GLIMPSE], [HARVEST], [ISEARCH], etc., and the Perl + [PERL] scripts in the appendix. + +2. HTML, Dublin Core, and Non-Dublin Core Metadata + + The Dublin Core (DC) metadata initiative [DCHOME] has produced a + small set of resource description categories [DC1], or elements of + metadata (literally, data about data). Metadata elements are + typically small relative to the resource they describe and may, if + the resource format permits, be embedded in it. Two such formats are + the Hypertext Markup Language (HTML) and the Extensible Markup + Language (XML); HTML is currently in wide use, but once standardized, + XML [XML] in conjunction with the Resource Description Framework + [RDF] promise a significantly more expressive means of encoding + metadata. The [RDF] specification actually describes a way to use + RDF within an HTML document by adhering to an abbreviated syntax. + + + + + + + +Kunze Informational [Page 1] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + This document explains how to encode metadata using HTML 4.0 + [HTML4.0]. It is not concerned with element semantics, which are + defined elsewhere. For illustrative purposes, some element semantics + are alluded to, but in no way should semantics appearing here be + considered definitive. + + The HTML encoding allows elements of DC metadata to be interspersed + with non-DC elements (provided such mixing is consistent with rules + governing use of those non-DC elements). A DC element is indicated + by the prefix "DC", and a non-DC element by another prefix; for + example, the prefix "AC" is used with elements from the A-Core [AC]. + +3. The META Tag + + The META tag of HTML is designed to encode a named metadata element. + Each element describes a given aspect of a document or other + information resource. For example, this tagged metadata element, + + <meta name = "DC.Creator" + content = "Simpson, Homer"> + + says that Homer Simpson is the Creator, where the element named + Creator is defined in the DC element set. In the more general form, + + <meta name = "PREFIX.ELEMENT_NAME" + content = "ELEMENT_VALUE"> + + the capitalized words are meant to be replaced in actual + descriptions; thus in the example, + + ELEMENT_NAME was: Creator + ELEMENT_VALUE was: Simpson, Homer + and PREFIX was: DC + + Within a META tag the first letter of a Dublin Core element name is + capitalized. DC places no restriction on alphabetic case in an + element value and any number of META tagged elements may appear + together, in any order. More than one DC element with the same name + may appear, and each DC element is optional. The next example is a + book description with two authors, two titles, and no other metadata. + + <meta name = "DC.Title" + content = "The Communist Manifesto"> + <meta name = "DC.Creator" + content = "Marx, K."> + + + + + + +Kunze Informational [Page 2] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + <meta name = "DC.Creator" + content = "Engels, F."> + <meta name = "DC.Title" + content = "Capital"> + + The prefix "DC" precedes each Dublin Core element encoded with META, + and it is separated by a period (.) from the element name following + it. Each non-DC element should be encoded with a prefix that can be + used to trace its origin and definition; the linkage between prefix + and element definition is made with the LINK tag, as explained in the + next section. Non-DC elements, such as Email from the A-Core [AC], + may appear together with DC elements, as in + + <meta name = "DC.Creator" + content = "Da Costa, José"> + <meta name = "AC.Email" + content = "dacostaj@peoplesmail.org"> + <meta name = "DC.Title" + content = "Jesse "The Body" Ventura--A Biography"> + + This example also shows how some special characters may be encoded. + The author name in the first element contains a diacritic encoded as + an HTML character entity reference -- in this case an accented letter + E. Similarly, the last line contains two double-quote characters + encoded so as to avoid being interpreted as element content + delimiters. + +4. The LINK Tag + + The LINK tag of HTML may be used to associate an element name prefix + with the reference definition of the element set that it identifies. + A sequence of META tags describing a resource is incomplete without + one such LINK tag for each different prefix appearing in the + sequence. The previous example could be considered complete with the + addition of these two LINK tags: + + <link rel = "schema.DC" + href = "http://purl.org/DC/elements/1.0/"> + <link rel = "schema.AC" + href = "http://metadata.net/ac/2.0/"> + + In general, the association takes the form + + <link rel = "schema.PREFIX" + href = "LOCATION_OF_DEFINITION"> + + + + + + +Kunze Informational [Page 3] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + where, in actual descriptions, PREFIX is to be replaced by the prefix + and LOCATION_OF_DEFINITION by the URL or URN of the defining + document. When embedded in the HEAD part of an HTML file, a sequence + of LINK and META tags describes the information in the surrounding + HTML file itself. Here is a complete HTML file with its own embedded + description. + + <html> + <head> + <title> A Dirge </title> + <link rel = "schema.DC" + href = "http://purl.org/DC/elements/1.0/"> + <meta name = "DC.Title" + content = "A Dirge"> + <meta name = "DC.Creator" + content = "Shelley, Percy Bysshe"> + <meta name = "DC.Type" + content = "poem"> + <meta name = "DC.Date" + content = "1820"> + <meta name = "DC.Format" + content = "text/html"> + <meta name = "DC.Language" + content = "en"> + </head> + <body><pre> + Rough wind, that moanest loud + Grief too sad for song; + Wild wind, when sullen cloud + Knells all the night long; + Sad storm, whose tears are vain, + Bare woods, whose branches strain, + Deep caves and dreary main, - + Wail, for the world's wrong! + </pre></body> + </html> + +5. Encoding Recommendations + + HTML allows more flexibility in principle and in practice than is + recommended here for encoding metadata. Limited flexibility + encourages easy development of software for extracting and processing + metadata. At this early evolutionary stage of internet metadata, + easy prototyping and experimentation hastens the development of + useful standards. + + + + + + +Kunze Informational [Page 4] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Adherence is therefore recommended to the tagging style exemplified + in this document as regards prefix and element name capitalization, + double-quoting (") of attribute values, and not starting more than + one META tag on a line. There is much room for flexibility, but + choosing a style and sticking with it will likely make metadata + manipulation and editing easier. The following META tags adhere to + the recommendations and carry identical metadata in three different + styles: + + <META NAME="DC.Format" + CONTENT="text/html; 12 Kbytes"> + <meta + Content = "text/html; 12 Kbytes" + Name = "DC.Format" + > + <meta name = "DC.Format" content = "text/html; 12 Kbytes"> + + Use of these recommendations is known to result in metadata that may + be harvested, indexed, and manipulated by popular, freely available + software packages such as [SWISH-E], [freeWAIS-sf2.0], [GLIMPSE], + [HARVEST], and [ISEARCH], among others. These conventions also work + with the metadata processing scripts appearing in the appendix, as + well as with most of the [DCPROJECTS] applications referenced from + the [DCHOME] site. Software support for the LINK tag and qualifier + conventions (see the next section) is not currently widespread. + + Ordering of metadata elements is not preserved in general. Writers + of software for metadata indexing and display should try to preserve + relative ordering among META tagged elements having the same name + (e.g., among multiple authors), however, metadata providers and + searchers have no guarantee that ordering will be preserved in + metadata that passes through unknown systems. + +6. Dublin Core in Real Descriptions + + In actual resource description it is often necessary to qualify + Dublin Core elements to add nuances of meaning. While neither the + general principles nor the specific semantics of DC qualifiers are + within scope of this document, everyday uses of the qualifier syntax + are illustrated to lend realism to later examples. Without further + explanation, the three ways in which the optional qualifier syntax is + currently (subject to change) used to supplement the META tag may be + summarized as follows: + + <meta lang = "LANGUAGE_OF_METADATA_CONTENT" ... > + + <meta scheme = "CONTROLLED_FORMAT_OR_VOCABULARY_OF_METADATA" ... > + + + + +Kunze Informational [Page 5] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + <meta name = "PREFIX.ELEMENT_NAME.SUBELEMENT_NAME" ... > + + Accordingly, a posthumous work in Spanish might be described with + + <meta name = "DC.Language" + scheme = "rfc1766" + content = "es"> + <meta name = "DC.Title" + lang = "es" + content = "La Mesa Verde y la Silla Roja"> + <meta name = "DC.Title" + lang = "en" + content = "The Green Table and the Red Chair"> + <meta name = "DC.Date.Created" + content = "1935"> + <meta name = "DC.Date.Available" + content = "1939"> + + Note that the qualifier syntax and label suffixes (which follow an + element name and a period) used in examples in this document merely + reflect current trends in the HTML encoding of qualifiers. Use of + this syntax and these suffixes is neither a standard nor a + recommendation. + +7. Encoding Dublin Core Elements + + This section consists of very simple Dublin Core encoding examples, + arranged by element. + + Title (name given to the resource) + ----- + + <meta name = "DC.Title" + content = "Polycyclic aromatic hydrocarbon contamination"> + + <meta name = "DC.Title" + content = "Crime and Punishment"> + + <meta name = "DC.Title" + content = "Methods of Information in Medicine, Vol 32, No 4"> + + <meta name = "DC.Title" + content = "Still life #4 with flowers"> + + <meta name = "DC.Title" + lang = "de" + content = "Das Wohltemperierte Klavier, Teil I"> + + + + +Kunze Informational [Page 6] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Creator (entity that created the content) + ------- + + <meta name = "DC.Creator" + content = "Gogh, Vincent van"> + <meta name = "DC.Creator" + content = "van Gogh, Vincent"> + + <meta name = "DC.Creator" + content = "Mao Tse Tung"> + <meta name = "DC.Creator" + content = "Mao, Tse Tung"> + + <meta name = "DC.Creator" + content = "Plato"> + <meta name = "DC.Creator" + lang = "fr" + content = "Platon"> + + <meta name = "DC.Creator.Director" + content = "Sturges, Preston"> + <meta name = "DC.Creator.Writer" + content = "Hecht, Ben"> + <meta name = "DC.Creator.Producer" + content = "Chaplin, Charles"> + + Subject (topic or keyword) + ------- + + <meta name = "DC.Subject" + content = "heart attack"> + <meta name = "DC.Subject" + scheme = "MESH" + content = "Myocardial Infarction; Pericardial Effusion"> + + <meta name = "DC.Subject" + content = "vietnam war"> + <meta name = "DC.Subject" + scheme = "LCSH" + content = "Vietnamese Conflict, 1961-1975"> + + <meta name = "DC.Subject" + content = "Friendship"> + <meta name = "DC.Subject" + scheme = "ddc" + content = "158.25"> + + + + + +Kunze Informational [Page 7] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Description (account, summary, or abstract of the content) + ----------- + + <meta name = "DC.Description" + lang = "en" + content = "The Author gives some Account of Himself and Family + -- His First Inducements to Travel -- He is + Shipwrecked, and Swims for his Life -- Gets safe on + Shore in the Country of Lilliput -- Is made a + Prisoner, and carried up the Country"> + + <meta name = "DC.Description" + content = "A tutorial and reference manual for Java."> + + <meta name = "DC.Description" + content = "Seated family of five, coconut trees to the left, + sailboats moored off sandy beach to the right, + with volcano in the background."> + + Publisher (entity that made the resource available) + --------- + + <meta name = "DC.Publisher" + content = "O'Reilly"> + + <meta name = "DC.Publisher" + content = "Digital Equipment Corporation"> + + <meta name = "DC.Publisher" + content = "University of California Press"> + + <meta name = "DC.Publisher" + content = "State of Florida (USA)"> + + Contributor (other entity that made a contribution) + ----------- + + <meta name = "DC.Contributor" + content = "Curie, Marie"> + + <meta name = "DC.Contributor.Photographer" + content = "Adams, Ansel"> + <meta name = "DC.Contributor.Artist" + content = "Sendak, Maurice"> + <meta name = "DC.Contributor.Editor" + content = "Starr, Kenneth"> + + + + + +Kunze Informational [Page 8] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Date (of an event in the life of the resource; [WTN8601] recommended) + ---- + + <meta name = "DC.Date" + content = "1972"> + + <meta name = "DC.Date" + content = "1998-05-14"> + <meta name = "DC.Date" + scheme = "WTN8601" + content = "1998-05-14"> + + <meta name = "DC.Date.Created" + content = "1998-05-14"> + <meta name = "DC.Date.Available" + content = "1998-05-21"> + <meta name = "DC.Date.Valid" + content = "1998-05-28"> + + <meta name = "DC.Date.Created" + content = "triassic"> + <meta name = "DC.Date.Acquired" + content = "1957"> + + <meta name = "DC.Date.Accepted" + scheme = "WTN8601" + content = "1998-12-02T16:59"> + + <meta name = "DC.Date.DataGathered" + scheme = "ISO8601" + content = "98-W49-3T1659"> + + <meta name = "DC.Date.Issued" + scheme = "ANSI.X3.X30-1985" + content = "19980514"> + + Type (nature, genre, or category; [DCT1] recommended) + ---- + + <meta name = "DC.Type" + content = "poem"> + + <meta name = "DC.Type" + scheme = "DCT1" + content = "software"> + <meta name = "DC.Type" + content = "software program source code"> + + + + +Kunze Informational [Page 9] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + <meta name = "DC.Type" + content = "interactive video game"> + + <meta name = "DC.Type" + scheme = "DCT1" + content = "dataset"> + + <meta name = "DC.Type" + content = "web home page"> + <meta name = "DC.Type" + content = "web bibliography"> + + <meta name = "DC.Type" + content = "painting"> + <meta name = "DC.Type" + content = "image; woodblock"> + <meta name = "DC.Type" + scheme = "AAT" + content = "clipeus (portrait)"> + <meta name = "DC.Type" + lang = "en-US" + content = "image; advertizement"> + + <meta name = "DC.Type" + scheme = "DCT1" + content = "event"> + <meta name = "DC.Type" + content = "event; periodic"> + + Format (physical or digital data format, plus optional dimensions) + ------ + + <meta name = "DC.Format" + content = "text/xml"> + <meta name = "DC.Format" + scheme = "IMT" + content = "text/xml"> + + <meta name = "DC.Format" + scheme = "IMT" + content = "image/jpeg"> + <meta name = "DC.Format" + content = "A text file with mono-spaced tables and diagrams."> + + <meta name = "DC.Format" + content = "video/mpeg; 14 minutes"> + + + + + +Kunze Informational [Page 10] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + <meta name = "DC.Format" + content = "unix tar archive, gzip compressed; 1.5 Mbytes"> + + <meta name = "DC.Format" + content = "watercolor; 23 cm x 31 cm"> + + Identifier (of the resource) + ---------- + + <meta name = "DC.Identifier" + content = "http://foo.bar.org/zaf/"> + + <meta name = "DC.Identifier" + content = "urn:ietf:rfc:1766"> + + <meta name = "DC.Identifier" + scheme = "ISBN" + content = "1-56592-149-6"> + + <meta name = "DC.Identifier" + scheme = "LCCN" + content = "67-26020"> + + <meta name = "DC.Identifier" + scheme = "DOI" + content = "10.12345/33-824688ab"> + + Source (reference to the resource's origin) + ------ + + <meta name = "DC.Source" + content = "Shakespeare's Romeo and Juliet"> + + <meta name = "DC.Source" + content = "http://a.b.org/manon/"> + + Language (of the content of the resource; [RFC1766] recommended) + -------- + + <meta name = "DC.Language" + content = "en"> + <meta name = "DC.Language" + scheme = "rfc1766" + content = "en"> + <meta name = "DC.Language" + scheme = "ISO639-2" + content = "eng"> + + + + +Kunze Informational [Page 11] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + <meta name = "DC.Language" + scheme = "rfc1766" + content = "en-US"> + + <meta name = "DC.Language" + content = "zh"> + <meta name = "DC.Language" + content = "ja"> + <meta name = "DC.Language" + content = "es"> + <meta name = "DC.Language" + content = "de"> + + <meta name = "DC.Language" + content = "german"> + <meta name = "DC.Language" + lang = "fr" + content = "allemand"> + + Relation (reference to a related resource) + -------- + + <meta name = "DC.Relation.IsPartOf" + content = "http://foo.bar.org/abc/proceedings/1998/"> + + <meta name = "DC.Relation.IsFormatOf" + content = "http://foo.bar.org/cd145.sgml"> + + <meta name = "DC.Relation.IsVersionOf" + content = "http://foo.bar.org/draft9.4.4.2"> + + <meta name = "DC.Relation.References" + content = "urn:isbn:1-56592-149-6"> + + <meta name = "DC.Relation.IsBasedOn" + content = "Shakespeare's Romeo and Juliet"> + + <meta name = "DC.Relation.Requires" + content = "LWP::UserAgent; HTML::Parse; URI::URL; + Net::DNS; Tk::Pixmap; Tk::Bitmap; Tk::Photo"> + + Coverage (extent or scope of the content) + -------- + + <meta name = "DC.Coverage" + content = "US civil war era; 1861-1865"> + + + + + +Kunze Informational [Page 12] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + <meta name = "DC.Coverage" + content = "Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W"> + + <meta name = "DC.Coverage" + scheme = "TGN" + content = "Columbus (C,V)"> + + <meta name = "DC.Coverage.Jurisdiction" + content = "Commonwealth of Australia"> + + Rights (text or identifier of a rights management statement) + ------ + + <meta name = "DC.Rights" + lang = "en" + content = "Copyright Acme 1999 - All rights reserved."> + + <meta name = "DC.Rights" + content = "http://foo.bar.org/cgi-bin/terms"> + +8. Security Considerations + + The syntax rules for encoding Dublin Core metadata in HTML that are + documented here pose no direct risk to computers and networks. + People can use these rules to encode metadata that is inaccurate or + even deliberately misleading (creating mischief in the form of "index + spam"), however, this reflects a general pattern of HTML META tag + abuse that is not limited to the encoding of metadata from the Dublin + Core set. Even traditional metadata encoding schemes (e.g., [MARC]) + are not immune to inaccuracy, although they are generally followed in + environments where production quality greatly exceeds that of the + average Web site. + + Systems that process metadata encoded with META tags need to consider + issues related to its accuracy and validity as part of their design + and implementation, and users of such systems need to consider the + design and implementation assumptions. Various approaches may be + relevant for certain applications, such as adding statements of + metadata provenance, signing of metadata with digital signatures, and + automating certain aspects of metadata creation; but these are far + outside the scope of this document and the underlying META tag syntax + that it describes. + + + + + + + + + +Kunze Informational [Page 13] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +9. Appendix -- Perl Scripts that Manipulate HTML Encoded Metadata + + This section contains two simple programs that work with versions 4 + and 5 of the Perl [PERL] scripting language interpreter. They may be + taken and freely adapted for local organizational needs, research + proposals, venture capital bids, etc. A variety of applications are + within easy reach of implementors that choose to build on these + scripts. + + Script 1: Metadata Format Conversion + ------------------------------------- + + Here is a simple Perl script that correctly recognizes every example + of metadata encoding in this document. It shows how a modest + scripting effort can produce a utility that converts metadata from + one format to another. Minor changes are sufficient to support a + number of output formats. + +#!/depot/bin/perl +# +# This simple perl script extracts metadata embedded in an HTML file +# and outputs it in an alternate format. Issues warning about missing +# element name or value. +# +# Handles mixed case tags and attribute values, one per line or spanning +# several lines. Also handles a quoted string spanning multiple lines. +# No error checking. Does not tolerate more than one "<meta" per line. + +print "@(urc;\n"; +while (<>) { + next if (! /<meta/i); + ($meta) = /(<meta.*$)/i; + if (! /<meta.*>/i) { + while (<>) { + $meta .= $_; + last if (/>/); + } + } + $name = $meta =~ /name\s*=\s*"([^"]*)"/i + ? $1 : "MISSING ELEMENT NAME"; + $content = $meta =~ /content\s*=\s*"([^"]*)"/i + ? $1 : "MISSING ELEMENT VALUE"; + ($scheme) = $meta =~ /scheme\s*=\s*"([^"]*)"/i; + ($lang) = $meta =~ /lang\s*=\s*"([^"]*)"/i; + + if ($lang || $scheme) { + $mod = " ($lang"; + if (! $scheme) + + + +Kunze Informational [Page 14] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + { $mod .= ")"; } + elsif (! $lang) + { $mod .= "$scheme)" } + else + { $mod .= ", $scheme)"; } + } + else + { $mod = ""; } + + print " @|$name$mod; $content\n"; +} +print "@)urc;\n"; +# ---- end of Perl script ---- + + When the conversion script is run on the metadata file example from + the LINK tag section (section 4), it produces the following output. + + @(urc; + @|DC.Title; A Dirge + @|DC.Creator; Shelley, Percy Bysshe + @|DC.Type; poem + @|DC.Date; 1820 + @|DC.Format; text/html + @|DC.Language; en + @)urc; + + Script 2: Automated Metadata Creation + -------------------------------------- + + The creation and maintenance of high-quality metadata can be + extremely expensive without automation to assist in processes such as + supplying pre-set or computed defaults, validating syntax, verifying + value ranges, spell checking, etc. Considerable relief could be had + from a script that reduced an individual provider's metadata burden + to just the title of each document. Below is such a script. It lets + the provider of an HTML document abbreviate an entire embedded + resource description using a single HTML comment statement that looks + like + + <!--metablock Little Red Riding Hood --> + + Our script processes this statement specially as a kind of "metadata + block" declaration with attached title. The general form is + + <!--metablock TITLE_OF_DOCUMENT --> + + + + + + +Kunze Informational [Page 15] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + This statement works much like a "Web server-side include" in that + the script replaces it with a fully-specified block of metadata and + triggers other replacements. Once installed, the script can output + HTML files suitable for integration into one's production Web server + procedures. + + The individual provider keeps a separate "template" file of + infrequently changing pre-set values for metadata elements. If the + provider's needs are simple enough, the only element values besides + the title that differ from one document to the next may be generated + automatically. Using the script, values may be referenced as + variables from within the template or within the document. Our + variable references have the form "(--mbVARNAME)", and here is what + they look like inside a template: + + <title> (--mbtitle) </title> + <meta name = "DC.Creator" + content = "Simpson, Homer"> + <meta name = "DC.Title" + content = "(--mbtitle)"> + <meta name = "DC.Date.Created" + content = "(--mbfilemodtime)"> + <meta name = "DC.Identifier" + content = "(--mbbaseURL)/(--mbfilename)"> + <meta name = "DC.Format" + content = "text/html; (--mbfilesize)"> + <meta name = "DC.Language" + content = "(--mblanguage)-BUREAUCRATESE"> + <meta name = "RC.MetadataAuthority" + content = "Springfield Nuclear"> + <link rel = "schema.DC" + href = "http://purl.org/DC/elements/1.0/"> + <link rel = "schema.RC" + href = "http://nukes.org/ReactorCore/rc"> + + The above template represents the metadata block that will describe + the document once the variable references are replaced with real + values. By the conventions of our script, the following variables + will be replaced in both the template and in the document: + + (--mbfilesize) size of the final output file + (--mbtitle) title of the document + (--mblanguage) language of the document + (--mbbaseURL) beginning part of document identifier + (--mbfilename) last part (minus .html) of identifier + (--mbfilemodtime) last modification date of the document + + + + + +Kunze Informational [Page 16] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + Here's an example HTML file to run the script on. + + <html> + <head> + <!--metablock Nutritional Allocation Increase --> + <meta name = "DC.Type" + content = "Memorandum"> + </head> + <body> + <p> + From: Acting Shift Supervisor + To: Plant Control Personnel + RE: (--mbtitle) + Date: (--mbfilemodtime) + <p> + Pursuant to directive DOH:10.2001/405aec of article B-2022, + subsection 48.2.4.4.1c regarding staff morale and employee + productivity standards, the current allocation of doughnut + acquisition funds shall be increased effective immediately. + </body> + </html> + + Note that because replacement occurs throughout the document, the + provider need only enter the title once instead of twice (normally + the title must be entered once in the HTML head and again in the HTML + body). After running the script, the above file is transformed into + this: + + <html> + <head> + <title> Nutritional Allocation Increase </title> + <meta name = "DC.Creator" + content = "Simpson, Homer"> + <meta name = "DC.Title" + content = "Nutritional Allocation Increase"> + <meta name = "DC.Date.Created" + content = "1999-03-08"> + <meta name = "DC.Identifier" + content = "http://moes.bar.com/doh/homer.html"> + <meta name = "DC.Format" + content = "text/html; 1320 bytes"> + <meta name = "DC.Language" + content = "en-BUREAUCRATESE"> + <meta name = "RC.MetadataAuthority" + content = "Springfield Nuclear"> + <link rel = "schema.DC" + href = "http://purl.org/DC/elements/1.0/"> + <link rel = "schema.RC" + + + +Kunze Informational [Page 17] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + href = "http://nukes.org/ReactorCore/rc"> + <meta name = "DC.Type" + content = "Memorandum"> + </head> + <body> + <p> + From: Acting Shift Supervisor + To: Plant Control Personnel + RE: Nutritional Allocation Increase + Date: 1999-03-08 + <p> + Pursuant to directive DOH:10.2001/405aec of article B-2022, + subsection 48.2.4.4.1c regarding staff morale and employee + productivity standards, the current allocation of doughnut + acquisition funds shall be increased effective immediately. + </body> + </html> + + Here is the script that accomplishes this transformation. + +#!/depot/bin/perl +# +# This Perl script processes metadata block declarations of the form +# <!--metablock TITLE_OF_DOCUMENT --> and variable references of the +# form (--mbVARNAME), replacing them with full metadata blocks and +# variable values, respectively. Requires a "template" file. +# Outputs an HTML file. +# +# Invoke this script with a single filename argument, "foo". It creates +# an output file "foo.html" using a temporary working file "foo.work". +# The size of foo.work is measured after variable replacement, and is +# later inserted into the file in such a way that the file's size does +# not change in the process. Has little or no error checking. + +$infile = shift; +open(IN, "< $infile") + or die("Could not open input file \"$infile\""); +$workfile = "$infile.work"; +unlink($workfile); +open(WORK, "+> $workfile") + or die("Could not open work file \"$workfile\""); + +@offsets = (); # records locations for late size replacement +$title = ""; # gets the title during metablock processing +$language = "en"; # pre-set language here (not in the template) +$baseURL = "http://moes.bar.com/doh"; # pre-set base URL here also +$filename = "$infile.html"; # final output filename +$filesize = "(--mbfilesize)"; # replaced late (separate pass) + + + +Kunze Informational [Page 18] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +($year, $month, $day) = (localtime( (stat IN) [9] ))[5, 4, 3]; +$filemodtime = sprintf "%s-%02s-%02s", 1900 + $year, 1 + $month, $day; + +sub putout { # outputs current line with variable replacement + if (! /\(--mb/) { + print WORK; + return; + } + if (/\(--mbfilesize\)/) # remember where it was + { push @offsets, tell WORK; } # but don't replace yet + s/\(--mbtitle\)/$title/g; + s/\(--mblanguage\)/$language/g; + s/\(--mbbaseURL\)/$baseURL/g; + s/\(--mbfilename\)/$filename/g; + s/\(--mbfilemodtime\)/$filemodtime/g; + print WORK; +} + +while (<IN>) { # main loop for input file + if (! /(.*)<!--metablock\s*(.*)/) { + &putout; + next; + } + $title = $2; + $_ = $1; + &putout; + if ($title =~ s/\s*-->(.*)//) { + $remainder = $1; + } + else { + while (<IN>) { + $title .= $_; + last if (/(.*)\s*-->(.*)/); + } + $title .= $1; + $remainder = $2; + } + open(TPLATE, "< template") + or die("Could not open template file"); + while (<TPLATE>) # subloop for template file + { &putout; } + close(TPLATE); + $_ = $remainder; + &putout; + + + + + + + +Kunze Informational [Page 19] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +} +close(IN); + +# Now replace filesize variables without altering total byte count. +select( (select(WORK), $| = 1) [0] ); # first flush output so we +if (($size = -s WORK) < 100000) # can get final file size + { $scale = 0; } # and set scale factor or +else { # compute it, keeping width of size field low + for ($scale = 0; $size >= 1000; $scale++) + { $size /= 1024; } +} +$filesize = sprintf "%7.7s %sbytes", + $size, (" ", "K", "M", "G", "T", "P") [$scale]; + +foreach $pos (@offsets) { # loop through saved size locations + seek WORK, $pos, 0; # read the line found there + $_ = <WORK>; + # $filesize must be exactly as wide as "(--mbfilesize)" + s/\(--mbfilesize\)/$filesize/g; + seek WORK, $pos, 0; # rewrite it with replacement + print WORK; +} + +close(WORK); +rename($workfile, "$filename") + or die("Could not rename \"$workfile\" to \"$filename\""); +# ---- end of Perl script ---- + + + + + + + + + + + + + + + + + + + + + + + + +Kunze Informational [Page 20] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +10. Author's Address + + John A. Kunze + Center for Knowledge Management + University of California, San Francisco + 530 Parnassus Ave, Box 0840 + San Francisco, CA 94143-0840, USA + + Fax: +1 415-476-4653 + EMail: jak@ckm.ucsf.edu + + +11. References + + [AAT] Art and Architecture Thesaurus, Getty Information + Institute. + http://shiva.pub.getty.edu/aat_browser/ + + [AC] The A-Core: Metadata about Content Metadata, (in + progress) + http://metadata.net/ac/draft-iannella-admin-01.txt + + [DC1] Weibel, S., Kunze, J., Lagoze, C. and M. Wolf, + "Dublin Core Metadata for Resource Discovery", RFC + 2413, September 1998. + ftp://ftp.isi.edu/in-notes/rfc2413.txt + + [DCHOME] Dublin Core Initiative Home Page. + http://purl.org/DC/ + + [DCPROJECTS] Projects Using Dublin Core Metadata. + http://purl.org/DC/projects/index.htm + + [DCT1] Dublin Core Type List 1, DC Type Working Group, + March 1999. + http://www.loc.gov/marc/typelist.html + + [freeWAIS-sf2.0] The enhanced freeWAIS distribution, February 1999. + http://ls6-www.cs.uni- + dortmund.de/ir/projects/freeWAIS-sf/ + + [GLIMPSE] Glimpse Home Page. + http://glimpse.cs.arizona.edu/ + + [HARVEST] Harvest Web Indexing. + http://www.tardis.ed.ac.uk/harvest/ + + + + + +Kunze Informational [Page 21] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + + [HTML4.0] Hypertext Markup Language 4.0 Specification, April + 1998. + http://www.w3.org/TR/REC-html40/ + + [ISEARCH] Isearch Resources Page. + http://www.etymon.com/Isearch/ + + [ISO639-2] Code for the representation of names of languages, + 1996. + http://www.indigo.ie/egt/standards/iso639/iso639-2- + en.html + + [ISO8601] ISO 8601:1988(E), Data elements and interchange + formats -- Information interchange -- Representation + of dates and times, International Organization for + Standardization, June 1988. + http://www.iso.ch/markete/8601.pdf + + [MARC] USMARC Format for Bibliographic Data, US Library of + Congress. + http://lcweb.loc.gov/marc/marc.html + + [PERL] L. Wall, T. Christiansen, R. Schwartz, Programming + Perl, Second Edition, O'Reilly, 1996. + + [RDF] Resource Description Framework Model and Syntax + Specification, February 1999. + http://www.w3.org/TR/REC-rdf-syntax/ + + [RFC1766] Alvestrand, H., "Tags for the Identification of + Languages", RFC 1766, March 1996. + ftp://ftp.isi.edu/in-notes/rfc1766.txt + + [SWISH-E] Simple Web Indexing System for Humans - Enhanced. + http://sunsite.Berkeley.EDU/SWISH-E/ + + [TGN] Thesaurus of Geographic Names, Getty Information + Institute. + http://shiva.pub.getty.edu/tgn_browser/ + + [WTN8601] W3C Technical Note - Profile of ISO 8601 Date and + Time Formats. + http://www.w3.org/TR/NOTE-datetime + + [XML] Extensible Markup Language (XML). + http://www.w3.org/TR/REC-xml + + + + + +Kunze Informational [Page 22] + +RFC 2731 Encoding Dublin Core Metadata in HTML December 1999 + + +12. Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Kunze Informational [Page 23] + |