diff options
| author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 | 
|---|---|---|
| committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 | 
| commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
| tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc2731.txt | |
| parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) | |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc2731.txt')
| -rw-r--r-- | doc/rfc/rfc2731.txt | 1291 | 
1 files changed, 1291 insertions, 0 deletions
| diff --git a/doc/rfc/rfc2731.txt b/doc/rfc/rfc2731.txt new file mode 100644 index 0000000..1d1d194 --- /dev/null +++ b/doc/rfc/rfc2731.txt @@ -0,0 +1,1291 @@ + + + + + + +Network Working Group                                           J. Kunze +Request for Comments: 2731                                   Dublin Core +Category: Informational                              Metadata Initiative +                                                           December 1999 + + +                 Encoding Dublin Core Metadata in HTML + + +Status of this Memo + +   This memo provides information for the Internet community.  It does +   not specify an Internet standard of any kind.  Distribution of this +   memo is unlimited. + +Copyright Notice + +   Copyright (C) The Internet Society (1999).  All Rights Reserved. + +1. Abstract + +   The Dublin Core [DC1] is a small set of metadata elements for +   describing information resources.  This document explains how these +   elements are expressed using the META and LINK tags of HTML +   [HTML4.0].  A sequence of metadata elements embedded in an HTML file +   is taken to be a description of that file.  Examples illustrate +   conventions allowing interoperation with current software that +   indexes, displays, and manipulates metadata, such as [SWISH-E], +   [freeWAIS-sf2.0], [GLIMPSE], [HARVEST], [ISEARCH], etc., and the Perl +   [PERL] scripts in the appendix. + +2. HTML, Dublin Core, and Non-Dublin Core Metadata + +   The Dublin Core (DC) metadata initiative [DCHOME] has produced a +   small set of resource description categories [DC1], or elements of +   metadata (literally, data about data).  Metadata elements are +   typically small relative to the resource they describe and may, if +   the resource format permits, be embedded in it.  Two such formats are +   the Hypertext Markup Language (HTML) and the Extensible Markup +   Language (XML); HTML is currently in wide use, but once standardized, +   XML [XML] in conjunction with the Resource Description Framework +   [RDF] promise a significantly more expressive means of encoding +   metadata.  The [RDF] specification actually describes a way to use +   RDF within an HTML document by adhering to an abbreviated syntax. + + + + + + + +Kunze                        Informational                      [Page 1] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   This document explains how to encode metadata using HTML 4.0 +   [HTML4.0].  It is not concerned with element semantics, which are +   defined elsewhere.  For illustrative purposes, some element semantics +   are alluded to, but in no way should semantics appearing here be +   considered definitive. + +   The HTML encoding allows elements of DC metadata to be interspersed +   with non-DC elements (provided such mixing is consistent with rules +   governing use of those non-DC elements).  A DC element is indicated +   by the prefix "DC", and a non-DC element by another prefix; for +   example, the prefix "AC" is used with elements from the A-Core [AC]. + +3. The META Tag + +   The META tag of HTML is designed to encode a named metadata element. +   Each element describes a given aspect of a document or other +   information resource.  For example, this tagged metadata element, + +       <meta name    = "DC.Creator" +             content = "Simpson, Homer"> + +   says that Homer Simpson is the Creator, where the element named +   Creator is defined in the DC element set.  In the more general form, + +       <meta name    = "PREFIX.ELEMENT_NAME" +             content = "ELEMENT_VALUE"> + +   the capitalized words are meant to be replaced in actual +   descriptions; thus in the example, + +             ELEMENT_NAME   was:  Creator +             ELEMENT_VALUE  was:  Simpson, Homer +             and PREFIX     was:  DC + +   Within a META tag the first letter of a Dublin Core element name is +   capitalized.  DC places no restriction on alphabetic case in an +   element value and any number of META tagged elements may appear +   together, in any order.  More than one DC element with the same name +   may appear, and each DC element is optional.  The next example is a +   book description with two authors, two titles, and no other metadata. + +       <meta name    = "DC.Title" +             content = "The Communist Manifesto"> +       <meta name    = "DC.Creator" +             content = "Marx, K."> + + + + + + +Kunze                        Informational                      [Page 2] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +       <meta name    = "DC.Creator" +             content = "Engels, F."> +       <meta name    = "DC.Title" +             content = "Capital"> + +   The prefix "DC" precedes each Dublin Core element encoded with META, +   and it is separated by a period (.) from the element name following +   it.  Each non-DC element should be encoded with a prefix that can be +   used to trace its origin and definition; the linkage between prefix +   and element definition is made with the LINK tag, as explained in the +   next section.  Non-DC elements, such as Email from the A-Core [AC], +   may appear together with DC elements, as in + +       <meta name    = "DC.Creator" +             content = "Da Costa, José"> +       <meta name    = "AC.Email" +             content = "dacostaj@peoplesmail.org"> +       <meta name    = "DC.Title" +             content = "Jesse "The Body" Ventura--A Biography"> + +   This example also shows how some special characters may be encoded. +   The author name in the first element contains a diacritic encoded as +   an HTML character entity reference -- in this case an accented letter +   E.  Similarly, the last line contains two double-quote characters +   encoded so as to avoid being interpreted as element content +   delimiters. + +4. The LINK Tag + +   The LINK tag of HTML may be used to associate an element name prefix +   with the reference definition of the element set that it identifies. +   A sequence of META tags describing a resource is incomplete without +   one such LINK tag for each different prefix appearing in the +   sequence.  The previous example could be considered complete with the +   addition of these two LINK tags: + +       <link rel     = "schema.DC" +             href    = "http://purl.org/DC/elements/1.0/"> +       <link rel     = "schema.AC" +             href    = "http://metadata.net/ac/2.0/"> + +   In general, the association takes the form + +       <link rel     = "schema.PREFIX" +             href    = "LOCATION_OF_DEFINITION"> + + + + + + +Kunze                        Informational                      [Page 3] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   where, in actual descriptions, PREFIX is to be replaced by the prefix +   and LOCATION_OF_DEFINITION by the URL or URN of the defining +   document.  When embedded in the HEAD part of an HTML file, a sequence +   of LINK and META tags describes the information in the surrounding +   HTML file itself.  Here is a complete HTML file with its own embedded +   description. + +       <html> +       <head> +       <title> A Dirge </title> +       <link rel     = "schema.DC" +             href    = "http://purl.org/DC/elements/1.0/"> +       <meta name    = "DC.Title" +             content = "A Dirge"> +       <meta name    = "DC.Creator" +             content = "Shelley, Percy Bysshe"> +       <meta name    = "DC.Type" +             content = "poem"> +       <meta name    = "DC.Date" +             content = "1820"> +       <meta name    = "DC.Format" +             content = "text/html"> +       <meta name    = "DC.Language" +             content = "en"> +       </head> +       <body><pre> +               Rough wind, that moanest loud +                 Grief too sad for song; +               Wild wind, when sullen cloud +                 Knells all the night long; +               Sad storm, whose tears are vain, +               Bare woods, whose branches strain, +               Deep caves and dreary main, - +                 Wail, for the world's wrong! +       </pre></body> +       </html> + +5. Encoding Recommendations + +   HTML allows more flexibility in principle and in practice than is +   recommended here for encoding metadata.  Limited flexibility +   encourages easy development of software for extracting and processing +   metadata.  At this early evolutionary stage of internet metadata, +   easy prototyping and experimentation hastens the development of +   useful standards. + + + + + + +Kunze                        Informational                      [Page 4] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   Adherence is therefore recommended to the tagging style exemplified +   in this document as regards prefix and element name capitalization, +   double-quoting (") of attribute values, and not starting more than +   one META tag on a line.  There is much room for flexibility, but +   choosing a style and sticking with it will likely make metadata +   manipulation and editing easier.  The following META tags adhere to +   the recommendations and carry identical metadata in three different +   styles: + +       <META NAME="DC.Format" +             CONTENT="text/html; 12 Kbytes"> +       <meta +               Content = "text/html; 12 Kbytes" +               Name = "DC.Format" +       > +       <meta name = "DC.Format" content = "text/html; 12 Kbytes"> + +   Use of these recommendations is known to result in metadata that may +   be harvested, indexed, and manipulated by popular, freely available +   software packages such as [SWISH-E], [freeWAIS-sf2.0], [GLIMPSE], +   [HARVEST], and [ISEARCH], among others.  These conventions also work +   with the metadata processing scripts appearing in the appendix, as +   well as with most of the [DCPROJECTS] applications referenced from +   the [DCHOME] site.  Software support for the LINK tag and qualifier +   conventions (see the next section) is not currently widespread. + +   Ordering of metadata elements is not preserved in general.  Writers +   of software for metadata indexing and display should try to preserve +   relative ordering among META tagged elements having the same name +   (e.g., among multiple authors), however, metadata providers and +   searchers have no guarantee that ordering will be preserved in +   metadata that passes through unknown systems. + +6. Dublin Core in Real Descriptions + +   In actual resource description it is often necessary to qualify +   Dublin Core elements to add nuances of meaning.  While neither the +   general principles nor the specific semantics of DC qualifiers are +   within scope of this document, everyday uses of the qualifier syntax +   are illustrated to lend realism to later examples.  Without further +   explanation, the three ways in which the optional qualifier syntax is +   currently (subject to change) used to supplement the META tag may be +   summarized as follows: + +    <meta lang    = "LANGUAGE_OF_METADATA_CONTENT" ... > + +    <meta scheme  = "CONTROLLED_FORMAT_OR_VOCABULARY_OF_METADATA" ... > + + + + +Kunze                        Informational                      [Page 5] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +    <meta name    = "PREFIX.ELEMENT_NAME.SUBELEMENT_NAME" ... > + +   Accordingly, a posthumous work in Spanish might be described with + +       <meta name    = "DC.Language" +             scheme  = "rfc1766" +             content = "es"> +       <meta name    = "DC.Title" +             lang    = "es" +             content = "La Mesa Verde y la Silla Roja"> +       <meta name    = "DC.Title" +             lang    = "en" +             content = "The Green Table and the Red Chair"> +       <meta name    = "DC.Date.Created" +             content = "1935"> +       <meta name    = "DC.Date.Available" +             content = "1939"> + +   Note that the qualifier syntax and label suffixes (which follow an +   element name and a period) used in examples in this document merely +   reflect current trends in the HTML encoding of qualifiers.  Use of +   this syntax and these suffixes is neither a standard nor a +   recommendation. + +7. Encoding Dublin Core Elements + +   This section consists of very simple Dublin Core encoding examples, +   arranged by element. + +   Title (name given to the resource) +   ----- + +    <meta name    = "DC.Title" +          content = "Polycyclic aromatic hydrocarbon contamination"> + +    <meta name    = "DC.Title" +          content = "Crime and Punishment"> + +    <meta name    = "DC.Title" +          content = "Methods of Information in Medicine, Vol 32, No 4"> + +    <meta name    = "DC.Title" +          content = "Still life #4 with flowers"> + +    <meta name    = "DC.Title" +          lang    = "de" +          content = "Das Wohltemperierte Klavier, Teil I"> + + + + +Kunze                        Informational                      [Page 6] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   Creator (entity that created the content) +   ------- + +    <meta name    = "DC.Creator" +          content = "Gogh, Vincent van"> +    <meta name    = "DC.Creator" +          content = "van Gogh, Vincent"> + +    <meta name    = "DC.Creator" +          content = "Mao Tse Tung"> +    <meta name    = "DC.Creator" +          content = "Mao, Tse Tung"> + +    <meta name    = "DC.Creator" +          content = "Plato"> +    <meta name    = "DC.Creator" +          lang    = "fr" +          content = "Platon"> + +    <meta name    = "DC.Creator.Director" +          content = "Sturges, Preston"> +    <meta name    = "DC.Creator.Writer" +          content = "Hecht, Ben"> +    <meta name    = "DC.Creator.Producer" +          content = "Chaplin, Charles"> + +   Subject (topic or keyword) +   ------- + +    <meta name    = "DC.Subject" +          content = "heart attack"> +    <meta name    = "DC.Subject" +          scheme  = "MESH" +          content = "Myocardial Infarction; Pericardial Effusion"> + +    <meta name    = "DC.Subject" +          content = "vietnam war"> +    <meta name    = "DC.Subject" +          scheme  = "LCSH" +          content = "Vietnamese Conflict, 1961-1975"> + +    <meta name    = "DC.Subject" +          content = "Friendship"> +    <meta name    = "DC.Subject" +          scheme  = "ddc" +          content = "158.25"> + + + + + +Kunze                        Informational                      [Page 7] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   Description (account, summary, or abstract of the content) +   ----------- + +    <meta name    = "DC.Description" +          lang    = "en" +          content = "The Author gives some Account of Himself and Family +                     -- His First Inducements to Travel -- He is +                     Shipwrecked, and Swims for his Life -- Gets safe on +                     Shore in the Country of Lilliput -- Is made a +                     Prisoner, and carried up the Country"> + +    <meta name    = "DC.Description" +          content = "A tutorial and reference manual for Java."> + +    <meta name    = "DC.Description" +          content = "Seated family of five, coconut trees to the left, +                     sailboats moored off sandy beach to the right, +                     with volcano in the background."> + +   Publisher (entity that made the resource available) +   --------- + +    <meta name    = "DC.Publisher" +          content = "O'Reilly"> + +    <meta name    = "DC.Publisher" +          content = "Digital Equipment Corporation"> + +    <meta name    = "DC.Publisher" +          content = "University of California Press"> + +    <meta name    = "DC.Publisher" +          content = "State of Florida (USA)"> + +   Contributor (other entity that made a contribution) +   ----------- + +    <meta name    = "DC.Contributor" +          content = "Curie, Marie"> + +    <meta name    = "DC.Contributor.Photographer" +          content = "Adams, Ansel"> +    <meta name    = "DC.Contributor.Artist" +          content = "Sendak, Maurice"> +    <meta name    = "DC.Contributor.Editor" +          content = "Starr, Kenneth"> + + + + + +Kunze                        Informational                      [Page 8] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   Date (of an event in the life of the resource; [WTN8601] recommended) +   ---- + +    <meta name    = "DC.Date" +          content = "1972"> + +    <meta name    = "DC.Date" +          content = "1998-05-14"> +    <meta name    = "DC.Date" +          scheme  = "WTN8601" +          content = "1998-05-14"> + +    <meta name    = "DC.Date.Created" +          content = "1998-05-14"> +    <meta name    = "DC.Date.Available" +          content = "1998-05-21"> +    <meta name    = "DC.Date.Valid" +          content = "1998-05-28"> + +    <meta name    = "DC.Date.Created" +          content = "triassic"> +    <meta name    = "DC.Date.Acquired" +          content = "1957"> + +    <meta name    = "DC.Date.Accepted" +          scheme  = "WTN8601" +          content = "1998-12-02T16:59"> + +    <meta name    = "DC.Date.DataGathered" +          scheme  = "ISO8601" +          content = "98-W49-3T1659"> + +    <meta name    = "DC.Date.Issued" +          scheme  = "ANSI.X3.X30-1985" +          content = "19980514"> + +   Type (nature, genre, or category; [DCT1] recommended) +   ---- + +    <meta name    = "DC.Type" +          content = "poem"> + +    <meta name    = "DC.Type" +          scheme  = "DCT1" +          content = "software"> +    <meta name    = "DC.Type" +          content = "software program source code"> + + + + +Kunze                        Informational                      [Page 9] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +    <meta name    = "DC.Type" +          content = "interactive video game"> + +    <meta name    = "DC.Type" +          scheme  = "DCT1" +          content = "dataset"> + +    <meta name    = "DC.Type" +          content = "web home page"> +    <meta name    = "DC.Type" +          content = "web bibliography"> + +    <meta name    = "DC.Type" +          content = "painting"> +    <meta name    = "DC.Type" +          content = "image; woodblock"> +    <meta name    = "DC.Type" +          scheme  = "AAT" +          content = "clipeus (portrait)"> +    <meta name    = "DC.Type" +          lang    = "en-US" +          content = "image; advertizement"> + +    <meta name    = "DC.Type" +          scheme  = "DCT1" +          content = "event"> +    <meta name    = "DC.Type" +          content = "event; periodic"> + +   Format (physical or digital data format, plus optional dimensions) +   ------ + +    <meta name    = "DC.Format" +          content = "text/xml"> +    <meta name    = "DC.Format" +          scheme  = "IMT" +          content = "text/xml"> + +    <meta name    = "DC.Format" +          scheme  = "IMT" +          content = "image/jpeg"> +    <meta name    = "DC.Format" +          content = "A text file with mono-spaced tables and diagrams."> + +    <meta name    = "DC.Format" +          content = "video/mpeg; 14 minutes"> + + + + + +Kunze                        Informational                     [Page 10] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +    <meta name    = "DC.Format" +          content = "unix tar archive, gzip compressed; 1.5 Mbytes"> + +    <meta name    = "DC.Format" +          content = "watercolor; 23 cm x 31 cm"> + +   Identifier (of the resource) +   ---------- + +    <meta name    = "DC.Identifier" +          content = "http://foo.bar.org/zaf/"> + +    <meta name    = "DC.Identifier" +          content = "urn:ietf:rfc:1766"> + +    <meta name    = "DC.Identifier" +          scheme  = "ISBN" +          content = "1-56592-149-6"> + +    <meta name    = "DC.Identifier" +          scheme  = "LCCN" +          content = "67-26020"> + +    <meta name    = "DC.Identifier" +          scheme  = "DOI" +          content = "10.12345/33-824688ab"> + +   Source (reference to the resource's origin) +   ------ + +    <meta name    = "DC.Source" +          content = "Shakespeare's Romeo and Juliet"> + +    <meta name    = "DC.Source" +          content = "http://a.b.org/manon/"> + +   Language (of the content of the resource; [RFC1766] recommended) +   -------- + +    <meta name    = "DC.Language" +          content = "en"> +    <meta name    = "DC.Language" +          scheme  = "rfc1766" +          content = "en"> +    <meta name    = "DC.Language" +          scheme  = "ISO639-2" +          content = "eng"> + + + + +Kunze                        Informational                     [Page 11] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +    <meta name    = "DC.Language" +          scheme  = "rfc1766" +          content = "en-US"> + +    <meta name    = "DC.Language" +          content = "zh"> +    <meta name    = "DC.Language" +          content = "ja"> +    <meta name    = "DC.Language" +          content = "es"> +    <meta name    = "DC.Language" +          content = "de"> + +    <meta name    = "DC.Language" +          content = "german"> +    <meta name    = "DC.Language" +          lang    = "fr" +          content = "allemand"> + +   Relation (reference to a related resource) +   -------- + +    <meta name    = "DC.Relation.IsPartOf" +          content = "http://foo.bar.org/abc/proceedings/1998/"> + +    <meta name    = "DC.Relation.IsFormatOf" +          content = "http://foo.bar.org/cd145.sgml"> + +    <meta name    = "DC.Relation.IsVersionOf" +          content = "http://foo.bar.org/draft9.4.4.2"> + +    <meta name    = "DC.Relation.References" +          content = "urn:isbn:1-56592-149-6"> + +    <meta name    = "DC.Relation.IsBasedOn" +          content = "Shakespeare's Romeo and Juliet"> + +    <meta name    = "DC.Relation.Requires" +          content = "LWP::UserAgent; HTML::Parse; URI::URL; +                     Net::DNS; Tk::Pixmap; Tk::Bitmap; Tk::Photo"> + +   Coverage (extent or scope of the content) +   -------- + +    <meta name    = "DC.Coverage" +          content = "US civil war era; 1861-1865"> + + + + + +Kunze                        Informational                     [Page 12] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +    <meta name    = "DC.Coverage" +          content = "Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W"> + +    <meta name    = "DC.Coverage" +          scheme  = "TGN" +          content = "Columbus (C,V)"> + +    <meta name    = "DC.Coverage.Jurisdiction" +          content = "Commonwealth of Australia"> + +   Rights (text or identifier of a rights management statement) +   ------ + +    <meta name    = "DC.Rights" +          lang    = "en" +          content = "Copyright Acme 1999 - All rights reserved."> + +    <meta name    = "DC.Rights" +          content = "http://foo.bar.org/cgi-bin/terms"> + +8. Security Considerations + +   The syntax rules for encoding Dublin Core metadata in HTML that are +   documented here pose no direct risk to computers and networks. +   People can use these rules to encode metadata that is inaccurate or +   even deliberately misleading (creating mischief in the form of "index +   spam"), however, this reflects a general pattern of HTML META tag +   abuse that is not limited to the encoding of metadata from the Dublin +   Core set.  Even traditional metadata encoding schemes (e.g., [MARC]) +   are not immune to inaccuracy, although they are generally followed in +   environments where production quality greatly exceeds that of the +   average Web site. + +   Systems that process metadata encoded with META tags need to consider +   issues related to its accuracy and validity as part of their design +   and implementation, and users of such systems need to consider the +   design and implementation assumptions.  Various approaches may be +   relevant for certain applications, such as adding statements of +   metadata provenance, signing of metadata with digital signatures, and +   automating certain aspects of metadata creation; but these are far +   outside the scope of this document and the underlying META tag syntax +   that it describes. + + + + + + + + + +Kunze                        Informational                     [Page 13] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +9. Appendix -- Perl Scripts that Manipulate HTML Encoded Metadata + +   This section contains two simple programs that work with versions 4 +   and 5 of the Perl [PERL] scripting language interpreter.  They may be +   taken and freely adapted for local organizational needs, research +   proposals, venture capital bids, etc.  A variety of applications are +   within easy reach of implementors that choose to build on these +   scripts. + +   Script 1:  Metadata Format Conversion +   ------------------------------------- + +   Here is a simple Perl script that correctly recognizes every example +   of metadata encoding in this document.  It shows how a modest +   scripting effort can produce a utility that converts metadata from +   one format to another.  Minor changes are sufficient to support a +   number of output formats. + +#!/depot/bin/perl +# +# This simple perl script extracts metadata embedded in an HTML file +# and outputs it in an alternate format.  Issues warning about missing +# element name or value. +# +# Handles mixed case tags and attribute values, one per line or spanning +# several lines.  Also handles a quoted string spanning multiple lines. +# No error checking.  Does not tolerate more than one "<meta" per line. + +print "@(urc;\n"; +while (<>) { +        next if (! /<meta/i); +        ($meta) = /(<meta.*$)/i; +        if (! /<meta.*>/i) { +                while (<>) { +                        $meta .= $_; +                        last if (/>/); +                } +        } +        $name     =    $meta =~ /name\s*=\s*"([^"]*)"/i +                        ? $1 : "MISSING ELEMENT NAME"; +        $content  = $meta =~ /content\s*=\s*"([^"]*)"/i +                        ? $1 : "MISSING ELEMENT VALUE"; +        ($scheme) =  $meta =~ /scheme\s*=\s*"([^"]*)"/i; +        ($lang)   =    $meta =~ /lang\s*=\s*"([^"]*)"/i; + +        if ($lang || $scheme) { +                $mod = " ($lang"; +                if (! $scheme) + + + +Kunze                        Informational                     [Page 14] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +                        { $mod .= ")"; } +                elsif (! $lang) +                        { $mod .= "$scheme)" } +                else +                        { $mod .= ", $scheme)"; } +        } +        else +                { $mod = ""; } + +        print "    @|$name$mod; $content\n"; +} +print "@)urc;\n"; +# ---- end of Perl script ---- + +   When the conversion script is run on the metadata file example from +   the LINK tag section (section 4), it produces the following output. + +        @(urc; +            @|DC.Title; A Dirge +            @|DC.Creator; Shelley, Percy Bysshe +            @|DC.Type; poem +            @|DC.Date; 1820 +            @|DC.Format; text/html +            @|DC.Language; en +        @)urc; + +   Script 2:  Automated Metadata Creation +   -------------------------------------- + +   The creation and maintenance of high-quality metadata can be +   extremely expensive without automation to assist in processes such as +   supplying pre-set or computed defaults, validating syntax, verifying +   value ranges, spell checking, etc.  Considerable relief could be had +   from a script that reduced an individual provider's metadata burden +   to just the title of each document.  Below is such a script.  It lets +   the provider of an HTML document abbreviate an entire embedded +   resource description using a single HTML comment statement that looks +   like + +       <!--metablock Little Red Riding Hood --> + +   Our script processes this statement specially as a kind of "metadata +   block" declaration with attached title.  The general form is + +       <!--metablock TITLE_OF_DOCUMENT --> + + + + + + +Kunze                        Informational                     [Page 15] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   This statement works much like a "Web server-side include" in that +   the script replaces it with a fully-specified block of metadata and +   triggers other replacements.  Once installed, the script can output +   HTML files suitable for integration into one's production Web server +   procedures. + +   The individual provider keeps a separate "template" file of +   infrequently changing pre-set values for metadata elements.  If the +   provider's needs are simple enough, the only element values besides +   the title that differ from one document to the next may be generated +   automatically.  Using the script, values may be referenced as +   variables from within the template or within the document.  Our +   variable references have the form "(--mbVARNAME)", and here is what +   they look like inside a template: + +       <title> (--mbtitle) </title> +       <meta name    = "DC.Creator" +             content = "Simpson, Homer"> +       <meta name    = "DC.Title" +             content = "(--mbtitle)"> +       <meta name    = "DC.Date.Created" +             content = "(--mbfilemodtime)"> +       <meta name    = "DC.Identifier" +             content = "(--mbbaseURL)/(--mbfilename)"> +       <meta name    = "DC.Format" +             content = "text/html; (--mbfilesize)"> +       <meta name    = "DC.Language" +             content = "(--mblanguage)-BUREAUCRATESE"> +       <meta name    = "RC.MetadataAuthority" +             content = "Springfield Nuclear"> +       <link rel     = "schema.DC" +             href    = "http://purl.org/DC/elements/1.0/"> +       <link rel     = "schema.RC" +             href    = "http://nukes.org/ReactorCore/rc"> + +   The above template represents the metadata block that will describe +   the document once the variable references are replaced with real +   values.  By the conventions of our script, the following variables +   will be replaced in both the template and in the document: + +       (--mbfilesize)            size of the final output file +       (--mbtitle)               title of the document +       (--mblanguage)            language of the document +       (--mbbaseURL)             beginning part of document identifier +       (--mbfilename)            last part (minus .html) of identifier +       (--mbfilemodtime)         last modification date of the document + + + + + +Kunze                        Informational                     [Page 16] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   Here's an example HTML file to run the script on. + +       <html> +       <head> +       <!--metablock Nutritional Allocation Increase --> +       <meta name    = "DC.Type" +             content = "Memorandum"> +       </head> +       <body> +       <p> +       From:  Acting Shift Supervisor +       To:    Plant Control Personnel +       RE:    (--mbtitle) +       Date:  (--mbfilemodtime) +       <p> +       Pursuant to directive DOH:10.2001/405aec of article B-2022, +       subsection 48.2.4.4.1c regarding staff morale and employee +       productivity standards, the current allocation of doughnut +       acquisition funds shall be increased effective immediately. +       </body> +       </html> + +   Note that because replacement occurs throughout the document, the +   provider need only enter the title once instead of twice (normally +   the title must be entered once in the HTML head and again in the HTML +   body).  After running the script, the above file is transformed into +   this: + +       <html> +       <head> +           <title> Nutritional Allocation Increase </title> +       <meta name    = "DC.Creator" +             content = "Simpson, Homer"> +       <meta name    = "DC.Title" +             content = "Nutritional Allocation Increase"> +       <meta name    = "DC.Date.Created" +             content = "1999-03-08"> +       <meta name    = "DC.Identifier" +             content = "http://moes.bar.com/doh/homer.html"> +       <meta name    = "DC.Format" +             content = "text/html;    1320  bytes"> +       <meta name    = "DC.Language" +             content = "en-BUREAUCRATESE"> +       <meta name    = "RC.MetadataAuthority" +             content = "Springfield Nuclear"> +       <link rel     = "schema.DC" +             href    = "http://purl.org/DC/elements/1.0/"> +       <link rel     = "schema.RC" + + + +Kunze                        Informational                     [Page 17] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +             href    = "http://nukes.org/ReactorCore/rc"> +       <meta name    = "DC.Type" +             content = "Memorandum"> +       </head> +       <body> +       <p> +       From:  Acting Shift Supervisor +       To:    Plant Control Personnel +       RE:    Nutritional Allocation Increase +       Date:  1999-03-08 +       <p> +       Pursuant to directive DOH:10.2001/405aec of article B-2022, +       subsection 48.2.4.4.1c regarding staff morale and employee +       productivity standards, the current allocation of doughnut +       acquisition funds shall be increased effective immediately. +       </body> +       </html> + +   Here is the script that accomplishes this transformation. + +#!/depot/bin/perl +# +# This Perl script processes metadata block declarations of the form +# <!--metablock TITLE_OF_DOCUMENT --> and variable references of the +# form (--mbVARNAME), replacing them with full metadata blocks and +# variable values, respectively.  Requires a "template" file. +# Outputs an HTML file. +# +# Invoke this script with a single filename argument, "foo".  It creates +# an output file "foo.html" using a temporary working file "foo.work". +# The size of foo.work is measured after variable replacement, and is +# later inserted into the file in such a way that the file's size does +# not change in the process.  Has little or no error checking. + +$infile = shift; +open(IN, "< $infile") +        or die("Could not open input file \"$infile\""); +$workfile = "$infile.work"; +unlink($workfile); +open(WORK, "+> $workfile") +        or die("Could not open work file \"$workfile\""); + +@offsets = ();          # records locations for late size replacement +$title = "";            # gets the title during metablock processing +$language = "en";       # pre-set language here (not in the template) +$baseURL = "http://moes.bar.com/doh";   # pre-set base URL here also +$filename = "$infile.html";             # final output filename +$filesize = "(--mbfilesize)";           # replaced late (separate pass) + + + +Kunze                        Informational                     [Page 18] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +($year, $month, $day) = (localtime( (stat IN) [9] ))[5, 4, 3]; +$filemodtime = sprintf "%s-%02s-%02s", 1900 + $year, 1 + $month, $day; + +sub putout {            # outputs current line with variable replacement +        if (! /\(--mb/) { +                print WORK; +                return; +        } +        if (/\(--mbfilesize\)/)                 # remember where it was +                { push @offsets, tell WORK; }   # but don't replace yet +        s/\(--mbtitle\)/$title/g; +        s/\(--mblanguage\)/$language/g; +        s/\(--mbbaseURL\)/$baseURL/g; +        s/\(--mbfilename\)/$filename/g; +        s/\(--mbfilemodtime\)/$filemodtime/g; +        print WORK; +} + +while (<IN>) {                          # main loop for input file +        if (! /(.*)<!--metablock\s*(.*)/) { +                &putout; +                next; +        } +        $title = $2; +        $_ = $1; +        &putout; +        if ($title =~ s/\s*-->(.*)//) { +                $remainder = $1; +        } +        else { +                while (<IN>) { +                        $title .= $_; +                        last if (/(.*)\s*-->(.*)/); +                } +                $title .= $1; +                $remainder = $2; +        } +        open(TPLATE, "< template") +                or die("Could not open template file"); +        while (<TPLATE>)                # subloop for template file +                { &putout; } +        close(TPLATE); +        $_ = $remainder; +        &putout; + + + + + + + +Kunze                        Informational                     [Page 19] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +} +close(IN); + +# Now replace filesize variables without altering total byte count. +select( (select(WORK), $| = 1) [0] );   # first flush output so we +if (($size = -s WORK) < 100000)         # can get final file size +        { $scale = 0; }                 # and set scale factor or +else {                  # compute it, keeping width of size field low +        for ($scale = 0; $size >= 1000; $scale++) +                { $size /= 1024; } +} +$filesize = sprintf "%7.7s %sbytes", +        $size, (" ", "K", "M", "G", "T", "P") [$scale]; + +foreach $pos (@offsets) {       # loop through saved size locations +        seek WORK, $pos, 0;             # read the line found there +        $_ = <WORK>; +        # $filesize must be exactly as wide as "(--mbfilesize)" +        s/\(--mbfilesize\)/$filesize/g; +        seek WORK, $pos, 0;             # rewrite it with replacement +        print WORK; +} + +close(WORK); +rename($workfile, "$filename") +        or die("Could not rename \"$workfile\" to \"$filename\""); +# ---- end of Perl script ---- + + + + + + + + + + + + + + + + + + + + + + + + +Kunze                        Informational                     [Page 20] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +10. Author's Address + +   John A. Kunze +   Center for Knowledge Management +   University of California, San Francisco +   530 Parnassus Ave, Box 0840 +   San Francisco, CA  94143-0840, USA + +   Fax:   +1 415-476-4653 +   EMail: jak@ckm.ucsf.edu + + +11. References + +   [AAT]            Art and Architecture Thesaurus, Getty Information +                    Institute. +                    http://shiva.pub.getty.edu/aat_browser/ + +   [AC]             The A-Core: Metadata about Content Metadata, (in +                    progress) +                    http://metadata.net/ac/draft-iannella-admin-01.txt + +   [DC1]            Weibel, S., Kunze, J., Lagoze, C. and M. Wolf, +                    "Dublin Core Metadata for Resource Discovery", RFC +                    2413, September 1998. +                    ftp://ftp.isi.edu/in-notes/rfc2413.txt + +   [DCHOME]         Dublin Core Initiative Home Page. +                    http://purl.org/DC/ + +   [DCPROJECTS]     Projects Using Dublin Core Metadata. +                    http://purl.org/DC/projects/index.htm + +   [DCT1]           Dublin Core Type List 1, DC Type Working Group, +                    March 1999. +                    http://www.loc.gov/marc/typelist.html + +   [freeWAIS-sf2.0] The enhanced freeWAIS distribution, February 1999. +                    http://ls6-www.cs.uni- +                    dortmund.de/ir/projects/freeWAIS-sf/ + +   [GLIMPSE]        Glimpse Home Page. +                    http://glimpse.cs.arizona.edu/ + +   [HARVEST]        Harvest Web Indexing. +                    http://www.tardis.ed.ac.uk/harvest/ + + + + + +Kunze                        Informational                     [Page 21] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +   [HTML4.0]        Hypertext Markup Language 4.0 Specification, April +                    1998. +                    http://www.w3.org/TR/REC-html40/ + +   [ISEARCH]        Isearch Resources Page. +                    http://www.etymon.com/Isearch/ + +   [ISO639-2]       Code for the representation of names of languages, +                    1996. +                    http://www.indigo.ie/egt/standards/iso639/iso639-2- +                    en.html + +   [ISO8601]        ISO 8601:1988(E), Data elements and interchange +                    formats -- Information interchange -- Representation +                    of dates and times, International Organization for +                    Standardization, June 1988. +                    http://www.iso.ch/markete/8601.pdf + +   [MARC]           USMARC Format for Bibliographic Data, US Library of +                    Congress. +                    http://lcweb.loc.gov/marc/marc.html + +   [PERL]           L. Wall, T. Christiansen, R. Schwartz, Programming +                    Perl, Second Edition, O'Reilly, 1996. + +   [RDF]            Resource Description Framework Model and Syntax +                    Specification, February 1999. +                    http://www.w3.org/TR/REC-rdf-syntax/ + +   [RFC1766]        Alvestrand, H., "Tags for the Identification of +                    Languages", RFC 1766, March 1996. +                    ftp://ftp.isi.edu/in-notes/rfc1766.txt + +   [SWISH-E]        Simple Web Indexing System for Humans - Enhanced. +                    http://sunsite.Berkeley.EDU/SWISH-E/ + +   [TGN]            Thesaurus of Geographic Names, Getty Information +                    Institute. +                    http://shiva.pub.getty.edu/tgn_browser/ + +   [WTN8601]        W3C Technical Note - Profile of ISO 8601 Date and +                    Time Formats. +                    http://www.w3.org/TR/NOTE-datetime + +   [XML]            Extensible Markup Language (XML). +                    http://www.w3.org/TR/REC-xml + + + + + +Kunze                        Informational                     [Page 22] + +RFC 2731         Encoding Dublin Core Metadata in HTML     December 1999 + + +12.  Full Copyright Statement + +   Copyright (C) The Internet Society (1999).  All Rights Reserved. + +   This document and translations of it may be copied and furnished to +   others, and derivative works that comment on or otherwise explain it +   or assist in its implementation may be prepared, copied, published +   and distributed, in whole or in part, without restriction of any +   kind, provided that the above copyright notice and this paragraph are +   included on all such copies and derivative works.  However, this +   document itself may not be modified in any way, such as by removing +   the copyright notice or references to the Internet Society or other +   Internet organizations, except as needed for the purpose of +   developing Internet standards in which case the procedures for +   copyrights defined in the Internet Standards process must be +   followed, or as required to translate it into languages other than +   English. + +   The limited permissions granted above are perpetual and will not be +   revoked by the Internet Society or its successors or assigns. + +   This document and the information contained herein is provided on an +   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING +   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING +   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION +   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF +   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + +   Funding for the RFC Editor function is currently provided by the +   Internet Society. + + + + + + + + + + + + + + + + + + + +Kunze                        Informational                     [Page 23] + |