1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
|
Network Working Group G. Klyne
Request for Comments: 2938 Content Technologies
Updates: 2533 L. Masinter
Category: Standards Track AT&T
September 2000
Identifying Composite Media Features
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2000). All Rights Reserved.
Abstract
In RFC 2533, an expression format is presented for describing media
feature capabilities as a combination of simple media feature tags.
This document describes an abbreviated format for a composite media
feature set, based upon a hash of the feature expression describing
that composite.
Table of Contents
1. Introduction ................................................2
1.1 Organization of this document ...............................2
1.2 Terminology and document conventions ........................2
2. Motivation and goals ........................................3
3. Composite feature representation ............................4
3.1 Feature set hashed reference format .........................5
3.1.1 Hash value calculation ......................................6
3.1.2 Base-32 value representation ................................7
3.2 Resolving feature set identifiers ...........................8
3.2.1 Query protocol ..............................................8
3.2.2 Inline feature set details ..................................9
4. Examples ...................................................10
5. Internationalization Considerations ........................12
6. Security Considerations ....................................13
7. Acknowledgements ...........................................13
8. References .................................................13
Klyne & Masinter Standards Track [Page 1]
^L
RFC 2938 Identifying Composite Media Features September 2000
9. Authors' Addresses .........................................15
10. Appendix A: The birthday paradox ...........................16
11. Full Copyright Statement ...................................18
1. Introduction
In "A Syntax for Describing Media Feature Sets" [1], an expression
format is presented for describing media feature capabilities as a
combination of simple media feature tags [2].
This document proposes an abbreviated format for a composite media
feature set, based upon a hash of the feature expression describing
that composite.
This memo extends and builds upon the expression syntax described in
RFC 2533 [1], and it is assumed that the reader is familiar with the
interpretation of feature set expressions described there.
1.1 Organization of this document
Section 2 sets out some of the background and goals for feature set
references.
Section 3 presents a syntax for feature set references, and describes
how they are related to feature set expressions.
1.2 Terminology and document conventions
This section defines a number of terms and other document
conventions, which are used with specific meaning in this memo. The
terms are listed in alphabetical order.
dereference
the act of replacing a feature set reference with its
corresponding feature set expression. Also called
"resolution".
feature set
some set of media features described by a media feature
assertion, as described in "A Syntax for Describing Media
Feature Sets" [1]. (See that memo for a more formal
definition of this term.)
feature set expression
a string that describes some feature set, formulated
according to the rules in "A Syntax for Describing Media
feature sets" [1] (and possibly extended by other
specifications).
Klyne & Masinter Standards Track [Page 2]
^L
RFC 2938 Identifying Composite Media Features September 2000
feature set reference
a brief construct that references some feature set. (See
also: "dereference".)
feature set tag
a name that conforms to the syntax of a feature tag [2] that
is used to denote a feature set rather than a single
feature.
resolution
(See "dereference").
This specification uses syntax notation and conventions described
in RFC 2234, "Augmented BNF for Syntax Specifications: ABNF" [3].
NOTE: Comments like this provide additional nonessential
information about the rationale behind this document. Such
information is not needed for building a conformant
implementation, but may help those who wish to understand the
design in greater depth.
2. Motivation and goals
The range of media feature capabilities of a message handling system
can be quite extensive, and the corresponding feature set expression
[1] can reach a significant size.
A requirement has been identified to allow recurring feature sets to
be identified by a single reference value, which can be combined with
other elements in a feature set expression. It is anticipated that
mechanisms will be provided that allow the recipient of such a
feature set reference to discover the corresponding feature set
expression, but any such mechanism is beyond the scope of this
specification.
Thus, the goals for this proposal are:
o to provide an abbreviated form for referencing an arbitrary
feature set expression.
o the meaning of (i.e., the corresponding feature set expression) a
feature set reference should be independent of any particular
mechanism that may be used to dereference it.
o to be able to verify whether a given feature set expression
corresponds to some feature set reference without having to
perform an explicit dereferencing operation (i.e., without
incurring additional network traffic).
Klyne & Masinter Standards Track [Page 3]
^L
RFC 2938 Identifying Composite Media Features September 2000
o for protocol processors that conform to RFC 2533 [1] to be able to
sensibly handle a feature set reference without explicit knowledge
of its meaning (i.e., the introduction of feature set references
should not break existing feature expression processors). That
is, the applicable interpretation and processing rules of RFC 2533
[1] apply equally to expressions containing feature set
references.
NOTE: This proposal does not attempt to address the "override"
or "default" problem. (Where a feature set may be referenced and
selectively modified.)
Some circumstances in which such an abbreviated form might be used
include:
o A media feature expression that contains a repeated sub-
expression. If the sub-expression is quite large, space can be
saved by writing it out once, then using the abbreviated form to
reference it.
o A capability that is common to a range of devices, such as a given
class of fax machine where are large number of feature tags are
involved, but only a small number of common feature sets. If the
recipient understands, or can discover, that some abbreviation
stands for a given feature set then feature expression size can be
reduced by using the abbreviation.
If feature set abbreviations are used in this way, it may be that
they can be interpreted by a simple table lookup rather than full
feature expression parsing. (Making this useful in practice will
depend on crafting the feature subsets appropriately.)
Examples of such usage are given in section 4 of this memo.
This memo does not specify how a program that receives a feature set
abbreviation should discover the corresponding feature set
expression: see section 3.2.
3. Composite feature representation
This specification hinges on two central ideas:
o the use of auxiliary predicates (introduced in RFC 2533 [1]) to
form the basis of a feature set identifier, and
o the use of a token based on a hash function computed over the
referenced feature set expression.
Klyne & Masinter Standards Track [Page 4]
^L
RFC 2938 Identifying Composite Media Features September 2000
A key reason to use a hash function to generate an identifier is to
define a global name space without requiring a central naming
authority. New feature set tags can be introduced by any party
following the appropriate rules of formulation, without reference to
any centralized authority.
Local resolution services may be needed to map feature set tags to
their corresponding feature set expressions, but these are not able
to vary the meaning of any given tag. Failure of a resolution
service to return the correct expression is detectable by a calling
application, which should reject any incorrect value supplied.
NOTE: where a feature set reference is used, its meaning is
defined by substitution of the referenced feature expression into
the referencing expression. When all references have been thus
replaced, the result is interpreted as a normal feature
expression.
In particular, if a referenced feature expression contains some
feature tag that is also constrained by the referencing
expression, the constraints are interpreted per RFC 2533 [1],
without regard for their origin. E.g., (using some notation
introduced below):
(& (pix-x=100) (pix-y<=300)
(h.SBB5REAOMHC09CP2GM4V07PQP0) )
where (h.SBB5REAOMHC09CP2GM4V07PQP0) resolves to:
(& (pix-x<=200) (pix-y<=150) )
yields a result equivalent to:
(& (pix-x=100) (pix-y<=150) )
3.1 Feature set hashed reference format
This specification introduces a special form of auxiliary predicate
name with the following syntax:
fname = "h." 1*BASE32DIGIT
BASE32DIGIT = DIGIT
/ "A" / "B" / "C" / "D" / "E" / "F" / "G" / "H"
/ "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P"
/ "Q" / "R" / "S" / "T" / "U" / "V"
The sequence of base-32 digits represents the value of a hash
function calculated over the corresponding feature set expression
(see following sections). Note that the above syntax allows upper-
or lower-case letters for base-32 digits (per RFC 2234 [3]).
Klyne & Masinter Standards Track [Page 5]
^L
RFC 2938 Identifying Composite Media Features September 2000
Thus, within a feature set expression, a hashed feature set reference
would have the following form:
(h.123456789abcdefghijklmnopq)
3.1.1 Hash value calculation
The hash value is calculated using the MD5 algorithm [6] over the
text of the referenced feature set expression subjected to certain
normalizations. The feature expression must conform to the syntax
given for 'filter' in RFC 2533 [1]:
filter = "(" filtercomp ")" *( ";" parameter )
The steps for calculating a hash value are:
1. Whitespace normalization: all spaces, CR, LF, TAB and any other
layout control characters that may be embedded in the feature
expression string, other than those contained within quoted
strings, are removed (or ignored for the purpose of hash value
computation).
2. Case normalization: all lower case letters in the feature
expression, other than those contained within quoted strings, are
converted to upper case. That is, unquoted characters with US-
ASCII values 97 to 122 (decimal) are changed to corresponding
characters in the range 65 to 90.
3. Hash computation: the MD5 algorithm, described in RFC 1321 [6], is
applied to the normalized feature expression string (represented
as a sequence of octets containing US-ASCII character codes; see
also section 5).
The result obtained in step 3 is a 128-bit (16 octet) value that
is converted to a base-32 representation to form the feature set
reference.
NOTE: under some circumstances, removal of ALL whitespace may
result in an invalid feature expression string. This should not
be a problem as this is done only for the purpose of calculating
a hash value, and significantly different feature expressions are
expected to differ in ways other than their whitespace.
NOTE: case normalization is deemed appropriate since feature tag
and token matching is case insensitive.
Klyne & Masinter Standards Track [Page 6]
^L
RFC 2938 Identifying Composite Media Features September 2000
3.1.2 Base-32 value representation
RFC 1321 [6] describes how to calculate an MD5 hash value that is a
sequence of 16 octets. This is then required to be coded as a base-
32 value, which is a sequence of base-32 digit characters.
Each successive character in a base-32 value represents 5 successive
bits of the underlying octet sequence. Thus, each group of 8
characters represents a sequence of 5 octets (40 bits):
1 2 3
01234567 89012345 67890123 45678901 23456789
+--------+--------+--------+--------+--------+
|< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
+--------+--------+--------+--------+--------+
<===> 8th character
<====> 7th character
<===> 6th character
<====> 5th character
<====> 4th character
<===> 3rd character
<====> 2nd character
<===> 1st character
The value (i.e. sequence of bits) represented by each base-32 digit
character is indicated by the following table:
"0" 0 "A" 10 "K" 20 "U" 30
"1" 1 "B" 11 "L" 21 "V" 31
"2" 2 "C" 12 "M" 22
"3" 3 "D" 13 "N" 23
"4" 4 "E" 14 "O" 24
"5" 5 "F" 15 "P" 25
"6" 6 "G" 16 "Q" 26
"7" 7 "H" 17 "R" 27
"8" 8 "I" 18 "S" 28
"9" 9 "J" 19 "T" 29
When encoding a base-32 value, each full group of 5 octets is
represented by a sequence of 8 characters indicated above. If a
group of less than 5 octets remain after this, they are encoded using
as many additional characters as may be needed: 1, 2, 3 or 4 octets
are encoded by 2, 4, 5 or 7 characters respectively. Any spare bits
represented by the base-32 digit characters are selected to be zero.
Klyne & Masinter Standards Track [Page 7]
^L
RFC 2938 Identifying Composite Media Features September 2000
When decoding a base-32 value, the reverse mapping is applied: each
full group of 8 characters codes a sequence of 5 octets. A final
group of 2, 4, 5 or 7 characters codes a sequence of 1, 2, 3 or 4
octets respectively. Any spare bits represented by the final group
of characters are discarded.
Thus, for a 128-bit (16 octet) MD5 hash value, the first 15 octets
are coded as 24 base 32 digit characters, and the final octet is
coded by two characters.
NOTE: Base64 representation (per MIME [4]) would be more compact
(21 rather than 26 characters for the MD5 128-bit hash value),
but an auxiliary predicate name is defined (by [1]) to have the
same syntax as a feature tag, and the feature tag matching rules
(per [2]) state that feature tag matching is case insensitive.
Base36 representation was considered (i.e., using all letters
"A"-"Z") but was not used because this would require extended
precision multiplication and division operations to encode and
decode the hash values.
3.2 Resolving feature set identifiers
This memo does not mandate any particular mechanism for dereferencing
a feature set identifier. It is expected that specific dereferencing
mechanisms will be specified for any application or protocol that
uses them.
The following sections describe some ways that feature set
dereferencing information may be incorporated into a feature set
expression. These are based on auxiliary predicate definitions
within a "where" clause [1].
When a hashed feature set reference is used, conformance to the
hashing rules takes precedence over any other determination of the
feature expression. Any expression, however obtained, may not be
substituted for the hash-based reference unless it yields the correct
hash value.
3.2.1 Query protocol
A protocol providing request/response type queries (e.g., HTTP, LDAP,
etc.) might be set up to provide a resolution service.
Thus, a query to a server associated with the capabilities could be
performed on the feature set identifier. The response returned would
be a CONNEG expression; e.g.,
Klyne & Masinter Standards Track [Page 8]
^L
RFC 2938 Identifying Composite Media Features September 2000
(h.SBB5REAOMHC09CP2GM4V07PQP0)
where
(h.SBB5REAOMHC09CP2GM4V07PQP0) :- (& (pix-x<=200) (pix-y<=150) )
end
or just:
(& (pix-x<=200) (pix-y<=150) )
This result would be combined with the original expression to
obtain a result not including the hash based predicate.
This process might be further enhanced by using URN resolution
mechanisms (e.g., DNS NAPTR [10]) to discover the resolution
protocol and server.
3.2.2 Inline feature set details
In this case, a reference is resolved by including its definition
inline in an expression.
The feature set expression associated with a reference value may be
specified directly in a "where" clause, using the auxiliary
predicate definition syntax [1]; e.g.,
(& (dpi=100) (h.SBB5REAOMHC09CP2GM4V07PQP0) )
where
(h.SBB5REAOMHC09CP2GM4V07PQP0) :- (& (pix-x<=200) (pix-y<=150) )
end
This form might be used on request (where the request mechanism is
defined by the invoking application protocol), or when the originator
believes the recipient may not understand the reference.
It is an error if the inline feature expression does not yield the
hash value contained in auxiliary predicate name.
NOTE: viewed in isolation, this format does not have any obvious
value, in that the (h.xxx) form of auxiliary predicate could be
replaced by any arbitrary name.
It is anticipated that this form might be used as a follow-up
response in a sequence along the lines of:
A> Capabilities are:
(& (dpi=100) (h.SBB5REAOMHC09CP2GM4V07PQP0) )
B> Do not understand:
(h.SBB5REAOMHC09CP2GM4V07PQP0)
Klyne & Masinter Standards Track [Page 9]
^L
RFC 2938 Identifying Composite Media Features September 2000
A> Capabilities are:
(& (dpi=100) (h.SBB5REAOMHC09CP2GM4V07PQP0) )
where
(h.SBB5REAOMHC09CP2GM4V07PQP0) :- (& (pix-x<=200)
(pix-y<=150) )
end
4. Examples
The following are some examples of feature set expressions containing
feature set references:
(& (dpi=100) (h.SBB5REAOMHC09CP2GM4V07PQP0) )
(& (dpi=100) (h.SBB5REAOMHC09CP2GM4V07PQP0) )
where
(h.SBB5REAOMHC09CP2GM4V07PQP0) :-
(& (pix-x<=200) (pix-y<=150) )
end
(h.QGEOPMCF02P09QC016CEPU22FO)
where
(h.QGEOPMCF02P09QC016CEPU22FO) :-
(| (& (ua-media=continuous) (dpi=200) (dpi-xyratio=200/100)
(color=Binary) (paper-size=B4) (image-coding=MH) )
(& (ua-media=continuous) (dpi=200) (dpi-xyratio=200/100)
(color=Binary) (paper-size=B4) (image-coding=MR) )
(& (ua-media=stationery) (dpi=300) (dpi-xyratio=1)
(color=Binary) (paper-size=A4) (image-coding=JBIG) )
(& (ua-media=transparency) (dpi=300) (dpi-xyratio=1)
(color=Binary) (paper-size=A4) (image-coding=JBIG) ) )
end
The following examples are based on Internet fax work, and show how a
feature-hash might be used to express the commonly-used features. A
form of Internet fax system that is expected to be quite common is a
so-called "simple mode" system, whose capabilities are described by
the following feature expression:
(& (image-file-structure=TIFF-minimal)
(MRC-mode=0)
(color=Binary)
(image-coding=MH) (MRC-mode=0)
(| (& (dpi=204) (dpi-xyratio=[204/98,204/196]) )
(& (dpi=200) (dpi-xyratio=[200/100,1]) ) )
(size-x<=2150/254)
(paper-size=A4)
Klyne & Masinter Standards Track [Page 10]
^L
RFC 2938 Identifying Composite Media Features September 2000
(ua-media=stationery) )
This might be expressed by the hash-based feature set identifier:
(h.MSB955PVIRT1QOHET9AJT5JM3O)
The following example describes capabilities of a full-color
Internet fax system. Note a number of feature values are
applicable in common with '(color=grey)' and '(color=full)':
(& (image-file-structure=TIFF)
(MRC-mode=0)
(| (& (color=Binary)
(image-coding=[MH,MR,MMR])
(| (& (dpi=204) (dpi-xyratio=[204/98,204/196]) )
(& (dpi=200) (dpi-xyratio=[200/100,1]) )
(& (dpi=300) (dpi-xyratio=1) ) ) )
(& (color=grey)
(image-coding=JPEG)
(image-coding-constraint=JPEG-T4E)
(color-levels<=256)
(color-space=CIELAB)
(color-illuminant=D50)
(CIELAB-L-min>=0)
(CIELAB-L-max<=100)
(dpi=[100,200,300]) (dpi-xyratio=1) )
(& (color=full)
(image-coding=JPEG)
(image-coding-constraint=JPEG-T4E)
(color-subsampling=["1:1:1","4:1:1"])
(color-levels<=16777216)
(color-space=CIELAB)
(color-illuminant=D50)
(CIELAB-L-min>=0)
(CIELAB-L-max<=100)
(CIELAB-a-min>=-85)
(CIELAB-a-max<=85)
(CIELAB-b-min>=-75)
(CIELAB-b-max<=125)
(dpi=[100,200,300]) (dpi-xyratio=1) ) )
(size-x<=2150/254)
(paper-size=[letter,A4,B4]) )
(ua-media=stationery) )
Klyne & Masinter Standards Track [Page 11]
^L
RFC 2938 Identifying Composite Media Features September 2000
Separating out the common capabilities yields:
(& (image-file-structure=TIFF)
(MRC-mode=0)
(| (& (color=Binary)
(image-coding=[MH,MR,MMR])
(| (& (dpi=204) (dpi-xyratio=[204/98,204/196]) )
(& (dpi=200) (dpi-xyratio=[200/100,1]) )
(& (dpi=300) (dpi-xyratio=1) ) ) )
(& (color=grey)
(color-levels<=256)
(h.QVSEM8V2LMJ8VOR7V682J7079O) )
(& (color=full)
(color-subsampling=["1:1:1","4:1:1"])
(color-levels<=16777216)
(CIELAB-a-min>=-85)
(CIELAB-a-max<=85)
(CIELAB-b-min>=-75)
(CIELAB-b-max<=125)
(h.QVSEM8V2LMJ8VOR7V682J7079O) ) )
(size-x<=2150/254)
(paper-size=[letter,A4,B4]) )
(ua-media=stationery) )
where
(h.QVSEM8V2LMJ8VOR7V682J7079O) :-
(& (image-coding=JPEG)
(image-coding-constraint=JPEG-T4E)
(color-space=CIELAB)
(color-illuminant=D50)
(CIELAB-L-min>=0)
(CIELAB-L-max<=100)
(dpi=[100,200,300]) (dpi-xyratio=1) )
end
5. Internationalization Considerations
Feature set expressions and URI strings are currently defined to
consist of only characters from the US-ASCII repertoire [1,5]; under
these circumstances this specification is not impacted by
internationalization considerations (other than any already
applicable to URIs [5]).
But, if future revisions of the feature set syntax permit non-US-
ASCII characters (e.g. within quoted strings), then some canonical
representation must be defined for the purposes of calculating hash
values. One choice might be to use a UTF-8 equivalent representation
as the basis for calculating the feature set hash. Another choice
Klyne & Masinter Standards Track [Page 12]
^L
RFC 2938 Identifying Composite Media Features September 2000
might be to leave this as an application protocol issue (but this
could lead to non-interoperable feature sets between different
protocols).
Another conceivable issue is that of up-casing the feature expression
in preparation for computing a hash value. This does not apply to
the content of strings so is not likely to be an issue. But if
changes are made that do permit non-US-ASCII characters in feature
tags or token strings, consideration must be given to properly
defining how case conversion is to be performed.
6. Security Considerations
For the most part, security considerations are the same as those that
apply for capability identification in general [1,2,9].
A possible added consideration is that use of a specific feature set
identifier may reveal more information about a system than is
necessary for a transaction at hand.
7. Acknowledgements
Ideas here have been improved by early discussions with Martin
Duerst, Al Gilman and Ted Hardie. Useful suggestions for improvement
were provided by Maurizio Codogno.
8. References
[1] Klyne, G., "A Syntax for Describing Media Feature Sets", RFC
2533, March 1999.
[2] Mutz, A. and T. Hardie, "Media Feature Tag Registration
Procedure", RFC 2506, March 1999.
[3] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
[4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part 1: Format of Internet message bodies",
RFC 2045, November 1996.
[5] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource
Identifiers (URI): Generic Syntax", RFC 2396, August 1998.
[6] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April
1992.
Klyne & Masinter Standards Track [Page 13]
^L
RFC 2938 Identifying Composite Media Features September 2000
[7] "Applied Cryptography"
Bruce Schneier
John Wiley and Sons, 1996 (second edition)
ISBN 0-471-12845-7 (cloth)
ISBN 0-471-11709-9 (paper)
[8] Klyne, G., "Protocol-independent Content Negotiation Framework",
RFC 2703, September 1999.
[9] "Numerical Recipes"
William H Press, Brian P Flannery, Saul A Teukolski and
William T Vetterling
Cambridge University Press (1986)
ISBN 0 521 30811 9
(The Gamma function approximation is presented in chapter 6 on
"Special Functions". There have been several later editions of
this book published, so the chapter reference may change.)
[10] Daniel, R. and M. Mealling, "Resolution of Uniform Resource
Identifiers using the Domain Name System", RFC 2168, June 1997.
[11] Java source code of feature set matching algorithm, with feature
set hash computation option. Linked from
<http://www.imc.org/ietf-medfree/>
Klyne & Masinter Standards Track [Page 14]
^L
RFC 2938 Identifying Composite Media Features September 2000
9. Authors' Addresses
Graham Klyne
Content Technologies Ltd.
1220 Parkview,
Arlington Business Park
Theale
Reading, RG7 4SA
United Kingdom
Phone: +44 118 930 1300
Fax: +44 118 930 1301
EMail: GK@ACM.ORG
Larry Masinter
AT&T Labs
75 Willow Road
Menlo Park, CA 94025
Phone: +1-650-463-7059
EMail: LMM@acm.org
http://larry.masinter.net
Klyne & Masinter Standards Track [Page 15]
^L
RFC 2938 Identifying Composite Media Features September 2000
10. Appendix A: The birthday paradox
NOTE: this entire section is commentary, and does not affect the
feature set reference specification in any way.
The use of a hash value to represent an arbitrary feature set is
based on a presumption that no two distinct feature sets will yield
the same hash value.
There is a small but distinct possibility that two different feature
sets will indeed yield the same hash value.
We assume that the 128-bit hash function distributes hash values for
feature sets, even those with very small differences, randomly and
evenly through the range of 2^128 (approximately 3*10^38) possible
values. This is a fundamental property of a good digest algorithm
like MD5. Thus, the chance that any two distinct feature set
expressions yield the same hash is less than 1 in 10^38. This is
negligible when compared with, say, the probability that a receiving
system will fail having received data conforming to a negotiated
feature set.
But when the number of distinct feature sets in circulation
increases, the probability of repeating a hash value increases
surprisingly. This is illustrated by the "birthday paradox": given
a random collection of just 23 people, there is a greater than even
chance that there exists some pair with the same birthday. This
topic is discussed further in sections 7.4 and 7.5 of Bruce
Schneier's "Applied Cryptography" [7].
The table below shows the "birthday paradox" probabilities that at
least one pair of feature sets has the same hash value for different
numbers of feature sets in use.
Number of feature Probability of two
sets in use sets with the same
hash value
1 0
2 3E-39
10 1E-37
1E3 1E-33
1E6 1E-27
1E9 1E-21
1E12 1E-15
1E15 1E-9
1E18 1E-3
Klyne & Masinter Standards Track [Page 16]
^L
RFC 2938 Identifying Composite Media Features September 2000
The above probability computations are approximate, being
performed using logarithms of a Gamma function
approximation by Lanczos [9]. The probability formula is
'P=1-(m!/((m-n)! m^n))', where 'm' is the total number of
possible hash values (2^128) and 'n' is the number of
feature sets in use.
If original feature set expressions are generated manually, or only
in response to some manually constrained process, the total number
of feature sets in circulation is likely to remain very small in
relation to the total number of possible hash values.
The outcome of all this is: assuming that the feature sets are
manually generated, even taking account of the birthday paradox
effect, the probability of incorrectly identifying a feature set
using a hash value is still negligibly small when compared with
other possible failure modes.
Klyne & Masinter Standards Track [Page 17]
^L
RFC 2938 Identifying Composite Media Features September 2000
11. Full Copyright Statement
Copyright (C) The Internet Society (2000). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Klyne & Masinter Standards Track [Page 18]
^L
|