summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1439.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1439.txt')
-rw-r--r--doc/rfc/rfc1439.txt619
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc1439.txt b/doc/rfc/rfc1439.txt
new file mode 100644
index 0000000..dbdf4de
--- /dev/null
+++ b/doc/rfc/rfc1439.txt
@@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group C. Finseth
+Request for Comments: 1439 University of Minnesota
+ March 1993
+
+
+ The Uniqueness of Unique Identifiers
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard. Distribution of this memo is
+ unlimited.
+
+Abstract
+
+ This RFC provides information that may be useful when selecting a
+ method to use for assigning unique identifiers to people.
+
+1. The Issue
+
+ Computer systems require a way to identify the people associated with
+ them. These identifiers have been called "user names" or "account
+ names." The identifers are typically short, alphanumeric strings.
+ In general, these identifiers must be unique.
+
+ The uniqueness is usually achieved in one of three ways:
+
+ 1) The identifiers are assigned in a unique manner without using
+ information associated with the individual. Example identifiers are:
+
+ ax54tv
+ cs00034
+
+ This method was often used by large timesharing systems. While it
+ achieved the uniqueness property, there was no way of guessing the
+ identifier without knowing it through other means.
+
+ 2) The identifiers are assigned in a unique manner where the bulk of
+ the identifier is algorithmically derived from the individual's name.
+ Example identifers are:
+
+ Craig.A.Finseth-1
+ Finseth1
+ caf-1
+ fins0001
+
+ 3) The identifiers are in general not assigned in a unique manner:
+ the identifier is algorithmically derived from the individual's name
+
+
+
+Finseth [Page 1]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ and duplicates are handled in an ad-hoc manner. Example identifiers
+ are:
+
+ Craig.Finseth
+ caf
+
+ Now that we have widespread electronic mail, an important feature of
+ an identifier system is the ability to predict the identifier based
+ on other information associated with the individual. This other
+ information is typically the person's name.
+
+ Methods two and three make such predictions possible, especially if
+ you have one example mapping from a person's name to the identifier.
+ Method two relies on using some or all of the name and
+ algorithmically varying it to ensure uniqueness (for example, by
+ appending an integer). Method three relies on using some or all of
+ the name and selects an alternate identifier in the case of a
+ duplication.
+
+ For both methods, it is important to minimize the need for making the
+ adjustments required to ensure uniqueness (i.e., an integer that is
+ not 1 or an alternate identifier). The probability that an
+ adjustment will be required depends on the format of the identifer
+ and the size of the organization.
+
+2. Identifier Formats
+
+ There are a number of popular identifier formats. This section will
+ list some of them and supply both typical and maximum values for the
+ number of possible identifiers. A "typical" value is the number that
+ you are likely to run into in real life. A "maximum" value is the
+ largest number of possible (without getting extreme about it) values.
+ All ranges are expressed as a number of bits.
+
+2.1 Initials
+
+ There are three popular formats based on initials: those with one,
+ two, or three letters. (The number of people with more than three
+ initials is assumed to be small.) Values:
+
+ format typical maximum
+
+ I 4 5
+ II 8 10
+ III 12 15
+
+
+
+
+
+
+Finseth [Page 2]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ You can also think of these as first, middle, and last initials:
+
+ I 4 5
+ F L 8 10
+ F M L 12 15
+
+2.2 Names
+
+ Again, there are three popular formats based on using names: those
+ with the first name, last name, and both first and last names.
+ Values:
+
+ format typical maximum
+
+ First 8 14
+ Last 9 13
+ First Last 17 27
+
+2.3 Combinations
+
+ I have seen these combinations in use ("F" is first initial, "M" is
+ middle initial, and "L" is last initial):
+
+ format typical maximum
+
+ F Last 13 18
+ F M Last 17 23
+ First L 12 19
+ First M Last 21 32
+
+2.4 Complete List
+
+ Here are all possible combinations of nothing, initial, and full name
+ for first, middle, and last. The number of Middle names is assumed
+ to be the same as the number of First names. Values:
+
+ format typical maximum
+
+ _ _ _ 0 0
+ _ _ L 4 5
+ _ _ Last 9 13
+
+ _ M _ 4 5
+ _ M L 5 10
+ _ M Last 13 18
+
+ _ Middle _ 8 14
+ _ Middle L 12 19
+
+
+
+Finseth [Page 3]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ _ Middle Last 17 27
+
+ F _ _ 4 5
+ F _ L 5 10
+ F _ Last 13 18
+
+ F M _ 5 10
+ F M L 12 15
+ F M Last 17 23
+
+ F Middle _ 12 19
+ F Middle L 16 24
+ F Middle Last 21 32
+
+ First _ _ 8 14
+ First _ L 12 19
+ First _ Last 17 27
+
+ First M _ 12 19
+ First M L 16 24
+ First M Last 21 32
+
+ First Middle _ 16 28
+ First Middle L 20 33
+ First Middle Last 26 40
+
+3. Probabilities of Duplicates
+
+ As can be seen, the information content in these identifiers in no
+ case exceeds 40 bits and the typical information content never
+ exceeds 26 bits. The content of most of them is in the 8 to 20 bit
+ range. Duplicates are thus not only possible but likely.
+
+ The method used to compute the probability of duplicates is the same
+ as that of the well-known "birthday" problem. For a universe of N
+ items, the probability of duplicates in X members is expressed by:
+
+ N N-1 N-2 N-(X-1)
+ - x --- x --- x ... x -------
+ N N N N
+
+ A program to compute this function for selected values of N is given
+ in the appendix, as is its complete output.
+
+ The "1%" column is the number of items (people) before an
+ organization of that (universe) size has a 1% chance of a duplicate.
+ Similarly for 2%, 5%, 10%, and 20%.
+
+
+
+
+Finseth [Page 4]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ bits universe 1% 2% 5% 10% 20%
+
+ 6 64 2 3 4 5 6
+ 7 128 3 3 5 6 8
+ 8 256 3 4 6 8 12
+ 9 512 4 6 8 11 16
+ 10 1,024 6 7 11 16 22
+ 11 2,048 7 10 15 22 31
+ 12 4,096 10 14 21 30 44
+ 13 8,192 14 19 30 43 61
+ 14 16,384 19 27 42 60 86
+ 15 32,768 27 37 59 84 122
+ 16 65,536 37 52 83 118 172
+ 17 131,072 52 74 117 167 243
+ 18 262,144 74 104 165 236 343
+ 19 524,288 104 147 233 333 485
+ 20 1,048,576 146 207 329 471 685
+ 21 2,097,152 206 292 465 666 968
+ 22 4,194,304 291 413 657 941 1369
+ 23 8,388,608 412 583 929 1330 1936
+ 24 16,777,216 582 824 1313 1881 2737
+ 25 33,554,432 822 1165 1856 2660 3871
+ 26 67,108,864 1162 1648 2625 3761 5474
+ 27 134,217,728 1644 2330 3712 5319 7740
+ 28 268,435,456 2324 3294 5249 7522 10946
+ 29 536,870,912 3286 4659 7422 10637 15480
+ 30 1,073,741,824 4647 6588 10496 15043 21891
+ 31 2,147,483,648 6571 9316 14844 21273 30959
+
+ For example, assume an organization were to select the "First Last"
+ form. This form has 17 bits (typical) and 27 bits (maximum) of
+ information. The relevant line is:
+
+ 17 131,072 52 74 117 167 243
+
+ For an organization with 100 people, the probability of a duplicate
+ would be between 2% and 5% (probably around 4%). If the organization
+ had 1,000 people, the probability of a duplicate would be much
+ greater than 20%.
+
+Appendix: Reuse of Identifiers and Privacy Issues
+
+ Let's say that an organization were to select the format:
+
+ First.M.Last-#
+
+ as my own organization has. Is the -# required, or can one simply
+ do:
+
+
+
+Finseth [Page 5]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ Craig.A.Finseth
+
+ for the first one and
+
+ Craig.A.Finseth-2
+
+ (or -1) for the second? The answer is "no," although for non-obvious
+ reasons.
+
+ Assume that the organization has made this selection and a third
+ party wants to send e-mail to Craig.A.Finseth. Because of the
+ Electronic Communications Privacy Act of 1987, an organization must
+ treat electronic mail with care. In this case, there is no way for
+ the third party user to reliably know that sending to Craig.A.Finseth
+ is (may be) the wrong party. On the other hand, if the -# suffix is
+ always present and attempts to send mail to the non-suffix form are
+ rejected, the third party user will realize that they must have the
+ suffix in order to have a unique identifier.
+
+ For similar reasons, identifiers in this form should not be re-used
+ in the life of the mail system.
+
+Appendix: Perl Program to Compute Probabilities
+
+ #!/usr/local/bin/perl
+
+ for $bits (6..31) {
+ &Compute($bits);
+ }
+ exit(0);
+
+ # ------------------------------------------------------------
+
+ sub Compute {
+ $bits = $_[0];
+ $num = 1 << $bits;
+ $cnt = $num;
+
+ print "bits $bitsnumber $num:0;
+
+ for ($prob = 1; $prob > 0.99; ) {
+ $prob *= $cnt / $num;
+ $cnt--;
+ }
+
+ print "", $num - $cnt, "$prob0;
+
+ for (; $prob > 0.98; ) {
+
+
+
+Finseth [Page 6]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ $prob *= $cnt / $num;
+ $cnt--;
+ }
+ print "", $num - $cnt, "$prob0;
+
+ for (; $prob > 0.95; ) {
+ $prob *= $cnt / $num;
+ $cnt--;
+ }
+ print "", $num - $cnt, "$prob0;
+
+ for (; $prob > 0.90; ) {
+ $prob *= $cnt / $num;
+ $cnt--;
+ }
+ print "", $num - $cnt, "$prob0;
+
+ for (; $prob > 0.80; ) {
+ $prob *= $cnt / $num;
+ $cnt--;
+ }
+ print "", $num - $cnt, "$prob0;
+
+ print "0;
+ }
+
+Appendix: Perl Program Output
+
+ bits 6 number 64:
+ 2 0.984375
+ 3 0.95361328125
+ 4 0.90891265869140625
+ 5 0.85210561752319335938
+ 6 0.78553486615419387817
+
+ bits 7 number 128:
+ 3 0.9766845703125
+ 3 0.9766845703125
+ 5 0.92398747801780700684
+ 6 0.88789421715773642063
+ 8 0.79999355674331695809
+
+ bits 8 number 256:
+ 3 0.988311767578125
+ 4 0.97672998905181884766
+ 6 0.94268989971169503406
+ 8 0.89542306910786462204
+ 12 0.76969425214152431547
+
+
+
+Finseth [Page 7]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ bits 9 number 512:
+ 4 0.98832316696643829346
+ 6 0.97102570187075798458
+ 8 0.94652632751096643648
+ 11 0.89748056780293572476
+ 16 0.78916761796439427457
+
+ bits 10 number 1024:
+ 6 0.98543241551841020964
+ 7 0.97965839745873206645
+ 11 0.94753115178840541244
+ 16 0.88888866335604777014
+ 22 0.79677613655632184564
+
+ bits 11 number 2048:
+ 7 0.98978773152834598203
+ 10 0.97823367137821537476
+ 15 0.94990722378677450166
+ 22 0.89298119682681720288
+ 31 0.79597589885472519455
+
+ bits 12 number 4096:
+ 10 0.98906539062491305447
+ 14 0.97800426773009718762
+ 21 0.94994111694430838355
+ 30 0.89901365764115603874
+ 44 0.79312138620093930452
+
+ bits 13 number 8192:
+ 14 0.98894703242829806733
+ 19 0.97932692503837115439
+ 30 0.94822407309193512681
+ 43 0.89545741661906652631
+ 61 0.7993625840767998314
+
+ bits 14 number 16384:
+ 19 0.98961337517641645434
+ 27 0.97879319536756481668
+ 42 0.94876352395820107155
+ 60 0.89748107890372830209
+ 86 0.79973683158771624591
+
+ bits 15 number 32768:
+ 27 0.98934263776790121181
+ 37 0.97987304880641035165
+ 59 0.94909471808051404373
+ 84 0.89899774209805793923
+ 122 0.79809378598190949816
+
+
+
+Finseth [Page 8]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ bits 16 number 65536:
+ 37 0.98988724065590050216
+ 52 0.97996496661944154649
+ 83 0.94937874420413270737
+ 118 0.89996948010355670711
+ 172 0.79884228150816105618
+
+ bits 17 number 131072:
+ 52 0.98993311138884398925
+ 74 0.97960010416289267088
+ 117 0.94952974978505377823
+ 167 0.89960828942716541956
+ 243 0.79894309171178368167
+
+ bits 18 number 262144:
+ 74 0.98974844864797828503
+ 104 0.97977315557223210174
+ 165 0.94968621078621640041
+ 236 0.8995926348279144058
+ 343 0.7994422793765953994
+
+ bits 19 number 524288:
+ 104 0.98983557888923057178
+ 147 0.97973841652874515962
+ 233 0.94974719445364064185
+ 333 0.89991342619657743729
+ 485 0.79936749144148444568
+
+ bits 20 number 1048576:
+ 146 0.98995567500195758015
+ 207 0.97987072919607220989
+ 329 0.94983990872655321702
+ 471 0.89980857451706741656
+ 685 0.79974215234216872172
+
+ bits 21 number 2097152:
+ 206 0.98998177463778547214
+ 292 0.97994400939715686771
+ 465 0.94985589918092261374
+ 666 0.89978055267663470396
+ 968 0.79994886751736571373
+
+ bits 22 number 4194304:
+ 291 0.98999013137747737812
+ 413 0.97991951242142538714
+ 657 0.94991674892578203959
+ 941 0.89991652739633254399
+ 1369 0.79989205747440361716
+
+
+
+Finseth [Page 9]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ bits 23 number 8388608:
+ 412 0.98995762604049764022
+ 583 0.97997846530691334888
+ 929 0.94991024716640248826
+ 1330 0.89999961063320443877
+ 1936 0.79987028265451087794
+
+ bits 24 number 16777216:
+ 582 0.98997307486745211857
+ 824 0.97999203469417239809
+ 1313 0.94995516684099989835
+ 1881 0.89997049960675035152
+ 2737 0.79996700222056416063
+
+ bits 25 number 33554432:
+ 822 0.98999408609360783906
+ 1165 0.9799956928177964155
+ 1856 0.9499899669674316538
+ 2660 0.8999664414095410736
+ 3871 0.79992328289672998132
+
+ bits 26 number 67108864:
+ 1162 0.98999884535478044345
+ 1648 0.9799801637652703068
+ 2625 0.94997437525354821997
+ 3761 0.89999748465616635773
+ 5474 0.79993922903192515861
+
+ bits 27 number 134217728:
+ 1644 0.9899880636014986024
+ 2330 0.97998730103356856969
+ 3712 0.94997727934463771504
+ 5319 0.89998552434244594167
+ 7740 0.79999591580103557309
+
+ bits 28 number 268435456:
+ 2324 0.98999458855588851058
+ 3294 0.97999828329325222587
+ 5249 0.94998397932368705554
+ 7522 0.89998576049206902017
+ 10946 0.79999058777500076101
+
+ bits 29 number 536870912:
+ 3286 0.98999717306002099626
+ 4659 0.97999160965267329004
+ 7422 0.94999720388831232487
+ 10637 0.89999506567702891591
+ 15480 0.7999860979665908145
+
+
+
+Finseth [Page 10]
+
+RFC 1439 Uniqueness of Unique Identifiers March 1993
+
+
+ bits 30 number 1073741824:
+ 4647 0.98999674474047760775
+ 6588 0.97999531736215383937
+ 10496 0.94999806770951356061
+ 15043 0.89999250738244507275
+ 21891 0.79999995570982085358
+
+ bits 31 number 2147483648:
+ 6571 0.98999869761078929109
+ 9316 0.97999801528523688976
+ 14844 0.94999403283519279206
+ 21273 0.89999983631135749285
+ 30959 0.79999272222201334159
+
+References
+
+ Bruce Lansky (1984). The Best Baby Name Book. Deephaven, MN:
+ Meadowbrook. ISBN 0-671-54463-2.
+
+ Lareina Rule (1988). Name Your Baby. Bantam. ISBN 0-553-27145-8.
+
+Security Considerations
+
+ Security issues are not discussed in this memo.
+
+Author's Address
+
+ Craig A. Finseth
+ Networking Services
+ Computer and Information Services
+ University of Minnesota
+ 130 Lind Hall
+ 207 Church St. SE
+ Minneapolis, MN 55455-0134
+
+ EMail: Craig.A.Finseth-1@umn.edu or
+ fin@unet.umn.edu
+
+ Phone: +1 612 624 3375
+ Fax: +1 612 626 1002
+
+
+
+
+
+
+
+
+
+
+
+Finseth [Page 11]
+ \ No newline at end of file