summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1630.txt
blob: c1e9d9abd0b2233e05b7738ff448e9865344b1f4 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
Network Working Group                                     T. Berners-Lee
Request for Comments: 1630                                          CERN
Category: Informational                                        June 1994


                 Universal Resource Identifiers in WWW

                A Unifying Syntax for the Expression of
             Names and Addresses of Objects on the Network
                     as used in the World-Wide Web

Status of this Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

IESG Note:

   Note that the work contained in this memo does not describe an
   Internet standard.  An Internet standard for general Resource
   Identifiers is under development within the IETF.

Introduction

   This document defines the syntax used by the World-Wide Web
   initiative to encode the names and addresses of objects on the
   Internet.  The web is considered to include objects accessed using an
   extendable number of protocols, existing, invented for the web
   itself, or to be invented in the future.  Access instructions for an
   individual object under a given protocol are encoded into forms of
   address string.  Other protocols allow the use of object names of
   various forms.  In order to abstract the idea of a generic object,
   the web needs the concepts of the universal set of objects, and of
   the universal set of names or addresses of objects.

   A Universal Resource Identifier (URI) is a member of this universal
   set of names in registered name spaces and addresses referring to
   registered protocols or name spaces.  A Uniform Resource Locator
   (URL), defined elsewhere, is a form of URI which expresses an address
   which maps onto an access algorithm using network protocols. Existing
   URI schemes which correspond to the (still mutating) concept of IETF
   URLs are listed here. The Uniform Resource Name (URN) debate attempts
   to define a name space (and presumably resolution protocols) for
   persistent object names. This area is not addressed by this document,
   which is written in order to document existing practice and provide a
   reference point for URL and URN discussions.




Berners-Lee                                                     [Page 1]
^L
RFC 1630                      URIs in WWW                      June 1994


   The world-wide web protocols are discussed on the mailing list www-
   talk-request@info.cern.ch and the newsgroup comp.infosystems.www is
   preferable for beginner's questions. The mailing list uri-
   request@bunyip.com has discussion related particularly to the URI
   issue.  The author may be contacted as timbl@info.cern.ch.

   This document is available in hypertext form at:

   http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html

The Need For a Universal Syntax

   This section describes the concept of the URI and does not form part
   of the specification.

   Many protocols and systems for document search and retrieval are
   currently in use, and many more protocols or refinements of existing
   protocols are to be expected in a field whose expansion is explosive.

   These systems are aiming to achieve global search and readership of
   documents across differing computing platforms, and despite a
   plethora of protocols and data formats.  As protocols evolve,
   gateways can allow global access to remain possible. As data formats
   evolve, format conversion programs can preserve global access.  There
   is one area, however, in which it is impractical to make conversions,
   and that is in the names and addresses used to identify objects.
   This is because names and addresses of objects are passed on in so
   many ways, from the backs of envelopes to hypertext objects, and may
   have a long life.

   A common feature of almost all the data models of past and proposed
   systems is something which can be mapped onto a concept of "object"
   and some kind of name, address, or identifier for that object.  One
   can therefore define a set of name spaces in which these objects can
   be said to exist.

   Practical systems need to access and mix objects which are part of
   different existing and proposed systems.  Therefore, the concept of
   the universal set of all objects, and hence the universal set of
   names and addresses, in all name spaces, becomes important.  This
   allows names in different spaces to be treated in a common way, even
   though names in different spaces have differing characteristics, as
   do the objects to which they refer.








Berners-Lee                                                     [Page 2]
^L
RFC 1630                      URIs in WWW                      June 1994


   URIs

      This document defines a way to encapsulate a name in any
      registered name space, and label it with the the name space,
      producing a member of the universal set.  Such an encoded and
      labelled member of this set is known as a Universal Resource
      Identifier, or URI.

      The universal syntax allows access of objects available using
      existing protocols, and may be extended with technology.

      The specification of the URI syntax does not imply anything about
      the properties of names and addresses in the various name spaces
      which are mapped onto the set of URI strings.  The properties
      follow from the specifications of the protocols and the associated
      usage conventions for each scheme.

   URLs

      For existing Internet access protocols, it is necessary in most
      cases to define the encoding of the access algorithm into
      something concise enough to be termed address.  URIs which refer
      to objects accessed with existing protocols are known as "Uniform
      Resource Locators" (URLs) and are listed here as used in WWW, but
      to be formally defined in a separate document.

   URNs

      There is currently a drive to define a space of more persistent
      names than any URLs.  These "Uniform Resource Names" are the
      subject of an IETF working group's discussions.  (See Sollins and
      Masinter, Functional Specifications for URNs, circulated
      informally.)

      The URI syntax and URL forms have been in widespread use by
      World-Wide Web software since 1990.















Berners-Lee                                                     [Page 3]
^L
RFC 1630                      URIs in WWW                      June 1994


Design Criteria and Choices

   This section is not part of the specification: it is simply an
   explanation of the way in which the specification was derived.

   Design criteria

      The syntax was designed to be:

      Extensible              New naming schemes may be added later.

      Complete                It is possible to encode any naming
                              scheme.

      Printable               It is possible to express any URI using
                              7-bit ASCII characters so that URIs may,
                              if necessary, be passed using pen and ink.

   Choices for a universal syntax

      For the syntax itself there is little choice except for the order
      and punctuation of the elements, and the acceptable characters and
      escaping rules.

      The extensibility requirement is met by allowing an arbitrary (but
      registered) string to be used as a prefix.  A prefix is chosen as
      left to right parsing is more common than right to left.  The
      choice of a colon as separator of the prefix from the rest of the
      URI was arbitrary.

      The decoding of the rest of the string is defined as a function of
      the prefix.  New prefixed are introduced for new schemes as
      necessary, in agreement with the registration authority.  The
      registration of a new scheme clearly requires the definition of
      the decoding of the URI into a given name space, and a definition
      of the properties and, where applicable, resolution protocols, for
      the name space.

      The completeness requirement is easily met by allowing
      particularly strange or plain binary names to be encoded in base
      16 or 64 using the acceptable characters.

      The printability requirement could have been met by requiring all
      schemes to encode characters not part of a basic set.  This led to
      many discussions of what the basic set should be.  A difficult
      case, for example, is when an ISO latin 1 string appears in a URL,
      and within an application with ISO Latin-1 capability, it can be
      handled intact.  However, for transport in general, the non-ASCII



Berners-Lee                                                     [Page 4]
^L
RFC 1630                      URIs in WWW                      June 1994


      characters need to be escaped.

      The solution to this was to specify a safe set of characters, and
      a general escaping scheme which may be used for encoding "unsafe"
      characters.  This "safe" set is suitable, for example, for use in
      electronic mail.  This is the canonical form of a URI.

      The choice of escape character for introducing representations of
      non-allowed characters also tends to be a matter of taste.  An
      ANSI standard exists in the C language, using the back-slash
      character "\".  The use of this character on unix command lines,
      however, can be a problem as it is interpreted by many shell
      programs, and would have itself to be escaped.  It is also a
      character which is not available on certain keyboards.  The equals
      sign is commonly used in the encoding of names having
      attribute=value pairs.  The percent sign was eventually chosen as
      a suitable escape character.

      There is a conflict between the need to be able to represent many
      characters including spaces within a URI directly, and the need to
      be able to use a URI in environments which have limited character
      sets or in which certain characters are prone to corruption.  This
      conflict has been resolved by use of an hexadecimal escaping
      method which may be applied to any characters forbidden in a given
      context.  When URLs are moved between contexts, the set of
      characters escaped may be enlarged or reduced unambiguously.

      The use of white space characters is risky in URIs to be printed
      or sent by electronic mail, and the use of multiple white space
      characters is very risky.  This is because of the frequent
      introduction of extraneous white space when lines are wrapped by
      systems such as mail, or sheer necessity of narrow column width,
      and because of the inter-conversion of various forms of white
      space which occurs during character code conversion and the
      transfer of text between applications.  This is why the canonical
      form for URIs has all white spaces encoded.

Reommendations

   This section describes the syntax for URIs as used in the WorldWide
   Web initiative.  The generic syntax provides a framework for new
   schemes for names to be resolved using as yet undefined protocols.

URI syntax

   A complete URI consists of a naming scheme specifier followed by a
   string whose format is a function of the naming scheme.  For locators
   of information on the Internet, a common syntax is used for the IP



Berners-Lee                                                     [Page 5]
^L
RFC 1630                      URIs in WWW                      June 1994


   address part. A BNF description of the URL syntax is given in an a
   later section. The components are as follows.  Fragment identifiers
   and relative URIs are not involved in the basic URL definition.

   SCHEME

      Within the URI of a object, the first element is the name of the
      scheme, separated from the rest of the object by a colon.

   PATH

      The rest of the URI follows the colon in a format depending on the
      scheme. The path is interpreted in a manner dependent on the
      protocol being used.  However, when it contains slashes, these
      must imply a hierarchical structure.

Reserved characters

   The path in the URI has a significance defined by the particular
   scheme.  Typically, it is used to encode a name in a given name
   space, or an algorithm for accessing an object.  In either case, the
   encoding may use those characters allowed by the BNF syntax, or
   hexadecimal encoding of other characters.

   Some of the reserved characters have special uses as defined here.

   THE PERCENT SIGN

      The percent sign ("%", ASCII 25 hex) is used as the escape
      character in the encoding scheme and is never allowed for anything
      else.

   HIERARCHICAL FORMS

      The slash ("/", ASCII 2F hex) character is reserved for the
      delimiting of substrings whose relationship is hierarchical.  This
      enables partial forms of the URI.  Substrings consisting of single
      or double dots ("." or "..") are similarly reserved.

      The significance of the slash between two segments is that the
      segment of the path to the left is more significant than the
      segment of the path to the right.  ("Significance" in this case
      refers solely to closeness to the root of the hierarchical
      structure and makes no value judgement!)







Berners-Lee                                                     [Page 6]
^L
RFC 1630                      URIs in WWW                      June 1994


      Note

         The similarity to unix and other disk operating system filename
         conventions should be taken as purely coincidental, and should
         not be taken to indicate that URIs should be interpreted as
         file names.

   HASH FOR FRAGMENT IDENTIFIERS

      The hash ("#", ASCII 23 hex) character is reserved as a delimiter
      to separate the URI of an object from a fragment identifier .

   QUERY STRINGS

      The question mark ("?", ASCII 3F hex) is used to delimit the
      boundary between the URI of a queryable object, and a set of words
      used to express a query on that object.  When this form is used,
      the combined URI stands for the object which results from the
      query being applied to the original object.

      Within the query string, the plus sign is reserved as shorthand
      notation for a space.  Therefore, real plus signs must be encoded.
      This method was used to make query URIs easier to pass in systems
      which did not allow spaces.

      The query string represents some operation applied to the object,
      but this specification gives no common syntax or semantics for it.
      In practice the syntax and sematics may depend on the scheme and
      may even on the base URI.

   OTHER RESERVED CHARACTERS

      The astersik ("*", ASCII 2A hex) and exclamation mark ("!" , ASCII
      21 hex) are reserved for use as having special signifiance within
      specific schemes.

Unsafe characters

   In canonical form, certain characters such as spaces, control
   characters, some characters whose ASCII code is used differently in
   different national character variant 7 bit sets, and all 8bit
   characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not be
   used unencoded. This is a recommendation for trouble-free
   interchange, and as indicated below, the encoded set may be extended
   or reduced.






Berners-Lee                                                     [Page 7]
^L
RFC 1630                      URIs in WWW                      June 1994


Encoding reserved characters

   When a system uses a local addressing scheme, it is useful to provide
   a mapping from local addresses into URIs so that references to
   objects within the addressing scheme may be referred to globally, and
   possibly accessed through gateway servers.

   For a new naming scheme, any mapping scheme may be defined provided
   it is unambiguous, reversible, and provides valid URIs.  It is
   recommended that where hierarchical aspects to the local naming
   scheme exist, they be mapped onto the hierarchical URL path syntax in
   order to allow the partial form to be used.

   It is also recommended that the conventional scheme below be used in
   all cases except for any scheme which encodes binary data as opposed
   to text, in which case a more compact encoding such as pure
   hexadecimal or base 64 might be more appropriate.  For example, the
   conventional URI encoding method is used for mapping WAIS, FTP,
   Prospero and Gopher addresses in the URI specification.

   CONVENTIONAL URI ENCODING SCHEME

      Where the local naming scheme uses ASCII characters which are not
      allowed in the URI, these may be represented in the URL by a
      percent sign "%" immediately followed by two hexadecimal digits
      (0-9, A-F) giving the ISO Latin 1 code for that character.
      Character codes other than those allowed by the syntax shall not
      be used unencoded in a URI.

   REDUCED OR INCREASED SAFE CHARACTER SETS

      The same encoding method may be used for encoding characters whose
      use, although technically allowed in a URI, would be unwise due to
      problems of corruption by imperfect gateways or misrepresentation
      due to the use of variant character sets, or which would simply be
      awkward in a given environment.  Because a % sign always indicates
      an encoded character, a URI may be made "safer" simply by encoding
      any characters considered unsafe, while leaving already encoded
      characters still encoded.  Similarly, in cases where a larger set
      of characters is acceptable, % signs can be selectively and
      reversibly expanded.

      Before two URIs can be compared, it is therefore necessary to
      bring them to the same encoding level.

      However, the reserved characters mentioned above have a quite
      different significance when encoded, and so may NEVER be encoded
      and unencoded in this way.



Berners-Lee                                                     [Page 8]
^L
RFC 1630                      URIs in WWW                      June 1994


      The percent sign intended as such must always be encoded, as its
      presence otherwise always indicates an encoding.  Sequences which
      start with a percent sign but are not followed by two hexadecimal
      characters are reserved for future extension.  (See Example 3.)

   Example 1

   The URIs

                http://info.cern.ch/albert/bertram/marie-claude

   and

                http://info.cern.ch/albert/bertram/marie%2Dclaude

   are identical, as the %2D encodes a hyphen character.

   Example 2

   The URIs

                http://info.cern.ch/albert/bertram/marie-claude

   and

                http://info.cern.ch/albert/bertram%2Fmarie-claude

   are NOT identical, as in the second case the encoded slash does not
   have hierarchical significance.

   Example 3

   The URIs

                fxqn:/us/va/reston/cnri/ietf/24/asdf%*.fred

   and

                news:12345667123%asdghfh@info.cern.ch

   are illegal, as all % characters imply encodings, and there is no
   decoding defined for "%*"  or "%as" in this recommendation.

Partial (relative) form

   Within a object whose URI is well defined, the URI of another object
   may be given in abbreviated form, where parts of the two URIs are the
   same. This allows objects within a group to refer to each other



Berners-Lee                                                     [Page 9]
^L
RFC 1630                      URIs in WWW                      June 1994


   without requiring the space for a complete reference, and it
   incidentally allows the group of objects to be moved without changing
   any references.  It must be emphasized that when a reference is
   passed in anything other than a well controlled context, the full
   form must always be used.

   In the World-Wide Web applications, the context URI is that of the
   document or object containing a reference. In this case partial URIs
   can be generated in virtual objects or stored in real objects,
   without the need for dramatic change if the higher-order parts of a
   hierarchical naming system are modified.  Apart from terseness, this
   gives greater robustness to practical systems, by enabling
   information hiding between system components.

   The partial form relies on a property of the URI syntax that certain
   characters ("/") and certain path elements ("..", ".") have a
   significance reserved for representing a hierarchical space, and must
   be recognized as such by both clients and servers.

   A partial form can be distinguished from an absolute form in that the
   latter must have a colon and that colon must occur before any slash
   characters. Systems not requiring partial forms should not use any
   unencoded slashes in their naming schemes.  If they do, absolute URIs
   will still work, but confusion may result. (See note on Gopher
   below.)

   The rules for the use of a partial name relative to the URI of the
   context are:

      If the scheme parts are different, the whole absolute URI must
      be given.  Otherwise, the scheme is omitted, and:

      If the partial URI starts with a non-zero number of consecutive
      slashes, then everything from the context URI up to (but not
      including) the first occurrence of exactly the same number of
      consecutive slashes which has no greater number of consecutive
      slashes anywhere to the right of it is taken to be the same and
      so prepended to the partial URL to form the full URL. Otherwise:

      The last part of the path of the context URI (anything following
      the rightmost slash) is removed, and the given partial URI
      appended in its place, and then:

      Within the result, all occurrences of "xxx/../" or "/." are
      recursively removed, where xxx, ".." and "." are complete path
      elements.





Berners-Lee                                                    [Page 10]
^L
RFC 1630                      URIs in WWW                      June 1994


      Note: Trailing slashes

   If a path of the context locator ends in slash, partial URIs are
   treated differently to the URI with the same path but without a
   trailing slash. The trailing slash indicates a void segment of the
   path.

      Note: Gopher

   The gopher system does not have the concept of relative URIs, and the
   gopher community currently allows / as data characters in gopher URIs
   without escaping them to %2F.  Relative forms may not in general be
   used for documents served by gopher servers.  If they are used, then
   WWW software assumes, normally correctly, that in fact they do have
   hierarchical significance despite the specifications. The use of HTTP
   rather than gopher protocol is however recommended.

   Examples

   In the context of URI

                        magic://a/b/c//d/e/f

   the partial URIs would expand as follows:

   g                       magic://a/b/c//d/e/g

   /g                      magic://a/g

   //g                     magic://g

   ../g                    magic://a/b/c//d/g

   g:h                     g:h

   and in the context of the URI

                           magic://a/b/c//d/e/

   the results would be exactly the same.

Fragment-id

   This represents a part of, fragment of, or a sub-function within, an
   object.  Its syntax and semantics are defined by the application
   responsible for the object, or the specification of the content type
   of the object.  The only definition here is of the allowed characters
   by which it may be represented in a URL.



Berners-Lee                                                    [Page 11]
^L
RFC 1630                      URIs in WWW                      June 1994


   Specific syntaxes for representing fragments in text documents by
   line and character range, or in graphics by coordinates, or in
   structured documents using ladders, are suitable for standardization
   but not defined here.

   The fragment-id follows the URL of the whole object from which it is
   separated by a hash sign (#).  If the fragment-id is void, the hash
   sign may be omitted: A void fragment-id with or without the hash sign
   means that the URL refers to the whole object.

   While this hook is allowed for identification of fragments, the
   question of addressing of parts of objects, or of the grouping of
   objects and relationship between continued and containing objects, is
   not addressed by this document.

   Fragment identifiers do NOT address the question of objects which are
   different versions of a "living" object, nor of expressing the
   relationships between different versions and the living object.

   There is no implication that a fragment identifier refers to anything
   which can be extracted as an object in its own right.  It may, for
   example, refer to an indivisible point within an object.

Specific Schemes

   The mapping for URIs onto some existing standard and experimental
   protocols is outlined in the BNF syntax definition.  Notes on
   particular protocols follow.  These URIs are frequently referred to
   as URLs, though the exact definition of the term URL is still under
   discussion (March 1993).  The schemes covered are:

   http                    Hypertext Transfer Protocol (examples)

   ftp                     File Transfer protocol

   gopher                  Gopher protocol

   mailto                  Electronic mail address

   news                    Usenet news

   telnet, rlogin and tn3270
                           Reference to interactive sessions

   wais                    Wide Area Information Servers

   file                    Local file access




Berners-Lee                                                    [Page 12]
^L
RFC 1630                      URIs in WWW                      June 1994


   The following schemes are proposed as essential to the unification of
   the web with electronic mail, but not currently (to the author's
   knowledge) implemented:

   mid                     Message identifiers for electronic mail

   cid                     Content identifiers for MIME body part

   The schemes for X.500, network management database, and Whois++ have
   not been specified and may be the subject of further study.  Schemes
   for Prospero, and restricted NNTP use are not currently implemented
   as far as the author is aware.

   The "urn" prefix is reserved for use in encoding a Uniform Resource
   Name when that has been developed by the IETF working group.

   New schemes may be registered at a later time.

HTTP

   The HTTP protocol specifies that the path is handled transparently by
   those who handle URLs, except for the servers which de-reference
   them.  The path is passed by the client to the server with any
   request, but is not otherwise understood by the client.

   The host details are not passed on to the client when the URL is an
   HTTP URL which refers to the server in question.  In this case the
   string sent starts with the slash which follows the host details.
   However, when an HTTP server is being used as a gateway (or "proxy")
   then the entire URI, whether HTTP or some other scheme, is passed on
   the HTTP command line.  The search part, if present, is sent as part
   of the HTTP command, and may in this respect be treated as part of
   the path.  No fragmentid part of a WWW URI (the hash sign and
   following) is sent with the request.  Spaces and control characters
   in URLs must be escaped for transmission in HTTP, as must other
   disallowed characters.

   EXAMPLES

      These examples are not part of the specification: they are
      provided as illustations only.  The URI of the "welcome" page to a
      server is conventionally

         http://www.my.work.com/

         As the rest of the URL (after the hostname an port) is opaque
         to the client, it shows great variety but the following are all
         fairly typical.



Berners-Lee                                                    [Page 13]
^L
RFC 1630                      URIs in WWW                      June 1994


http://www.my.uni.edu/info/matriculation/enroling.html

http://info.my.org/AboutUs/Phonebook

http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98

http://www.my.org/462F4F2D4241522A314159265358979323846

   A URL for a server on a different port to 80 looks like

        http://info.cern.ch:8000/imaginary/test

   A reference to a particular part of a document may, including the
   fragment identifier, look like

        http://www.myu.edu/org/admin/people#andy

   in which case the string "#andy" is not sent to the server, but is
   retained by the client and used when the whole object had been
   retrieved.

    A search on a text database might look like

        http://info.my.org/AboutUs/Index/Phonebook?dobbins

   and on another database

        http://info.cern.ch/RDB/EMP?*%20where%20name%%3Ddobbins

   In all cases the client passes the path string to the server
   uninterpreted, and for the client to deduce anything from

FTP

   The ftp: prefix indicates that the FTP protocol is used, as defined
   in STD 9, RFC 959 or any successor.  The port number, if present,
   gives the port of the FTP server if not the FTP default.

   User name and password

      The syntax allows for the inclusion of a user name and even a
      password for those systems which do not use the anonymous FTP
      convention. The default, however, if no user or password is
      supplied, will be to use that convention, viz. that the user name
      is "anonymous" and the password the user's Internet-style mail
      address.





Berners-Lee                                                    [Page 14]
^L
RFC 1630                      URIs in WWW                      June 1994


      Where possible, this mail address should correspond to a usable
      mail address for the user, and preferably give a DNS host name
      which resolves to the IP address of the client.  Note that servers
      currently vary in their treatment of the anonymous password.

   Path

      The FTP protocol allows for a sequence of CWD commands (change
      working directory) and a TYPE command prior to service commands
      such as RETR (retrieve) or NLIST (etc.) which actually access a
      file.

      The arguments of any CWD commands are successive segment parts of
      the URL delimited by slash, and the final segment is suitable as
      the filename argument to the RETR command for retrieval or the
      directory argument to NLIST.

      For some file systems (Unix in particular), the "/" used to denote
      the hierarchical structure of the URL corresponds to the delimiter
      used to construct a file name hierarchy, and thus, the filename
      will look the same as the URL path.  This does NOT mean that the
      URL is a Unix filename.

         Note: Retrieving subsequent URLs from the same host

      There is no common hierarchical model to the FTP protocol, so if a
      directory change command has been given, it is impossible in
      general to deduce what sequence should be given to navigate to
      another directory for a second retrieval, if the paths are
      different.  The only reliable algorithm is to disconnect and
      reestablish the control connection.

   Data type

      The data content type of a file can only, in the general FTP case,
      be deduced from the name, normally the suffix of the name.  This
      is not standardized. An alternative is for it to be transferred in
      information outside the URL.  A suitable FTP transfer type (for
      example binary "I" or text "A") must in turn be deduced from the
      data content type.  It is recommended that conventions for
      suffixes of public archives be established, but it is outside the
      scope of this standard.

      An FTP URL may optionally specify the FTP data transfer type by
      which an object is to be retrieved. Most of the methods correspond
      to the FTP "Data Types" ASCII and IMAGE for the retrieval of a
      document, as specified in FTP by the TYPE command.  One method
      indicates directory access.



Berners-Lee                                                    [Page 15]
^L
RFC 1630                      URIs in WWW                      June 1994


      The data type is specified by a suffix to the URL.  Possible
      suffixes are:

       ;type = <type-code>     Use FTP type as given to perform data
                               transfer.

       /                       Use FTP directory list commands to read
                               directory

      The type code is in the format defined in RFC 959 except that THE
      SPACE IS OMITTED FROM THE URL.

   Transfer Mode

      Stream Mode is always used.

Gopher

   The gopher URL specifies the host and optionally the port to which
   the client should connect. This is followed by a slash and a single
   gopher type code. This type code is used by the client to determine
   how to interpret the server's reply and is is not for sending to
   server.  The command string to be sent to the server immediately
   follows the gopher type character.  It consists of the gopher
   selector string followed by any "Gopher plus" syntax, but always
   omitting the trainling CR LF pair.

   When the gopher command string contains characters (such a embedded
   CR LF and HT characters) not allowed in a URL, these are encoded
   using the conventional encoding.

   Note that some gopher selector strings begin with a copy of the
   gopher type character, in which case that character will occur twice
   consecutively.  Also note that the gopher selector string may be an
   empty string since this is how gopher clients refer to the top-level
   directory on a gopher server.

   If the encoded command string (with trailing CR LF stripped) would be
   void then the gopher type character may be omiited and "1" (ASCII 31
   hex) is assumed.

   Note that slash "/" in gopher selector strings may not correspond to
   a level in a hierarchical structure.








Berners-Lee                                                    [Page 16]
^L
RFC 1630                      URIs in WWW                      June 1994


Mailto

   This allows a URL to specify an RFC822 addr-spec mail address.  Note
   that use of % , for example as used in forming a gatewayed mail
   address, requires conversion to %25 in a URL.

News

   The news locators refer to either news group names or article message
   identifiers which must conform to the rules for a Message-Id of RFC
   1036 (Horton 1987).  A message identifier may be distinguished from a
   news group name by the presence of the commercial at "@" character.
   These rules imply that within an article, a reference to a news group
   or to another article will be a valid URL (in the partial form).

   A news URL may be dereferenced using NNTP (RFC 977, Kantor 1986)
   (The ARTICLE by message-id command ) or using any other protocol for
   the conveyance of usenet news articles, or by reference to a body of
   news articles already received.

   Note 1:

      Among URLs the "news" URLs are anomalous in that they are
      location-independent. They are unsuitable as URN candidates
      because the NNTP architecture relies on the expiry of articles and
      therefore a small number of articles being available at any time.
      When a news: URL is quoted, the assumption is that the reader will
      fetch the article or group from his or her local news host.  News
      host names are NOT part of news URLs.

   Note 2:

      An outstanding problem is that the message identifier is
      insufficient to allow the retrieval of an expired article, as no
      algorithm exists for deriving an archive site and file name.  The
      addition of the date and news group set to the article's URL would
      allow this if a directory existed of archive sites by news group.

      Suggested subject of study in conjunction with NNTP working group.
      Further extension possible may be to allow the naming of subject
      threads as addressable objects.

Telnet, rlogin, tn3270

   The use of URLs to represent interactive sessions is a convenient
   extension to their uses for objects.  This allows access to
   information systems which only provide an interactive service, and no
   information server.  As information within the service cannot be



Berners-Lee                                                    [Page 17]
^L
RFC 1630                      URIs in WWW                      June 1994


   addressed individually or, in general, automatically retrieved, this
   is a less desirable, though currently common, solution.

URN

   The "Universal Resource Name" is currently (March 1993) under
   development in the IETF.  A requirements specification is in
   preparation. It currently looks as though it will be a short string
   suitable for encoding in URI syntax, for which case the "urn:" prefix
   is reserved.  The URN shall be encoded precisely as defined in the
   (future) URN standard, except in that:

      If the official description of the URN syntax includes any
      constant wrapper characters, then they shall not be omitted from
      the URI encoding of the URN;

      If the URN has a hierarchical nature, then the slash delimiter
      shall be used in the URI encoding;

      If the URN has a hierarchical nature, the most significant part
      shall be encoded on the left in the URI encoding;

      Any characters with reserved meanings in the URI syntax shall be
      escape encoded

   These rules of course apply to any URI scheme.  It is of course
   possible that the URN syntax will be chosen such that the URI
   encoding will be a 1-1 transcription.

   An example might be a name such as

         urn:/iana/dns/ch/cern/cn/techdoc/94/1642-3

   but the reader should refer to the latest URN drafts or
   specifications.

WAIS

   The current WAIS implementation public domain requires that a client
   know the "type" of a object prior to retrieval. This value is
   returned along with the internal object identifier in the search
   response. It has been encoded into the path part of the URL in order
   to make the URL sufficient for the retrieval of the object.

   Within the WAIS world, names do not of course need to be prefixed by
   "wais:" (by the partial form rules).





Berners-Lee                                                    [Page 18]
^L
RFC 1630                      URIs in WWW                      June 1994


   The wpath of a WAIS URL consists of encoded fields of the WAIS
   identifier, in the same order as inthe WAIS identifier. For each
   field, the identifier field number is the digits before the equals
   sign, and the field contents follow, encoded in the conventional
   encoding, terminated by ";".

file

   The other URI schemes (except nntp) share the property that they are
   equally valid at any geographical place.

   There is however a real practical requirement to be able to generate
   a URL for an object in a machine's local file system.

   The syntax is similar to the ftp syntax, but in this case the slash
   is used to donate boundaries between directory levels of a
   hierarchical file system is used.  The "client" software converts the
   file URL into a file name in the local file name conventions.  This
   allows local files to be treated just as network objects without any
   necessity to use a network server for access.  This may be used for
   example for defining a user's "home" document in WWW.

   There is clearly a danger of confusion that a link made to a local
   file should be followed by someone on a different system, with
   unexpected and possibly harmful results.  Therefore, the convention
   is that even a "file" URL is provided with a host part.  This allows
   a client on another system to know that it cannot access the file
   system, or perhaps to use some other local mecahnism to access the
   file.

   The special value "localhost" is used in the host field to indicate
   that the filename should really be used on whatever host one is.
   This for example allows links to be made to files which are
   distribted on many machines, or to "your unix local password file"
   subject of course to consistency across the users of the data.

   A void host field is equivalent to "localhost".

Message-Id

   For systems which include information transferred using mail
   protocols, there is a need to be able to make cross-references
   between different items of information, even though, by the nature of
   mail, those items are only available to a restricted set of people.

   Two schemes are defined.  The first, "mid:", refers to the STD 11,
   RFC 822 Message-Id of a mail message.  This Identifier is already
   used in RFC 822 in for example the References and In-Reply-to field.



Berners-Lee                                                    [Page 19]
^L
RFC 1630                      URIs in WWW                      June 1994


   The rest of the URL after the "mid:" is the RFC822 msg-id with the
   constant <> wrapper removed, leaving an identifier whose format in
   fact happens to be the same as addr-spec format for mailboxes (though
   the semantics are different).

   The use of a "mid" URL implies access to a body of mail already
   received. If a message has been distributed using NNTP or other
   usenet protocols over the news system, then the "news:" form should
   be used.

Content-Id

   The second scheme, "cid:", is similar to "mid:", but makes reference
   to a body part of a MIME message by the value of its content-id
   field.  This allows, for example, a master document being the first
   part of a multipart/related MIME message to refer to component parts
   which are transferred in the same message.

   Note

      Beware however, that content identifiers are only required to be
      unique within the context of a given MIME message, and so the cid:
      URL is only meaningful with the context the same MIME message. For
      a reference outside the message, it would need to be appended to
      the message-id of the whole message.  A syntax for this has not
      been defined.

Schemes for Further Study

   X500

      The mapping of x500 names onto URLs is not defined here.  A
      decision is required as to whether "distinguished names" or "user
      friendly names" (ufn), or both, should be allowed.  If any
      punctuation conversions are needed from the adopted x500
      representation (such as the use of slashes between parts of a ufn)
      they must be defined.  This is a subject for study.

   WHOIS

      This prefix describes the access using the "whois++" scheme in the
      process of definition.  The host name part is the same as for
      other IP based schemes.  The path part can be either a whois
      handle for a whois object, or it can be a valid whois query
      string. This is a subject for further study.






Berners-Lee                                                    [Page 20]
^L
RFC 1630                      URIs in WWW                      June 1994


   NETWORK MANAGEMENT DATABASE

      This is a subject for study.

   NNTP

      This is an alternative form of reference for news articles,
      specifically to be used with NNTP servers, and particularly those
      incomplete server implementations which do not allow retrieval by
      message identifier.  In all other cases the "news" scheme should
      be used.

      The news server name, newsgroup name, and index number of an
      article within the newsgroup on that particular server are given.
      The NNTP protocol must be used.

      Note 1.

         This form of URL is not of global accessability, as typically
         NNTP servers only allow access from local clients.   Note that
         the article numbers within groups vary from server to server.

         This form or URL should not be quoted outside this local area.
         It should not be used within news articles for wider
         circulation than the one server.  This is a local identifier
         for a resource which is often available globally, and so is not
         recommended except in the case in which incomplete NNTP
         implementations on the local server force its adoption.

Prospero

   The Prospero (Neuman, 1991) directory service is used to resolve the
   URL yielding an access method for the object (which can then itself
   be represented as a URL if translated).  The host part contains a
   host name or internet address.  The port part is optional.

   The path part contains a host specific object name and an optional
   version number. If present, the version number is separated from the
   host specific object name by the characters "%00" (percent zero
   zero), this being an escaped string terminator (null).  External
   Prospero links are represented as URLs of the underlying access
   method and are not represented as Prospero URLs.

Registration of naming schemes

   A new naming scheme may be introduced by defining a mapping onto a
   conforming URL syntax, using a new prefix.  Experimental prefixes may
   be used by mutual agreement between parties, and must start with the



Berners-Lee                                                    [Page 21]
^L
RFC 1630                      URIs in WWW                      June 1994


   characters "x-".  The scheme name "urn:" is reserved for the work in
   progress on a scheme for more persistent names.

   It is proposed that the Internet Assigned Numbers Authority (IANA)
   perform the function of registration of new schemes. Any submission
   of a new URI scheme must include a definition of an algorithm for the
   retrieval of any object within that scheme. The algorithm must take
   the URI and produce either a set of URL(s) which will lead to the
   desired object, or the object itself, in a well-defined or
   determinable format.

   It is recommended that those proposing a new scheme demonstrate its
   utility and operability by the provision of a gateway which will
   provide images of objects in the new scheme for clients using an
   existing protocol. If the new scheme is not a locator scheme, then
   the properties of names in the new space should be clearly defined.
   It is likewise recommended that, where a protocol allows for
   retrieval by URL, that the client software have provision for being
   configured to use specific gateway locators for indirect access
   through new naming schemes.

BNF of Generic URI Syntax

   This is a BNF-like description of the URI syntax. at the level at
   which specific schemes are not considered.

   A vertical line "|" indicates alternatives, and [brackets] indicate
   optional parts.  Spaces are represented by the word "space", and the
   vertical line character by "vline".  Single letters stand for single
   letters.  All words of more than one letter below are entities
   described somewhere in this description.

   The "generic" production gives a higher level parsing of the same
   URIs as the other productions.  The "national" and "punctuation"
   characters do not appear in any productions and therefore may not
   appear in URIs.

     fragmentaddress        uri [ # fragmentid ]

     uri                    scheme :  path [ ? search ]

     scheme                 ialpha

     path                   void |  xpalphas  [  / path ]

     search                 xalphas [ + search ]

     fragmentid             xalphas



Berners-Lee                                                    [Page 22]
^L
RFC 1630                      URIs in WWW                      June 1994



     xalpha                 alpha | digit | safe | extra | escape

     xalphas                xalpha [ xalphas ]

     xpalpha                xalpha | +

     xpalphas               xpalpha [ xpalpha ]

     ialpha                 alpha [ xalphas ]

     alpha                  a | b | c | d | e | f | g | h | i | j | k |
                            l | m | n | o  | p | q | r | s | t | u | v |
                            w | x | y | z | A | B | C  | D | E | F | G |
                            H | I | J | K | L | M | N | O | P |  Q | R |
                            S | T | U | V | W | X | Y | Z

     digit                  0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

     safe                   $ | - | _ | @ | . | &

     extra                  ! | * | " |  ' | ( | ) | ,

     reserved               = | ; | / | # | ? | : | space

     escape                 % hex hex

     hex                    digit | a | b | c | d | e | f | A | B | C |
                            D | E | F

     national               { | } | vline | [ | ] | \ | ^ | ~

     punctuation            < | >

     void

      (end of URI BNF)

BNF for specific URL schemes

   This is a BNF-like description of the Uniform Resource Locator
   syntax.  A vertical line "|" indicates alternatives, and [brackets]
   indicate optional parts.  Spaces are represented by the word "space",
   and the vertical line character by "vline".  Single letters stand for
   single letters.  All words of more than one letter below are entities
   described somewhere in this description.





Berners-Lee                                                    [Page 23]
^L
RFC 1630                      URIs in WWW                      June 1994


   The current IETF URI Working Group preference is for the prefixedurl
   production. (Nov 1993. July 93: url).

   The "national" and "punctuation" characters do not appear in any
   productions and therefore may not appear in URLs.

   The "afsaddress" is left in as historical note, but is not a url
   production.

  prefixedurl            u r l : url

  url                    httpaddress | ftpaddress | newsaddress |
                         nntpaddress | prosperoaddress | telnetaddress
                         | gopheraddress | waisaddress |
                         mailtoaddress  | midaddress | cidaddress

  scheme                 ialpha

  httpaddress            h t t p :   / / hostport [  / path ] [ ?
                         search ]

  ftpaddress             f t p : / / login / path [  ftptype ]

  afsaddress             a f s : / / cellname / path

  newsaddress            n e w s : groupart

  nntpaddress            n n t p : group /  digits

  midaddress             m i d  :  addr-spec

  cidaddress             c i d : content-identifier

  mailtoaddress          m a i l t o : xalphas @ hostname

  waisaddress            waisindex | waisdoc

  waisindex              w a i s : / / hostport / database [ ? search
                         ]

  waisdoc                w a i s : / / hostport / database / wtype  /
                         wpath

  wpath                  digits = path ;  [ wpath ]

  groupart               * | group | article

  group                  ialpha [ . group ]



Berners-Lee                                                    [Page 24]
^L
RFC 1630                      URIs in WWW                      June 1994



  article                xalphas @ host

  database               xalphas

  wtype                  xalphas

  prosperoaddress        prosperolink

  prosperolink           p r o s p e r o : / / hostport / hsoname [ %
                         0 0 version [ attributes ] ]

  hsoname                path

  version                digits

  attributes             attribute [ attributes ]

  attribute              alphanums

  telnetaddress          t e l n e t : / / login

  gopheraddress          g o p h e r : / / hostport [/ gtype  [
                         gcommand ] ]

  login                  [ user [ : password ] @ ] hostport

  hostport               host [ : port ]

  host                   hostname | hostnumber

  ftptype                A formcode | E formcode | I | L digits

  formcode               N | T | C

  cellname               hostname

  hostname               ialpha [  .  hostname ]

  hostnumber             digits . digits . digits . digits

  port                   digits

  gcommand               path

  path                   void |  segment  [  / path ]

  segment                xpalphas



Berners-Lee                                                    [Page 25]
^L
RFC 1630                      URIs in WWW                      June 1994



  search                 xalphas [ + search ]

  user                   alphanum2 [ user ]

  password               alphanum2 [ password ]

  fragmentid             xalphas

  gtype                  xalpha

  alphanum2              alpha | digit | - | _ | . | +

  xalpha                 alpha | digit | safe | extra | escape

  xalphas                xalpha [ xalphas ]

  xpalpha                xalpha | +

  xpalphas               xpalpha [ xpalphas ]

  ialpha                 alpha [ xalphas ]

  alpha                  a | b | c | d | e | f | g | h | i | j | k |
                         l | m | n | o  | p | q | r | s | t | u | v |
                         w | x | y | z | A | B | C  | D | E | F | G |
                         H | I | J | K | L | M | N | O | P |  Q | R |
                         S | T | U | V | W | X | Y | Z

  digit                  0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  safe                   $ | - | _ | @ | . | &  | + | -

  extra                  ! | * |  " |  ' | ( | )  | ,

  reserved               =  |  ;  |  /  |  #  | ? |  : | space

  escape                 % hex hex

  hex                    digit | a | b | c | d | e | f | A | B | C |
                         D | E | F

  national               { | } | vline | [ | ] | \ | ^ | ~

  punctuation            < | >

  digits                 digit [ digits ]




Berners-Lee                                                    [Page 26]
^L
RFC 1630                      URIs in WWW                      June 1994


  alphanum               alpha | digit

  alphanums              alphanum [ alphanums ]

  void

   (end of URL BNF)

References

  Alberti, R., et.al., "Notes on the Internet Gopher Protocol",
     University of Minnesota, December 1991,
     <ftp://boombox.micro.umn.edu/pub/gopher/ gopher_protocol>. See also
     <gopher://gopher.micro.umn.edu/00/Information About Gopher/About
     Gopher>

  Berners-Lee, T., "Hypertext Transfer Protocol (HTTP)", CERN, December
     1991, as updated from time to time,
     <ftp://info.cern.ch/pub/www/doc/http-spec.txt>

  Crocker, D., "Standard for ARPA Internet Text Messages" STD 11, RFC
     822, UDel, August 1982.

  Davis, F, et  al., "WAIS Interface Protocol: Prototype Functional
     Specification", Thinking Machines Corporation, April 23, 1990.
     <ftp://quake.think.com/pub/wa is/doc/protspec.txt>

  International Standards Organization, Information and Documentation -
     Search and Retrieve Application Protocol Specification for open
     Systems Interconnection, ISO-10163.

  Horton, M., and R. Adams, "Standard for Interchange of USENET
     messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic
     Studies, December 1987.

  Huitema, C., "Naming: strategies and techniques", Computer Networks
     and ISDN Systems 23 (1991) 107-110.

  Kahle, B., "Document Identifiers, or International Standard Book
     Numbers for the Electronic Age", <ftp:
     //quake.think.com/pub/wais/doc/doc-ids.txt>

  Kantor, B., and P. Lapsley, Kantor, B., and P. Lapsley, "Network News
     Transfer Protocol", RFC 977, UC San Diego & UC Berkeley, February
     1986.  <ftp://ds.internic.net/rfc/rfc977.txt>

  Kunze, J., "Requirements for URLs", Work in Progress.




Berners-Lee                                                    [Page 27]
^L
RFC 1630                      URIs in WWW                      June 1994


  Lynch, C., Coalition for Networked Information: "Workshop on ID and
     Reference Structures for Networked Information", November 1991. See
     <wais://quake.think.com/wais-discussion-archives?lynch>

  Mockapetris, P., "Domain Names - Concepts and Facilities", STD 13, RFC
     1034, USC/Information Sciences Institute, November 1987,
     <ftp://ds.internic.net/rfc/rfc1034.txt>

  Neuman, B. Clifford, "Prospero: A Tool for Organizing Internet
     Resources", Electronic Networking: Research, Applications and
     Policy, Vol 1 No 2, Meckler Westport CT USA, 1992.  See also
     <ftp://prospero.isi.edu/pub/prospero/oir.ps>

  Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD 9,
     RFC 959, USC/Information Sciences Institute, October 1985.
     <ftp://ds.internic.net/rfc/rfc959.txt>

  Sollins, K., and L. Masinter, "Requiremnets for URNs", Work in
     Progress.

  Yeong, W., "Towards Networked Information Retrieval", Technical report
     91-06-25-01, June 1991, Performance Systems International, Inc.
     <ftp://uu.psi.com/wp/nir.txt>

  Yeong, W., "Representing Public Archives in the Directory", Work in
     Progress, November 1991, now expired.

Security Considerations

   Security issues are not discussed in this memo.

Author's Address

   Tim Berners-Lee
   World-Wide Web project
   CERN
   1211 Geneva 23,
   Switzerland

   Phone: +41 (22)767 3755
   Fax:   +41 (22)767 7155
   EMail: timbl@info.cern.ch









Berners-Lee                                                    [Page 28]
^L