summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc978.txt
blob: 977daf6181a4cd2a0816b89b49049f0ff4fe79c7 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
   Network Working Group                            J. K. Reynolds (ISI)
   Request for Comments:  978                   R. Gillmann (Inner Loop)
                                            W. A. Brackenridge (Alembic)
                                               A. Witkowski (Inner Loop)
                                                         J. Postel (ISI)
                                                           February 1986

   
                 VOICE FILE INTERCHANGE PROTOCOL (VFIP)
   

STATUS OF THIS MEMO

   This memo describes a proposed voice file interchange format for use
   in the ARPA-Internet community.  Suggestions for improvement are
   encouraged.  Distribution of this memo is unlimited.

1.  INTRODUCTION

   The purpose of the Voice File Interchange Protocol (VFIP) is to
   permit the interchange of various types of speech files between
   different systems.  Currently, there are many different types of
   voice implementations, but no specific standard has been set with an
   eye towards compatability between these systems.  With the increasing
   interest and development of voice, specifically in Multimedia Mail,
   there is an increased need to include standardized speech into a
   common data structure.

   The Voice File Interchange Protocol defines a header to describe the
   voice data.  The 18-byte header contains the identifier, the header
   version number, the header length, a DTMF mask for Touch-Tones, the
   recording rate in bits per second, the total time in deci-seconds
   (tenths of a second), and the encoding/recording method (see
   Figure 1).

2.  THE VOICE FILE INTERCHANGE PROTOCOL HEADER

   The Voice File Interchange Protocol header is organized as follows:

   2.1  The Header Version Number

      The version number is 1-byte.  This first version is number one.

   2.2  The Header Length

      The length is a 1-byte field indicating the length of the entire
      header in bytes.  For this first version, the length is
      18 (bytes).






Reynolds, et al.                                                [Page 1]
^L


Voice File Interchange Protocol                                  RFC 978


   2.3  The DTMF Mask

      This field describes what is known about DTMF Touch-Tones in the
      data.  The field consists of a 16 flag bits which indicate what is
      known about particular DTMF tones.  The 16 possible DTMF tones, in
      order, are:  0 1 2 3 4 5 6 7 8 9 # * A B C D.  The low order bit
      of the field is tone 0.

      A 1-bit signifies that the corresponding tone is guaranteed NOT to
      be in the speech file.  A 0-bit signifies that it may or may not
      be in the speech file.  Therefore, a field of 16 zeros denotes
      that nothing is known about the tones.  A field of 16 ones denotes
      that there are no tones in the file.

   2.4  Recording Rate

      The recording rate is a 32-bit field and is the approximate rate
      in bits/second of the method used to record the speech.  For
      variable rate methods, this may be very approximate.

   2.5  Total Time

      A 32-bit number indicating the total time of the recording in
      deci-seconds.  For example, 600 indicates 1 minute of speech.

   2.6  Methods of Encoding/Recording

      This 6-byte ASCII field indicates the method of
      encoding/recording.  Names shorter than six characters are padded
      out to the right with blanks (the ASCII space character, code 32
      decimal).  For comparisons, the names are case insensitive.

      Some known methods of Encoding/Recording are:

        TI - The Texas Instruments card for the IBM PC [5].

        IBM - PC Voice Communications Options.

        NVP-1 and NVP-2 - Network Voice Protocol [1,2].

        COMPUT - Computalker card for the IBM PC [4].










Reynolds, et al.                                                [Page 2]
^L


Voice File Interchange Protocol                                  RFC 978


3.  SUMMARY

   This 18-byte header will permit interchange of speech files between
   different systems, as well as facilitate automatic conversion between
   formats.  The header does not have to be prepended to the speech file
   proper; it may be in the form of a separate associated file, if that
   is more convenient.

                   <------------16-bits------------>
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |    Version    |      Length   |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |             -DTMF-            |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |          -Recording-          |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |             -Rate-            |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |            -Total-            |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |             -Time-            |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       M       |       E       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       T       |       H       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       O       |       D       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                Figure 1





















Reynolds, et al.                                                [Page 3]
^L


Voice File Interchange Protocol                                  RFC 978


4.  EXAMPLES

   Example 1 is for one minute of 2400 bps NVP-2 speech.  Nothing is
   known about DTMF tones in the data.

                   <------------16-bits------------>
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       1       |      18       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |               0               |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |                               |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |              2400             |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |                               |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |              600              |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       N       |       V       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       P       |       -       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       2       |      <sp>     |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                Example 1
























Reynolds, et al.                                                [Page 4]
^L


Voice File Interchange Protocol                                  RFC 978


   Example 2 shows the header for 10 seconds of 1200 bps TI speech, with
   none of the DTMF tone 0-9 in the data, but no information about
   tones *, #, A-D.

                   <------------16-bits------------>
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       1       |      18       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |              1023             |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |                               |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |              1200             |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |                               |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |              100              |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |       T       |       I       |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |      <sp>     |      <sp>     |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   |      <sp>     |      <sp>     |
                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                Example 2

REFERENCES

   [1]  Cohen, Danny, "Specifications for the Network Voice Protocol
        (NVP)", RFC 741 (NIC 42444), USC/Information Sciences Institute,
        January 1976.

   [2]  Cohen, Danny, "A Network Voice Protocol (NVP-II)",
        USC/Information Sciences Institute, April 1981.

   [3]  O'Leary, G. C., "Local Access Area Facilities for Packet Voice",
        MIT/LL, October 1980.

   [4]  Computalker, "Compu Phone for the IBM PC/XT", Santa Monica,
        California, August 1985.

   [5]  Texas Instruments, Inc., "The TI Speech Application Tool Kit
        Guide", TI Part #2232384-1, May 1985.







Reynolds, et al.                                                [Page 5]
^L