diff options
Diffstat (limited to 'doc/rfc/rfc4321.txt')
| -rw-r--r-- | doc/rfc/rfc4321.txt | 563 | 
1 files changed, 563 insertions, 0 deletions
| diff --git a/doc/rfc/rfc4321.txt b/doc/rfc/rfc4321.txt new file mode 100644 index 0000000..015bb27 --- /dev/null +++ b/doc/rfc/rfc4321.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group                                          R. Sparks +Request for Comments: 4321                              Estacado Systems +Category: Informational                                     January 2006 + + +                Problems Identified Associated with the +       Session Initiation Protocol's (SIP) Non-INVITE Transaction + + +Status of This Memo + +   This memo provides information for the Internet community.  It does +   not specify an Internet standard of any kind.  Distribution of this +   memo is unlimited. + +Copyright Notice + +   Copyright (C) The Internet Society (2006). + +Abstract + +   This document describes several problems that have been identified +   with the Session Initiation Protocol's (SIP) non-INVITE transaction. + +Table of Contents + +   1. Problems under the Current Specifications .......................2 +      1.1. NITs must complete immediately or risk losing a race .......2 +      1.2. Provisional responses can delay recovery from lost +           final responses ............................................3 +      1.3. Delayed responses will temporarily blacklist an element ....4 +      1.4. 408 for non-INVITE is not useful ...........................6 +      1.5. Non-INVITE timeouts doom forking proxies ...................7 +      1.6. Mismatched timer values make winning the race harder .......7 +   2. Security Considerations .........................................8 +   3. Acknowledgements ................................................8 +   4. Informative References ..........................................9 + + + + + + + + + + + + + + +Sparks                       Informational                      [Page 1] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +1.  Problems under the Current Specifications + +   There are a number of unpleasant edge conditions created by the SIP +   non-INVITE transaction (NIT) model's fixed duration.  The negative +   aspects of some of these are exacerbated by the effect that +   provisional responses have on the non-INVITE transaction state +   machines as currently defined. + +1.1.  NITs must complete immediately or risk losing a race + +   The non-INVITE transaction defined in RFC 3261 [1] is designed to +   have a fixed and finite duration (dependent on T1).  A consequence of +   this design is that participants must strive to complete the +   transaction as quickly as possible.  Consider the race condition +   shown in Figure 1. + +                         UAC           UAS +                          |   request   | +                     ---  |---.         | +                      ^   |    `---.    | +                      |   |         `-->|  --- +                      |   |             |   ^ +                      |   |             |   | +                    64*T1 |             |   | +                      |   |             |   | +                      |   |             | 64*T1 +                      |   |             |   | +                      |   |             |   | +                      v   |             |   | +        timeout <=== ---  |   200 OK    |   | +                          |         .---|   v +                          |    .---'    |  --- +                          |<--'         | + +                Figure 1: Non-Invite Race Condition + +   The User Agent Server (UAS) in this figure believes it has responded +   to the request in time, and that the request succeeded.  The User +   Agent Client (UAC), on the other hand, believes the request has +   timed-out, hence failed.  No longer having a matching client +   transaction, the UAC core will ignore what it believes to be a +   spurious response.  As far as the UAC is concerned, it received no +   response at all to its request.  The ultimate result is that the UAS +   and UAC have conflicting views of the outcome of the transaction. + + + + + + + +Sparks                       Informational                      [Page 2] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +   Therefore, a UAS cannot wait until the last possible moment to send a +   final response within a NIT.  It must, instead, send its response so +   that it will arrive at the UAC before that UAC times out. +   Unfortunately, the UAS has no way to accurately measure the +   propagation time of the request or predict the propagation time of +   the response.  The uncertainty it faces is compounded by each proxy +   that participates in the transaction.  Thus, the UAS's only choice is +   to send its final response as soon as it possibly can and hope for +   the best. + +   This result constrains the set of problems that can be solved with a +   single NIT.  Any delay introduced during processing of a request +   increases the probability of losing the race.  If the timing +   characteristics of that processing are not predictable and +   controllable, a single NIT is an inappropriate model for handling the +   request.  One viable alternative is to accept the request with a 202 +   and send the ultimate results in a new request in the reciprocal +   direction. + +   In specialized networks, a UAS might have some reliable knowledge of +   inter-hop latency and could use that knowledge to determine if it has +   time to delay its final response in order to perform some processing +   such as a database lookup while mitigating its risk of losing the +   race in Figure 1.  Establishing this knowledge across arbitrary +   networks (perhaps using resource reservation techniques and +   deterministic transports) is not currently feasible. + +1.2.  Provisional responses can delay recovery from lost final responses + +   The non-INVITE client transaction state machine provides reliability +   for NITs over unreliable transports (UDP) through retransmission of +   the request message.  Timer E is set to T1 when a request is +   initially transmitted.  As long as the machine remains in the Trying +   state, each time Timer E fires, it will be reset to twice its +   previous value (capping at T2) and the request is retransmitted. + +   If the non-INVITE client transaction state machine sees a provisional +    response, it transitions to the Proceeding state, where +   retransmission continues, but the algorithm for resetting Timer E is +   simply to use T2 instead of doubling at each firing.  (Note that +   Timer E is not altered during the transition to Proceeding.) + +   Making the transition to the Proceeding state before Timer E is reset +   to T2 can cause recovery from a lost final response to take extra +   time.  Figure 2 shows recovery from a lost final response with and +   without a provisional message during this window.  Recovery occurs +   within 2*T1 in the case without the provisional.  With the +   provisional, recovery is delayed until T2, which by default is 8*T1. + + + +Sparks                       Informational                      [Page 3] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +   In practical terms, a provisional response to a NIT in currently +   deployed networks can delay transaction completion by up to 3.5 +   seconds. + +                 UAC       UAS               UAC        UAS +                  |         |                 |          | +            ---   |----.    |            ---  |----.     | +             ^    |     `-->|             ^   |     `--->| +         E = T1   |         |         E = T1  |    .-----|(provisional) +             v    |         |             v   |<--'      | +            ---   |----.    |            ---  |----.     | +             ^    |     `-->|             ^   |     `--->| +             |    |   X<----|(lost final) |   |   X<-----|(lost final) +             |    |         |             |   |          | +         E = 2*T1 |         |             |   |          | +             |    |         |             |   |          | +             |    |         |             |   |          | +             v    |         |             |   |          | +            ---   |----.    |             |   |          | +                  |     `-->|             |   |          | +                  |   .-----|(final)      |   |          | +                  |<-'      |             |   |          | +                  |         |             |   |          | +                 \/\       /\/           /\/ /\/        /\/ +                                      E = T2 +                 \/\       /\/           /\/ /\/        /\/ +                  |         |             |   |          | +                  |         |             v   |          | +                  |         |            ---  |----.     | +                  |         |                 |     `--->| +                  |         |                 |    .-----|(final) +                  |         |                 |<--'      | +                  |         |                 |          | + +                   Figure 2: Provisionals Can Harm Recovery + +   No additional delay is introduced if the first provisional response +   is received after Timer E has reached its maximum reset interval of +   T2. + +1.3.  Delayed responses will temporarily blacklist an element + +   A SIP element's use of DNS Service Record Resource Records [3] is +   specified in RFC 3263 [2].  That specification discusses how SIP +   ensures high availability by having upstream elements detect failure +   of downstream elements.  It proceeds to define several types of +   failure detection and instructions for failover.  Two of the +   behaviors it describes are important to this document: + + + +Sparks                       Informational                      [Page 4] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +   o  Within a transaction, transport failure is detected either through +      an explicit report from the transport layer or through timeout. +      Note specifically that timeout will indicates transport failure +      regardless of the transport in use.  When transport failure is +      detected, the request is retried at the next element from the +      sorted results of the SRV query. + +   o  Between transactions, locations reporting temporary failure +      (through 503/Retry-After, for example) are not used until their +      requested black-out period expires. + +   The specification notes the benefit of caching locations that are +   successfully contacted, but does not discuss how such a cache is +   maintained.  It is unclear whether an element should stop using +   (temporarily blacklist) a location returned in the SRV query that +   results in a transport error.  If it does, when should such a +   location be removed from the blacklist? + +   Without such a blacklist (or equivalent mechanism), the intended +   availability mechanism fails miserably.  Consider traffic between two +   domains.  Proxy pA in domain A needs to forward a sequence of non- +   INVITE requests to domain B.  Through DNS SRV, pA discovers pB1 and +   pB2, and the ordering rules of [2] and [3] indicate it should use pB1 +   first.  The first request to pB1 times out.  Since pA is a proxy and +   a NIT has a fixed duration, pA has no opportunity to retry the +   request at pB2.  If pA does not remember pB1's failure, the second +   request (and all subsequent non-INVITE requests until pB1 recovers) +   are doomed to the same failure.  Caching would allow the subsequent +   requests to be tried at pB2. + +   Since miserable failure is not acceptable in deployed networks, we +   should anticipate that elements will, in fact, cache timeout failures +   between transactions.  Then the race in Figure 1 becomes important. +   If an element fails to respond "soon enough", it has effectively not +   responded at all and will be blacklisted at its peer for some period +   of time. + +   (Note that even with caching, the first request timeout results in a +   timeout failure all the way back to the original submitter.  The +   failover mechanisms in [2] work well to increase the resiliency of a +   given INVITE transaction, but do nothing for a given non-INVITE +   transaction.) + + + + + + + + + +Sparks                       Informational                      [Page 5] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +1.4.  408 for non-INVITE is not useful + +   Consider the race condition in Figure 1 when the final response is +   408 instead of 200.  Under the current specification, the race is +   guaranteed to be lost.  Most existing endpoints will emit a 408 for a +   non-INVITE request 64*T1 after receiving the request if they have not +   emitted an earlier final response.  Such a 408 is guaranteed to +   arrive at the next upstream element too late to be useful.  In fact, +   in the presence of proxies, these messages are even harmful.  When +   the 408 arrives, each proxy will have already terminated its +   associated client transaction due to timeout.  Therefore, each proxy +   must forward the 408 upstream statelessly.  This, in turn, is +   guaranteed to arrive too late.  As Figure 3 shows, this can +   ultimately result in bombarding the original requester with spurious +   408s.  (Note that the proxy's client transaction state machine never +   enters the Completed state, so Timer K does not enter into play.) + +                     UAC        P1         P2         P3         UAS +                      |          |          |          |          | +                ---  ===---.     |          |          |          | +                 ^    |     `-->===---.     |          |          | +                 |    |          |     `-->===---.     |          | +                 |    |          |          |     `-->===---.     | +               64*T1  |          |          |          |     `-->=== +                 |    |          |          |          |          | +                 |    |          |          |          |          | +                 v    |          |          |          |          | +      (timeout) ---  ===         |          |          |          | +                      |    .-408===         |          |          | +                      |<--'      |    .-408===         |          | +                      |    .-408-|<--'      |    .-408===         | +                      |<--'      |    .-408-|<--'      |    .-408=== +                      |    .-408-|<--'      |    .-408-|<--'      | +                      |<--'      |    .-408-|<--'      |          | +                      |    .-408-|<--'      |          |          | +                      |<--'      |          |          |          | +                      |          |          |          |          | + +                     Figure 3: Late 408s to Non-INVITEs + +   This response bombardment is not limited to the 408 response, though +   it only exists when participating client transaction state machines +   are timing out.  Figure 4 generalizes Figure 1 to include multiple +   hops.  Note that even though the UAS responds "in time" to P3, the +   response is too late for P2, P1, and the UAC. + + + + + + +Sparks                       Informational                      [Page 6] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +                     UAC        P1         P2         P3         UAS +                      |          |          |          |          | +                ---  ===---.     |          |          |          | +                 ^    |     `-->===---.     |          |          | +                 |    |          |     `-->===---.     |          | +                 |    |          |          |     `-->===---.     | +               64*T1  |          |          |          |     `-->=== +                 |    |          |          |          |          | +                 |    |          |          |          |          | +                 v    |          |          |          |          | +      (timeout) ---  ===         |          |          |          | +                      |    .-408===         |          |    .-200-| +                      |<--'      |    .-408===   .-200-|<--'      | +                      |    .-408-|<--'.-200-|<--'     ===         | +                      |<--'.-200-|<--'      |          |         === +                      |<--'      |          |          |          | +                      |          |          |          |          | + +                  Figure 4: Additional Timeout-Related Error + +1.5.  Non-INVITE timeouts doom forking proxies + +   A single branch with a delayed or missing final response will +   dominate the processing at proxy that receives no 2xx responses to a +   forked non-INVITE request.  This proxy is required to allow all of +   its client transactions to terminate before choosing a "best +   response".  This forces the proxy's server transaction to lose the +   race in Figure 1.  Any response it ultimately forwards (a 401, for +   example) will arrive at the upstream elements too late to be used. +   Thus, if no element among the branches would return a 2xx response, +   failure of a single element (or its transport) dooms the proxy to +   failure. + +1.6.  Mismatched timer values make winning the race harder + +   There are many failure scenarios due to misconfiguration or +   misbehavior that the SIP specification does not discuss.  One is +   placing two elements with different configured values for T1 and T2 +   on the same network.  Review of Figure 1 illustrates that the race +   failure is only made more likely in this misconfigured state (it may +   appear that shortening T1 at the element behaving as a UAS improves +   this particular situation, but remember that these elements may trade +   roles on the next request).  Since the protocol provides no mechanism +   for discovering/negotiating a peer's timer values, exceptional care +   must be taken when deploying systems with non-defaults to ensure that +   they will never directly communicate with elements with default +   values. + + + + +Sparks                       Informational                      [Page 7] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +2.  Security Considerations + +   This document describes some problems in the core SIP specification +   [1] related to the SIP non-INVITE requests, the messages other than +   INVITE that begin transactions.  A few of the problems lead to +   flooding or forgery risk, and could possibly be exploited by an +   adversary in a denial of service attack.  Solutions are defined in +   the companion document [4]. + +   One solution there prohibits proxies and User Agents from sending 408 +   responses to non-INVITE transactions.  Without this change, proxies +   automatically generate a storm of useless responses.  An attacker +   could capitalize on this by enticing User Agents to send non-INVITE +   requests to a black hole (through social engineering or DNS +   poisoning) or by selectively dropping responses. + +   Another solution prohibits proxies from forwarding late responses. +   Without this change, an attacker could easily forge messages which +   appear to be late responses.  All proxies compliant with RFC 3261 are +   required to forward these responses, wasting bandwidth and CPU and +   potentially overwhelming target User Agents (especially those with +   low speed connections). + +3.  Acknowledgements + +   This document captures many conversations about non-INVITE issues. +   Significant contributers include Ben Campbell, Gonzalo Camarillo, +   Steve Donovan, Rohan Mahy, Dan Petrie, Adam Roach, Jonathan +   Rosenberg, and Dean Willis. + + + + + + + + + + + + + + + + + + + + + + +Sparks                       Informational                      [Page 8] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +4. Informative References + +   [1]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., +        Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: +        Session Initiation Protocol", RFC 3261, June 2002. + +   [2]  Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol +        (SIP): Locating SIP Servers", RFC 3263, June 2002. + +   [3]  Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for +        specifying the location of services (DNS SRV)", RFC 2782, +        February 2000. + +   [4]  Sparks, R., "Actions Addressing Identified Issues with the +        Session Initiation Protocol's (SIP) Non-INVITE Transaction", RFC +        4320, January 2006. + +Author's Address + +   Robert J. Sparks +   Estacado Systems +   17210 Campbell Road +   Suite 250 +   Dallas, TX 75252-4203 + +   EMail: rjsparks@estacado.net + + + + + + + + + + + + + + + + + + + + + + + + + +Sparks                       Informational                      [Page 9] + +RFC 4321                SIP Non-INVITE Problems             January 2006 + + +Full Copyright Statement + +   Copyright (C) The Internet Society (2006). + +   This document is subject to the rights, licenses and restrictions +   contained in BCP 78, and except as set forth therein, the authors +   retain all their rights. + +   This document and the information contained herein are provided on an +   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS +   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET +   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, +   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE +   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED +   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + +   The IETF takes no position regarding the validity or scope of any +   Intellectual Property Rights or other rights that might be claimed to +   pertain to the implementation or use of the technology described in +   this document or the extent to which any license under such rights +   might or might not be available; nor does it represent that it has +   made any independent effort to identify any such rights.  Information +   on the procedures with respect to rights in RFC documents can be +   found in BCP 78 and BCP 79. + +   Copies of IPR disclosures made to the IETF Secretariat and any +   assurances of licenses to be made available, or the result of an +   attempt made to obtain a general license or permission for the use of +   such proprietary rights by implementers or users of this +   specification can be obtained from the IETF on-line IPR repository at +   http://www.ietf.org/ipr. + +   The IETF invites any interested party to bring to its attention any +   copyrights, patents or patent applications, or other proprietary +   rights that may cover technology that may be required to implement +   this standard.  Please address the information to the IETF at +   ietf-ipr@ietf.org. + +Acknowledgement + +   Funding for the RFC Editor function is provided by the IETF +   Administrative Support Activity (IASA). + + + + + + + +Sparks                       Informational                     [Page 10] + |