doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc3124.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc3124.txt b/doc/rfc/rfc3124.txt
new file mode 100644
index 0000000..db57bc3
--- /dev/null
+++ b/doc/rfc/rfc3124.txt
@@ -0,0 +1,1235 @@
+
+
+
+
+
+
+Network Working Group                                    H. Balakrishnan
+Request for Comments: 3124                                       MIT LCS
+Category: Standards Track                                      S. Seshan
+                                                                     CMU
+                                                               June 2001
+
+
+                         The Congestion Manager
+
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+Abstract
+
+   This document describes the Congestion Manager (CM), an end-system
+   module that:
+
+   (i) Enables an ensemble of multiple concurrent streams from a sender
+   destined to the same receiver and sharing the same congestion
+   properties to perform proper congestion avoidance and control, and
+
+   (ii) Allows applications to easily adapt to network congestion.
+
+1. Conventions used in this document:
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC-2119 [Bradner97].
+
+   STREAM
+
+      A group of packets that all share the same source and destination
+      IP address, IP type-of-service, transport protocol, and source and
+      destination transport-layer port numbers.
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 1]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   MACROFLOW
+
+      A group of CM-enabled streams that all use the same congestion
+      management and scheduling algorithms, and share congestion state
+      information.  Currently, streams destined to different receivers
+      belong to different macroflows.  Streams destined to the same
+      receiver MAY belong to different macroflows.  When the Congestion
+      Manager is in use, streams that experience identical congestion
+      behavior and use the same congestion control algorithm SHOULD
+      belong to the same macroflow.
+
+   APPLICATION
+
+      Any software module that uses the CM.  This includes user-level
+      applications such as Web servers or audio/video servers, as well
+      as in-kernel protocols such as TCP [Postel81] that use the CM for
+      congestion control.
+
+   WELL-BEHAVED APPLICATION
+
+      An application that only transmits when allowed by the CM and
+      accurately accounts for all data that it has sent to the receiver
+      by informing the CM using the CM API.
+
+   PATH MAXIMUM TRANSMISSION UNIT (PMTU)
+
+      The size of the largest packet that the sender can transmit
+      without it being fragmented en route to the receiver.  It includes
+      the sizes of all headers and data except the IP header.
+
+   CONGESTION WINDOW (cwnd)
+
+      A CM state variable that modulates the amount of outstanding data
+      between sender and receiver.
+
+   OUTSTANDING WINDOW (ownd)
+
+      The number of bytes that has been transmitted by the source, but
+      not known to have been either received by the destination or lost
+      in the network.
+
+   INITIAL WINDOW (IW)
+
+      The size of the sender's congestion window at the beginning of a
+      macroflow.
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 2]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   DATA TYPE SYNTAX
+
+      We use "u64" for unsigned 64-bit, "u32" for unsigned 32-bit, "u16"
+      for unsigned 16-bit, "u8" for unsigned 8-bit, "i32" for signed
+      32-bit, "i16" for signed 16-bit quantities, "float" for IEEE
+      floating point values.  The type "void" is used to indicate that
+      no return value is expected from a call.  Pointers are referred to
+      using "*" syntax, following C language convention.
+
+      We emphasize that all the API functions described in this document
+      are "abstract" calls and that conformant CM implementations may
+      differ in specific implementation details.
+
+2. Introduction
+
+   The framework described in this document integrates congestion
+   management across all applications and transport protocols.  The CM
+   maintains congestion parameters (available aggregate and per-stream
+   bandwidth, per-receiver round-trip times, etc.) and exports an API
+   that enables applications to learn about network characteristics,
+   pass information to the CM, share congestion information with each
+   other, and schedule data transmissions.  This document focuses on
+   applications and transport protocols with their own independent per-
+   byte or per-packet sequence number information, and does not require
+   modifications to the receiver protocol stack.  However, the receiving
+   application must provide feedback to the sending application about
+   received packets and losses, and the latter is expected to use the CM
+   API to update CM state.  This document does not address networks with
+   reservations or service differentiation.
+
+   The CM is an end-system module that enables an ensemble of multiple
+   concurrent streams to perform stable congestion avoidance and
+   control, and allows applications to easily adapt their transmissions
+   to prevailing network conditions.  It integrates congestion
+   management across all applications and transport protocols.  It
+   maintains congestion parameters (available aggregate and per-stream
+   bandwidth, per-receiver round-trip times, etc.) and exports an API
+   that enables applications to learn about network characteristics,
+   pass information to the CM, share congestion information with each
+   other, and schedule data transmissions.  When the CM is used, all
+   data transmissions subject to the CM must be done with the explicit
+   consent of the CM via this API to ensure proper congestion behavior.
+
+   Systems MAY choose to use CM, and if so they MUST follow this
+   specification.
+
+   This document focuses on applications and networks where the
+   following conditions hold:
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 3]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   1. Applications are well-behaved with their own independent
+      per-byte or per-packet sequence number information, and use the
+      CM API to update internal state in the CM.
+
+   2. Networks are best-effort without service discrimination or
+      reservations.  In particular, it does not address situations
+      where different streams between the same pair of hosts traverse
+      paths with differing characteristics.
+
+   The Congestion Manager framework can be extended to support
+   applications that do not provide their own feedback and to
+   differentially-served networks.  These extensions will be addressed
+   in later documents.
+
+   The CM is motivated by two main goals:
+
+   (i) Enable efficient multiplexing.  Increasingly, the trend on the
+   Internet is for unicast data senders (e.g., Web servers) to transmit
+   heterogeneous types of data to receivers, ranging from unreliable
+   real-time streaming content to reliable Web pages and applets.  As a
+   result, many logically different streams share the same path between
+   sender and receiver.  For the Internet to remain stable, each of
+   these streams must incorporate control protocols that safely probe
+   for spare bandwidth and react to congestion.  Unfortunately, these
+   concurrent streams typically compete with each other for network
+   resources, rather than share them effectively.  Furthermore, they do
+   not learn from each other about the state of the network.  Even if
+   they each independently implement congestion control (e.g., a group
+   of TCP connections each implementing the algorithms in [Jacobson88,
+   Allman99]), the ensemble of streams tends to be more aggressive in
+   the face of congestion than a single TCP connection implementing
+   standard TCP congestion control and avoidance [Balakrishnan98].
+
+   (ii) Enable application adaptation to congestion.  Increasingly,
+   popular real-time streaming applications run over UDP using their own
+   user-level transport protocols for good application performance, but
+   in most cases today do not adapt or react properly to network
+   congestion.  By implementing a stable control algorithm and exposing
+   an adaptation API, the CM enables easy application adaptation to
+   congestion.  Applications adapt the data they transmit to the current
+   network conditions.
+
+   The CM framework builds on recent work on TCP control block sharing
+   [Touch97], integrated TCP congestion control (TCP-Int)
+   [Balakrishnan98] and TCP sessions [Padmanabhan98].  [Touch97]
+   advocates the sharing of some of the state in the TCP control block
+   to improve transient transport performance and describes sharing
+   across an ensemble of TCP connections.  [Balakrishnan98],
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 4]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   [Padmanabhan98], and [Eggert00] describe several experiments that
+   quantify the benefits of sharing congestion state, including improved
+   stability in the face of congestion and better loss recovery.
+   Integrating loss recovery across concurrent connections significantly
+   improves performance because losses on one connection can be detected
+   by noticing that later data sent on another connection has been
+   received and acknowledged.  The CM framework extends these ideas in
+   two significant ways: (i) it extends congestion management to non-TCP
+   streams, which are becoming increasingly common and often do not
+   implement proper congestion management, and (ii) it provides an API
+   for applications to adapt their transmissions to current network
+   conditions.  For an extended discussion of the motivation for the CM,
+   its architecture, API, and algorithms, see [Balakrishnan99]; for a
+   description of an implementation and performance results, see
+   [Andersen00].
+
+   The resulting end-host protocol architecture at the sender is shown
+   in Figure 1.  The CM helps achieve network stability by implementing
+   stable congestion avoidance and control algorithms that are "TCP-
+   friendly" [Mahdavi98] based on algorithms described in [Allman99].
+   However, it does not attempt to enforce proper congestion behavior
+   for all applications (but it does not preclude a policer on the host
+   that performs this task).  Note that while the policer at the end-
+   host can use CM, the network has to be protected against compromises
+   to the CM and the policer at the end hosts, a task that requires
+   router machinery [Floyd99a].  We do not address this issue further in
+   this document.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 5]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   |--------| |--------| |--------| |--------|       |--------------|
+   |  HTTP  | |  FTP   | |  RTP 1 | |  RTP 2 |       |              |
+   |--------| |--------| |--------| |--------|       |              |
+       |          |         |  ^       |  ^          |              |
+       |          |         |  |       |  |          |   Scheduler  |
+       |          |         |  |       |  |  |---|   |              |
+       |          |         |  |-------|--+->|   |   |              |
+       |          |         |          |     |   |<--|              |
+       v          v         v          v     |   |   |--------------|
+   |--------| |--------|  |-------------|    |   |           ^
+   |  TCP 1 | |  TCP 2 |  |    UDP 1    |    | A |           |
+   |--------| |--------|  |-------------|    |   |           |
+      ^   |      ^   |              |        |   |   |--------------|
+      |   |      |   |              |        | P |-->|              |
+      |   |      |   |              |        |   |   |              |
+      |---|------+---|--------------|------->|   |   |  Congestion  |
+          |          |              |        | I |   |              |
+          v          v              v        |   |   |  Controller  |
+     |-----------------------------------|   |   |   |              |
+     |               IP                  |-->|   |   |              |
+     |-----------------------------------|   |   |   |--------------|
+                                             |---|
+
+                                      Figure 1
+
+   The key components of the CM framework are (i) the API, (ii) the
+   congestion controller, and (iii) the scheduler.  The API is (in part)
+   motivated by the requirements of application-level framing (ALF)
+   [Clark90], and is described in Section 4.  The CM internals (Section
+   5) include a congestion controller (Section 5.1) and a scheduler to
+   orchestrate data transmissions between concurrent streams in a
+   macroflow (Section 5.2).  The congestion controller adjusts the
+   aggregate transmission rate between sender and receiver based on its
+   estimate of congestion in the network.  It obtains feedback about its
+   past transmissions from applications themselves via the API.  The
+   scheduler apportions available bandwidth amongst the different
+   streams within each macroflow and notifies applications when they are
+   permitted to send data.  This document focuses on well-behaved
+   applications; a future one will describe the sender-receiver protocol
+   and header formats that will handle applications that do not
+   incorporate their own feedback to the CM.
+
+3. CM API
+
+   By convention, the IETF does not treat Application Programming
+   Interfaces as standards track.  However, it is considered important
+   to have the CM API and CM algorithm requirements in one coherent
+   document.  The following section on the CM API uses the terms MUST,
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 6]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   SHOULD, etc., but the terms are meant to apply within the context of
+   an implementation of the CM API.  The section does not apply to
+   congestion control implementations in general, only to those
+   implementations offering the CM API.
+
+   Using the CM API, streams can determine their share of the available
+   bandwidth, request and have their data transmissions scheduled,
+   inform the CM about successful transmissions, and be informed when
+   the CM's estimate of path bandwidth changes.  Thus, the CM frees
+   applications from having to maintain information about the state of
+   congestion and available bandwidth along any path.
+
+   The function prototypes below follow standard C language convention.
+   We emphasize that these API functions are abstract calls and
+   conformant CM implementations may differ in specific details, as long
+   as equivalent functionality is provided.
+
+   When a new stream is created by an application, it passes some
+   information to the CM via the cm_open(stream_info) API call.
+   Currently, stream_info consists of the following information: (i) the
+   source IP address, (ii) the source port, (iii) the destination IP
+   address, (iv) the destination port, and (v) the IP protocol number.
+
+3.1 State maintenance
+
+   1. Open: All applications MUST call cm_open(stream_info) before
+      using the CM API.  This returns a handle, cm_streamid, for the
+      application to use for all further CM API invocations for that
+      stream.  If the returned cm_streamid is -1, then the cm_open()
+      failed and that stream cannot use the CM.
+
+      All other calls to the CM for a stream use the cm_streamid
+      returned from the cm_open() call.
+
+   2. Close: When a stream terminates, the application SHOULD invoke
+      cm_close(cm_streamid) to inform the CM about the termination
+      of the stream.
+
+   3. Packet size: cm_mtu(cm_streamid) returns the estimated PMTU of
+      the path between sender and receiver.  Internally, this
+      information SHOULD be obtained via path MTU discovery
+      [Mogul90].  It MAY be statically configured in the absence of
+      such a mechanism.
+
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 7]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+3.2 Data transmission
+
+   The CM accommodates two types of adaptive senders, enabling
+   applications to dynamically adapt their content based on prevailing
+   network conditions, and supporting ALF-based applications.
+
+   1. Callback-based transmission.  The callback-based transmission API
+   puts the stream in firm control of deciding what to transmit at each
+   point in time.  To achieve this, the CM does not buffer any data;
+   instead, it allows streams the opportunity to adapt to unexpected
+   network changes at the last possible instant.  Thus, this enables
+   streams to "pull out" and repacketize data upon learning about any
+   rate change, which is hard to do once the data has been buffered.
+   The CM must implement a cm_request(i32 cm_streamid) call for streams
+   wishing to send data in this style.  After some time, depending on
+   the rate, the CM MUST invoke a callback using cmapp_send(), which is
+   a grant for the stream to send up to PMTU bytes.  The callback-style
+   API is the recommended choice for ALF-based streams.  Note that
+   cm_request() does not take the number of bytes or MTU-sized units as
+   an argument; each call to cm_request() is an implicit request for
+   sending up to PMTU bytes.  The CM MAY provide an alternate interface,
+   cm_request(int k).  The cmapp_send callback for this request is
+   granted the right to send up to k PMTU sized segments.  Section 4.3
+   discusses the time duration for which the transmission grant is
+   valid, while Section 5.2 describes how these requests are scheduled
+   and callbacks made.
+
+   2. Synchronous-style.  The above callback-based API accommodates a
+   class of ALF streams that are "asynchronous."  Asynchronous
+   transmitters do not transmit based on a periodic clock, but do so
+   triggered by asynchronous events like file reads or captured frames.
+   On the other hand, there are many streams that are "synchronous"
+   transmitters, which transmit periodically based on their own internal
+   timers (e.g., an audio senders that sends at a constant sampling
+   rate).  While CM callbacks could be configured to periodically
+   interrupt such transmitters, the transmit loop of such applications
+   is less affected if they retain their original timer-based loop.  In
+   addition, it complicates the CM API to have a stream express the
+   periodicity and granularity of its callbacks.  Thus, the CM MUST
+   export an API that allows such streams to be informed of changes in
+   rates using the cmapp_update(u64 newrate, u32 srtt, u32 rttdev)
+   callback function, where newrate is the new rate in bits per second
+   for this stream, srtt is the current smoothed round trip time
+   estimate in microseconds, and rttdev is the smoothed linear deviation
+   in the round-trip time estimate calculated using the same algorithm
+   as in TCP [Paxson00].  The newrate value reports an instantaneous
+   rate calculated, for example, by taking the ratio of cwnd and srtt,
+   and dividing by the fraction of that ratio allocated to the stream.
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 8]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   In response, the stream MUST adapt its packet size or change its
+   timer interval to conform to (i.e., not exceed) the allowed rate.  Of
+   course, it may choose not to use all of this rate.  Note that the CM
+   is not on the data path of the actual transmission.
+
+   To avoid unnecessary cmapp_update() callbacks that the application
+   will only ignore, the CM MUST provide a cm_thresh(float
+   rate_downthresh, float rate_upthresh, float rtt_downthresh, float
+   rtt_upthresh) function that a stream can use at any stage in its
+   execution.  In response, the CM SHOULD invoke the callback only when
+   the rate decreases to less than (rate_downthresh * lastrate) or
+   increases to more than (rate_upthresh * lastrate), where lastrate is
+   the rate last notified to the stream, or when the round-trip time
+   changes correspondingly by the requisite thresholds.  This
+   information is used as a hint by the CM, in the sense the
+   cmapp_update() can be called even if these conditions are not met.
+
+   The CM MUST implement a cm_query(i32 cm_streamid, u64* rate, u32*
+   srtt, u32* rttdev) to allow an application to query the current CM
+   state.  This sets the rate variable to the current rate estimate in
+   bits per second, the srtt variable to the current smoothed round-trip
+   time estimate in microseconds, and rttdev to the mean linear
+   deviation.  If the CM does not have valid estimates for the
+   macroflow, it fills in negative values for the rate, srtt, and
+   rttdev.
+
+   Note that a stream can use more than one of the above transmission
+   APIs at the same time.  In particular, the knowledge of sustainable
+   rate is useful for asynchronous streams as well as synchronous ones;
+   e.g., an asynchronous Web server disseminating images using TCP may
+   use cmapp_send() to schedule its transmissions and cmapp_update() to
+   decide whether to send a low-resolution or high-resolution image.  A
+   TCP implementation using the CM is described in Section 6.1.1, where
+   the benefit of the cm_request() callback API for TCP will become
+   apparent.
+
+   The reader will notice that the basic CM API does not provide an
+   interface for buffered congestion-controlled transmissions.  This is
+   intentional, since this transmission mode can be implemented using
+   the callback-based primitive.  Section 6.1.2 describes how
+   congestion-controlled UDP sockets may be implemented using the CM
+   API.
+
+3.3 Application notification
+
+   When a stream receives feedback from receivers, it MUST use
+   cm_update(i32 cm_streamid, u32 nrecd, u32 nlost, u8 lossmode, i32
+   rtt) to inform the CM about events such as congestion losses,
+
+
+
+Balakrishnan, et. al.       Standards Track                     [Page 9]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   successful receptions, type of loss (timeout event, Explicit
+   Congestion Notification [Ramakrishnan99], etc.) and round-trip time
+   samples.  The nrecd parameter indicates how many bytes were
+   successfully received by the receiver since the last cm_update call,
+   while the nrecd parameter identifies how many bytes were received
+   were lost during the same time period.  The rtt value indicates the
+   round-trip time measured during the transmission of these bytes.  The
+   rtt value must be set to -1 if no valid round-trip sample was
+   obtained by the application.  The lossmode parameter provides an
+   indicator of how a loss was detected.  A value of CM_NO_FEEDBACK
+   indicates that the application has received no feedback for all its
+   outstanding data, and is reporting this to the CM.  For example, a
+   TCP that has experienced a timeout would use this parameter to inform
+   the CM of this.  A value of CM_LOSS_FEEDBACK indicates that the
+   application has experienced some loss, which it believes to be due to
+   congestion, but not all outstanding data has been lost.  For example,
+   a TCP segment loss detected using duplicate (selective)
+   acknowledgments or other data-driven techniques fits this category.
+   A value of CM_EXPLICIT_CONGESTION indicates that the receiver echoed
+   an explicit congestion notification message.  Finally, a value of
+   CM_NO_CONGESTION indicates that no congestion-related loss has
+   occurred.  The lossmode parameter MUST be reported as a bit-vector
+   where the bits correspond to CM_NO_FEEDBACK, CM_LOSS_FEEDBACK,
+   CM_EXPLICIT_CONGESTION, and CM_NO_CONGESTION.  Note that over links
+   (paths) that experience losses for reasons other than congestion, an
+   application SHOULD inform the CM of losses, with the CM_NO_CONGESTION
+   field set.
+
+   cm_notify(i32 cm_streamid, u32 nsent) MUST be called when data is
+   transmitted from the host (e.g., in the IP output routine) to inform
+   the CM that nsent bytes were just transmitted on a given stream.
+   This allows the CM to update its estimate of the number of
+   outstanding bytes for the macroflow and for the stream.
+
+   A cmapp_send() grant from the CM to an application is valid only for
+   an expiration time, equal to the larger of the round-trip time and an
+   implementation-dependent threshold communicated as an argument to the
+   cmapp_send() callback function.  The application MUST NOT send data
+   based on this callback after this time has expired.  Furthermore, if
+   the application decides not to send data after receiving this
+   callback, it SHOULD call cm_notify(stream_info, 0) to allow the CM to
+   permit other streams in the macroflow to transmit data.  The CM
+   congestion controller MUST be robust to applications forgetting to
+   invoke cm_notify(stream_info, 0) correctly, or applications that
+   crash or disappear after having made a cm_request() call.
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 10]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+3.4 Querying
+
+   If applications wish to learn about per-stream available bandwidth
+   and round-trip time, they can use the CM's cm_query(i32 cm_streamid,
+   i64* rate, i32* srtt, i32* rttdev) call, which fills in the desired
+   quantities.  If the CM does not have valid estimates for the
+   macroflow, it fills in negative values for the rate, srtt, and
+   rttdev.
+
+3.5 Sharing granularity
+
+   One of the decisions the CM needs to make is the granularity at which
+   a macroflow is constructed, by deciding which streams belong to the
+   same macroflow and share congestion information.  The API provides
+   two functions that allow applications to decide which of their
+   streams ought to belong to the same macroflow.
+
+   cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow
+   identifier.  cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid)
+   sets the macroflow of the stream cm_streamid to cm_macroflowid.  If
+   the cm_macroflowid that is passed to cm_setmacroflow() is -1, then a
+   new macroflow is constructed and this is returned to the caller.
+   Each call to cm_setmacroflow() overrides the previous macroflow
+   association for the stream, should one exist.
+
+   The default suggested aggregation method is to aggregate by
+   destination IP address; i.e., all streams to the same destination
+   address are aggregated to a single macroflow by default.  The
+   cm_getmacroflow() and cm_setmacroflow() calls can then be used to
+   change this as needed.  We do note that there are some cases where
+   this may not be optimal, even over best-effort networks.  For
+   example, when a group of receivers are behind a NAT device, the
+   sender will see them all as one address.  If the hosts behind the NAT
+   are in fact connected over different bottleneck links, some of those
+   hosts could see worse performance than before.  It is possible to
+   detect such hosts when using delay and loss estimates, although the
+   specific mechanisms for doing so are beyond the scope of this
+   document.
+
+   The objective of this interface is to set up sharing of groups not
+   sharing policy of relative weights of streams in a macroflow.  The
+   latter requires the scheduler to provide an interface to set sharing
+   policy.  However, because we want to support many different
+   schedulers (each of which may need different information to set
+   policy), we do not specify a complete API to the scheduler (but see
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 11]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   Section 5.2).  A later guideline document is expected to describe a
+   few simple schedulers (e.g., weighted round-robin, hierarchical
+   scheduling) and the API they export to provide relative
+   prioritization.
+
+4. CM internals
+
+   This section describes the internal components of the CM.  It
+   includes a Congestion Controller and a Scheduler, with well-defined,
+   abstract interfaces exported by them.
+
+4.1 Congestion controller
+
+   Associated with each macroflow is a congestion control algorithm; the
+   collection of all these algorithms comprises the congestion
+   controller of the CM.  The control algorithm decides when and how
+   much data can be transmitted by a macroflow.  It uses application
+   notifications (Section 4.3) from concurrent streams on the same
+   macroflow to build up information about the congestion state of the
+   network path used by the macroflow.
+
+   The congestion controller MUST implement a "TCP-friendly" [Mahdavi98]
+   congestion control algorithm.  Several macroflows MAY (and indeed,
+   often will) use the same congestion control algorithm but each
+   macroflow maintains state about the network used by its streams.
+
+   The congestion control module MUST implement the following abstract
+   interfaces.  We emphasize that these are not directly visible to
+   applications; they are within the context of a macroflow, and are
+   different from the CM API functions of Section 4.
+
+   - void query(u64 *rate, u32 *srtt, u32 *rttdev): This function
+     returns the estimated rate (in bits per second) and smoothed
+     round trip time (in microseconds) for the macroflow.
+
+   - void notify(u32 nsent): This function MUST be used to notify the
+     congestion control module whenever data is sent by an
+     application.  The nsent parameter indicates the number of bytes
+     just sent by the application.
+
+   - void update(u32 nsent, u32 nrecd, u32 rtt, u32 lossmode): This
+     function is called whenever any of the CM streams associated with
+     a macroflow identifies that data has reached the receiver or has
+     been lost en route.  The nrecd parameter indicates the number of
+     bytes that have just arrived at the receiver.  The nsent
+     parameter is the sum of the number of bytes just received and the
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 12]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+     number of bytes identified as lost en route.  The rtt parameter is
+     the estimated round trip time in microseconds during the
+     transfer.  The lossmode parameter provides an indicator of how a
+     loss was detected (section 4.3).
+
+   Although these interfaces are not visible to applications, the
+   congestion controller MUST implement these abstract interfaces to
+   provide for modular inter-operability with different separately-
+   developed schedulers.
+
+   The congestion control module MUST also call the associated
+   scheduler's schedule function (section 5.2) when it believes that the
+   current congestion state allows an MTU-sized packet to be sent.
+
+4.2 Scheduler
+
+   While it is the responsibility of the congestion control module to
+   determine when and how much data can be transmitted, it is the
+   responsibility of a macroflow's scheduler module to determine which
+   of the streams should get the opportunity to transmit data.
+
+   The Scheduler MUST implement the following interfaces:
+
+   - void schedule(u32 num_bytes): When the congestion control module
+     determines that data can be sent, the schedule() routine MUST be
+     called with no more than the number of bytes that can be sent.
+     In turn, the scheduler MAY call the cmapp_send() function that CM
+     applications must provide.
+
+   - float query_share(i32 cm_streamid): This call returns the
+     described stream's share of the total bandwidth available to the
+     macroflow.  This call combined with the query call of the
+     congestion controller provides the information to satisfy an
+     application's cm_query() request.
+
+   - void notify(i32 cm_streamid, u32 nsent): This interface is used
+     to notify the scheduler module whenever data is sent by a CM
+     application.  The nsent parameter indicates the number of bytes
+     just sent by the application.
+
+     The Scheduler MAY implement many additional interfaces.  As
+     experience with CM schedulers increases, future documents may
+     make additions and/or changes to some parts of the scheduler
+     API.
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 13]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+5. Examples
+
+5.1 Example applications
+
+   This section describes three possible uses of the CM API by
+   applications.  We describe two asynchronous applications---an
+   implementation of a TCP sender and an implementation of congestion-
+   controlled UDP sockets, and a synchronous application---a streaming
+   audio server.  More details of these applications and CM
+   implementation optimizations for efficient operation are described in
+   [Andersen00].
+
+   All applications that use the CM MUST incorporate feedback from the
+   receiver.  For example, it must periodically (typically once or twice
+   per round trip time) determine how many of its packets arrived at the
+   receiver.  When the source gets this feedback, it MUST use
+   cm_update() to inform the CM of this new information.  This results
+   in the CM updating ownd and may result in the CM changing its
+   estimates and calling cmapp_update() of the streams of the macroflow.
+
+   The protocols in this section are examples and suggestions for
+   implementation, rather than requirements for any conformant
+   implementation.
+
+5.1.1 TCP
+
+   A TCP implementation that uses CM should use the cmapp_send()
+   callback API.  TCP only identifies which data it should send upon the
+   arrival of an acknowledgement or expiration of a timer.  As a result,
+   it requires tight control over when and if new data or
+   retransmissions are sent.
+
+   When TCP either connects to or accepts a connection from another
+   host, it performs a cm_open() call to associate the TCP connection
+   with a cm_streamid.
+
+   Once a connection is established, the CM is used to control the
+   transmission of outgoing data.  The CM eliminates the need for
+   tracking and reacting to congestion in TCP, because the CM and its
+   transmission API ensure proper congestion behavior.  Loss recovery is
+   still performed by TCP based on fast retransmissions and recovery as
+   well as timeouts.  In addition, TCP is also modified to have its own
+   outstanding window (tcp_ownd) estimate.  Whenever data segments are
+   sent from its cmapp_send() callback, TCP updates its tcp_ownd value.
+   The ownd variable is also updated after each cm_update() call.  TCP
+   also maintains a count of the number of outstanding segments
+   (pkt_cnt).  At any time, TCP can calculate the average packet size
+   (avg_pkt_size) as tcp_ownd/pkt_cnt.  The avg_pkt_size is used by TCP
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 14]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   to help estimate the amount of outstanding data.  Note that this is
+   not needed if the SACK option is used on the connection, since this
+   information is explicitly available.
+
+   The TCP output routines are modified as follows:
+
+      1. All congestion window (cwnd) checks are removed.
+
+      2. When application data is available.  The TCP output routines
+      perform all non-congestion checks (Nagle algorithm, receiver-
+      advertised window check, etc).  If these checks pass, the output
+      routine queues the data and calls cm_request() for the stream.
+
+      3. If incoming data or timers result in a loss being detected, the
+      retransmission is also placed in a queue and cm_request() is
+      called for the stream.
+
+      4. The cmapp_send() callback for TCP is set to an output routine.
+      If any retransmission is enqueued, the routine outputs the
+      retransmission.  Otherwise, the routine outputs as much new data
+      as the TCP connection state allows.  However, the cmapp_send()
+      never sends more than a single segment per call.  This routine
+      arranges for the other output computations to be done, such as
+      header and options computations.
+
+   The IP output routine on the host calls cm_notify() when the packets
+   are actually sent out.  Because it does not know which cm_streamid is
+   responsible for the packet, cm_notify() takes the stream_info as
+   argument (see Section 4 for what the stream_info should contain).
+   Because cm_notify() reports the IP payload size, TCP keeps track of
+   the total header size and incorporates these updates.
+
+   The TCP input routines are modified as follows:
+
+      1. RTT estimation is done as normal using either timestamps or
+      Karn's algorithm.  Any rtt estimate that is generated is passed to
+      CM via the cm_update call.
+
+      2. All cwnd and slow start threshold (ssthresh) updates are
+      removed.
+
+      3. Upon the arrival of an ack for new data, TCP computes the value
+      of in_flight (the amount of data in flight) as snd_max-ack-1
+      (i.e., MAX Sequence Sent - Current Ack - 1).  TCP then calls
+      cm_update(streamid, tcp_ownd - in_flight, 0, CM_NO_CONGESTION,
+      rtt).
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 15]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+      4. Upon the arrival of a duplicate acknowledgement, TCP must check
+      its dupack count (dup_acks) to determine its action.  If dup_acks
+      < 3, the TCP does nothing.  If dup_acks == 3, TCP assumes that a
+      packet was lost and that at least 3 packets arrived to generate
+      these duplicate acks.  Therefore, it calls cm_update(streamid, 4 *
+      avg_pkt_size, 3 * avg_pkt_size, CM_LOSS_FEEDBACK, rtt).  The
+      average packet size is used since the acknowledgments do not
+      indicate exactly how much data has reached the other end.  Most
+      TCP implementations interpret a duplicate ACK as an indication
+      that a full MSS has reached its destination.  Once a new ACK is
+      received, these TCP sender implementations may resynchronize with
+      TCP receiver.  The CM API does not provide a mechanism for TCP to
+      pass information from this resynchronization.  Therefore, TCP can
+      only infer the arrival of an avg_pkt_size amount of data from each
+      duplicate ack.  TCP also enqueues a retransmission of the lost
+      segment and calls cm_request().  If dup_acks > 3, TCP assumes that
+      a packet has reached the other end and caused this ack to be sent.
+      As a result, it calls cm_update(streamid, avg_pkt_size,
+      avg_pkt_size, CM_NO_CONGESTION, rtt).
+
+      5. Upon the arrival of a partial acknowledgment (one that does not
+      exceed the highest segment transmitted at the time the loss
+      occurred, as defined in [Floyd99b]), TCP assumes that a packet was
+      lost and that the retransmitted packet has reached the recipient.
+      Therefore, it calls cm_update(streamid, 2 * avg_pkt_size,
+      avg_pkt_size, CM_NO_CONGESTION, rtt).  CM_NO_CONGESTION is used
+      since the loss period has already been reported.  TCP also
+      enqueues a retransmission of the lost segment and calls
+      cm_request().
+
+   When the TCP retransmission timer expires, the sender identifies that
+   a segment has been lost and calls cm_update(streamid, avg_pkt_size,
+   0, CM_NO_FEEDBACK, 0) to signify that no feedback has been received
+   from the receiver and that one segment is sure to have "left the
+   pipe."  TCP also enqueues a retransmission of the lost segment and
+   calls cm_request().
+
+5.1.2 Congestion-controlled UDP
+
+   Congestion-controlled UDP is a useful CM application, which we
+   describe in the context of Berkeley sockets [Stevens94].  They
+   provide the same functionality as standard Berkeley UDP sockets, but
+   instead of immediately sending the data from the kernel packet queue
+   to lower layers for transmission, the buffered socket implementation
+   makes calls to the API exported by the CM inside the kernel and gets
+   callbacks from the CM.  When a CM UDP socket is created, it is bound
+   to a particular stream.  Later, when data is added to the packet
+   queue, cm_request() is called on the stream associated with the
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 16]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   socket.  When the CM schedules this stream for transmission, it calls
+   udp_ccappsend() in the UDP module.  This function transmits one MTU
+   from the packet queue, and schedules the transmission of any
+   remaining packets.  The in-kernel implementation of the CM UDP API
+   should not require any additional data copies and should support all
+   standard UDP options.  Modifying existing applications to use
+   congestion-controlled UDP requires the implementation of a new socket
+   option on the socket.  To work correctly, the sender must obtain
+   feedback about congestion.  This can be done in at least two ways:
+   (i) the UDP receiver application can provide feedback to the sender
+   application, which will inform the CM of network conditions using
+   cm_update(); (ii) the UDP receiver implementation can provide
+   feedback to the sending UDP.  Note that this latter alternative
+   requires changes to the receiver's network stack and the sender UDP
+   cannot assume that all receivers support this option without explicit
+   negotiation.
+
+5.1.3 Audio server
+
+   A typical audio application often has access to the sample in a
+   multitude of data rates and qualities.  The objective of the
+   application is then to deliver the highest possible quality of audio
+   (typically the highest data rate) its clients.  The selection of
+   which version of audio to transmit should be based on the current
+   congestion state of the network.  In addition, the source will want
+   audio delivered to its users at a consistent sampling rate.  As a
+   result, it must send data a regular rate, minimizing delaying
+   transmissions and reducing buffering before playback.  To meet these
+   requirements, this application can use the synchronous sender API
+   (Section 4.2).
+
+   When the source first starts, it uses the cm_query() call to get an
+   initial estimate of network bandwidth and delay.  If some other
+   streams on that macroflow have already been active, then it gets an
+   initial estimate that is valid; otherwise, it gets negative values,
+   which it ignores.  It then chooses an encoding that does not exceed
+   these estimates (or, in the case of an invalid estimate, uses
+   application-specific initial values) and begins transmitting data.
+   The application also implements the cmapp_update() callback.  When
+   the CM determines that network characteristics have changed, it calls
+   the application's cmapp_update() function and passes it a new rate
+   and round-trip time estimate.  The application must change its choice
+   of audio encoding to ensure that it does not exceed these new
+   estimates.
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 17]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+5.2 Example congestion control module
+
+   To illustrate the responsibilities of a congestion control module,
+   the following describes some of the actions of a simple TCP-like
+   congestion control module that implements Additive Increase
+   Multiplicative Decrease congestion control (AIMD_CC):
+
+   - query(): AIMD_CC returns the current congestion window (cwnd)
+     divided by the smoothed rtt (srtt) as its bandwidth estimate.  It
+     returns the smoothed rtt estimate as srtt.
+
+   - notify(): AIMD_CC adds the number of bytes sent to its
+     outstanding data window (ownd).
+
+   - update(): AIMD_CC subtracts nsent from ownd.  If the value of rtt
+     is non-zero, AIMD_CC updates srtt using the TCP srtt calculation.
+     If the update indicates that data has been lost, AIMD_CC sets
+     cwnd to 1 MTU if the loss_mode is CM_NO_FEEDBACK and to cwnd/2
+     (with a minimum of 1 MTU) if the loss_mode is CM_LOSS_FEEDBACK or
+     CM_EXPLICIT_CONGESTION.  AIMD_CC also sets its internal ssthresh
+     variable to cwnd/2.  If no loss had occurred, AIMD_CC mimics TCP
+     slow start and linear growth modes.  It increments cwnd by nsent
+     when cwnd < ssthresh (bounded by a maximum of ssthresh-cwnd) and
+     by nsent * MTU/cwnd when cwnd > ssthresh.
+
+   - When cwnd or ownd are updated and indicate that at least one MTU
+     may be transmitted, AIMD_CC calls the CM to schedule a
+     transmission.
+
+5.3 Example Scheduler Module
+
+   To clarify the responsibilities of a scheduler module, the following
+   describes some of the actions of a simple round robin scheduler
+   module (RR_sched):
+
+   - schedule(): RR_sched schedules as many streams as possible in round
+     robin fashion.
+
+   - query_share(): RR_sched returns 1/(number of streams in macroflow).
+
+   - notify(): RR_sched does nothing.  Round robin scheduling is not
+     affected by the amount of data sent.
+
+6. Security Considerations
+
+   The CM provides many of the same services that the congestion control
+   in TCP provides.  As such, it is vulnerable to many of the same
+   security problems.  For example, incorrect reports of losses and
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 18]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   transmissions will give the CM an inaccurate picture of the network's
+   congestion state.  By giving CM a high estimate of congestion, an
+   attacker can degrade the performance observed by applications.  For
+   example, a stream on a host can arbitrarily slow down any other
+   stream on the same macroflow, a form of denial of service.
+
+   The more dangerous form of attack occurs when an application gives
+   the CM a low estimate of congestion.  This would cause CM to be
+   overly aggressive and allow data to be sent much more quickly than
+   sound congestion control policies would allow.
+
+   [Touch97] describes a number of the security problems that arise with
+   congestion information sharing.  An additional vulnerability (not
+   covered by [Touch97])) occurs because applications have access
+   through the CM API to control shared state that will affect other
+   applications on the same computer.  For instance, a poorly designed,
+   possibly a compromised, or intentionally malicious UDP application
+   could misuse cm_update() to cause starvation and/or too-aggressive
+   behavior of others in the macroflow.
+
+7. References
+
+   [Allman99]        Allman, M. and Paxson, V., "TCP Congestion
+                     Control", RFC 2581, April 1999.
+
+   [Andersen00]      Balakrishnan, H., System Support for Bandwidth
+                     Management and Content Adaptation in Internet
+                     Applications, Proc. 4th Symp. on Operating Systems
+                     Design and Implementation, San Diego, CA, October
+                     2000.  Available from
+                     http://nms.lcs.mit.edu/papers/cm-osdi2000.html
+
+   [Balakrishnan98]  Balakrishnan, H., Padmanabhan, V., Seshan, S.,
+                     Stemm, M., and Katz, R., "TCP Behavior of a Busy
+                     Web Server:  Analysis and Improvements," Proc. IEEE
+                     INFOCOM, San Francisco, CA, March 1998.
+
+   [Balakrishnan99]  Balakrishnan, H., Rahul, H., and Seshan, S., "An
+                     Integrated Congestion Management Architecture for
+                     Internet Hosts," Proc. ACM SIGCOMM, Cambridge, MA,
+                     September 1999.
+
+   [Bradner96]       Bradner, S., "The Internet Standards Process ---
+                     Revision 3", BCP 9, RFC 2026, October 1996.
+
+   [Bradner97]       Bradner, S., "Key words for use in RFCs to Indicate
+                     Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 19]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+   [Clark90]         Clark, D. and Tennenhouse, D., "Architectural
+                     Consideration for a New Generation of Protocols",
+                     Proc. ACM SIGCOMM, Philadelphia, PA, September
+                     1990.
+
+   [Eggert00]        Eggert, L., Heidemann, J., and Touch, J., "Effects
+                     of Ensemble TCP," ACM Computer Comm. Review,
+                     January 2000.
+
+   [Floyd99a]        Floyd, S. and Fall, K.," Promoting the Use of End-
+                     to-End Congestion Control in the Internet,"
+                     IEEE/ACM Trans. on Networking, 7(4), August 1999,
+                     pp. 458-472.
+
+   [Floyd99b]        Floyd, S. and T. Henderson,"The New Reno
+                     Modification to TCP's Fast Recovery Algorithm," RFC
+                     2582, April 1999.
+
+   [Jacobson88]      Jacobson, V., "Congestion Avoidance and Control,"
+                     Proc. ACM SIGCOMM, Stanford, CA, August 1988.
+
+   [Mahdavi98]       Mahdavi, J. and Floyd, S., "The TCP Friendly
+                     Website,"
+                     http://www.psc.edu/networking/tcp_friendly.html
+
+   [Mogul90]         Mogul, J. and S. Deering, "Path MTU Discovery," RFC
+                     1191, November 1990.
+
+   [Padmanabhan98]   Padmanabhan, V., "Addressing the Challenges of Web
+                     Data Transport," PhD thesis, Univ. of California,
+                     Berkeley, December 1998.
+
+   [Paxson00]        Paxson, V. and M. Allman, "Computing TCP's
+                     Retransmission Timer", RFC 2988, November 2000.
+
+   [Postel81]        Postel, J., Editor, "Transmission Control
+                     Protocol", STD 7, RFC 793, September 1981.
+
+   [Ramakrishnan99]  Ramakrishnan, K. and Floyd, S., "A Proposal to Add
+                     Explicit Congestion Notification (ECN) to IP," RFC
+                     2481, January 1999.
+
+
+   [Stevens94]       Stevens, W., TCP/IP Illustrated, Volume 1.
+                     Addison-Wesley, Reading, MA, 1994.
+
+   [Touch97]         Touch, J., "TCP Control Block Interdependence", RFC
+                     2140, April 1997.
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 20]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+8. Acknowledgments
+
+   We thank David Andersen, Deepak Bansal, and Dorothy Curtis for their
+   work on the CM design and implementation.  We thank Vern Paxson for
+   his detailed comments, feedback, and patience, and Sally Floyd, Mark
+   Handley, and Steven McCanne for useful feedback on the CM
+   architecture.  Allison Mankin and Joe Touch provided several useful
+   comments on previous drafts of this document.
+
+9. Authors' Addresses
+
+   Hari Balakrishnan
+   Laboratory for Computer Science
+   200 Technology Square
+   Massachusetts Institute of Technology
+   Cambridge, MA 02139
+
+   EMail: hari@lcs.mit.edu
+   Web: http://nms.lcs.mit.edu/~hari/
+
+
+   Srinivasan Seshan
+   School of Computer Science
+   Carnegie Mellon University
+   5000 Forbes Ave.
+   Pittsburgh, PA 15213
+
+   EMail: srini@cmu.edu
+   Web: http://www.cs.cmu.edu/~srini/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 21]
+
+RFC 3124                 The Congestion Manager                June 2001
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Balakrishnan, et. al.       Standards Track                    [Page 22]
+
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc3124.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)