doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc8316.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 899 insertions, 0 deletions
diff --git a/doc/rfc/rfc8316.txt b/doc/rfc/rfc8316.txt
new file mode 100644
index 0000000..a9d45b4
--- /dev/null
+++ b/doc/rfc/rfc8316.txt
@@ -0,0 +1,899 @@
+
+
+
+
+
+
+Internet Research Task Force (IRTF)                             J. Nobre
+Request for Comments: 8316           University of Vale do Rio dos Sinos
+Category: Informational                                     L. Granville
+ISSN: 2070-1721                  Federal University of Rio Grande do Sul
+                                                                A. Clemm
+                                                                  Huawei
+                                                      A. Gonzalez Prieto
+                                                                  VMware
+                                                           February 2018
+
+
+       Autonomic Networking Use Case for Distributed Detection of
+                Service Level Agreement (SLA) Violations
+
+Abstract
+
+   This document describes an experimental use case that employs
+   autonomic networking for the monitoring of Service Level Agreements
+   (SLAs).  The use case is for detecting violations of SLAs in a
+   distributed fashion.  It strives to optimize and dynamically adapt
+   the autonomic deployment of active measurement probes in a way that
+   maximizes the likelihood of detecting service-level violations with a
+   given resource budget to perform active measurements.  This
+   optimization and adaptation should be done without any outside
+   guidance or intervention.
+
+   This document is a product of the IRTF Network Management Research
+   Group (NMRG).  It is published for informational purposes.
+
+Status of This Memo
+
+   This document is not an Internet Standards Track specification; it is
+   published for informational purposes.
+
+   This document is a product of the Internet Research Task Force
+   (IRTF).  The IRTF publishes the results of Internet-related research
+   and development activities.  These results might not be suitable for
+   deployment.  This RFC represents the consensus of the Network
+   Management Research Group of the Internet Research Task Force (IRTF).
+   Documents approved for publication by the IRSG are not candidates for
+   any level of Internet Standard; see Section 2 of RFC 7841.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   https://www.rfc-editor.org/info/rfc8316.
+
+
+
+
+
+
+Nobre, et al.                 Informational                     [Page 1]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+Copyright Notice
+
+   Copyright (c) 2018 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (https://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.
+
+Table of Contents
+
+   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
+   2.  Definitions and Acronyms  . . . . . . . . . . . . . . . . . .   5
+   3.  Current Approaches  . . . . . . . . . . . . . . . . . . . . .   6
+   4.  Use Case Description  . . . . . . . . . . . . . . . . . . . .   7
+   5.  A Distributed Autonomic Solution  . . . . . . . . . . . . . .   8
+   6.  Intended User Experience  . . . . . . . . . . . . . . . . . .  10
+   7.  Implementation Considerations . . . . . . . . . . . . . . . .  11
+     7.1.  Device-Based Self-Knowledge and Decisions . . . . . . . .  11
+     7.2.  Interaction with Other Devices  . . . . . . . . . . . . .  11
+   8.  Comparison with Current Solutions . . . . . . . . . . . . . .  12
+   9.  Related IETF Work . . . . . . . . . . . . . . . . . . . . . .  12
+   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  13
+   11. Security Considerations . . . . . . . . . . . . . . . . . . .  13
+   12. Informative References  . . . . . . . . . . . . . . . . . . .  13
+   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  16
+   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nobre, et al.                 Informational                     [Page 2]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+1.  Introduction
+
+   The Internet has been growing dramatically in terms of size,
+   capacity, and accessibility in recent years.  Communication
+   requirements of distributed services and applications running on top
+   of the Internet have become increasingly demanding.  Some examples
+   are real-time interactive video or financial trading.  Providing such
+   services involves stringent requirements in terms of acceptable
+   latency, loss, and jitter.
+
+   Performance requirements lead to the articulation of Service Level
+   Objectives (SLOs) that must be met.  Those SLOs are part of Service
+   Level Agreements (SLAs) that define a contract between the provider
+   and the consumer of a service.  SLOs, in effect, constitute a
+   service-level guarantee that the consumer of the service can expect
+   to receive (and often has to pay for).  Likewise, the provider of a
+   service needs to ensure that the service-level guarantee and
+   associated SLOs are met.  Some examples of clauses that relate to
+   SLOs can be found in [RFC7297].
+
+   Violations of SLOs can be associated with significant financial loss,
+   which can by divided into two categories.  First, there is the loss
+   that can be incurred by the user of a service when the agreed service
+   levels are not provided.  For example, a financial brokerage's stock
+   orders might suffer losses when it is unable to execute stock
+   transactions in a timely manner.  An electronic retailer may lose
+   customers when its online presence is perceived by customers as
+   sluggish.  An online gaming provider may not be able to provide fair
+   access to online players, resulting in frustrated players who are
+   lost as customers.  In each case, the failure of a service provider
+   to meet promised service-level guarantees can have a substantial
+   financial impact on users of the service.  Second, there is the loss
+   that is incurred by the provider of a service who is unable to meet
+   promised SLOs.  Those losses can take several forms, such as
+   penalties for violating the service level agreement and even loss of
+   future revenue due to reduced customer satisfaction (which, in many
+   cases, is more serious).  Hence, SLOs are a key concern for the
+   service provider.  In order to ensure that SLOs are not being
+   violated, service levels need to be continuously monitored at the
+   network infrastructure layer in order to know, for example, when
+   mitigating actions need to be taken.  To that end, service-level
+   measurements must take place.
+
+   Network measurements can be performed using active or passive
+   measurement techniques.  In passive measurements, production traffic
+   is observed, and no monitoring traffic is created by the measurement
+   process itself.  That is, network conditions are checked in a
+   non-intrusive way.  In the context of IP Flow Information Export
+
+
+
+Nobre, et al.                 Informational                     [Page 3]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   (IPFIX), several documents were produced that define how to export
+   data associated with flow records, i.e., data that is collected as
+   part of passive measurement mechanisms, generally applied against
+   flows of production traffic (e.g., [RFC7011]).  In addition, it is
+   possible to collect real data traffic (not just summarized flow
+   records) with time-stamped packets, possibly sampled (e.g., per
+   [RFC5474]), as a means of measuring and inferring service levels.
+   Active measurements, on the other hand, are more intrusive to the
+   network in the sense that they involve injecting synthetic test
+   traffic into the network to measure network service levels, as
+   opposed to simply observing production traffic.  The IP Performance
+   Metrics (IPPM) Working Group produced documents that describe active
+   measurement mechanisms such as the One-Way Active Measurement
+   Protocol (OWAMP) [RFC4656], the Two-Way Active Measurement Protocol
+   (TWAMP) [RFC5357], and the Cisco Service-Level Assurance Protocol
+   [RFC6812].  In addition, there are some mechanisms that do not
+   cleanly fit into either active or passive categories, such as
+   Performance and Diagnostic Metrics (PDM) Destination Option
+   techniques [RFC8250].
+
+   Active measurement mechanisms offer a high level of control over what
+   and how to measure.  They do not require inspecting production
+   traffic.  Because of this, active measurements usually offer better
+   accuracy and privacy than passive measurement mechanisms.  Traffic
+   encryption and regulations that limit the amount of payload
+   inspection that can occur are non-issues.  Furthermore, active
+   measurement mechanisms are able to detect end-to-end network
+   performance problems in a fine-grained way (e.g., simulating the
+   traffic that must be handled considering specific SLOs).  As a
+   result, active measurements are often preferred over passive
+   measurement for SLA monitoring.  Measurement probes must be hosted in
+   network devices and measurement sessions must be activated to compute
+   the current network metrics (for example, metrics such as the ones
+   described in [RFC4148], although note that [RFC4148] was obsoleted by
+   [RFC6248]).  This activation should be dynamic in order to follow
+   changes in network conditions, such as those related to routes being
+   added or new customer demands.
+
+   While offering many advantages, active measurements are expensive in
+   terms of network resource consumption.  Active measurements generally
+   involve measurement probes that generate synthetic test traffic that
+   is directed at a responder.  The responder needs to timestamp test
+   traffic it receives and reflect it back to the originating
+   measurement probe.  The measurement probe subsequently processes the
+   returned packets along with time-stamping information in order to
+   compute service levels.  Accordingly, active measurements consume
+   substantial CPU cycles as well as memory of network devices to
+
+
+
+
+Nobre, et al.                 Informational                     [Page 4]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   generate and process test traffic.  In addition, synthetic traffic
+   increases network load.  Thus, active measurements compete for
+   resources with other functions, including routing and switching.
+
+   The resources required and traffic generated by the active
+   measurement sessions are, in a large part, a function of the number
+   of measured network destinations.  (In addition, the amount of
+   traffic generated for each measurement plays a role that, in turn,
+   influences the accuracy of the measurement.)  When more destinations
+   are measured, a greater number of resources are consumed and more
+   traffic is needed to perform the measurements.  Thus, to have better
+   monitoring coverage, it is necessary to deploy more sessions, which
+   consequently increases consumed resources.  Otherwise, enabling the
+   observation of just a small subset of all network flows can lead to
+   insufficient coverage.
+
+   Furthermore, while some end-to-end service levels can be determined
+   by adding up the service levels observed across different path
+   segments, the same is not true for all service levels.  For example,
+   the end-to-end delay or packet loss from a node A to a node C routed
+   via a node B can often be computed simply by adding delays (or loss)
+   from A to B and from B to C.  This allows the decomposition of a
+   large set of end-to-end measurements into a much smaller set of
+   segment measurements.  However, end-to-end jitter and mean opinion
+   scores cannot be decomposed as easily and, for higher accuracy, must
+   be measured end-to-end.
+
+   Hence, the decision about how to place measurement probes becomes an
+   important management activity.  The goal is to obtain the maximum
+   benefits of service-level monitoring with a limited amount of
+   measurement overhead.  Specifically, the goal is to maximize the
+   number of service-level violations that are detected with a limited
+   number of resources.
+
+   The use case and the solution approach described in this document
+   address an important practical issue.  They are intended to provide a
+   basis for further experimentation to lead to solutions for wider
+   deployment.  This document represents the consensus of the IRTF's
+   Network Management Research Group (NMRG).  It was discussed
+   extensively and received three separate in-depth reviews.
+
+2.  Definitions and Acronyms
+
+   Active Measurements: Techniques to measure service levels that
+      involve generating and observing synthetic test traffic
+
+   Passive Measurements: Techniques used to measure service levels based
+      on observation of production traffic
+
+
+
+Nobre, et al.                 Informational                     [Page 5]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   Autonomic Network: A network containing exclusively autonomic nodes,
+      requiring no configuration, and deriving all required information
+      through self-knowledge, discovery, or intent.
+
+   Autonomic Service Agent (ASA): An agent implemented on an autonomic
+      node that implements an autonomic function, either in part (in the
+      case of a distributed function, as in the context of this
+      document) or whole
+
+   Measurement Session: A communications association between a probe and
+      a responder used to send and reflect synthetic test traffic for
+      active measurements
+
+   Probe: The source of synthetic test traffic in an active measurement
+
+   Responder: The destination for synthetic test traffic in an active
+      measurement
+
+   SLA: Service Level Agreement
+
+   SLO: Service Level Objective
+
+   P2P: Peer-to-Peer
+
+   (Note: The definitions for "Autonomic Network" and "Autonomic Service
+   Agent" are borrowed from [RFC7575]).
+
+3.  Current Approaches
+
+   For feasible deployments of active measurement solutions to
+   distribute the available measurement sessions along the network, the
+   current best practice consists of relying entirely on the human
+   administrator's expertise to infer the best location to activate such
+   sessions.  This is done through several steps.  First, it is
+   necessary to collect traffic information in order to grasp the
+   traffic matrix.  Then, the administrator uses this information to
+   infer the best destinations for measurement sessions.  After that,
+   the administrator activates sessions on the chosen subset of
+   destinations, taking the available resources into account.  This
+   practice, however, does not scale well because it is still labor
+   intensive and error-prone for the administrator to determine which
+   sessions should be activated given the set of critical flows that
+   needs to be measured.  Even worse, this practice completely fails in
+   networks where the most critical flows change rapidly, resulting in
+   dynamic changes to what would be the most important destinations.
+   For example, this can be the case in modern cloud environments.  This
+   is because fast reactions are necessary to reconfigure the sessions,
+   and administrators are just not quick enough in computing and
+
+
+
+Nobre, et al.                 Informational                     [Page 6]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   activating the new set of required sessions every time the network
+   traffic pattern changes.  Finally, the current practice for active
+   measurements usually covers only a fraction of the network flows that
+   should be observed, which invariably leads to the damaging
+   consequence of undetected SLA violations.
+
+4.  Use Case Description
+
+   The use case involves a service-level provider that needs to monitor
+   the network to detect service-level violations using active service-
+   level measurements and wants to be able to do so with minimal human
+   intervention.  The goal is to conduct the measurements in an
+   effective manner to maximize the percentage of detected service-level
+   violations.  The service-level provider has a bounded resource budget
+   with regard to measurements that can be performed, specifically the
+   number of measurements that can be conducted concurrently from any
+   one network device and possibly the total amount of measurement
+   traffic on the network.  However, while at any one point in time the
+   number of measurements conducted is limited, it is possible for a
+   device to change which destinations to measure over time.  This can
+   be exploited to achieve a balance of eventually covering all possible
+   destinations using a reasonable amount of "sampling" where
+   measurement coverage of a destination cannot be continuous.  The
+   solution needs to be dynamic and able to cope with network conditions
+   that may change over time.  The solution should also be embeddable
+   inside network devices that control the deployment of active
+   measurement mechanisms.
+
+   The goal is to conduct the measurements in a smart manner that
+   ensures that the network is broadly covered and that the likelihood
+   of detecting service-level violations is maximized.  In order to
+   maximize that likelihood, it is reasonable to focus measurement
+   resources on destinations that are more likely to incur a violation,
+   while spending fewer resources on destinations that are more likely
+   to be in compliance.  In order to do this, there are various aspects
+   that can be exploited, including past measurements (destinations
+   close to a service-level threshold requiring more focus than
+   destinations farther from it), complementation with passive
+   measurements such as flow data (to identify network destinations that
+   are currently popular and critical), and observations from other
+   parts of the network.  In addition, measurements can be coordinated
+   among different network devices to avoid hitting the same destination
+   at the same time and to share results that may be useful in future
+   probe placement.
+
+   Clearly, static solutions will have severe limitations.  At the same
+   time, human administrators cannot be in the loop for continuous
+   dynamic reconfigurations of measurement probes.  Thus, an automated
+
+
+
+Nobre, et al.                 Informational                     [Page 7]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   solution, or ideally an autonomic solution, is needed so that network
+   measurements are automatically orchestrated and dynamically
+   reconfigured from within the network.  This can be accomplished using
+   an autonomic solution that is distributed, using ASAs that are
+   implemented on nodes in the network.
+
+5.  A Distributed Autonomic Solution
+
+   The use of Autonomic Networking (AN) [RFC7575] can help such
+   detection through an efficient activation of measurement sessions.
+   Such an approach, along with a detailed assessment confirming its
+   viability, is described in [P2PBNM-Nobre-2012].  The problem to be
+   solved by AN in the present use case is how to steer the process of
+   measurement session activation by a complete solution that sets all
+   necessary parameters for this activation to operate efficiently,
+   reliably, and securely, with no required human intervention other
+   than setting overall policy.
+
+   When a node first comes online, it has no information about which
+   measurements are more critical than others.  In the absence of
+   information about past measurements and information from measurement
+   peers, it may start with an initial set of measurement sessions,
+   possibly randomly seeding a set of starter measurements and perhaps
+   taking a round-robin approach for subsequent measurement rounds.
+   However, as measurements are collected, a node will gain an
+   increasing amount of information that it can utilize to refine its
+   strategy of selecting measurement targets going forward.  For one, it
+   may take note of which targets returned measurement results very
+   close to service-level thresholds; these targets may require closer
+   scrutiny compared to others.  Second, it may utilize observations
+   that are made by its measurement peers in order to conclude which
+   measurement targets may be more critical than others and to ensure
+   that proper overall measurement coverage is obtained (so that not
+   every node incidentally measures the same targets, while other
+   targets are not measured at all).
+
+   We advocate for embedding P2P technology in network devices in order
+   to use autonomic control loops to make decisions about measurement
+   sessions.
+
+   Specifically, we advocate for network devices to implement an
+   autonomic function that monitors service levels for violations of
+   SLOs and that determines which measurement sessions to set up at any
+   given point in time based on current and past observations of the
+   node and of other peer nodes.
+
+   By performing these functions locally and autonomically on the device
+   itself, which measurements to conduct can be modified quickly based
+
+
+
+Nobre, et al.                 Informational                     [Page 8]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   on local observations while taking local resource availability into
+   account.  This allows a solution to be more robust and react more
+   dynamically to rapidly changing service levels than a solution that
+   has to rely on central coordination.  However, in order to optimize
+   decisions about which measurements to conduct, a node will need to
+   communicate with other nodes.  This allows a node to take into
+   account other nodes' observations in addition to its own in its
+   decisions.
+
+   For example, remote destinations whose observed service levels are on
+   the verge of violating stated objectives may require closer
+   monitoring than remote destinations that are comfortably within a
+   range of tolerance.  A distributed autonomic solution also allows
+   nodes to coordinate their probing decisions to collectively achieve
+   the best possible measurement coverage.  Because the number of
+   resources available for monitoring, exchanging measurement data, and
+   coordinating with other nodes is limited, a node may be interested in
+   identifying other nodes whose observations are similar to and
+   correlated with its own.  This helps a node prioritize and decide
+   which other nodes to coordinate and exchange data with.  All of this
+   requires the use of a P2P overlay.
+
+   A P2P overlay is essential for several reasons:
+
+   o  It makes it possible for nodes (or more specifically, the ASAs
+      that are deployed on those nodes) in the network to autonomically
+      set up measurement sessions without having to rely on a central
+      management system or controller to perform configuration
+      operations associated with configuring measurement probes and
+      responders.
+
+   o  It facilitates the exchange of data between different nodes to
+      share measurement results so that each node can refine its
+      measurement strategy based not just on its own observations, but
+      also on observations from its peers.
+
+   o  It allows nodes to coordinate their measurements to obtain the
+      best possible test coverage and avoid measurements that have a
+      very low likelihood of detecting service-level violations.
+
+   The provisioning of the P2P overlay should be transparent for the
+   network administrator.  An Autonomic Control Plane such as defined in
+   [ACP] provides an ideal candidate for the P2P overlay to run on.
+
+   An autonomic solution for the distributed detection of SLA violations
+   provides several benefits.  First, it provides efficiency; this
+   solution should optimize the resource consumption and avoid resource
+   starvation on the network devices.  A device that is "self-aware" of
+
+
+
+Nobre, et al.                 Informational                     [Page 9]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   its available resources will be able to adjust measurement activities
+   rapidly as needed, without requiring a separate control loop
+   involving resource monitoring by an external system.  Second, placing
+   logic about where to conduct measurements into the node enables rapid
+   control loops that allow devices to react instantly to observations
+   and adjust their measurement strategy.  For example, a device could
+   decide to adjust the amount of synthetic test traffic being sent
+   during the measurement itself depending on results observed so far on
+   this and other concurrent measurement sessions.  As a result, the
+   solution could decrease the time necessary to detect SLA violations.
+   Adaptivity features of an autonomic loop could capture the network
+   dynamics faster than a human administrator or even a central
+   controller.  Finally, the solution could help to reduce the workload
+   of human administrators.
+
+   In practice, these factors combine to maximize the likelihood of SLA
+   violations being detected while operating within a given resource
+   budget, allowing a continuous measurement strategy that takes into
+   account past measurement results to be conducted, observations of
+   other measures such as link utilization or flow data, measurement
+   results shared between network devices, and future measurement
+   activities coordinated among nodes.  Combined, this can result in
+   efficient measurement decisions that achieve a golden balance between
+   offering broad network coverage and honing in on service-level "hot
+   spots".
+
+6.  Intended User Experience
+
+   The autonomic solution should not require any human intervention in
+   the distributed detection of SLA violations.  By virtue of the
+   solution being autonomic, human users will not have to plan which
+   measurements to conduct in a network, which is often a very labor-
+   intensive task that requires detailed analysis of traffic matrices
+   and network topologies and is not prone to easy dynamic adjustment.
+   Likewise, they will not have to configure measurement probes and
+   responders.
+
+   There are some ways in which a human administrator may still interact
+   with the solution.  First, the human administrator will, of course,
+   be notified and obtain reports about service-level violations that
+   are observed.  Second, a human administrator may set policies
+   regarding how closely to monitor the network for service-level
+   violations and how many resources to spend.  For example, an
+   administrator may set a resource budget that is assigned to network
+   devices for measurement operations.  With that given budget, the
+   number of SLO violations that are detected will be maximized.
+   Alternatively, an administrator may set a target for the percentage
+   of SLO violations that must be detected, i.e., a target for the ratio
+
+
+
+Nobre, et al.                 Informational                    [Page 10]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   between the number of detected SLO violations and the number of total
+   SLO violations that are actually occurring (some of which might go
+   undetected).  In that case, the solution will aim to minimize the
+   resources spent (i.e., the amount of test traffic and number of
+   measurement sessions) that are required to achieve that target.
+
+7.  Implementation Considerations
+
+   The active measurement model assumes that a typical infrastructure
+   will have multiple network segments, multiple Autonomous Systems
+   (ASes), and a reasonably large number of routers.  It also considers
+   that multiple SLOs can be in place at a given time.  Since
+   interoperability in a heterogeneous network is a goal, features found
+   on different active measurement mechanisms (e.g., OWAMP, TWAMP, and
+   Cisco Service Level Assurance Protocol) and device programmability
+   interfaces (such as Juniper's Junos API or Cisco's Embedded Event
+   Manager) could be used for the implementation.  The autonomic
+   solution should include and/or reference specific algorithms,
+   protocols, metrics, and technologies for the implementation of
+   distributed detection of SLA violations as a whole.
+
+   Finally, it should be noted that there are multiple deployment
+   scenarios, including deployment scenarios that involve physical
+   devices hosting autonomic functions or virtualized infrastructure
+   hosting the same.  Co-deployment in conjunction with Virtual Network
+   Functions (VNFs) is a possibility for further study.
+
+7.1.  Device-Based Self-Knowledge and Decisions
+
+   Each device has self-knowledge about the local SLA monitoring.  This
+   could be in the form of historical measurement data and SLOs.
+   Besides that, the devices would have algorithms that could decide
+   which probes should be activated at a given time.  The choice of
+   which algorithm is better for a specific situation would be also
+   autonomic.
+
+7.2.  Interaction with Other Devices
+
+   Network devices should share information about service-level
+   measurement results.  This information can speed up the detection of
+   SLA violations and increase the number of detected SLA violations.
+   For example, if one device detects that a remote destination is in
+   danger of violating an SLO, other devices may conduct additional
+   measurements to the same destination or other destinations in its
+   proximity.  For any given network device, the exchange of data may be
+   more important with some devices (for example, devices in the same
+   network neighborhood or devices that are "correlated" by some other
+   means) than with others.  Defining the network devices that exchange
+
+
+
+Nobre, et al.                 Informational                    [Page 11]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   measurement data (i.e., management peers) creates a new topology.
+   Different approaches could be used to define this topology (e.g.,
+   correlated peers [P2PBNM-Nobre-2012]).  To bootstrap peer selection,
+   each device should use its known neighbors (e.g., FIB and RIB tables)
+   as initial seeds to identify possible peers.  It should be noted that
+   a solution will benefit if topology information and network discovery
+   functions are provided by the underlying autonomic framework.  A
+   solution will need to be able to discover measurement peers as well
+   as measurement targets, specifically measurement targets that support
+   active measurement responders and that will be able to respond to
+   measurement requests and reflect measurement traffic as needed.
+
+8.  Comparison with Current Solutions
+
+   There is no standardized solution for distributed autonomic detection
+   of SLA violations.  Current solutions are restricted to ad hoc
+   scripts running on a per-node fashion to automate some administrator
+   actions.  There are some proposals for passive probe activation
+   (e.g., DECON [DECON] and CSAMP [CSAMP]), but these do not focus on
+   autonomic features.
+
+9.  Related IETF Work
+
+   This section discusses related IETF work and is provided for
+   reference.  This section is not exhaustive; rather, it provides an
+   overview of the various initiatives and how they relate to autonomic
+   distributed detection of SLA violations.
+
+   1.  LMAP: The Large-Scale Measurement of Broadband Performance
+       Working Group standardizes the LMAP measurement system for
+       performance management of broadband access devices.  The
+       autonomic solution could be relevant to LMAP because it deploys
+       measurement probes and could be used for screening for SLA
+       violations.  Besides that, a solution to decrease the workload of
+       human administrators in service providers is probably highly
+       desirable.
+
+   2.  IPFIX: IP Flow Information Export (IPFIX) Working Group (now
+       concluded) aimed to standardize IP flows (i.e., netflows).  IPFIX
+       uses measurement probes (i.e., metering exporters) to gather flow
+       data.  In this context, the autonomic solution for the activation
+       of active measurement probes could possibly be extended to also
+       address passive measurement probes.  Besides that, flow
+       information could be used in making decisions regarding probe
+       activation.
+
+
+
+
+
+
+Nobre, et al.                 Informational                    [Page 12]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   3.  ALTO: The Application-Layer Traffic Optimization Working Group
+       aims to provide topological information at a higher abstraction
+       layer, which can be based upon network policy, and with
+       application-relevant service functions located in it.  Their work
+       could be leveraged to define the topology for network devices
+       that exchange measurement data.
+
+10.  IANA Considerations
+
+   This document has no IANA actions.
+
+11.  Security Considerations
+
+   The security of this solution hinges on the security of the network
+   underlay, i.e., the Autonomic Control Plane.  If the Autonomic
+   Control Plane were to be compromised, an attacker could undermine the
+   effectiveness of measurement coordination by reporting fraudulent
+   measurement results to peers.  This would cause measurement probes to
+   be deployed in an ineffective manner that would increase the
+   likelihood that violations of SLOs go undetected.
+
+   Likewise, the security of the solution hinges on the security of the
+   deployment mechanism for autonomic functions (in this case, the
+   autonomic function that conducts the service-level measurements).  If
+   an attacker were able to hijack an autonomic function, it could try
+   to exhaust or exceed the resources that should be spent on autonomic
+   measurements in order to deplete network resources, including network
+   bandwidth due to higher-than-necessary volumes of synthetic test
+   traffic generated by measurement probes.  Again, it could also lead
+   to reporting of misleading results; among other things, this could
+   result in non-optimal selection of measurement targets and, in turn,
+   an increase in the likelihood that service-level violations go
+   undetected.
+
+12.  Informative References
+
+   [ACP]      Eckert, T., Ed., Behringer, M., Ed., and S. Bjarnason, "An
+              Autonomic Control Plane (ACP)", Work in Progress,
+              draft-ietf-anima-autonomic-control-plane-13, December
+              2017.
+
+   [CSAMP]    Sekar, V., Reiter, M., Willinger, W., Zhang, H., Kompella,
+              R., and D. Andersen, "CSAMP: A System for Network-Wide
+              Flow Monitoring", NSDI USENIX Symposium Networked Systems
+              Design and Implementation, April 2008.
+
+
+
+
+
+
+Nobre, et al.                 Informational                    [Page 13]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   [DECON]    di Pietro, A., Huici, F., Costantini, D., and S.
+              Niccolini, "DECON: Decentralized Coordination for Large-
+              Scale Flow Monitoring", IEEE INFOCOM Workshops,
+              DOI 10.1109/INFCOMW.2010.5466642, March 2010.
+
+   [P2PBNM-Nobre-2012]
+              Nobre, J., Granville, L., Clemm, A., and A. Gonzalez
+              Prieto, "Decentralized Detection of SLA Violations Using
+              P2P Technology, 8th International Conference Network and
+              Service Management (CNSM)", 8th International Conference
+              on Network and Service Management (CNSM), 2012,
+              <http://ieeexplore.ieee.org/xpls/
+              abs_all.jsp?arnumber=6379997>.
+
+   [RFC4148]  Stephan, E., "IP Performance Metrics (IPPM) Metrics
+              Registry", BCP 108, RFC 4148, DOI 10.17487/RFC4148, August
+              2005, <https://www.rfc-editor.org/info/rfc4148>.
+
+   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
+              Zekauskas, "A One-way Active Measurement Protocol
+              (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006,
+              <https://www.rfc-editor.org/info/rfc4656>.
+
+   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J.
+              Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)",
+              RFC 5357, DOI 10.17487/RFC5357, October 2008,
+              <https://www.rfc-editor.org/info/rfc5357>.
+
+   [RFC5474]  Duffield, N., Ed., Chiou, D., Claise, B., Greenberg, A.,
+              Grossglauser, M., and J. Rexford, "A Framework for Packet
+              Selection and Reporting", RFC 5474, DOI 10.17487/RFC5474,
+              March 2009, <https://www.rfc-editor.org/info/rfc5474>.
+
+   [RFC6248]  Morton, A., "RFC 4148 and the IP Performance Metrics
+              (IPPM) Registry of Metrics Are Obsolete", RFC 6248,
+              DOI 10.17487/RFC6248, April 2011,
+              <https://www.rfc-editor.org/info/rfc6248>.
+
+   [RFC6812]  Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare,
+              S., and E. Yedavalli, "Cisco Service-Level Assurance
+              Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013,
+              <https://www.rfc-editor.org/info/rfc6812>.
+
+   [RFC7011]  Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
+              "Specification of the IP Flow Information Export (IPFIX)
+              Protocol for the Exchange of Flow Information", STD 77,
+              RFC 7011, DOI 10.17487/RFC7011, September 2013,
+              <https://www.rfc-editor.org/info/rfc7011>.
+
+
+
+Nobre, et al.                 Informational                    [Page 14]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+   [RFC7297]  Boucadair, M., Jacquenet, C., and N. Wang, "IP
+              Connectivity Provisioning Profile (CPP)", RFC 7297,
+              DOI 10.17487/RFC7297, July 2014,
+              <https://www.rfc-editor.org/info/rfc7297>.
+
+   [RFC7575]  Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A.,
+              Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic
+              Networking: Definitions and Design Goals", RFC 7575,
+              DOI 10.17487/RFC7575, June 2015,
+              <https://www.rfc-editor.org/info/rfc7575>.
+
+   [RFC8250]  Elkins, N., Hamilton, R., and M. Ackermann, "IPv6
+              Performance and Diagnostic Metrics (PDM) Destination
+              Option", RFC 8250, DOI 10.17487/RFC8250, September 2017,
+              <https://www.rfc-editor.org/info/rfc8250>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nobre, et al.                 Informational                    [Page 15]
+
+RFC 8316         AN Use Case Detection of SLA Violations   February 2018
+
+
+Acknowledgements
+
+   We wish to acknowledge the helpful contributions, comments, and
+   suggestions that were received from Mohamed Boucadair, Brian
+   Carpenter, Hanlin Fang, Bruno Klauser, Diego Lopez, Vincent Roca, and
+   Eric Voit.  In addition, we thank Diego Lopez, Vincent Roca, and
+   Brian Carpenter for their detailed reviews.
+
+Authors' Addresses
+
+   Jeferson Campos Nobre
+   University of Vale do Rio dos Sinos
+   Porto Alegre
+   Brazil
+
+   Email: jcnobre@unisinos.br
+
+
+   Lisandro Zambenedetti Granvile
+   Federal University of Rio Grande do Sul
+   Porto Alegre
+   Brazil
+
+   Email: granville@inf.ufrgs.br
+
+
+   Alexander Clemm
+   Huawei USA - Futurewei Technologies Inc.
+   Santa Clara, California
+   United States of America
+
+   Email: ludwig@clemm.org, alexander.clemm@huawei.com
+
+
+   Alberto Gonzalez Prieto
+   VMware
+   Palo Alto, California
+   United States of America
+
+   Email: agonzalezpri@vmware.com
+
+
+
+
+
+
+
+
+
+
+
+Nobre, et al.                 Informational                    [Page 16]
+
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc8316.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)