doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc9232.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 1922 insertions, 0 deletions
diff --git a/doc/rfc/rfc9232.txt b/doc/rfc/rfc9232.txt
new file mode 100644
index 0000000..61383ff
--- /dev/null
+++ b/doc/rfc/rfc9232.txt
@@ -0,0 +1,1922 @@
+
+
+
+
+Internet Engineering Task Force (IETF)                           H. Song
+Request for Comments: 9232                                     Futurewei
+Category: Informational                                           F. Qin
+ISSN: 2070-1721                                             China Mobile
+                                                       P. Martinez-Julia
+                                                                    NICT
+                                                            L. Ciavaglia
+                                                          Rakuten Mobile
+                                                                 A. Wang
+                                                           China Telecom
+                                                                May 2022
+
+
+                      Network Telemetry Framework
+
+Abstract
+
+   Network telemetry is a technology for gaining network insight and
+   facilitating efficient and automated network management.  It
+   encompasses various techniques for remote data generation,
+   collection, correlation, and consumption.  This document describes an
+   architectural framework for network telemetry, motivated by
+   challenges that are encountered as part of the operation of networks
+   and by the requirements that ensue.  This document clarifies the
+   terminology and classifies the modules and components of a network
+   telemetry system from different perspectives.  The framework and
+   taxonomy help to set a common ground for the collection of related
+   work and provide guidance for related technique and standard
+   developments.
+
+Status of This Memo
+
+   This document is not an Internet Standards Track specification; it is
+   published for informational purposes.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Not all documents
+   approved by the IESG are candidates for any level of Internet
+   Standard; see Section 2 of RFC 7841.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   https://www.rfc-editor.org/info/rfc9232.
+
+Copyright Notice
+
+   Copyright (c) 2022 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (https://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Revised BSD License text as described in Section 4.e of the
+   Trust Legal Provisions and are provided without warranty as described
+   in the Revised BSD License.
+
+Table of Contents
+
+   1.  Introduction
+     1.1.  Applicability Statement
+     1.2.  Glossary
+   2.  Background
+     2.1.  Telemetry Data Coverage
+     2.2.  Use Cases
+     2.3.  Challenges
+     2.4.  Network Telemetry
+     2.5.  The Necessity of a Network Telemetry Framework
+   3.  Network Telemetry Framework
+     3.1.  Top-Level Modules
+       3.1.1.  Management Plane Telemetry
+       3.1.2.  Control Plane Telemetry
+       3.1.3.  Forwarding Plane Telemetry
+       3.1.4.  External Data Telemetry
+     3.2.  Second-Level Function Components
+     3.3.  Data Acquisition Mechanism and Type Abstraction
+     3.4.  Mapping Existing Mechanisms into the Framework
+   4.  Evolution of Network Telemetry Applications
+   5.  Security Considerations
+   6.  IANA Considerations
+   7.  Informative References
+   Appendix A.  A Survey on Existing Network Telemetry Techniques
+     A.1.  Management Plane Telemetry
+       A.1.1.  Push Extensions for NETCONF
+       A.1.2.  gRPC Network Management Interface
+     A.2.  Control Plane Telemetry
+       A.2.1.  BGP Monitoring Protocol
+     A.3.  Data Plane Telemetry
+       A.3.1.  Alternate-Marking (AM) Technology
+       A.3.2.  Dynamic Network Probe
+       A.3.3.  IP Flow Information Export (IPFIX) Protocol
+       A.3.4.  In Situ OAM
+       A.3.5.  Postcard-Based Telemetry
+       A.3.6.  Existing OAM for Specific Data Planes
+     A.4.  External Data and Event Telemetry
+       A.4.1.  Sources of External Events
+       A.4.2.  Connectors and Interfaces
+       Acknowledgments
+       Contributors
+   Authors' Addresses
+
+1.  Introduction
+
+   Network visibility is the ability of management tools to see the
+   state and behavior of a network, which is essential for successful
+   network operation.  Network telemetry revolves around network data
+   that 1) can help provide insights about the current state of the
+   network, including network devices, forwarding, control, and
+   management planes; 2) can be generated and obtained through a variety
+   of techniques, including but not limited to network instrumentation
+   and measurements; and 3) can be processed for purposes ranging from
+   service assurance to network security using a wide variety of data
+   analytical techniques.  In this document, network telemetry refers to
+   both the data itself (i.e., "Network Telemetry Data") and the
+   techniques and processes used to generate, export, collect, and
+   consume that data for use by potentially automated management
+   applications.  Network telemetry extends beyond the classical network
+   Operations, Administration, and Management (OAM) techniques and
+   expects to support better flexibility, scalability, accuracy,
+   coverage, and performance.
+
+   However, the term "network telemetry" lacks an unambiguous
+   definition.  The scope and coverage of it cause confusion and
+   misunderstandings.  It is beneficial to clarify the concept and
+   provide a clear architectural framework for network telemetry, so we
+   can articulate the technical field and better align the related
+   techniques and standard works.
+
+   To fulfill such an undertaking, we first discuss some key
+   characteristics of network telemetry that set a clear distinction
+   from the conventional network OAM and show that some conventional OAM
+   technologies can be considered a subset of the network telemetry
+   technologies.  We then provide an architectural framework for network
+   telemetry that includes four modules, each associated with a
+   different category of telemetry data and corresponding procedures.
+   All the modules are internally structured in the same way, including
+   components that allow the operator to configure data sources in
+   regard to what data to generate and how to make that available to
+   client applications, components that instrument the underlying data
+   sources, and components that perform the actual rendering, encoding,
+   and exporting of the generated data.  We show how the network
+   telemetry framework can benefit current and future network
+   operations.  Based on the distinction of modules and function
+   components, we can map the existing and emerging techniques and
+   protocols into the framework.  The framework can also simplify
+   designing, maintaining, and understanding a network telemetry system.
+   In addition, we outline the evolution stages of the network telemetry
+   system and discuss the potential security concerns.
+
+   The purpose of the framework and taxonomy is to set a common ground
+   for the collection of related work and provide guidance for future
+   technique and standard developments.  To the best of our knowledge,
+   this document is the first such effort for network telemetry in
+   industry standards organizations.  This document does not define
+   specific technologies.
+
+1.1.  Applicability Statement
+
+   Large-scale network data collection is a major threat to user privacy
+   and may be indistinguishable from pervasive monitoring [RFC7258].
+   The network telemetry framework presented in this document must not
+   be applied to generating, exporting, collecting, analyzing, or
+   retaining individual user data or any data that can identify end
+   users or characterize their behavior without consent.  Based on this
+   principle, the network telemetry framework is not applicable to
+   networks whose endpoints represent individual users, such as general-
+   purpose access networks.
+
+1.2.  Glossary
+
+   Before further discussion, we list some key terminology and
+   abbreviations used in this document.  There is an intended
+   differentiation between the terms of network telemetry and OAM.
+   However, it should be understood that there is not a hard-line
+   distinction between the two concepts.  Rather, network telemetry is
+   considered an extension of OAM.  It covers all the existing OAM
+   protocols but puts more emphasis on the newer and emerging techniques
+   and protocols concerning all aspects of network data from acquisition
+   to consumption.
+
+   AI:         Artificial Intelligence.  In the network domain, AI
+               refers to machine-learning-based technologies for
+               automated network operation and other tasks.
+
+   AM:         Alternate Marking.  A flow performance measurement
+               method, as specified in [RFC8321].
+
+   BMP:        BGP Monitoring Protocol.  Specified in [RFC7854].
+
+   DPI:        Deep Packet Inspection.  Refers to the techniques that
+               examine packets beyond packet L3/L4 headers.
+
+   gNMI:       gRPC Network Management Interface.  A network management
+               protocol from the OpenConfig Operator Working Group,
+               mainly contributed by Google.  See [gnmi] for details.
+
+   GPB:        Google Protocol Buffer.  An extensible mechanism for
+               serializing structured data.  See [gpb] for details.
+
+   gRPC:       gRPC Remote Procedure Call.  An open-source high-
+               performance RPC framework that gNMI is based on.  See
+               [grpc] for details.
+
+   IPFIX:      IP Flow Information Export Protocol.  Specified in
+               [RFC7011].
+
+   IOAM:       In situ OAM [RFC9197].  A data plane on-path telemetry
+               technique.
+
+   JSON:       JavaScript Object Notation.  An open standard file format
+               and data interchange format that uses human-readable text
+               to store and transmit data objects, as specified in
+               [RFC8259].
+
+   MIB:        Management Information Base.  A database used for
+               managing the entities in a network.
+
+   NETCONF:    Network Configuration Protocol.  Specified in [RFC6241].
+
+   NetFlow:    A Cisco protocol used for flow record collecting, as
+               described in [RFC3954].
+
+   Network Telemetry:  The process and instrumentation for acquiring and
+               utilizing network data remotely for network monitoring
+               and operation.  A general term for a large set of network
+               visibility techniques and protocols, concerning aspects
+               like data generation, collection, correlation, and
+               consumption.  Network telemetry addresses current network
+               operation issues and enables smooth evolution toward
+               future intent-driven autonomous networks.
+
+   NMS:        Network Management System.  Refers to applications that
+               allow network administrators to manage a network.
+
+   OAM:        Operations, Administration, and Maintenance.  A group of
+               network management functions that provide network fault
+               indication, fault localization, performance information,
+               and data and diagnosis functions.  Most conventional
+               network monitoring techniques and protocols belong to
+               network OAM.
+
+   PBT:        Postcard-Based Telemetry.  A data plane on-path telemetry
+               technique.  A representative technique is described in
+               [IPPM-IOAM-DIRECT-EXPORT].
+
+   RESTCONF:   An HTTP-based protocol that provides a programmatic
+               interface for accessing data defined in YANG, using the
+               datastore concepts defined in NETCONF, as specified in
+               [RFC8040].
+
+   SMIv2:      Structure of Management Information Version 2.  Defines
+               MIB objects, as specified in [RFC2578].
+
+   SNMP:       Simple Network Management Protocol.  Versions 1, 2, and 3
+               are specified in [RFC1157], [RFC3416], and [RFC3411],
+               respectively.
+
+   XML:        Extensible Markup Language.  A markup language for data
+               encoding that is both human readable and machine
+               readable, as specified by W3C [W3C.REC-xml-20081126].
+
+   YANG:       YANG is a data modeling language for the definition of
+               data sent over network management protocols such as
+               NETCONF and RESTCONF.  YANG is defined in [RFC6020] and
+               [RFC7950].
+
+   YANG ECA:   A YANG model for Event-Condition-Action policies, as
+               defined in [NETMOD-ECA-POLICY].
+
+   YANG-Push:  A mechanism that allows subscriber applications to
+               request a stream of updates from a YANG datastore on a
+               network device.  Details are specified in [RFC8639] and
+               [RFC8641].
+
+2.  Background
+
+   The term "big data" is used to describe the extremely large volume of
+   data sets that can be analyzed computationally to reveal patterns,
+   trends, and associations.  Networks are undoubtedly a source of big
+   data because of their scale and the volume of network traffic they
+   forward.  When a network's endpoints do not represent individual
+   users (e.g., in industrial, data-center, and infrastructure
+   contexts), network operations can often benefit from large-scale data
+   collection without breaching user privacy.
+
+   Today, one can access advanced big data analytics capability through
+   a plethora of commercial and open-source platforms (e.g., Apache
+   Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine
+   learning).  Thanks to the advance of computing and storage
+   technologies, network big data analytics give network operators an
+   opportunity to gain network insights and move towards network
+   autonomy.  Some operators start to explore the application of
+   Artificial Intelligence (AI) to make sense of network data.  Software
+   tools can use the network data to detect and react on network faults,
+   anomalies, and policy violations, as well as predict future events.
+   In turn, the network policy updates for planning, intrusion
+   prevention, optimization, and self-healing may be applied.
+
+   It is conceivable that an autonomic network [RFC7575] is the logical
+   next step for network evolution following Software-Defined Networking
+   (SDN), which aims to reduce (or even eliminate) human labor, make
+   more efficient use of network resources, and provide better services
+   more aligned with customer requirements.  The IETF ANIMA Working
+   Group is dedicated to developing and maintaining protocols and
+   procedures for automated network management and control of
+   professionally managed networks.  The related technique of
+   Intent-Based Networking (IBN) [NMRG-IBN-CONCEPTS-DEFINITIONS]
+   requires network visibility and telemetry data in order to ensure
+   that the network is behaving as intended.
+
+   However, while the data processing capability is improved and
+   applications require more data to function better, the networks lag
+   behind in extracting and translating network data into useful and
+   actionable information in efficient ways.  The system bottleneck is
+   shifting from data consumption to data supply.  Both the number of
+   network nodes and the traffic bandwidth keep increasing at a fast
+   pace.  The network configuration and policy change at smaller time
+   slots than before.  More subtle events and fine-grained data through
+   all network planes need to be captured and exported in real time.  In
+   a nutshell, it is a challenge to get enough high-quality data out of
+   the network in a manner that is efficient, timely, and flexible.
+   Therefore, we need to survey the existing technologies and protocols
+   and identify any potential gaps.
+
+   In the remainder of this section, we first clarify the scope of
+   network data (i.e., telemetry data) relevant in this document.  Then,
+   we discuss several key use cases for network operations of today and
+   the future.  Next, we show why the current network OAM techniques and
+   protocols are insufficient for these use cases.  The discussion
+   underlines the need for new methods, techniques, and protocols, as
+   well as the extensions of existing ones, which we assign under the
+   umbrella term "Network Telemetry".
+
+2.1.  Telemetry Data Coverage
+
+   Any information that can be extracted from networks (including the
+   data plane, control plane, and management plane) and used to gain
+   visibility or as a basis for actions is considered telemetry data.
+   It includes statistics, event records and logs, snapshots of state,
+   configuration data, etc.  It also covers the outputs of any active
+   and passive measurements [RFC7799].  In some cases, raw data is
+   processed in network before being sent to a data consumer.  Such
+   processed data is also considered telemetry data.  The value of
+   telemetry data varies.  In some cases, if the cost is acceptable,
+   less but higher-quality data are preferred rather than a lot of low-
+   quality data.  A classification of telemetry data is provided in
+   Section 3.  To preserve the privacy of end users, no user packet
+   content should be collected.  Specifically, the data objects
+   generated, exported, and collected by a network telemetry application
+   should not include any packet payload from traffic associated with
+   end-user systems.
+
+2.2.  Use Cases
+
+   The following set of use cases is essential for network operations.
+   While the list is by no means exhaustive, it is enough to highlight
+   the requirements for data velocity, variety, volume, and veracity,
+   the attributes of big data, in networks.
+
+   *  Security: Network intrusion detection and prevention systems need
+      to monitor network traffic and activities and act upon anomalies.
+      Given increasingly sophisticated attack vectors coupled with
+      increasingly severe consequences of security breaches, new tools
+      and techniques need to be developed, relying on wider and deeper
+      visibility into networks.  The ultimate goal is to achieve
+      security with no, or only minimal, human intervention and without
+      disrupting legitimate traffic flows.
+
+   *  Policy and Intent Compliance: Network policies are the rules that
+      constrain the services for network access, provide service
+      differentiation, or enforce specific treatment on the traffic.
+      For example, a service function chain is a policy that requires
+      the selected flows to pass through a set of ordered network
+      functions.  Intent, as defined in [NMRG-IBN-CONCEPTS-DEFINITIONS],
+      is a set of operational goals that a network should meet and
+      outcomes that a network is supposed to deliver, defined in a
+      declarative manner without specifying how to achieve or implement
+      them.  An intent requires a complex translation and mapping
+      process before being applied on networks.  While a policy or
+      intent is enforced, the compliance needs to be verified and
+      monitored continuously by relying on visibility that is provided
+      through network telemetry data.  Any violation must be reported
+      immediately - this will alert the network administrator to the
+      policy or intent violation and will potentially result in updates
+      to how the policy or intent is applied in the network to ensure
+      that it remains in force.
+
+   *  SLA Compliance: A Service Level Agreement (SLA) is a service
+      contract between a service provider and a client, which includes
+      the metrics for the service measurement and remedy/penalty
+      procedures when the service level misses the agreement.  Users
+      need to check if they get the service as promised, and network
+      operators need to evaluate how they can deliver services that meet
+      the SLA based on real-time network telemetry data, including data
+      from network measurements.
+
+   *  Root Cause Analysis: Many network failures can be the effect of a
+      sequence of chained events.  Troubleshooting and recovery require
+      quick identification of the root cause of any observable issues.
+      However, the root cause is not always straightforward to identify,
+      especially when the failure is sporadic and the number of event
+      messages, both related and unrelated to the same cause, is
+      overwhelming.  While technologies such as machine learning can be
+      used for root cause analysis, it is up to the network to sense and
+      provide the relevant diagnostic data that are either actively fed
+      into or passively retrieved by the root cause analysis
+      applications.
+
+   *  Network Optimization: This covers all short-term and long-term
+      network optimization techniques, including load balancing, Traffic
+      Engineering (TE), and network planning.  Network operators are
+      motivated to optimize their network utilization and differentiate
+      services for better Return on Investment (ROI) or lower Capital
+      Expenditure (CAPEX).  The first step is to know the real-time
+      network conditions before applying policies for traffic
+      manipulation.  In some cases, microbursts need to be detected in a
+      very short time frame so that fine-grained traffic control can be
+      applied to avoid network congestion.  Long-term planning of
+      network capacity and topology requires analysis of real-world
+      network telemetry data that is obtained over long periods of time.
+
+   *  Event Tracking and Prediction: The visibility into traffic path
+      and performance is critical for services and applications that
+      rely on healthy network operation.  Numerous related network
+      events are of interest to network operators.  For example, network
+      operators want to learn where and why packets are dropped for an
+      application flow.  They also want to be warned of issues in
+      advance, so proactive actions can be taken to avoid catastrophic
+      consequences.
+
+2.3.  Challenges
+
+   For a long time, network operators have relied upon SNMP [RFC3416],
+   Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the
+   network.  Some other OAM techniques as described in [RFC7276] are
+   also used to facilitate network troubleshooting.  These conventional
+   techniques are not sufficient to support the above use cases for the
+   following reasons:
+
+   *  Most use cases need to continuously monitor the network and
+      dynamically refine the data collection in real time.  Poll-based
+      low-frequency data collection is ill-suited for these
+      applications.  Subscription-based streaming data directly pushed
+      from the data source (e.g., the forwarding chip) is preferred to
+      provide sufficient data quantity and precision at scale.
+
+   *  Comprehensive data is needed, ranging from packet processing
+      engines to traffic managers, line cards to main control boards,
+      user flows to control protocol packets, device configurations to
+      operations, and physical layers to application layers.
+      Conventional OAM only covers a narrow range of data (e.g., SNMP
+      only handles data from the Management Information Base (MIB)).
+      Classical network devices cannot provide all the necessary probes.
+      More open and programmable network devices are therefore needed.
+
+   *  Many application scenarios need to correlate network-wide data
+      from multiple sources (i.e., from distributed network devices,
+      different components of a network device, or different network
+      planes).  A piecemeal solution is often lacking the capability to
+      consolidate the data from multiple sources.  The composition of a
+      complete solution, as partly proposed by Autonomic Resource
+      Control Architecture (ARCA) [NMRG-ANTICIPATED-ADAPTATION], will be
+      empowered and guided by a comprehensive framework.
+
+   *  Some conventional OAM techniques (e.g., CLI and Syslog) lack a
+      formal data model.  The unstructured data hinder the tool
+      automation and application extensibility.  Standardized data
+      models are essential to support the programmable networks.
+
+   *  Although some conventional OAM techniques support data push (e.g.,
+      SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the
+      pushed data are limited to only predefined management plane
+      warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow).
+      Network operators require the data with arbitrary source,
+      granularity, and precision, which is beyond the capability of the
+      existing techniques.
+
+   *  Conventional passive measurement techniques can either consume
+      excessive network resources and produce excessive redundant data
+      or lead to inaccurate results; on the other hand, conventional
+      active measurement techniques can interfere with the user traffic,
+      and their results are indirect.  Techniques that can collect
+      direct and on-demand data from user traffic are more favorable.
+
+   These challenges were addressed by newer standards and techniques
+   (e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push),
+   and more are emerging.  These standards and techniques need to be
+   recognized and accommodated in a new framework.
+
+2.4.  Network Telemetry
+
+   Network telemetry has emerged as a mainstream technical term to refer
+   to the network data collection and consumption techniques.  Several
+   network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and
+   gRPC [grpc]) have been widely deployed.  Network telemetry allows
+   separate entities to acquire data from network devices so that data
+   can be visualized and analyzed to support network monitoring and
+   operation.  Network telemetry covers the conventional network OAM and
+   has a wider scope.  For instance, it is expected that network
+   telemetry can provide the necessary network insight for autonomous
+   networks and address the shortcomings of conventional OAM techniques.
+
+   Network telemetry usually assumes machines as data consumers rather
+   than human operators.  Hence, network telemetry can directly trigger
+   the automated network operation, while in contrast, some conventional
+   OAM tools were designed and used to help human operators to monitor
+   and diagnose the networks and guide manual network operations.  Such
+   a proposition leads to very different techniques.
+
+   Although new network telemetry techniques are emerging and subject to
+   continuous evolution, several characteristics of network telemetry
+   have been well accepted.  Note that network telemetry is intended to
+   be an umbrella term covering a wide spectrum of techniques, so the
+   following characteristics are not expected to be held by every
+   specific technique.
+
+   *  Push and Streaming: Instead of polling data from network devices,
+      telemetry collectors subscribe to streaming data pushed from data
+      sources in network devices.
+
+   *  Volume and Velocity: Telemetry data is intended to be consumed by
+      machines rather than by human beings.  Therefore, the data volume
+      can be huge, and the processing is optimized for the needs of
+      automation in real time.
+
+   *  Normalization and Unification: Telemetry aims to address the
+      overall network automation needs.  Efforts are made to normalize
+      the data representation and unify the protocols, so as to simplify
+      data analysis and provide integrated analysis across heterogeneous
+      devices and data sources across a network.
+
+   *  Model-Based: Telemetry data is modeled in advance, which allows
+      applications to configure and consume data with ease.
+
+   *  Data Fusion: The data for a single application can come from
+      multiple data sources (e.g., cross-domain, cross-device, and
+      cross-layer) that are based on a common name/ID and need to be
+      correlated to take effect.
+
+   *  Dynamic and Interactive: Since the network telemetry means to be
+      used in a closed control loop for network automation, it needs to
+      run continuously and adapt to the dynamic and interactive queries
+      from the network operation controller.
+
+   In addition, an ideal network telemetry solution may also have the
+   following features or properties:
+
+   *  In-Network Customization: The data that is generated can be
+      customized in network at runtime to cater to the specific need of
+      applications.  This needs the support of a programmable data
+      plane, which allows probes with custom functions to be deployed at
+      flexible locations.
+
+   *  In-Network Data Aggregation and Correlation: Network devices and
+      aggregation points can work out which events and what data needs
+      to be stored, reported, or discarded, thus reducing the load on
+      the central collection and processing points while still ensuring
+      that the right information is ready to be processed in a timely
+      way.
+
+   *  In-Network Processing: Sometimes it is not necessary or feasible
+      to gather all information to a central point to be processed and
+      acted upon.  It is possible for the data processing to be done in
+      network, allowing reactive actions to be taken locally.
+
+   *  Direct Data Plane Export: The data originated from data plane
+      forwarding chips can be directly exported to the data consumer for
+      efficiency, especially when the data bandwidth is large and real-
+      time processing is required.
+
+   *  In-Band Data Collection: In addition to the passive and active
+      data collection approaches, the new hybrid approach allows to
+      directly collect data for any target flow on its entire forwarding
+      path [OPSAWG-IFIT-FRAMEWORK].
+
+   It is worth noting that a network telemetry system should not be
+   intrusive to normal network operations by avoiding the pitfall of the
+   "observer effect".  That is, it should not change the network
+   behavior and affect the forwarding performance.  Moreover, high-
+   volume telemetry traffic may cause network congestion unless proper
+   isolation or traffic engineering techniques are in place, or
+   congestion control mechanisms ensure that telemetry traffic backs off
+   if it exceeds the network capacity.  [RFC8084] and [RFC8085] are
+   relevant Best Current Practices (BCPs) in this space.
+
+   Although in many cases a system for network telemetry involves a
+   remote data collecting and consuming entity, it is important to
+   understand that there are no inherent assumptions about how a system
+   should be architected.  While a network architecture with a
+   centralized controller (e.g., SDN) seems to be a natural fit for
+   network telemetry, network telemetry can work in distributed fashions
+   as well.  For example, telemetry data producers and consumers can
+   have a peer-to-peer relationship, in which a network node can be the
+   direct consumer of telemetry data from other nodes.
+
+2.5.  The Necessity of a Network Telemetry Framework
+
+   Network data analytics (e.g., machine learning) is applied for
+   network operation automation, relying on abundant and coherent data
+   from networks.  Data acquisition that is limited to a single source
+   and static in nature will in many cases not be sufficient to meet an
+   application's telemetry data needs.  As a result, multiple data
+   sources, involving a variety of techniques and standards, will need
+   to be integrated.  It is desirable to have a framework that
+   classifies and organizes different telemetry data sources and types,
+   defines different components of a network telemetry system and their
+   interactions, and helps coordinate and integrate multiple telemetry
+   approaches across layers.  This allows flexible combinations of data
+   for different applications, while normalizing and simplifying
+   interfaces.  In detail, such a framework would benefit the
+   development of network operation applications for the following
+   reasons:
+
+   *  Future networks, autonomous or otherwise, depend on holistic and
+      comprehensive network visibility.  Use cases and applications are
+      better when supported uniformly and coherently using an
+      integrated, converged mechanism and common telemetry data
+      representations wherever feasible.  Therefore, the protocols and
+      mechanisms should be consolidated into a minimum yet comprehensive
+      set.  A telemetry framework can help to normalize the technique
+      developments.
+
+   *  Network visibility presents multiple viewpoints.  For example, the
+      device viewpoint takes the network infrastructure as the
+      monitoring object from which the network topology and device
+      status can be acquired, and the traffic viewpoint takes the flows
+      or packets as the monitoring object from which the traffic quality
+      and path can be acquired.  An application may need to switch its
+      viewpoint during operation.  It may also need to correlate a
+      service and its impact on user experience (UE) to acquire the
+      comprehensive information.
+
+   *  Applications require network telemetry to be elastic in order to
+      make efficient use of network resources and reduce the impact of
+      processing related to network telemetry on network performance.
+      For example, routine network monitoring should cover the entire
+      network with a low data sampling rate.  Only when issues arise or
+      critical trends emerge should telemetry data sources be modified
+      and telemetry data rates be boosted as needed.
+
+   *  Efficient data aggregation is critical for applications to reduce
+      the overall quantity of data and improve the accuracy of analysis.
+
+   A telemetry framework collects all the telemetry-related works from
+   different sources and working groups within the IETF.  This makes it
+   possible to assemble a comprehensive network telemetry system and to
+   avoid repetitious or redundant work.  The framework should cover the
+   concepts and components from the standardization perspective.  This
+   document describes the modules that make up a network telemetry
+   framework and decomposes the telemetry system into a set of distinct
+   components that existing and future work can easily map to.
+
+3.  Network Telemetry Framework
+
+   The top-level network telemetry framework partitions the network
+   telemetry into four modules based on the telemetry data object source
+   and represents their relationship.  Once the network operation
+   applications acquire the data from these modules, they can apply data
+   analytics and take actions.  At the next level, the framework
+   decomposes each module into separate components.  Each of these
+   modules follows the same underlying structure, with one component
+   dedicated to the configuration of data subscriptions and data
+   sources, a second component dedicated to encoding and exporting data,
+   and a third component instrumenting the generation of telemetry
+   related to the underlying resources.  Throughout the framework, the
+   same set of abstract data-acquiring mechanisms and data types
+   (Section 3.3) are applied.  The two-level architecture with the
+   uniform data abstraction helps accurately pinpoint a protocol or
+   technique to its position in a network telemetry system or
+   disaggregates a network telemetry system into manageable parts.
+
+3.1.  Top-Level Modules
+
+   Telemetry can be applied on the forwarding plane, control plane, and
+   management plane in a network, as well as on other sources out of the
+   network, as shown in Figure 1.  Therefore, we categorize the network
+   telemetry into four distinct modules (management plane, control
+   plane, forwarding plane, and external data and event telemetry) with
+   each having its own interface to network operation applications.
+
+                   +------------------------------+
+                   |                              |
+                   |       Network Operation      |<-------+
+                   |          Applications        |        |
+                   |                              |        |
+                   +------------------------------+        |
+                           ^          ^       ^            |
+                           |          |       |            |
+                           V          V       |            V
+                   +--------------+-----------|---+  +-----------+
+                   |              | Control   |   |  |           |
+                   |              | Plane     |   |  | External  |
+                   |            <--->         |   |  | Data and  |
+                   |              | Telemetry |   |  | Event     |
+                   |  Management  |       ^   V   |  | Telemetry |
+                   |  Plane       +-------|-------+  |           |
+                   |  Telemetry   |       V       |  +-----------+
+                   |              | Forwarding    |
+                   |              | Plane         |
+                   |            <--->             |
+                   |              | Telemetry     |
+                   |              |               |
+                   +--------------+---------------+
+
+        Figure 1: Modules in Layer Category of the Network Telemetry
+                                 Framework
+
+   The rationale of this partition lies in the different telemetry data
+   objects that result in different data sources and export locations.
+   Such differences have profound implications on in-network data
+   programming and processing capability, data encoding and the
+   transport protocol, and required data bandwidth and latency.  Data
+   can be sent directly or proxied via the control and management
+   planes.  There are advantages/disadvantages to both approaches.
+
+   Note that in some cases, the network controller itself may be the
+   source of telemetry data that is unique to it or derived from the
+   telemetry data collected from the network elements.  Some of the
+   principles and taxonomy specific to the control plane and management
+   plane telemetry could also be applied to the controller when it is
+   required to provide the telemetry data to network operation
+   applications hosted outside.  The scope of this document is focused
+   on the network elements telemetry, and further details related to
+   controllers are thus out of scope.
+
+   We summarize the major differences of the four modules in Table 1.
+   They are compared from six angles:
+
+   *  Data Object
+
+   *  Data Export Location
+
+   *  Data Model
+
+   *  Data Encoding
+
+   *  Telemetry Application Protocol
+
+   *  Data Transport Method
+
+   Data Object is the target and source of each module.  Because the
+   data source varies, the location where data is mostly conveniently
+   exported also varies.  For example, forwarding plane data mainly
+   originates as data exported from the forwarding Application-Specific
+   Integrated Circuits (ASICs), while control plane data mainly
+   originates from the protocol daemons running on the control CPU(s).
+   For convenience and efficiency, it is preferred to export the data
+   off the device from locations near the source.  Because the locations
+   that can export data have different capabilities, different choices
+   of data models, encoding, and transport methods are made to balance
+   the performance and cost.  For example, the forwarding chip has high
+   throughput but limited capacity for processing complex data and
+   maintaining state, while the main control CPU is capable of complex
+   data and state processing but has limited bandwidth for high
+   throughput data.  As a result, the suitable telemetry protocol for
+   each module can be different.  Some representative techniques are
+   shown in the corresponding table blocks to highlight the technical
+   diversity of these modules.  Note that the selected techniques just
+   reflect the de facto state of the art and are by no means exhaustive
+   (e.g., IPFIX can also be implemented over TCP and SCTP, but that is
+   not recommended for the forwarding plane).  The key point is that one
+   cannot expect to use a universal protocol to cover all the network
+   telemetry requirements.
+
+   +=============+===============+==========+==========+===============+
+   |Module       |Management     |Control   |Forwarding|External Data  |
+   |             |Plane          |Plane     |Plane     |               |
+   +=============+===============+==========+==========+===============+
+   |Object       |configuration  |control   |flow and  |terminal,      |
+   |             |and operation  |protocol  |packet    |social, and    |
+   |             |state          |and       |QoS,      |environmental  |
+   |             |               |signaling,|traffic   |               |
+   |             |               |RIB       |stat.,    |               |
+   |             |               |          |buffer and|               |
+   |             |               |          |queue     |               |
+   |             |               |          |stat.,    |               |
+   |             |               |          |FIB,      |               |
+   |             |               |          |Access    |               |
+   |             |               |          |Control   |               |
+   |             |               |          |List (ACL)|               |
+   +-------------+---------------+----------+----------+---------------+
+   |Export       |main control   |main      |forwarding|various        |
+   |Location     |CPU            |control   |chip or   |               |
+   |             |               |CPU,      |linecard  |               |
+   |             |               |linecard  |CPU; main |               |
+   |             |               |CPU, or   |control   |               |
+   |             |               |forwarding|CPU       |               |
+   |             |               |chip      |unlikely  |               |
+   +-------------+---------------+----------+----------+---------------+
+   |Data Model   |YANG, MIB,     |YANG,     |YANG,     |YANG, custom   |
+   |             |syslog         |custom    |custom    |               |
+   +-------------+---------------+----------+----------+---------------+
+   |Data Encoding|GPB, JSON, XML |GPB, JSON,|plain text|GPB, JSON, XML,|
+   |             |               |XML, plain|          |plain text     |
+   |             |               |text      |          |               |
+   +-------------+---------------+----------+----------+---------------+
+   |Application  |gRPC, NETCONF, |gRPC,     |IPFIX,    |gRPC           |
+   |Protocol     |RESTCONF       |NETCONF,  |traffic   |               |
+   |             |               |IPFIX,    |mirroring,|               |
+   |             |               |traffic   |gRPC,     |               |
+   |             |               |mirroring |NETFLOW   |               |
+   +-------------+---------------+----------+----------+---------------+
+   |Data         |HTTP(S), TCP   |HTTP(S),  |UDP       |HTTP(S), TCP,  |
+   |Transport    |               |TCP, UDP  |          |UDP            |
+   +-------------+---------------+----------+----------+---------------+
+
+                 Table 1: Comparison of Data Object Modules
+
+   Note that the interaction with the applications that consume network
+   telemetry data can be indirect.  Some in-device data transfer is
+   possible.  For example, in the management plane telemetry, the
+   management plane will need to acquire data from the data plane.  Some
+   operational states can only be derived from data plane data sources
+   such as the interface status and statistics.  As another example,
+   obtaining control plane telemetry data may require the ability to
+   access the Forwarding Information Base (FIB) of the data plane.
+
+   On the other hand, an application may involve more than one plane and
+   interact with multiple planes simultaneously.  For example, an SLA
+   compliance application may require both the data plane telemetry and
+   the control plane telemetry.
+
+   The requirements and challenges for each module are summarized as
+   follows (note that the requirements may pertain across all telemetry
+   modules; however, we emphasize those that are most pronounced for a
+   particular plane).
+
+3.1.1.  Management Plane Telemetry
+
+   The management plane of network elements interacts with the Network
+   Management System (NMS) and provides information such as performance
+   data, network logging data, network warning and defects data, and
+   network statistics and state data.  The management plane includes
+   many protocols, including the classical SNMP and syslog.  Regardless
+   the protocol, management plane telemetry must address the following
+   requirements:
+
+   *  Convenient Data Subscription: An application should have the
+      freedom to choose which data is exported (see Section 3.3) and the
+      means and frequency of how that data is exported (e.g., on-change
+      or periodic subscription).
+
+   *  Structured Data: For automatic network operation, machines will
+      replace humans for network data comprehension.  Data modeling
+      languages, such as YANG, can efficiently describe structured data
+      and normalize data encoding and transformation.
+
+   *  High-Speed Data Transport: In order to keep up with the velocity
+      of information, a data source needs to be able to send large
+      amounts of data at high frequency.  Compact encoding formats or
+      data compression schemes are needed to reduce the quantity of data
+      and improve the data transport efficiency.  The subscription mode,
+      by replacing the query mode, reduces the interactions between
+      clients and servers and helps to improve the data source's
+      efficiency.
+
+   *  Network Congestion Avoidance: The application must protect the
+      network from congestion with congestion control mechanisms or, at
+      minimum, with circuit breakers.  [RFC8084] and [RFC8085] provide
+      some solutions in this space.
+
+3.1.2.  Control Plane Telemetry
+
+   The control plane telemetry refers to the health condition monitoring
+   of different network control protocols at all layers of the protocol
+   stack.  Keeping track of the operational status of these protocols is
+   beneficial for detecting, localizing, and even predicting various
+   network issues, as well as for network optimization, in real time and
+   with fine granularity.  Some particular challenges and issues faced
+   by the control plane telemetry are as follows:
+
+   *  How to correlate the End-to-End (E2E) Key Performance Indicators
+      (KPIs) to a specific layer's KPIs.  For example, IPTV users may
+      describe their UE by the video smoothness and definition.  Then in
+      case of an unusually poor UE KPI or a service disconnection, it is
+      non-trivial to delimit and pinpoint the issue in the responsible
+      protocol layer (e.g., the transport layer or the network layer),
+      the responsible protocol (e.g., IS-IS or BGP at the network
+      layer), and finally the responsible device(s) with specific
+      reasons.
+
+   *  Conventional OAM-based approaches for control plane KPI
+      measurement, which include Ping (L3), Traceroute (L3), Y.1731
+      [y1731] (L2), and so on.  One common issue behind these methods is
+      that they only measure the KPIs instead of reflecting the actual
+      running status of these protocols, making them less effective or
+      efficient for control plane troubleshooting and network
+      optimization.
+
+   *  How more research is needed for the BGP monitoring protocol (BMP).
+      BMP is an example of the control plane telemetry; it is currently
+      used for monitoring BGP routes and enables rich applications, such
+      as BGP peer analysis, Autonomous System (AS) analysis, prefix
+      analysis, and security analysis.  However, the monitoring of other
+      layers, protocols, and the cross-layer, cross-protocol KPI
+      correlations are still in their infancy (e.g., IGP monitoring is
+      not as extensive as BMP), which requires further research.
+
+   Note that the requirement and solutions for network congestion
+   avoidance are also applicable to the control plane telemetry.
+
+3.1.3.  Forwarding Plane Telemetry
+
+   An effective forwarding plane telemetry system relies on the data
+   that the network device can expose.  The quality, quantity, and
+   timeliness of data must meet some stringent requirements.  This
+   raises some challenges for the network data plane devices where the
+   first-hand data originates.
+
+   *  A data plane device's main function is user traffic processing and
+      forwarding.  While supporting network visibility is important, the
+      telemetry is just an auxiliary function, and it should strive to
+      not impede normal traffic processing and forwarding (i.e., the
+      forwarding behavior should not be altered, and the trade-off
+      between forwarding performance and telemetry should be well-
+      balanced).
+
+   *  Network operation applications require end-to-end visibility
+      across various sources, which can result in a huge volume of data.
+      However, the sheer quantity of data must not exhaust the network
+      bandwidth, regardless of the data delivery approach (i.e., whether
+      through in-band or out-of-band channels).
+
+   *  The data plane devices must provide timely data with the minimum
+      possible delay.  Long processing, transport, storage, and analysis
+      delay can impact the effectiveness of the control loop and even
+      render the data useless.
+
+   *  The data should be structured, labeled, and easy for applications
+      to parse and consume.  At the same time, the data types needed by
+      applications can vary significantly.  The data plane devices need
+      to provide enough flexibility and programmability to support the
+      precise data provision for applications.
+
+   *  The data plane telemetry should support incremental deployment and
+      work even though some devices are unaware of the system.
+
+   *  The requirement and solutions for network congestion avoidance are
+      also applicable to the forwarding plane telemetry.
+
+   Although not specific to the forwarding plane, these challenges are
+   more difficult for the forwarding plane because of the limited
+   resources and flexibility.  Data plane programmability is essential
+   to support network telemetry.  Newer data plane forwarding chips are
+   equipped with advanced telemetry features and provide flexibility to
+   support customized telemetry functions.
+
+   Technique Taxonomy: This pertains to how one instruments the
+   telemetry; there can be multiple possible dimensions to classify the
+   forwarding plane telemetry techniques.
+
+   *  Active, Passive, and Hybrid: This dimension pertains to the end-
+      to-end measurement.  Active and passive methods (as well as the
+      hybrid types) are well documented in [RFC7799].  Passive methods
+      include TCPDUMP, IPFIX [RFC7011], sFlow, and traffic mirroring.
+      These methods usually have low data coverage.  The bandwidth cost
+      is very high in order to improve the data coverage.  On the other
+      hand, active methods include Ping, the One-Way Active Measurement
+      Protocol (OWAMP) [RFC4656], the Two-Way Active Measurement
+      Protocol (TWAMP) [RFC5357], the Simple Two-way Active Measurement
+      Protocol (STAMP) [RFC8762], and Cisco's SLA Protocol [RFC6812].
+      These methods are intrusive and only provide indirect network
+      measurements.  Hybrid methods, including IOAM [RFC9197], Alternate
+      Marking (AM) [RFC8321], and Multipoint Alternate Marking
+      [RFC8889], provide a well-balanced and more flexible approach.
+      However, these methods are also more complex to implement.
+
+   *  In-Band and Out-of-Band: Telemetry data carried in user packets
+      before being exported to a data collector is considered in-band
+      (e.g., IOAM [RFC9197]).  Telemetry data that is directly exported
+      to a data collector without modifying user packets is considered
+      out-of-band (e.g., the postcard-based approach described in
+      Appendix A.3.5).  It is also possible to have hybrid methods,
+      where only the telemetry instruction or partial data is carried by
+      user packets (e.g., AM [RFC8321]).
+
+   *  End-to-End and In-Network: End-to-end methods start from, and end
+      at, the network end hosts (e.g., Ping).  In-network methods work
+      in networks and are transparent to end hosts.  However, if needed,
+      in-network methods can be easily extended into end hosts.
+
+   *  Data Subject: Depending on the telemetry objective, the methods
+      can be flow based (e.g., IOAM [RFC9197]), path based (e.g.,
+      Traceroute), and node based (e.g., IPFIX [RFC7011]).  The various
+      data objects can be packet, flow record, measurement, states, and
+      signal.
+
+3.1.4.  External Data Telemetry
+
+   Events that occur outside the boundaries of the network system are
+   another important source of network telemetry.  Correlating both
+   internal telemetry data and external events with the requirements of
+   network systems, as presented in [NMRG-ANTICIPATED-ADAPTATION],
+   provides a strategic and functional advantage to management
+   operations.
+
+   As with other sources of telemetry information, the data and events
+   must meet strict requirements, especially in terms of timeliness,
+   which is essential to properly incorporate external event information
+   into network management applications.  The specific challenges are
+   described as follows:
+
+   *  The role of the external event detector can be played by multiple
+      elements, including hardware (e.g., physical sensors, such as
+      seismometers) and software (e.g., big data sources that can
+      analyze streams of information, such as Twitter messages).  Thus,
+      the transmitted data must support different shapes but, at the
+      same time, follow a common but extensible schema.
+
+   *  Since the main function of the external event detectors is to
+      perform the notifications, their timeliness is assumed.  However,
+      once messages have been dispatched, they must be quickly collected
+      and inserted into the control plane with variable priority, which
+      is higher for important sources and events and lower for secondary
+      ones.
+
+   *  The schema used by external detectors must be easily adopted by
+      current and future devices and applications.  Therefore, it must
+      be easily mapped to current data models, such as in terms of YANG.
+
+   *  As the communication with external entities outside the boundary
+      of a provider network may be realized over the Internet, the risk
+      of congestion is even more relevant in this context and proper
+      countermeasures must be taken.  Solutions such as network
+      transport circuit breakers are needed as well.
+
+   Organizing both internal and external telemetry information together
+   will be key for the general exploitation of the management
+   possibilities of current and future network systems, as reflected in
+   the incorporation of cognitive capabilities to new hardware and
+   software (virtual) elements.
+
+3.2.  Second-Level Function Components
+
+   The telemetry module at each plane can be further partitioned into
+   five distinct conceptual components:
+
+   *  Data Query, Analysis, and Storage: This component works at the
+      network operation application block in Figure 1.  It is normally a
+      part of the network management system at the receiver side.  On
+      one hand, it is responsible for issuing data requirements.  The
+      data of interest can be modeled data through configuration or
+      custom data through programming.  The data requirements can be
+      queries for one-shot data or subscriptions for events or streaming
+      data.  On the other hand, it receives, stores, and processes the
+      returned data from network devices.  Data analysis can be
+      interactive to initiate further data queries.  This component can
+      reside in either network devices or remote controllers.  It can be
+      centralized and distributed and involve one or more instances.
+
+   *  Data Configuration and Subscription: This component manages data
+      queries on devices.  It determines the protocol and channel for
+      applications to acquire desired data.  This component is also
+      responsible for configuring the desired data that might not be
+      directly available from data sources.  The subscription data can
+      be described by models, templates, or programs.
+
+   *  Data Encoding and Export: This component determines how telemetry
+      data is delivered to the data analysis and storage component with
+      access control.  The data encoding and the transport protocol may
+      vary due to the data export location.
+
+   *  Data Generation and Processing: The requested data needs to be
+      captured, filtered, processed, and formatted in network devices
+      from raw data sources.  This may involve in-network computing and
+      processing on either the fast path or the slow path in network
+      devices.
+
+   *  Data Object and Source: This component determines the monitoring
+      objects and original data sources provisioned in the device.  A
+      data source usually just provides raw data that needs further
+      processing.  Each data source can be considered a probe.  Some
+      data sources can be dynamically installed, while others will be
+      more static.
+
+                     +----------------------------------------+
+                   +----------------------------------------+ |
+                   |                                        | |
+                   |    Data Query, Analysis, & Storage     | |
+                   |                                        | +
+                   +-------+++ -----------------------------+
+                           |||                   ^^^
+                           |||                   |||
+                           ||V                   |||
+                        +--+V--------------------+++------------+
+                     +-----V---------------------+------------+ |
+                   +---------------------+-------+----------+ | |
+                   | Data Configuration  |                  | | |
+                   | & Subscription      | Data Encoding    | | |
+                   | (model, template,   | & Export         | | |
+                   |  & program)         |                  | | |
+                   +---------------------+------------------| | |
+                   |                                        | | |
+                   |           Data Generation              | | |
+                   |           & Processing                 | | |
+                   |                                        | | |
+                   +----------------------------------------| | |
+                   |                                        | | |
+                   |       Data Object and Source           | |-+
+                   |                                        |-+
+                   +----------------------------------------+
+
+          Figure 2: Components in the Network Telemetry Framework
+
+3.3.  Data Acquisition Mechanism and Type Abstraction
+
+   Broadly speaking, network data can be acquired through subscription
+   (push) and query (poll).  A subscription is a contract between
+   publisher and subscriber.  After initial setup, the subscribed data
+   is automatically delivered to registered subscribers until the
+   subscription expires.  There are two variations of subscription.  The
+   subscriptions can be predefined, or the subscribers are allowed to
+   configure and tailor the published data to their specific needs.
+
+   In contrast, queries are used when a client expects immediate and
+   one-off feedback from network devices.  The queried data may be
+   directly extracted from some specific data source or synthesized and
+   processed from raw data.  Queries work well for interactive network
+   telemetry applications.
+
+   In general, data can be pulled (i.e., queried) whenever needed, but
+   in many cases, pushing the data (i.e., subscription) is more
+   efficient, and it can reduce the latency of a client detecting a
+   change.  From the data consumer point of view, there are four types
+   of data from network devices that a telemetry data consumer can
+   subscribe or query:
+
+   *  Simple Data: Data that are steadily available from some datastore
+      or static probes in network devices.
+
+   *  Derived Data: Data that need to be synthesized or processed in the
+      network from raw data from one or more network devices.  The data
+      processing function can be statically or dynamically loaded into
+      network devices.
+
+   *  Event-triggered Data: Data that are conditionally acquired based
+      on the occurrence of some events.  An example of event-triggered
+      data could be an interface changing operational state between up
+      and down.  Such data can be actively pushed through subscription
+      or passively polled through query.  There are many ways to model
+      events, including using Finite State Machine (FSM) or Event
+      Condition Action (ECA) [NETMOD-ECA-POLICY].
+
+   *  Streaming Data: Data that are continuously generated.  It can be a
+      time series or the dump of databases.  For example, an interface
+      packet counter is exported every second.  The streaming data
+      reflect real-time network states and metrics and require large
+      bandwidth and processing power.  The streaming data are always
+      actively pushed to the subscribers.
+
+   The above telemetry data types are not mutually exclusive.  Rather,
+   they are often composite.  Derived data is composed of simple data;
+   event-triggered data can be simple or derived; and streaming data can
+   be based on some recurring event.  The relationships of these data
+   types are illustrated in Figure 3.
+
+      +----------------------+     +-----------------+
+      | Event-Triggered Data |<----+ Streaming Data  |
+      +-------+---+----------+     +-----+---+-------+
+              |   |                      |   |
+              |   |                      |   |
+              |   |   +--------------+   |   |
+              |   +-->| Derived Data |<--+   |
+              |       +------+------ +       |
+              |              |               |
+              |              V               |
+              |       +--------------+       |
+              +------>| Simple Data  |<------+
+                      +--------------+
+
+                      Figure 3: Data Type Relationship
+
+   Subscription usually deals with event-triggered data and streaming
+   data, and query usually deals with simple data and derived data.  But
+   the other ways are also possible.  Advanced network telemetry
+   techniques are designed mainly for event-triggered or streaming data
+   subscription and derived data query.
+
+3.4.  Mapping Existing Mechanisms into the Framework
+
+   The following table shows how the existing mechanisms (mainly
+   published in IETF and with the emphasis on the latest new
+   technologies) are positioned in the framework.  Given the vast body
+   of existing work, we cannot provide an exhaustive list, so the
+   mechanisms in the tables should be considered as just examples.
+   Also, some comprehensive protocols and techniques may cover multiple
+   aspects or modules of the framework, so a name in a block only
+   emphasizes one particular characteristic of it.  More details about
+   some listed mechanisms can be found in Appendix A.
+
+     +===============+=================+================+============+
+     |               | Management      | Control Plane  | Forwarding |
+     |               | Plane           |                | Plane      |
+     +===============+=================+================+============+
+     | data          | gNMI, NETCONF,  | gNMI, NETCONF, | NETCONF,   |
+     | configuration | RESTCONF, SNMP, | RESTCONF,      | RESTCONF,  |
+     | and subscribe | YANG-Push       | YANG-Push      | YANG-Push  |
+     +---------------+-----------------+----------------+------------+
+     | data          | MIB, YANG       | YANG           | IOAM,      |
+     | generation    |                 |                | PSAMP,     |
+     | and process   |                 |                | PBT, AM    |
+     +---------------+-----------------+----------------+------------+
+     | data encoding | gRPC, HTTP, TCP | BMP, TCP       | IPFIX, UDP |
+     | and export    |                 |                |            |
+     +---------------+-----------------+----------------+------------+
+
+                       Table 2: Existing Work Mapping
+
+   Although the framework is generally suitable for any network
+   environments, the multi-domain telemetry has some unique challenges
+   that deserve further architectural consideration, which is out of the
+   scope of this document.
+
+4.  Evolution of Network Telemetry Applications
+
+   Network telemetry is an evolving technical area.  As the network
+   moves towards the automated operation, network telemetry applications
+   undergo several stages of evolution, which add a new layer of
+   requirements to the underlying network telemetry techniques.  Each
+   stage is built upon the techniques adopted by the previous stages
+   plus some new requirements.
+
+   Stage 0 - Static Telemetry:  The telemetry data source and type are
+      determined at design time.  The network operator can only
+      configure how to use it with limited flexibility.
+
+   Stage 1 - Dynamic Telemetry:  The custom telemetry data can be
+      dynamically programmed or configured at runtime without
+      interrupting the network operation, allowing a trade-off among
+      resource, performance, flexibility, and coverage.
+
+   Stage 2 - Interactive Telemetry:  The network operator can
+      continuously customize and fine tune the telemetry data in real
+      time to reflect the network operation's visibility requirements.
+      Compared with Stage 1, the changes are frequent based on the real-
+      time feedback.  At this stage, some tasks can be automated, but
+      human operators still need to sit in the middle to make decisions.
+
+   Stage 3 - Closed-Loop Telemetry:  The telemetry is free from the
+      interference of human operators, except for generating the
+      reports.  The intelligent network operation engine automatically
+      issues the telemetry data requests, analyzes the data, and updates
+      the network operations in closed control loops.
+
+   Existing technologies are ready for Stages 0 and 1.  Individual
+   applications for Stages 2 and 3 are also possible now.  However, the
+   future autonomic networks may need a comprehensive operation
+   management system that works at Stages 2 and 3 to cover all the
+   network operation tasks.  A well-defined network telemetry framework
+   is the first step towards this direction.
+
+5.  Security Considerations
+
+   The complexity of network telemetry raises significant security
+   implications.  For example, telemetry data can be manipulated to
+   exhaust various network resources at each plane as well as the data
+   consumer; falsified or tampered data can mislead the decision-making
+   process and paralyze networks; and wrong configuration and
+   programming for telemetry is equally harmful.  The telemetry data is
+   highly sensitive, which exposes a lot of information about the
+   network and its configuration.  Some of that information can make
+   designing attacks against the network much easier (e.g., exact
+   details of what software and patches have been installed) and allows
+   an attacker to determine whether a device may be subject to
+   unprotected security vulnerabilities.
+
+   Given that this document has proposed a framework for network
+   telemetry and the telemetry mechanisms discussed are more extensive
+   (in both message frequency and traffic amount) than the conventional
+   network OAM concepts, we must also anticipate that new security
+   considerations that may also arise.  A number of techniques already
+   exist for securing the forwarding plane, control plane, and
+   management plane in a network, but it is important to consider if any
+   new threat vectors are now being enabled via the use of network
+   telemetry procedures and mechanisms.
+
+   This document proposes a conceptual architectural for collecting,
+   transporting, and analyzing a wide variety of data sources in support
+   of network applications.  The protocols, data formats, and
+   configurations chosen to implement this framework will dictate the
+   specific security considerations.  These considerations may include:
+
+   *  Telemetry framework trust and policy models;
+
+   *  Role management and access control for enabling and disabling
+      telemetry capabilities;
+
+   *  Protocol transport used for telemetry data and its inherent
+      security capabilities;
+
+   *  Telemetry data stores, storage encryption, methods of access, and
+      retention practices;
+
+   *  Tracking telemetry events and any abnormalities that might
+      identify malicious attacks using telemetry interfaces.
+
+   *  Authentication and integrity protection of telemetry data to make
+      data more trustworthy; and
+
+   *  Segregating the telemetry data traffic from the data traffic
+      carried over the network (e.g., historically management access and
+      management data may be carried via an independent management
+      network).
+
+   Some security considerations highlighted above may be minimized or
+   negated with policy management of network telemetry.  In a network
+   telemetry deployment, it would be advantageous to separate telemetry
+   capabilities into different classes of policies, i.e., Role-Based
+   Access Control and Event-Condition-Action policies.  Also, potential
+   conflicts between network telemetry mechanisms must be detected
+   accurately and resolved quickly to avoid unnecessary network
+   telemetry traffic propagation escalating into an unintended or
+   intended denial-of-service attack.
+
+   Further study of the security issues will be required, and it is
+   expected that the security mechanisms and protocols are developed and
+   deployed along with a network telemetry system.
+
+6.  IANA Considerations
+
+   This document has no IANA actions.
+
+7.  Informative References
+
+   [gnmi]     Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack,
+              C., and C. Marrow, "gRPC Network Management Interface",
+              IETF 98, March 2017,
+              <https://datatracker.ietf.org/meeting/98/materials/slides-
+              98-rtgwg-gnmi-intro-draft-openconfig-rtgwg-gnmi-spec-00>.
+
+   [gpb]      Google Developers, "Protocol Buffers",
+              <https://developers.google.com/protocol-buffers>.
+
+   [grpc]     gRPC, "gPPC: A high performance, open source universal RPC
+              framework", <https://grpc.io>.
+
+   [IPPM-IOAM-DIRECT-EXPORT]
+              Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F.,
+              Bhandari, S., Ed., Sivakolundu, R., and T. Mizrahi, Ed.,
+              "In-situ OAM Direct Exporting", Work in Progress,
+              Internet-Draft, draft-ietf-ippm-ioam-direct-export-07, 13
+              October 2021, <https://datatracker.ietf.org/doc/html/
+              draft-ietf-ippm-ioam-direct-export-07>.
+
+   [IPPM-POSTCARD-BASED-TELEMETRY]
+              Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou,
+              T., Li, Z., Mishra, G., Shin, J., and K. Lee, "In-Situ OAM
+              Marking-based Direct Export", Work in Progress, Internet-
+              Draft, draft-song-ippm-postcard-based-telemetry-12, 12 May
+              2022, <https://datatracker.ietf.org/doc/html/draft-song-
+              ippm-postcard-based-telemetry-12>.
+
+   [NETCONF-DISTRIB-NOTIF]
+              Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois,
+              "Subscription to Distributed Notifications", Work in
+              Progress, Internet-Draft, draft-ietf-netconf-distributed-
+              notif-03, 10 January 2022,
+              <https://datatracker.ietf.org/doc/html/draft-ietf-netconf-
+              distributed-notif-03>.
+
+   [NETCONF-UDP-NOTIF]
+              Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H.,
+              and P. Lucente, "UDP-based Transport for Configured
+              Subscriptions", Work in Progress, Internet-Draft, draft-
+              ietf-netconf-udp-notif-05, 4 March 2022,
+              <https://datatracker.ietf.org/doc/html/draft-ietf-netconf-
+              udp-notif-05>.
+
+   [NETMOD-ECA-POLICY]
+              Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise,
+              "A YANG Data model for ECA Policy Management", Work in
+              Progress, Internet-Draft, draft-ietf-netmod-eca-policy-01,
+              19 February 2021, <https://datatracker.ietf.org/doc/html/
+              draft-ietf-netmod-eca-policy-01>.
+
+   [NMRG-ANTICIPATED-ADAPTATION]
+              Martinez-Julia, P., Ed., "Exploiting External Event
+              Detectors to Anticipate Resource Requirements for the
+              Elastic Adaptation of SDN/NFV Systems", Work in Progress,
+              Internet-Draft, draft-pedro-nmrg-anticipated-adaptation-
+              02, 29 June 2018, <https://datatracker.ietf.org/doc/html/
+              draft-pedro-nmrg-anticipated-adaptation-02>.
+
+   [NMRG-IBN-CONCEPTS-DEFINITIONS]
+              Clemm, A., Ciavaglia, L., Granville, L. Z., and J.
+              Tantsura, "Intent-Based Networking - Concepts and
+              Definitions", Work in Progress, Internet-Draft, draft-
+              irtf-nmrg-ibn-concepts-definitions-09, 24 March 2022,
+              <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-
+              ibn-concepts-definitions-09>.
+
+   [OPSAWG-DNP4IQ]
+              Song, H., Ed. and J. Gong, "Requirements for Interactive
+              Query with Dynamic Network Probes", Work in Progress,
+              Internet-Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017,
+              <https://datatracker.ietf.org/doc/html/draft-song-opsawg-
+              dnp4iq-01>.
+
+   [OPSAWG-IFIT-FRAMEWORK]
+              Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "A
+              Framework for In-situ Flow Information Telemetry", Work in
+              Progress, Internet-Draft, draft-song-opsawg-ifit-
+              framework-17, 22 February 2022,
+              <https://datatracker.ietf.org/doc/html/draft-song-opsawg-
+              ifit-framework-17>.
+
+   [RFC1157]  Case, J., Fedor, M., Schoffstall, M., and J. Davin,
+              "Simple Network Management Protocol (SNMP)", RFC 1157,
+              DOI 10.17487/RFC1157, May 1990,
+              <https://www.rfc-editor.org/info/rfc1157>.
+
+   [RFC2578]  McCloghrie, K., Ed., Perkins, D., Ed., and J.
+              Schoenwaelder, Ed., "Structure of Management Information
+              Version 2 (SMIv2)", STD 58, RFC 2578,
+              DOI 10.17487/RFC2578, April 1999,
+              <https://www.rfc-editor.org/info/rfc2578>.
+
+   [RFC2981]  Kavasseri, R., Ed., "Event MIB", RFC 2981,
+              DOI 10.17487/RFC2981, October 2000,
+              <https://www.rfc-editor.org/info/rfc2981>.
+
+   [RFC3176]  Phaal, P., Panchen, S., and N. McKee, "InMon Corporation's
+              sFlow: A Method for Monitoring Traffic in Switched and
+              Routed Networks", RFC 3176, DOI 10.17487/RFC3176,
+              September 2001, <https://www.rfc-editor.org/info/rfc3176>.
+
+   [RFC3411]  Harrington, D., Presuhn, R., and B. Wijnen, "An
+              Architecture for Describing Simple Network Management
+              Protocol (SNMP) Management Frameworks", STD 62, RFC 3411,
+              DOI 10.17487/RFC3411, December 2002,
+              <https://www.rfc-editor.org/info/rfc3411>.
+
+   [RFC3416]  Presuhn, R., Ed., "Version 2 of the Protocol Operations
+              for the Simple Network Management Protocol (SNMP)",
+              STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002,
+              <https://www.rfc-editor.org/info/rfc3416>.
+
+   [RFC3877]  Chisholm, S. and D. Romascanu, "Alarm Management
+              Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877,
+              September 2004, <https://www.rfc-editor.org/info/rfc3877>.
+
+   [RFC3954]  Claise, B., Ed., "Cisco Systems NetFlow Services Export
+              Version 9", RFC 3954, DOI 10.17487/RFC3954, October 2004,
+              <https://www.rfc-editor.org/info/rfc3954>.
+
+   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
+              Zekauskas, "A One-way Active Measurement Protocol
+              (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006,
+              <https://www.rfc-editor.org/info/rfc4656>.
+
+   [RFC5085]  Nadeau, T., Ed. and C. Pignataro, Ed., "Pseudowire Virtual
+              Circuit Connectivity Verification (VCCV): A Control
+              Channel for Pseudowires", RFC 5085, DOI 10.17487/RFC5085,
+              December 2007, <https://www.rfc-editor.org/info/rfc5085>.
+
+   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J.
+              Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)",
+              RFC 5357, DOI 10.17487/RFC5357, October 2008,
+              <https://www.rfc-editor.org/info/rfc5357>.
+
+   [RFC5424]  Gerhards, R., "The Syslog Protocol", RFC 5424,
+              DOI 10.17487/RFC5424, March 2009,
+              <https://www.rfc-editor.org/info/rfc5424>.
+
+   [RFC6020]  Bjorklund, M., Ed., "YANG - A Data Modeling Language for
+              the Network Configuration Protocol (NETCONF)", RFC 6020,
+              DOI 10.17487/RFC6020, October 2010,
+              <https://www.rfc-editor.org/info/rfc6020>.
+
+   [RFC6241]  Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
+              and A. Bierman, Ed., "Network Configuration Protocol
+              (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
+              <https://www.rfc-editor.org/info/rfc6241>.
+
+   [RFC6812]  Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare,
+              S., and E. Yedavalli, "Cisco Service-Level Assurance
+              Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013,
+              <https://www.rfc-editor.org/info/rfc6812>.
+
+   [RFC7011]  Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
+              "Specification of the IP Flow Information Export (IPFIX)
+              Protocol for the Exchange of Flow Information", STD 77,
+              RFC 7011, DOI 10.17487/RFC7011, September 2013,
+              <https://www.rfc-editor.org/info/rfc7011>.
+
+   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
+              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
+              2014, <https://www.rfc-editor.org/info/rfc7258>.
+
+   [RFC7276]  Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
+              Weingarten, "An Overview of Operations, Administration,
+              and Maintenance (OAM) Tools", RFC 7276,
+              DOI 10.17487/RFC7276, June 2014,
+              <https://www.rfc-editor.org/info/rfc7276>.
+
+   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
+              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
+              DOI 10.17487/RFC7540, May 2015,
+              <https://www.rfc-editor.org/info/rfc7540>.
+
+   [RFC7575]  Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A.,
+              Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic
+              Networking: Definitions and Design Goals", RFC 7575,
+              DOI 10.17487/RFC7575, June 2015,
+              <https://www.rfc-editor.org/info/rfc7575>.
+
+   [RFC7799]  Morton, A., "Active and Passive Metrics and Methods (with
+              Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
+              May 2016, <https://www.rfc-editor.org/info/rfc7799>.
+
+   [RFC7854]  Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP
+              Monitoring Protocol (BMP)", RFC 7854,
+              DOI 10.17487/RFC7854, June 2016,
+              <https://www.rfc-editor.org/info/rfc7854>.
+
+   [RFC7950]  Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
+              RFC 7950, DOI 10.17487/RFC7950, August 2016,
+              <https://www.rfc-editor.org/info/rfc7950>.
+
+   [RFC8040]  Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF
+              Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017,
+              <https://www.rfc-editor.org/info/rfc8040>.
+
+   [RFC8084]  Fairhurst, G., "Network Transport Circuit Breakers",
+              BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017,
+              <https://www.rfc-editor.org/info/rfc8084>.
+
+   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
+              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
+              March 2017, <https://www.rfc-editor.org/info/rfc8085>.
+
+   [RFC8259]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
+              Interchange Format", STD 90, RFC 8259,
+              DOI 10.17487/RFC8259, December 2017,
+              <https://www.rfc-editor.org/info/rfc8259>.
+
+   [RFC8321]  Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli,
+              L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi,
+              "Alternate-Marking Method for Passive and Hybrid
+              Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321,
+              January 2018, <https://www.rfc-editor.org/info/rfc8321>.
+
+   [RFC8639]  Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard,
+              E., and A. Tripathy, "Subscription to YANG Notifications",
+              RFC 8639, DOI 10.17487/RFC8639, September 2019,
+              <https://www.rfc-editor.org/info/rfc8639>.
+
+   [RFC8641]  Clemm, A. and E. Voit, "Subscription to YANG Notifications
+              for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641,
+              September 2019, <https://www.rfc-editor.org/info/rfc8641>.
+
+   [RFC8671]  Evens, T., Bayraktar, S., Lucente, P., Mi, P., and S.
+              Zhuang, "Support for Adj-RIB-Out in the BGP Monitoring
+              Protocol (BMP)", RFC 8671, DOI 10.17487/RFC8671, November
+              2019, <https://www.rfc-editor.org/info/rfc8671>.
+
+   [RFC8762]  Mirsky, G., Jun, G., Nydell, H., and R. Foote, "Simple
+              Two-Way Active Measurement Protocol", RFC 8762,
+              DOI 10.17487/RFC8762, March 2020,
+              <https://www.rfc-editor.org/info/rfc8762>.
+
+   [RFC8889]  Fioccola, G., Ed., Cociglio, M., Sapio, A., and R. Sisto,
+              "Multipoint Alternate-Marking Method for Passive and
+              Hybrid Performance Monitoring", RFC 8889,
+              DOI 10.17487/RFC8889, August 2020,
+              <https://www.rfc-editor.org/info/rfc8889>.
+
+   [RFC8924]  Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan,
+              R., and A. Ghanwani, "Service Function Chaining (SFC)
+              Operations, Administration, and Maintenance (OAM)
+              Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020,
+              <https://www.rfc-editor.org/info/rfc8924>.
+
+   [RFC9069]  Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
+              "Support for Local RIB in the BGP Monitoring Protocol
+              (BMP)", RFC 9069, DOI 10.17487/RFC9069, February 2022,
+              <https://www.rfc-editor.org/info/rfc9069>.
+
+   [RFC9197]  Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi,
+              Ed., "Data Fields for In Situ Operations, Administration,
+              and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197,
+              May 2022, <https://www.rfc-editor.org/info/rfc9197>.
+
+   [W3C.REC-xml-20081126]
+              Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
+              F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth
+              Edition)", World Wide Web Consortium Recommendation REC-
+              xml-20081126, November 2008,
+              <https://www.w3.org/TR/2008/REC-xml-20081126>.
+
+   [y1731]    ITU-T, "Operations, administration and maintenance (OAM)
+              functions and mechanisms for Ethernet-based networks",
+              ITU-T Recommendation G.8013/Y.1731, August 2015,
+              <https://www.itu.int/rec/T-REC-Y.1731/en>.
+
+Appendix A.  A Survey on Existing Network Telemetry Techniques
+
+   In this non-normative appendix, we provide an overview of some
+   existing techniques and standard proposals for each network telemetry
+   module.
+
+A.1.  Management Plane Telemetry
+
+A.1.1.  Push Extensions for NETCONF
+
+   NETCONF [RFC6241] is a popular network management protocol
+   recommended by IETF.  Its core strength is for managing
+   configuration, but it can also be used for data collection.
+   YANG-Push [RFC8639] [RFC8641] extends NETCONF and enables subscriber
+   applications to request a continuous, customized stream of updates
+   from a YANG datastore.  Providing such visibility into changes made
+   upon YANG configuration and operational objects enables new
+   capabilities based on the remote mirroring of configuration and
+   operational state.  Moreover, a distributed data collection mechanism
+   [NETCONF-DISTRIB-NOTIF] via a UDP-based publication channel
+   [NETCONF-UDP-NOTIF] provides enhanced efficiency for the NETCONF-
+   based telemetry.
+
+A.1.2.  gRPC Network Management Interface
+
+   gRPC Network Management Interface (gNMI) [gnmi] is a network
+   management protocol based on the gRPC [grpc] Remote Procedure Call
+   (RPC) framework.  With a single gRPC service definition, both
+   configuration and telemetry can be covered. gRPC is an open-source
+   micro-service communication framework based on HTTP/2 [RFC7540].  It
+   provides a number of capabilities that are well-suited for network
+   telemetry, including:
+
+   *  A full-duplex streaming transport model; when combined with a
+      binary encoding mechanism, it provides good telemetry efficiency.
+
+   *  A higher-level feature consistency across platforms that common
+      HTTP/2 libraries typically do not provide.  This characteristic is
+      especially valuable for the fact that telemetry data collectors
+      normally reside on a large variety of platforms.
+
+   *  A built-in load-balancing and failover mechanism.
+
+A.2.  Control Plane Telemetry
+
+A.2.1.  BGP Monitoring Protocol
+
+   BMP [RFC7854] is used to monitor BGP sessions and is intended to
+   provide a convenient interface for obtaining route views.
+
+   BGP routing information is collected from the monitored device(s) to
+   the BMP monitoring station by setting up the BMP TCP session.  The
+   BGP peers are monitored by the BMP Peer Up and Peer Down
+   notifications.  The BGP routes (including Adj_RIB_In [RFC7854],
+   Adj_RIB_out [RFC8671], and local RIB [RFC9069]) are encapsulated in
+   the BMP Route Monitoring Message and the BMP Route Mirroring Message,
+   providing both an initial table dump and real-time route updates.  In
+   addition, BGP statistics are reported through the BMP Stats Report
+   Message, which could be either timer triggered or event-driven.
+   Future BMP extensions could further enrich BGP monitoring
+   applications.
+
+A.3.  Data Plane Telemetry
+
+A.3.1.  Alternate-Marking (AM) Technology
+
+   The Alternate-Marking method enables efficient measurements of packet
+   loss, delay, and jitter both in IP and Overlay Networks, as presented
+   in [RFC8321] and [RFC8889].
+
+   This technique can be applied to point-to-point and multipoint-to-
+   multipoint flows.  Alternate Marking creates batches of packets by
+   alternating the value of 1 bit (or a label) of the packet header.
+   These batches of packets are unambiguously recognized over the
+   network, and the comparison of packet counters for each batch allows
+   the packet loss calculation.  The same idea can be applied to delay
+   measurement by selecting ad hoc packets with a marking bit dedicated
+   for delay measurements.
+
+   The Alternate-Marking method needs two counters each marking period
+   for each flow under monitor.  For instance, by considering n
+   measurement points and m monitored flows, the order of magnitude of
+   the packet counters for each time interval is n*m*2 (1 per color).
+
+   Since networks offer rich sets of network performance measurement
+   data (e.g., packet counters), conventional approaches run into
+   limitations.  The bottleneck is the generation and export of the data
+   and the amount of data that can be reasonably collected from the
+   network.  In addition, management tasks related to determining and
+   configuring which data to generate lead to significant deployment
+   challenges.
+
+   The Multipoint Alternate-Marking approach, described in [RFC8889],
+   aims to resolve this issue and make the performance monitoring more
+   flexible in case a detailed analysis is not needed.
+
+   An application orchestrates network performance measurement tasks
+   across the network to allow for optimized monitoring.  The
+   application can choose how roughly or precisely to configure
+   measurement points depending on the application's requirements.
+
+   Using Alternate Marking, it is possible to monitor a Multipoint
+   Network without in-depth examination by using Network Clustering
+   (subnetworks that are portions of the entire network that preserve
+   the same property of the entire network, called clusters).  So in the
+   case where there is packet loss or the delay is too high, the
+   specific filtering criteria could be applied to gather a more
+   detailed analysis by using a different combination of clusters up to
+   a per-flow measurement as described in the Alternate-Marking document
+   [RFC8321].
+
+   In summary, an application can configure end-to-end network
+   monitoring.  If the network does not experience issues, this
+   approximate monitoring is good enough and is very cheap in terms of
+   network resources.  However, in case of problems, the application
+   becomes aware of the issues from this approximate monitoring and, in
+   order to localize the portion of the network that has issues,
+   configures the measurement points more extensively, allowing more
+   detailed monitoring to be performed.  After the detection and
+   resolution of the problem, the initial approximate monitoring can be
+   used again.
+
+A.3.2.  Dynamic Network Probe
+
+   A hardware-based Dynamic Network Probe (DNP) [OPSAWG-DNP4IQ] provides
+   a programmable means to customize the data that an application
+   collects from the data plane.  A direct benefit of DNP is the
+   reduction of the exported data.  A full DNP solution covers several
+   components including data source, data subscription, and data
+   generation.  The data subscription needs to define the derived data
+   that can be composed and derived from raw data sources.  The data
+   generation takes advantage of the moderate in-network computing to
+   produce the desired data.
+
+   While DNP can introduce unforeseeable flexibility to the data plane
+   telemetry, it also faces some challenges.  It requires a flexible
+   data plane that can be dynamically reprogrammed at runtime.  The
+   programming Application Programming Interface (API) is yet to be
+   defined.
+
+A.3.3.  IP Flow Information Export (IPFIX) Protocol
+
+   Traffic on a network can be seen as a set of flows passing through
+   network elements.  IPFIX [RFC7011] provides a means of transmitting
+   traffic flow information for administrative or other purposes.  A
+   typical IPFIX-enabled system includes a pool of Metering Processes
+   that collects data packets at one or more Observation Points,
+   optionally filters them, and aggregates information about these
+   packets.  An Exporter then gathers each of the Observation Points
+   together into an Observation Domain and sends this information via
+   the IPFIX protocol to a Collector.
+
+A.3.4.  In Situ OAM
+
+   Classical passive and active monitoring and measurement techniques
+   are either inaccurate or resource consuming.  It is preferable to
+   directly acquire data associated with a flow's packets when the
+   packets pass through a network.  IOAM [RFC9197], a data generation
+   technique, embeds a new instruction header to user packets, and the
+   instruction directs the network nodes to add the requested data to
+   the packets.  Thus, at the path's end, the packet's experience gained
+   on the entire forwarding path can be collected.  Such firsthand data
+   is invaluable to many network OAM applications.
+
+   However, IOAM also faces some challenges.  The issues on performance
+   impact, security, scalability and overhead limits, encapsulation
+   difficulties in some protocols, and cross-domain deployment need to
+   be addressed.
+
+A.3.5.  Postcard-Based Telemetry
+
+   The postcard-based telemetry, as embodied in IOAM Direct Export (DEX)
+   [IPPM-IOAM-DIRECT-EXPORT] and IOAM Marking
+   [IPPM-POSTCARD-BASED-TELEMETRY], is a complementary technique to the
+   passport-based IOAM [RFC9197].  PBT directly exports data at each
+   node through an independent packet.  At the cost of higher bandwidth
+   overhead and the need for data correlation, PBT shows several unique
+   advantages.  It can also help to identify packet drop location in
+   case a packet is dropped on its forwarding path.
+
+A.3.6.  Existing OAM for Specific Data Planes
+
+   Various data planes raise unique OAM requirements.  IETF has
+   published OAM technique and framework documents (e.g., [RFC8924] and
+   [RFC5085]) targeting different data planes such as Multiprotocol
+   Label Switching (MPLS), L2 Virtual Private Network (VPN), Network
+   Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN),
+   Bit Index Explicit Replication (BIER), Service Function Chaining
+   (SFC), Segment Routing (SR), and Deterministic Networking (DETNET).
+   The aforementioned data plane telemetry techniques can be used to
+   enhance the OAM capability on such data planes.
+
+A.4.  External Data and Event Telemetry
+
+A.4.1.  Sources of External Events
+
+   To ensure that the information provided by external event detectors
+   and used by the network management solutions is meaningful for
+   management purposes, the network telemetry framework must ensure that
+   such detectors (sources) are easily connected to the management
+   solutions (sinks).  This requires the specification of a list of
+   potential external data sources that could be of interest in network
+   management and matching it to the connectors and/or interfaces
+   required to connect them.
+
+   Categories of external event sources that may be of interest to
+   network management include:
+
+   *  Smart objects and sensors.  With the consolidation of the Internet
+      of Things (IoT), any network system will have many smart objects
+      attached to its physical surroundings and logical operation
+      environments.  Most of these objects will be essentially based on
+      sensors of many kinds (e.g., temperature, humidity, and presence),
+      and the information they provide can be very useful for the
+      management of the network, even when they are not specifically
+      deployed for such purpose.  Elements of this source type will
+      usually provide a specific protocol for interaction, especially
+      one of the protocols related to IoT, such as the Constrained
+      Application Protocol (CoAP).
+
+   *  Online news reporters.  Several online news services have the
+      ability to provide an enormous quantity of information about
+      different events occurring in the world.  Some of those events can
+      have an impact on the network system managed by a specific
+      framework; therefore, such information may be of interest to the
+      management solution.  For instance, diverse security reports, such
+      as Common Vulnerabilities and Exposures (CVEs), can be issued by
+      the corresponding authority and used by the management solution to
+      update the managed system, if needed.  Instead of a specific
+      protocol and data format, the sources of this kind of information
+      usually follow a relaxed but structured format.  This format will
+      be part of both the ontology and information model of the
+      telemetry framework.
+
+   *  Global event analyzers.  The advance of big data analyzers
+      provides a huge amount of information and, more interestingly, the
+      identification of events detected by analyzing many data streams
+      from different origins.  In contrast with the other types of
+      sources, which are focused on specific events, the detectors of
+      this source type will detect generic events.  For example, during
+      a sports event, some unexpected movement makes it fascinating, and
+      many people connect to sites that are reporting on the event.  The
+      underlying networks supporting the services that cover the event
+      can be affected by such situation, so their management solutions
+      should be aware of it.  In contrast with the other source types, a
+      new information model, format, and reporting protocol is required
+      to integrate the detectors of this type with the management
+      solution.
+
+   Additional detector types can be added to the system, but generally
+   they will be the result of composing the properties offered by these
+   main classes.
+
+A.4.2.  Connectors and Interfaces
+
+   For allowing external event detectors to be properly integrated with
+   other management solutions, both elements must expose interfaces and
+   protocols that are subject to their particular objective.  Since
+   external event detectors will be focused on providing their
+   information to their main consumers, which generally will not be
+   limited to the network management solutions, the framework must
+   include the definition of the required connectors for ensuring the
+   interconnection between detectors (sources) and their consumers
+   within the management systems (sinks) are effective.
+
+   In some situations, the interconnection between external event
+   detectors and the management system is via the management plane.  For
+   those situations, there will be a special connector that provides the
+   typical interfaces found in most other elements connected to the
+   management plane.  For instance, the interfaces could accomplish this
+   with a specific data model (YANG) and specific telemetry protocol,
+   such as NETCONF, YANG-Push, or gRPC.
+
+Acknowledgments
+
+   We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe
+   Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe
+   Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra,
+   Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin
+   Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Éric
+   Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many
+   others who have provided helpful comments and suggestions to improve
+   this document.
+
+Contributors
+
+   The other contributors of this document are Tianran Zhou, Zhenbin Li,
+   Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm.
+
+Authors' Addresses
+
+   Haoyu Song
+   Futurewei
+   United States of America
+   Email: haoyu.song@futurewei.com
+
+
+   Fengwei Qin
+   China Mobile
+   China
+   Email: qinfengwei@chinamobile.com
+
+
+   Pedro Martinez-Julia
+   NICT
+   Japan
+   Email: pedro@nict.go.jp
+
+
+   Laurent Ciavaglia
+   Rakuten Mobile
+   France
+   Email: laurent.ciavaglia@rakuten.com
+
+
+   Aijun Wang
+   China Telecom
+   China
+   Email: wangaj3@chinatelecom.cn
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc9232.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)