diff options
Diffstat (limited to 'doc/rfc/rfc3549.txt')
-rw-r--r-- | doc/rfc/rfc3549.txt | 1851 |
1 files changed, 1851 insertions, 0 deletions
diff --git a/doc/rfc/rfc3549.txt b/doc/rfc/rfc3549.txt new file mode 100644 index 0000000..166211c --- /dev/null +++ b/doc/rfc/rfc3549.txt @@ -0,0 +1,1851 @@ + + + + + + +Network Working Group J. Salim +Request for Comments: 3549 Znyx Networks +Category: Informational H. Khosravi + Intel + A. Kleen + Suse + A. Kuznetsov + INR/Swsoft + July 2003 + + + Linux Netlink as an IP Services Protocol + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2003). All Rights Reserved. + +Abstract + + This document describes Linux Netlink, which is used in Linux both as + an intra-kernel messaging system as well as between kernel and user + space. The focus of this document is to describe Netlink's + functionality as a protocol between a Forwarding Engine Component + (FEC) and a Control Plane Component (CPC), the two components that + define an IP service. As a result of this focus, this document + ignores other uses of Netlink, including its use as a intra-kernel + messaging system, as an inter-process communication scheme (IPC), or + as a configuration tool for other non-networking or non-IP network + services (such as decnet, etc.). + + This document is intended as informational in the context of prior + art for the ForCES IETF working group. + + + + + + + + + + + + + +Salim, et. al. Informational [Page 1] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +Table of Contents + + 1. Introduction ............................................... 2 + 1.1. Definitions ........................................... 3 + 1.1.1. Control Plane Components (CPCs)................ 3 + 1.1.2. Forwarding Engine Components (FECs)............ 3 + 1.1.3. IP Services ................................... 5 + 2. Netlink Architecture ....................................... 7 + 2.1. Netlink Logical Model ................................. 8 + 2.2. Message Format......................................... 9 + 2.3. Protocol Model......................................... 9 + 2.3.1. Service Addressing............................. 10 + 2.3.2. Netlink Message Header......................... 10 + 2.3.3. FE System Services' Templates.................. 13 + 3. Currently Defined Netlink IP Services....................... 16 + 3.1. IP Service NETLINK_ROUTE............................... 16 + 3.1.1. Network Route Service Module................... 16 + 3.1.2. Neighbor Setup Service Module.................. 20 + 3.1.3. Traffic Control Service........................ 21 + 3.2. IP Service NETLINK_FIREWALL............................ 23 + 3.3. IP Service NETLINK_ARPD................................ 27 + 4. References.................................................. 27 + 4.1. Normative References................................... 27 + 4.2. Informative References................................. 28 + 5. Security Considerations..................................... 28 + 6. Acknowledgements............................................ 28 + Appendix 1: Sample Service Hierarchy .......................... 29 + Appendix 2: Sample Protocol for the Foo IP Service............. 30 + Appendix 2a: Interacting with Other IP services................. 30 + Appendix 3: Examples........................................... 31 + Authors' Addresses.............................................. 32 + Full Copyright Statement........................................ 33 + +1. Introduction + + The concept of IP Service control-forwarding separation was first + introduced in the early 1990s by the BSD 4.4 routing sockets [9]. + The focus at that time was a simple IP(v4) forwarding service and how + the CPC, either via a command line configuration tool or a dynamic + route daemon, could control forwarding tables for that IPv4 + forwarding service. + + The IP world has evolved considerably since those days. Linux + Netlink, when observed from a service provisioning and management + point of view, takes routing sockets one step further by breaking the + barrier of focus around IPv4 forwarding. Since the Linux 2.1 kernel, + Netlink has been providing the IP service abstraction to a few + services other than the classical RFC 1812 IPv4 forwarding. + + + +Salim, et. al. Informational [Page 2] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + The motivation for this document is not to list every possible + service for which Netlink is applied. In fact, we leave out a lot of + services (multicast routing, tunneling, policy routing, etc). Neither + is this document intended to be a tutorial on Netlink. The idea is + to explain the overall Netlink view with a special focus on the + mandatory building blocks within the ForCES charter (i.e., IPv4 and + QoS). This document also serves to capture prior art to many + mechanisms that are useful within the context of ForCES. The text is + limited to a subset of what is available in kernel 2.4.6, the newest + kernel when this document was first written. It is also limited to + IPv4 functionality. + + We first give some concept definitions and then describe how Netlink + fits in. + +1.1. Definitions + + A Control Plane (CP) is an execution environment that may have + several sub-components, which we refer to as CPCs. Each CPC provides + control for a different IP service being executed by a Forwarding + Engine (FE) component. This relationship means that there might be + several CPCs on a physical CP, if it is controlling several IP + services. In essence, the cohesion between a CP component and an FE + component is the service abstraction. + +1.1.1. Control Plane Components (CPCs) + + Control Plane Components encompass signalling protocols, with + diversity ranging from dynamic routing protocols, such as OSPF [5], + to tag distribution protocols, such as CR-LDP [7]. Classical + management protocols and activities also fall under this category. + These include SNMP [6], COPS [4], and proprietary CLI/GUI + configuration mechanisms. The purpose of the control plane is to + provide an execution environment for the above-mentioned activities + with the ultimate goal being to configure and manage the second + Network Element (NE) component: the FE. The result of the + configuration defines the way that packets traversing the FE are + treated. + +1.1.2. Forwarding Engine Components (FECs) + + The FE is the entity of the NE that incoming packets (from the + network into the NE) first encounter. + + The FE's service-specific component massages the packet to provide it + with a treatment to achieve an IP service, as defined by the Control + Plane Components for that IP service. Different services will + utilize different FECs. Service modules may be chained to achieve a + + + +Salim, et. al. Informational [Page 3] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + more complex service (refer to the Linux FE model, described later). + When built for providing a specific service, the FE service component + will adhere to a forwarding model. + +1.1.2.1. Linux IP Forwarding Engine Model + + ____ +---------------+ + +->-| FW |---> | TCP, UDP, ... | + | +----+ +---------------+ + | | + ^ v + | _|_ + +----<----+ | FW | + | +----+ + ^ | + | Y + To host From host + stack stack + ^ | + |_____ | +Ingress ^ Y +device ____ +-------+ +|---|--+ ____ +--------+ Egress +->----->| FW |-->|Ingress|-->---->| Forw- |->| FW |->| Egress | device + +----+ | TC | | ard | +----+ | TC |--> + +-------+ +-------+ +--------+ + + The figure above shows the Linux FE model per device. The only + mandatory part of the datapath is the Forwarding module, which is RFC + 1812 conformant. The different Firewall (FW), Ingress Traffic + Control, and Egress Traffic Control building blocks are not mandatory + in the datapath and may even be used to bypass the RFC 1812 module. + These modules are shown as simple blocks in the datapath but, in + fact, could be multiple cascaded, independent submodules within the + indicated blocks. More information can be found at [10] and [11]. + + Packets arriving at the ingress device first pass through a firewall + module. Packets may be dropped, munged, etc., by the firewall + module. The incoming packet, depending on set policy, may then be + passed via an Ingress Traffic Control module. Metering and policing + activities are contained within the Ingress TC module. Packets may + be dropped, depending on metering results and policing policies, at + this module. Next, the packet is subjected to the only non-optional + module, the RFC 1812-conformant Forwarding module. The packet may be + dropped if it is nonconformant (to the many RFCs complementing 1812 + and 1122). This module is a juncture point at which packets destined + to the forwarding NE may be sent up to the host stack. + + + + + +Salim, et. al. Informational [Page 4] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Packets that are not for the NE may further traverse a policy routing + submodule (within the forwarding module), if so provisioned. Another + firewall module is walked next. The firewall module can drop or + munge/transform packets, depending on the configured sub-modules + encountered and their policies. If all goes well, the Egress TC + module is accessed next. + + The Egress TC may drop packets for policing, scheduling, congestion + control, or rate control reasons. Egress queues exist at this point + and any of the drops or delays may happen before or after the packet + is queued. All is dependent on configured module algorithms and + policies. + +1.1.3. IP Services + + An IP service is the treatment of an IP packet within the NE. This + treatment is provided by a combination of both the CPC and the FEC. + + The time span of the service is from the moment when the packet + arrives at the NE to the moment that it departs. In essence, an IP + service in this context is a Per-Hop Behavior. CP components running + on NEs define the end-to-end path control for a service by running + control/signaling protocol/management-applications. These + distributed CPCs unify the end-to-end view of the IP service. As + noted above, these CP components then define the behavior of the FE + (and therefore the NE) for a described packet. + + A simple example of an IP service is the classical IPv4 Forwarding. + In this case, control components, such as routing protocols (OSPF, + RIP, etc.) and proprietary CLI/GUI configurations, modify the FE's + forwarding tables in order to offer the simple service of forwarding + packets to the next hop. Traditionally, NEs offering this simple + service are known as routers. + + + + + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 5] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + In the diagram below, we show a simple FE<->CP setup to provide an + example of the classical IPv4 service with an extension to do some + basic QoS egress scheduling and illustrate how the setup fits in this + described model. + + Control Plane (CP) + .------------------------------------ + | /^^^^^^\ /^^^^^^\ | + | | | | COPS |-\ | + | | ospfd | | PEP | \ | + | \ / \_____/ | | + /------\_____/ | / | + | | | | / | + | |_________\__________|____|_________| + | | | | + ****************************************** + Forwarding ************* Netlink layer ************ + Engine (FE) ***************************************** + .-------------|-----------|----------|---|------------- + | IPv4 forwarding | | | + | FE Service / / | + | Component / / | + | ---------------/---------------/--------- | + | | | / | | + packet | | --------|-- ----|----- | packet + in | | | IPv4 | | Egress | | out + -->--->|------>|---->|Forwarding|----->| QoS |--->| ---->|-> + | | | | | Scheduler| | | + | | ----------- ---------- | | + | | | | + | --------------------------------------- | + | | + ------------------------------------------------------- + + The above diagram illustrates ospfd, an OSPF protocol control daemon, + and a COPS Policy Enforcement Point (PEP) as distinct CPCs. The IPv4 + FE component includes the IPv4 Forwarding service module as well as + the Egress Scheduling service module. Another service might add a + policy forwarder between the IPv4 forwarder and the QoS egress + scheduler. A simpler classical service would have constituted only + the IPv4 forwarder. + + Over the years, it has become important to add additional services to + routers to meet emerging requirements. More complex services + extending classical forwarding have been added and standardized. + These newer services might go beyond the layer 3 contents of the + packet header. However, the name "router", although a misnomer, is + still used to describe these NEs. Services (which may look beyond + + + +Salim, et. al. Informational [Page 6] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + the classical L3 service headers) include firewalling, QoS in + Diffserv and RSVP, NAT, policy based routing, etc. Newer control + protocols or management activities are introduced with these new + services. + + One extreme definition of a IP service is something for which a + service provider would be able to charge. + +2. Netlink Architecture + + Control of IP service components is defined by using templates. + + The FEC and CPC participate to deliver the IP service by + communicating using these templates. The FEC might continuously get + updates from the Control Plane Component on how to operate the + service (e.g., for v4 forwarding or for route additions or + deletions). + + The interaction between the FEC and the CPC, in the Netlink context, + defines a protocol. Netlink provides mechanisms for the CPC + (residing in user space) and the FEC (residing in kernel space) to + have their own protocol definition -- kernel space and user space + just mean different protection domains. Therefore, a wire protocol + is needed to communicate. The wire protocol is normally provided by + some privileged service that is able to copy between multiple + protection domains. We will refer to this service as the Netlink + service. The Netlink service can also be encapsulated in a different + transport layer, if the CPC executes on a different node than the + FEC. The FEC and CPC, using Netlink mechanisms, may choose to define + a reliable protocol between each other. By default, however, Netlink + provides an unreliable communication. + + Note that the FEC and CPC can both live in the same memory protection + domain and use the connect() system call to create a path to the peer + and talk to each other. We will not discuss this mechanism further + other than to say that it is available. Throughout this document, we + will refer interchangeably to the FEC to mean kernel space and the + CPC to mean user space. This denomination is not meant, however, to + restrict the two components to these protection domains or to the + same compute node. + + Note: Netlink allows participation in IP services by both service + components. + + + + + + + + +Salim, et. al. Informational [Page 7] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +2.1. Netlink Logical Model + + In the diagram below we show a simple FEC<->CPC logical relationship. + We use the IPv4 forwarding FEC (NETLINK_ROUTE, which is discussed + further below) as an example. + + Control Plane (CP) + .------------------------------------ + | /^^^^^\ /^^^^^\ | + | | | / CPC-2 \ | + | | CPC-1 | | COPS | | + | | ospfd | | PEP | | + | | / \____ _/ | + | \____/ | | + | | | | + ****************************************| + ************* BROADCAST WIRE ************ + FE---------- *****************************************. + | IPv4 forwarding | | | | + | FEC | | | | + | --------------/ ----|-----------|-------- | + | | / | | | | + | | .-------. .-------. .------. | | + | | |Ingress| | IPv4 | |Egress| | | + | | |police | |Forward| | QoS | | | + | | |_______| |_______| |Sched | | | + | | ------ | | + | --------------------------------------- | + | | + ----------------------------------------------------- + + Netlink logically models FECs and CPCs in the form of nodes + interconnected to each other via a broadcast wire. + + The wire is specific to a service. The example above shows the + broadcast wire belonging to the extended IPv4 forwarding service. + + Nodes (CPCs or FECs as illustrated above) connect to the wire and + register to receive specific messages. CPCs may connect to multiple + wires if it helps them to control the service better. All nodes + (CPCs and FECs) dump packets on the broadcast wire. Packets can be + discarded by the wire if they are malformed or not specifically + formatted for the wire. Dropped packets are not seen by any of the + nodes. The Netlink service may signal an error to the sender if it + detects a malformatted Netlink packet. + + + + + + +Salim, et. al. Informational [Page 8] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Packets sent on the wire can be broadcast, multicast, or unicast. + FECs or CPCs register for specific messages of interest for + processing or just monitoring purposes. + + Appendices 1 and 2 have a high level overview of this interaction. + +2.2. Message Format + + There are three levels to a Netlink message: The general Netlink + message header, the IP service specific template, and the IP service + specific data. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Netlink message header | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | IP Service Template | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | IP Service specific data in TLVs | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + The Netlink message is used to communicate between the FEC and CPC + for parameterization of the FECs, asynchronous event notification of + FEC events to the CPCs, and statistics querying/gathering (typically + by a CPC). + + The Netlink message header is generic for all services, whereas the + IP Service Template header is specific to a service. Each IP Service + then carries parameterization data (CPC->FEC direction) or response + (FEC->CPC direction). These parameterizations are in TLV (Type- + Length-Value) format and are unique to the service. + + The different parts of the netlink message are discussed in the + following sections. + +2.3. Protocol Model + + This section expands on how Netlink provides the mechanism for + service-oriented FEC and CPC interaction. + + + + + +Salim, et. al. Informational [Page 9] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +2.3.1. Service Addressing + + Access is provided by first connecting to the service on the FE. The + connection is achieved by making a socket() system call to the + PF_NETLINK domain. Each FEC is identified by a protocol number. One + may open either SOCK_RAW or SOCK_DGRAM type sockets, although Netlink + does not distinguish between the two. The socket connection provides + the basis for the FE<->CP addressing. + + Connecting to a service is followed (at any point during the life of + the connection) by either issuing a service-specific command (from + the CPC to the FEC, mostly for configuration purposes), issuing a + statistics-collection command, or subscribing/unsubscribing to + service events. Closing the socket terminates the transaction. + Refer to Appendices 1 and 2 for examples. + +2.3.2. Netlink Message Header + + Netlink messages consist of a byte stream with one or multiple + Netlink headers and an associated payload. If the payload is too big + to fit into a single message it, can be split over multiple Netlink + messages, collectively called a multipart message. For multipart + messages, the first and all following headers have the NLM_F_MULTI + Netlink header flag set, except for the last header which has the + Netlink header type NLMSG_DONE. + + The Netlink message header is shown below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Sequence Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Process ID (PID) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + + + + + + + + + + +Salim, et. al. Informational [Page 10] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + The fields in the header are: + + Length: 32 bits + The length of the message in bytes, including the header. + + Type: 16 bits + This field describes the message content. + It can be one of the standard message types: + NLMSG_NOOP Message is ignored. + NLMSG_ERROR The message signals an error and the payload + contains a nlmsgerr structure. This can be looked + at as a NACK and typically it is from FEC to CPC. + NLMSG_DONE Message terminates a multipart message. + + Individual IP services specify more message types, e.g., + NETLINK_ROUTE service specifies several types, such as RTM_NEWLINK, + RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR, RTM_DELADDR, RTM_NEWROUTE, + RTM_DELROUTE, etc. + + Flags: 16 bits + The standard flag bits used in Netlink are + NLM_F_REQUEST Must be set on all request messages (typically + from user space to kernel space) + NLM_F_MULTI Indicates the message is part of a multipart + message terminated by NLMSG_DONE + NLM_F_ACK Request for an acknowledgment on success. + Typical direction of request is from user + space (CPC) to kernel space (FEC). + NLM_F_ECHO Echo this request. Typical direction of + request is from user space (CPC) to kernel + space (FEC). + + Additional flag bits for GET requests on config information in + the FEC. + NLM_F_ROOT Return the complete table instead of a + single entry. + NLM_F_MATCH Return all entries matching criteria passed in + message content. + NLM_F_ATOMIC Return an atomic snapshot of the table being + referenced. This may require special + privileges because it has the potential to + interrupt service in the FE for a longer time. + + Convenience macros for flag bits: + NLM_F_DUMP This is NLM_F_ROOT or'ed with NLM_F_MATCH + + + + + + +Salim, et. al. Informational [Page 11] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Additional flag bits for NEW requests + NLM_F_REPLACE Replace existing matching config object with + this request. + NLM_F_EXCL Don't replace the config object if it already + exists. + NLM_F_CREATE Create config object if it doesn't already + exist. + NLM_F_APPEND Add to the end of the object list. + + For those familiar with BSDish use of such operations in route + sockets, the equivalent translations are: + + - BSD ADD operation equates to NLM_F_CREATE or-ed + with NLM_F_EXCL + - BSD CHANGE operation equates to NLM_F_REPLACE + - BSD Check operation equates to NLM_F_EXCL + - BSD APPEND equivalent is actually mapped to + NLM_F_CREATE + + Sequence Number: 32 bits + The sequence number of the message. + + Process ID (PID): 32 bits + The PID of the process sending the message. The PID is used by the + kernel to multiplex to the correct sockets. A PID of zero is used + when sending messages to user space from the kernel. + +2.3.2.1. Mechanisms for Creating Protocols + + One could create a reliable protocol between an FEC and a CPC by + using the combination of sequence numbers, ACKs, and retransmit + timers. Both sequence numbers and ACKs are provided by Netlink; + timers are provided by Linux. + + One could create a heartbeat protocol between the FEC and CPC by + using the ECHO flags and the NLMSG_NOOP message. + + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 12] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +2.3.2.2. The ACK Netlink Message + + This message is actually used to denote both an ACK and a NACK. + Typically, the direction is from FEC to CPC (in response to an ACK + request message). However, the CPC should be able to send ACKs back + to FEC when requested. The semantics for this are IP service + specific. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Netlink message header | + | type = NLMSG_ERROR | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Error code | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | OLD Netlink message header | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Error code: integer (typically 32 bits) + + An error code of zero indicates that the message is an ACK response. + An ACK response message contains the original Netlink message header, + which can be used to compare against (sent sequence numbers, etc). + + A non-zero error code message is equivalent to a Negative ACK (NACK). + In such a situation, the Netlink data that was sent down to the + kernel is returned appended to the original Netlink message header. + An error code printable via the perror() is also set (not in the + message header, rather in the executing environment state variable). + +2.3.3. FE System Services' Templates + + These are services that are offered by the system for general use by + other services. They include the ability to configure, gather + statistics and listen to changes in shared resources. IP address + management, link events, etc. fit here. We create this section for + these services for logical separation, despite the fact that they are + accessed via the NETLINK_ROUTE FEC. The reason that they exist + within NETLINK_ROUTE is due to historical cruft: the BSD 4.4 Route + Sockets implemented them as part of the IPv4 forwarding sockets. + + + + + + + + + + +Salim, et. al. Informational [Page 13] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +2.3.3.1. Network Interface Service Module + + This service provides the ability to create, remove, or get + information about a specific network interface. The network + interface can be either physical or virtual and is network protocol + independent (e.g., an x.25 interface can be defined via this + message). The Interface service message template is shown below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Family | Reserved | Device Type | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Interface Index | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Device Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Change Mask | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Family: 8 bits + This is always set to AF_UNSPEC. + + Device Type: 16 bits + This defines the type of the link. The link could be Ethernet, a + tunnel, etc. We are interested only in IPv4, although the link type + is L3 protocol-independent. + + Interface Index: 32 bits + Uniquely identifies interface. + + Device Flags: 32 bits + + IFF_UP Interface is administratively up. + IFF_BROADCAST Valid broadcast address set. + IFF_DEBUG Internal debugging flag. + IFF_LOOPBACK Interface is a loopback interface. + IFF_POINTOPOINT Interface is a point-to-point link. + IFF_RUNNING Interface is operationally up. + IFF_NOARP No ARP protocol needed for this interface. + IFF_PROMISC Interface is in promiscuous mode. + IFF_NOTRAILERS Avoid use of trailers. + IFF_ALLMULTI Receive all multicast packets. + IFF_MASTER Master of a load balancing bundle. + IFF_SLAVE Slave of a load balancing bundle. + IFF_MULTICAST Supports multicast. + + + + + +Salim, et. al. Informational [Page 14] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + IFF_PORTSEL Is able to select media type via ifmap. + IFF_AUTOMEDIA Auto media selection active. + IFF_DYNAMIC Interface was dynamically created. + + Change Mask: 32 bits + Reserved for future use. Must be set to 0xFFFFFFFF. + + Applicable attributes: + Attribute Description + .......................................................... + IFLA_UNSPEC Unspecified. + IFLA_ADDRESS Hardware address interface L2 address. + IFLA_BROADCAST Hardware address L2 broadcast + address. + IFLA_IFNAME ASCII string device name. + IFLA_MTU MTU of the device. + IFLA_LINK ifindex of link to which this device + is bound. + IFLA_QDISC ASCII string defining egress root + queuing discipline. + IFLA_STATS Interface statistics. + + Netlink message types specific to this service: + RTM_NEWLINK, RTM_DELLINK, and RTM_GETLINK + +2.3.3.2. IP Address Service Module + + This service provides the ability to add, remove, or receive + information about an IP address associated with an interface. The + address provisioning service message template is shown below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Family | Length | Flags | Scope | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Interface Index | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Family: 8 bits + Address Family: AF_INET for IPv4; and AF_INET6 for IPV6. + + Length: 8 bits + The length of the address mask. + + Flags: 8 bits + IFA_F_SECONDARY For secondary address (alias interface). + + + + +Salim, et. al. Informational [Page 15] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + IFA_F_PERMANENT For a permanent address set by the user. + When this is not set, it means the address + was dynamically created (e.g., by stateless + autoconfiguration). + IFA_F_DEPRECATED Defines deprecated (IPV4) address. + IFA_F_TENTATIVE Defines tentative (IPV4) address (duplicate + address detection is still in progress). + Scope: 8 bits + The address scope in which the address stays valid. + SCOPE_UNIVERSE: Global scope. + SCOPE_SITE (IPv6 only): Only valid within this site. + SCOPE_LINK: Valid only on this device. + SCOPE_HOST: Valid only on this host. + + le attributes: + + Attribute Description + IFA_UNSPEC Unspecified. + IFA_ADDRESS Raw protocol address of interface. + IFA_LOCAL Raw protocol local address. + IFA_LABEL ASCII string name of the interface. + IFA_BROADCAST Raw protocol broadcast address. + IFA_ANYCAST Raw protocol anycast address. + IFA_CACHEINFO Cache address information. + + Netlink messages specific to this service: RTM_NEWADDR, + RTM_DELADDR, and RTM_GETADDR. + +3. Currently Defined Netlink IP Services + + Although there are many other IP services defined that are using + Netlink, as mentioned earlier, we will talk only about a handful of + those integrated into kernel version 2.4.6. These are: + + NETLINK_ROUTE, NETLINK_FIREWALL, and NETLINK_ARPD. + +3.1. IP Service NETLINK_ROUTE + + This service allows CPCs to modify the IPv4 routing table in the + Forwarding Engine. It can also be used by CPCs to receive routing + updates, as well as to collect statistics. + +3.1.1. Network Route Service Module + + This service provides the ability to create, remove or receive + information about a network route. The service message template is + shown below. + + + + +Salim, et. al. Informational [Page 16] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Family | Src length | Dest length | TOS | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Table ID | Protocol | Scope | Type | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + Family: 8 bits + Address Family: AF_INET for IPv4; and AF_INET6 for IPV6. + + Src length: 8 bits + Prefix length of source IP address. + + Dest length: 8 bits + Prefix length of destination IP address. + + TOS: 8 bits + The 8-bit TOS (should be deprecated to make room for DSCP). + Table ID: 8 bits + Table identifier. Up to 255 route tables are supported. + RT_TABLE_UNSPEC An unspecified routing table. + RT_TABLE_DEFAULT The default table. + RT_TABLE_MAIN The main table. + RT_TABLE_LOCAL The local table. + + The user may assign arbitrary values between + RT_TABLE_UNSPEC(0) and RT_TABLE_DEFAULT(253). + + Protocol: 8 bits + Identifies what/who added the route. + Protocol Route origin. + .............................................. + RTPROT_UNSPEC Unknown. + RTPROT_REDIRECT By an ICMP redirect. + RTPROT_KERNEL By the kernel. + RTPROT_BOOT During bootup. + RTPROT_STATIC By the administrator. + + Values larger than RTPROT_STATIC(4) are not interpreted by the + kernel, they are just for user information. They may be used to + tag the source of a routing information or to distinguish between + multiple routing daemons. See <linux/rtnetlink.h> for the + routing daemon identifiers that are already assigned. + + + + +Salim, et. al. Informational [Page 17] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Scope: 8 bits + Route scope (valid distance to destination). + RT_SCOPE_UNIVERSE Global route. + RT_SCOPE_SITE Interior route in the + local autonomous system. + RT_SCOPE_LINK Route on this link. + RT_SCOPE_HOST Route on the local host. + RT_SCOPE_NOWHERE Destination does not exist. + + + The values between RT_SCOPE_UNIVERSE(0) and RT_SCOPE_SITE(200) + are available to the user. + + Type: 8 bits + The type of route. + + Route type Description + ---------------------------------------------------- + RTN_UNSPEC Unknown route. + RTN_UNICAST A gateway or direct route. + RTN_LOCAL A local interface route. + RTN_BROADCAST A local broadcast route + (sent as a broadcast). + RTN_ANYCAST An anycast route. + RTN_MULTICAST A multicast route. + RTN_BLACKHOLE A silent packet dropping route. + RTN_UNREACHABLE An unreachable destination. + Packets dropped and host + unreachable ICMPs are sent to the + originator. + RTN_PROHIBIT A packet rejection route. Packets + are dropped and communication + prohibited ICMPs are sent to the + originator. + RTN_THROW When used with policy routing, + continue routing lookup in another + table. Under normal routing, + packets are dropped and net + unreachable ICMPs are sent to the + originator. + RTN_NAT A network address translation + rule. + RTN_XRESOLVE Refer to an external resolver (not + implemented). + + + + + + + +Salim, et. al. Informational [Page 18] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Flags: 32 bits + Further qualify the route. + RTM_F_NOTIFY If the route changes, notify the + user. + RTM_F_CLONED Route is cloned from another route. + RTM_F_EQUALIZE Allow randomization of next hop + path in multi-path routing + (currently not implemented). + + Attributes applicable to this service: + Attribute Description + --------------------------------------------------- + RTA_UNSPEC Ignored. + RTA_DST Protocol address for route + destination address. + RTA_SRC Protocol address for route source + address. + RTA_IIF Input interface index. + RTA_OIF Output interface index. + RTA_GATEWAY Protocol address for the gateway of + the route + RTA_PRIORITY Priority of route. + RTA_PREFSRC Preferred source address in cases + where more than one source address + could be used. + RTA_METRICS Route metrics attributed to route + and associated protocols (e.g., + RTT, initial TCP window, etc.). + RTA_MULTIPATH Multipath route next hop's + attributes. + RTA_PROTOINFO Firewall based policy routing + attribute. + RTA_FLOW Route realm. + RTA_CACHEINFO Cached route information. + + + Additional Netlink message types applicable to this service: + RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE + + + + + + + + + + + + + +Salim, et. al. Informational [Page 19] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +3.1.2. Neighbor Setup Service Module + + This service provides the ability to add, remove, or receive + information about a neighbor table entry (e.g., an ARP entry or an + IPv4 neighbor solicitation, etc.). The service message template is + shown below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Family | Reserved1 | Reserved2 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Interface Index | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | State | Flags | Type | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Family: 8 bits + Address Family: AF_INET for IPv4; and AF_INET6 for IPV6. + + Interface Index: 32 bits + The unique interface index. + + State: 16 bits + A bitmask of the following states: + NUD_INCOMPLETE Still attempting to resolve. + NUD_REACHABLE A confirmed working cache entry + NUD_STALE an expired cache entry. + NUD_DELAY Neighbor no longer reachable. + Traffic sent, waiting for + confirmation. + NUD_PROBE A cache entry that is currently + being re-solicited. + NUD_FAILED An invalid cache entry. + NUD_NOARP A device which does not do neighbor + discovery (ARP). + NUD_PERMANENT A static entry. + Flags: 8 bits + NTF_PROXY A proxy ARP entry. + NTF_ROUTER An IPv6 router. + + + + + + + + + + + +Salim, et. al. Informational [Page 20] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Attributes applicable to this service: + Attributes Description + ------------------------------------ + NDA_UNSPEC Unknown type. + NDA_DST A neighbour cache network. + layer destination address + NDA_LLADDR A neighbor cache link layer + address. + NDA_CACHEINFO Cache statistics. + + Additional Netlink message types applicable to this service: + RTM_NEWNEIGH, RTM_DELNEIGH, and RTM_GETNEIGH. + +3.1.3. Traffic Control Service + + This service provides the ability to provision, query or listen to + events under the auspices of traffic control. These include queuing + disciplines, (schedulers and queue treatment algorithms -- e.g., + priority-based scheduler or the RED algorithm) and classifiers. + Linux Traffic Control Service is very flexible and allows for + hierarchical cascading of the different blocks for traffic resource + sharing. + + ++ ++ +-----+ +-------+ ++ ++ .++ + || . || +------+ | |-->| Qdisc |-->|| || || + || ||---->|Filter|--->|Class| +-------+ ||-+ || || + || || | +------+ | +---------------+| | || || + || . || | +----------------------+ | || .|| + || . || | +------+ | || || + || || +->|Filter|-_ +-----+ +-------+ ++ | || .|| + || -->|| | +------+ ->| |-->| Qdisc |-->|| | ||->|| + || . || | |Class| +-------+ ||-+-->|| .|| + ->dev->|| || | +------+ _->| +---------------+| || || + || || +->|Filter|- +----------------------+ || .|| + || || +------+ || .|| + || . |+----------------------------------------------+| || + || | Parent Queuing discipline | .|| + || . +------------------------------------------------+ .|| + || . . .. . . .. . . . .. .. .. . .. || + |+--------------------------------------------------------+| + | Parent Queuing discipline | + | (attached to egress device) | + +----------------------------------------------------------+ + + The above diagram shows an example of the Egress TC block. We try to + be very brief here. For more information, please refer to [11]. A + packet first goes through a filter that is used to identify a class + to which the packet may belong. A class is essentially a terminal + + + +Salim, et. al. Informational [Page 21] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + queuing discipline and has a queue associated with it. The queue may + be subject to a simple algorithm, like FIFO, or a more complex one, + like RED or a token bucket. The outermost queuing discipline, which + is referred to as the parent is typically associated with a + scheduler. Within this scheduler hierarchy, however, may be other + scheduling algorithms, making the Linux Egress TC very flexible. + + The service message template that makes this possible is shown below. + This template is used in both the ingress and the egress queuing + disciplines (refer to the egress traffic control model in the FE + model section). Each of the specific components of the model has + unique attributes that describe it best. The common attributes are + described below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Family | Reserved1 | Reserved2 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Interface Index | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Qdisc handle | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Parent Qdisc | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | TCM Info | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Family: 8 bits + Address Family: AF_INET for IPv4; and AF_INET6 for IPV6. + + Interface Index: 32 bits + The unique interface index. + + Qdisc handle: 32 bits + Unique identifier for instance of queuing discipline. Typically, + this is split into major:minor of 16 bits each. The major number + would also be the major number of the parent of this instance. + + Parent Qdisc: 32 bits + Used in hierarchical layering of queuing disciplines. If this value + and the Qdisc handle are the same and equal to TC_H_ROOT, then the + defined qdisc is the top most layer known as the root qdisc. + + TCM Info: 32 bits + Set by the FE to 1 typically, except when the Qdisc instance is in + use, in which case it is set to imply a reference count. From the + CPC towards the direction of the FEC, this is typically set to 0 + + + +Salim, et. al. Informational [Page 22] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + except when used in the context of filters. In that case, this 32- + bit field is split into a 16-bit priority field and 16-bit protocol + field. The protocol is defined in kernel source + <include/linux/if_ether.h>, however, the most commonly used one is + ETH_P_IP (the IP protocol). + + The priority is used for conflict resolution when filters intersect + in their expressions. + + Generic attributes applicable to this service: + Attribute Description + ------------------------------------ + TCA_KIND Canonical name of FE component. + TCA_STATS Generic usage statistics of FEC + TCA_RATE rate estimator being attached to + FEC. Takes snapshots of stats to + compute rate. + TCA_XSTATS Specific statistics of FEC. + TCA_OPTIONS Nested FEC-specific attributes. + + Appendix 3 has an example of configuring an FE component for a FIFO + Qdisc. + + Additional Netlink message types applicable to this service: + RTM_NEWQDISC, RTM_DELQDISC, RTM_GETQDISC, RTM_NEWTCLASS, + RTM_DELTCLASS, RTM_GETTCLASS, RTM_NEWTFILTER, RTM_DELTFILTER, and + RTM_GETTFILTER. + +3.2. IP Service NETLINK_FIREWALL + + This service allows CPCs to receive, manipulate, and re-inject + packets via the IPv4 firewall service modules in the FE. A firewall + rule is first inserted to activate packet redirection. The CPC + informs the FEC whether it would like to receive just the metadata on + the packet or the actual data and, if the metadata is desired, what + is the maximum data length to be redirected. The redirected packets + are still stored in the FEC, waiting a verdict from the CPC. The + verdict could constitute a simple accept or drop decision of the + packet, in which case the verdict is imposed on the packet still + sitting on the FEC. The verdict may also include a modified packet + to be sent on as a replacement. + + + + + + + + + + +Salim, et. al. Informational [Page 23] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Two types of messages exist that can be sent from CPC to FEC. These + are: Mode messages and Verdict messages. Mode messages are sent + immediately to the FEC to describe what the CPC would like to + receive. Verdict messages are sent to the FEC after a decision has + been made on the fate of a received packet. The formats are + described below. + + The mode message is described first. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Mode | Reserved1 | Reserved2 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Range | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Mode: 8 bits + Control information on the packet to be sent to the CPC. The + different types are: + + IPQ_COPY_META Copy only packet metadata to CPC. + IPQ_COPY_PACKET Copy packet metadata and packet payloads + to CPC. + + Range: 32 bits + If IPQ_COPY_PACKET, this defines the maximum length to copy. + + + + + + + + + + + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 24] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + A packet and associated metadata received from user space looks + as follows. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Packet ID | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Mark | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | timestamp_m | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | timestamp_u | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | hook | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | indev_name | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | outdev_name | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | hw_protocol | hw_type | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | hw_addrlen | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | hw_addr | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | data_len | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Payload . . . | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Packet ID: 32 bits + The unique packet identifier as passed to the CPC by the FEC. + + Mark: 32 bits + The internal metadata value set to describe the rule in which + the packet was picked. + + timestamp_m: 32 bits + Packet arrival time (seconds) + + timestamp_u: 32 bits + Packet arrival time (useconds in addition to the seconds in + timestamp_m) + + hook: 32 bits + The firewall module from which the packet was picked. + + + + +Salim, et. al. Informational [Page 25] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + indev_name: 128 bits + ASCII name of incoming interface. + + outdev_name: 128 bits + ASCII name of outgoing interface. + + hw_protocol: 16 bits + Hardware protocol, in network order. + + hw_type: 16 bits + Hardware type. + + hw_addrlen: 8 bits + Hardware address length. + + hw_addr: 64 bits + Hardware address. + + data_len: 32 bits + Length of packet data. + + Payload: size defined by data_len + The payload of the packet received. + + The Verdict message format is as follows + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Value | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Packet ID | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Payload . . . | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Value: 32 bits + + This is the verdict to be imposed on the packet still sitting + in the FEC. Verdicts could be: + + NF_ACCEPT Accept the packet and let it continue its + traversal. + NF_DROP Drop the packet. + + + + + +Salim, et. al. Informational [Page 26] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + Packet ID: 32 bits + The packet identifier as passed to the CPC by the FEC. + + Data Length: 32 bits + The data length of the modified packet (in bytes). If you don't + modify the packet just set it to 0. + + Payload: + Size as defined by the Data Length field. + +3.3. IP Service NETLINK_ARPD + + This service is used by CPCs for managing the neighbor table in the + FE. The message format used between the FEC and CPC is described in + the section on the Neighbor Setup Service Module. + + The CPC service is expected to participate in neighbor solicitation + protocol(s). + + A neighbor message of type RTM_NEWNEIGH is sent towards the CPC by + the FE to inform the CPC of changes that might have happened on that + neighbor's entry (e.g., a neighbor being perceived as unreachable). + + RTM_GETNEIGH is used to solicit the CPC for information on a specific + neighbor. + +4. References + +4.1. Normative References + + [1] Braden, R., Clark, D. and S. Shenker, "Integrated Services in + the Internet Architecture: an Overview", RFC 1633, June 1994. + + [2] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, + June 1995. + + [3] Blake, S., Black, D., Carlson, M., Davies, E, Wang, Z. and W. + Weiss, "An Architecture for Differentiated Services", RFC 2475, + December 1998. + + [4] Durham, D., Boyle, J., Cohen, R., Herzog, S., Rajan, R. and A. + Sastry, "The COPS (Common Open Policy Service) Protocol", RFC + 2748, January 2000. + + [5] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. + + [6] Case, J., Fedor, M., Schoffstall, M. and C. Davin, "Simple + Network Management Protocol (SNMP)", STD 15, RFC 1157, May 1990. + + + +Salim, et. al. Informational [Page 27] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + [7] Andersson, L., Doolan, P., Feldman, N., Fredette, A. and B. + Thomas, "LDP Specification", RFC 3036, January 2001. + + [8] Bernet, Y., Blake, S., Grossman, D. and A. Smith, "An Informal + Management Model for DiffServ Routers", RFC 3290, May 2002. + +4.2. Informative References + + [9] G. R. Wright, W. Richard Stevens. "TCP/IP Illustrated Volume 2, + Chapter 20", June 1995. + + [10] http://www.netfilter.org + + [11] http://diffserv.sourceforge.net + +5. Security Considerations + + Netlink lives in a trusted environment of a single host separated by + kernel and user space. Linux capabilities ensure that only someone + with CAP_NET_ADMIN capability (typically, the root user) is allowed + to open sockets. + +6. Acknowledgements + + 1) Andi Kleen, for man pages on netlink and rtnetlink. + + 2) Alexey Kuznetsov is credited for extending Netlink to the IP + service delivery model. The original Netlink character device was + written by Alan Cox. + + 3) Jeremy Ethridge for taking the role of someone who did not + understand Netlink and reviewing the document to make sure that it + made sense. + + + + + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 28] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +Appendix 1: Sample Service Hierarchy + + In the diagram below we show a simple IP service, foo, and the + interaction it has between CP and FE components for the service + (labels 1-3). + + The diagram is also used to demonstrate CP<->FE addressing. In this + section, we illustrate only the addressing semantics. In Appendix 2, + the diagram is referenced again to define the protocol interaction + between service foo's CPC and FEC (labels 4-10). + + CP + [--------------------------------------------------------. + | .-----. | + | | . -------. | + | | CLI | / \ | + | | | | CP protocol | | + | /->> -. | component | <-. | + | __ _/ | | For | | | + | | | IP service | ^ | + | Y | foo | | | + | | ___________/ ^ | + | Y 1,4,6,8,9 / ^ 2,5,10 | 3,7 | + --------------- Y------------/---|----------|----------- + | ^ | ^ + **|***********|****|**********|********** + ************* Netlink layer ************ + **|***********|****|**********|********** + FE | | ^ ^ + .-------- Y-----------Y----|--------- |----. + | | / | + | Y / | + | . --------^-------. / | + | |FE component/module|/ | + | | for IP Service | | + --->---|------>---| foo |----->-----|------>-- + | ------------------- | + | | + | | + ------------------------------------------ + + The control plane protocol for IP service foo does the following to + connect to its FE counterpart. The steps below are also numbered + above in the diagram. + + 1) Connect to the IP service foo through a socket connect. A typical + connection would be via a call to: socket(AF_NETLINK, SOCK_RAW, + NETLINK_FOO). + + + +Salim, et. al. Informational [Page 29] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + + 2) Bind to listen to specific asynchronous events for service foo. + + 3) Bind to listen to specific asynchronous FE events. + +Appendix 2: Sample Protocol for the Foo IP Service + + Our example IP service foo is used again to demonstrate how one can + deploy a simple IP service control using Netlink. + + These steps are continued from Appendix 1 (hence the numbering). + + 4) Query for current config of FE component. + + 5) Receive response to (4) via channel on (3). + + 6) Query for current state of IP service foo. + + 7) Receive response to (6) via channel on (2). + + 8) Register the protocol-specific packets you would like the FE to + forward to you. + + 9) Send service-specific foo commands and receive responses for them, + if needed. + +Appendix 2a: Interacting with Other IP services + + The diagram in Appendix 1 shows another control component configuring + the same service. In this case, it is a proprietary Command Line + Interface. The CLI may or may not be using the Netlink protocol to + communicate to the foo component. If the CLI issues commands that + will affect the policy of the FEC for service foo then, then the foo + CPC is notified. It could then make algorithmic decisions based on + this input. For example, if an FE allowed another service to delete + policies installed by a different service and a policy that foo + installed was deleted by service bar, there might be a need to + propagate this to all the peers of service foo. + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 30] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +Appendix 3: Examples + + In this example, we show a simple configuration Netlink message sent + from a TC CPC to an egress TC FIFO queue. This queue algorithm is + based on packet counting and drops packets when the limit exceeds 100 + packets. We assume that the queue is in a hierarchical setup with a + parent 100:0 and a classid of 100:1 and that it is to be installed on + a device with an ifindex of 4. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Length (52) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type (RTM_NEWQDISC) | Flags (NLM_F_EXCL | | + | |NLM_F_CREATE | NLM_F_REQUEST)| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Sequence Number(arbitrary number) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Process ID (0) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |Family(AF_INET)| Reserved1 | Reserved1 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Interface Index (4) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Qdisc handle (0x1000001) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Parent Qdisc (0x1000000) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | TCM Info (0) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type (TCA_KIND) | Length(4) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Value ("pfifo") | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type (TCA_OPTIONS) | Length(4) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Value (limit=100) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + + + + + + + + + + +Salim, et. al. Informational [Page 31] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +Authors' Addresses + + Jamal Hadi Salim + Znyx Networks + Ottawa, Ontario + Canada + + EMail: hadi@znyx.com + + + Hormuzd M Khosravi + Intel + 2111 N.E. 25th Avenue JF3-206 + Hillsboro OR 97124-5961 + USA + + Phone: +1 503 264 0334 + EMail: hormuzd.m.khosravi@intel.com + + + Andi Kleen + SuSE + Stahlgruberring 28 + 81829 Muenchen + Germany + + EMail: ak@suse.de + + + Alexey Kuznetsov + INR/Swsoft + Moscow + Russia + + EMail: kuznet@ms2.inr.ac.ru + + + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 32] + +RFC 3549 Linux Netlink as an IP Services Protocol July 2003 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2003). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assignees. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Salim, et. al. Informational [Page 33] + |