diff options
Diffstat (limited to 'doc/rfc/rfc1224.txt')
-rw-r--r-- | doc/rfc/rfc1224.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc1224.txt b/doc/rfc/rfc1224.txt new file mode 100644 index 0000000..06d838f --- /dev/null +++ b/doc/rfc/rfc1224.txt @@ -0,0 +1,1235 @@ + + + + + + +Network Working Group L. Steinberg +Request for Comments: 1224 IBM Corporation + May 1991 + + + + Techniques for Managing Asynchronously Generated Alerts + +Status of this Memo + + This memo defines common mechanisms for managing asynchronously + produced alerts in a manner consistent with current network + management protocols. + + This memo specifies an Experimental Protocol for the Internet + community. Discussion and suggestions for improvement are requested. + Please refer to the current edition of the "IAB Official Protocol + Standards" for the standardization state and status of this protocol. + Distribution of this memo is unlimited. + +Abstract + + This RFC explores mechanisms to prevent a remotely managed entity + from burdening a manager or network with an unexpected amount of + network management information, and to ensure delivery of "important" + information. The focus is on controlling the flow of asynchronously + generated information, and not how the information is generated. + +Table of Contents + + 1. Introduction................................................... 2 + 2. Problem Definition............................................. 3 + 2.1 Polling Advantages............................................ 3 + (a) Reliable detection of failures............................... 3 + (b) Reduced protocol complexity on managed entity................ 3 + (c) Reduced performance impact on managed entity................. 3 + (d) Reduced configuration requirements to manage remote entity... 4 + 2.2 Polling Disadvantages......................................... 4 + (a) Response time for problem detection.......................... 4 + (b) Volume of network management traffic generated............... 4 + 2.3 Alert Advantages.............................................. 5 + (a) Real-time knowledge of problems.............................. 5 + (b) Minimal amount of network management traffic................. 5 + 2.4 Alert Disadvantages........................................... 5 + (a) Potential loss of critical information....................... 5 + (b) Potential to over-inform a manager........................... 5 + 3. Specific Goals of this Memo.................................... 6 + 4. Compatibility with Existing Network Management Protocols....... 6 + + + +Steinberg [Page 1] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + 5. Closed Loop "Feedback" Alert Reporting with a "Pin" Sliding + Window Limit................................................... 6 + 5.1 Use of Feedback............................................... 7 + 5.1.1 Example..................................................... 8 + 5.2 Notes on Feedback/Pin usage................................... 8 + 6. Polled, Logged Alerts.......................................... 9 + 6.1 Use of Polled, Logged Alerts.................................. 10 + 6.1.1 Example..................................................... 12 + 6.2 Notes on Polled, Logged Alerts................................ 12 + 7. Compatibility with SNMP and CMOT .............................. 14 + 7.1 Closed Loop Feedback Alert Reporting.......................... 14 + 7.1.1 Use of Feedback with SNMP................................... 14 + 7.1.2 Use of Feedback with CMOT................................... 14 + 7.2 Polled, Logged Alerts......................................... 14 + 7.2.1 Use of Polled, Logged Alerts with SNMP...................... 14 + 7.2.2 Use of Polled, Logged Alerts with CMOT...................... 15 + 8. Notes on Multiple Manager Environments......................... 15 + 9. Summary........................................................ 16 + 10. References.................................................... 16 + 11. Acknowledgements.............................................. 17 + Appendix A. Example of polling costs............................. 17 + Appendix B. MIB object definitions............................... 19 + Security Considerations........................................... 22 + Author's Address.................................................. 22 + +1. Introduction + + This memo defines mechanisms to prevent a remotely managed entity + from burdening a manager or network with an unexpected amount of + network management information, and to ensure delivery of "important" + information. The focus is on controlling the flow of asynchronously + generated information, and not how the information is generated. + Mechanisms for generating and controlling the generation of + asynchronous information may involve protocol specific issues. + + There are two understood mechanisms for transferring network + management information from a managed entity to a manager: request- + response driven polling, and the unsolicited sending of "alerts". + Alerts are defined as any management information delivered to a + manager that is not the result of a specific query. Advantages and + disadvantages exist within each method. They are detailed in section + 2 below. + + Alerts in a failing system can be generated so rapidly that they + adversely impact functioning resources. They may also fail to be + delivered, and critical information maybe lost. Methods are needed + both to limit the volume of alert transmission and to assist in + delivering a minimum amount of information to a manager. + + + +Steinberg [Page 2] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + It is our belief that managed agents capable of asynchronously + generating alerts should attempt to adopt mechanisms that fill both + of these needs. For reasons shown in section 2.4, it is necessary to + fulfill both alert-management requirements. A complete alert-driven + system must ensure that alerts are delivered or their loss detected + with a means to recreate the lost information, AND it must not allow + itself to overburden its manager with an unreasonable amount of + information. + +2. Problem Definition + + The following discusses the relative advantages and disadvantages of + polled vs. alert driven management. + +2.1 Polling Advantages + + (a) Reliable detection of failures. + + A manager that polls for all of its information can + more readily determine machine and network failures; + a lack of a response to a query indicates problems + with the machine or network. A manager relying on + notification of problems might assume that a faulty + system is good, should the alert be unable to reach + its destination, or the managed system be unable to + correctly generate the alert. Examples of this + include network failures (in which an isolated network + cannot deliver the alert), and power failures (in which + a failing machine cannot generate an alert). More + subtle forms of failure in the managed entity might + produce an incorrectly generated alert, or no alert at + all. + + (b) Reduced protocol complexity on managed entity + + The use of a request-response based system is based on + conservative assumptions about the underlying transport + protocol. Timeouts and retransmits (re-requests) can + be built into the manager. In addition, this allows + the manager to affect the amount of network management + information flowing across the network directly. + + (c) Reduced performance impact on managed entity + + In a purely polled system, there is no danger of having + to often test for an alert condition. This testing + takes CPU cycles away from the real mission of the + managed entity. Clearly, testing a threshold on each + + + +Steinberg [Page 3] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + packet received could have unwanted performance effects + on machines such as gateways. Those who wish to use + thresholds and alerts must choose the parameters to be + tested with great care, and should be strongly + discouraged from updating statistics and checking values + frequently. + + (d) Reduced Configuration Requirements to manage remote + entity + + Remote, managed entities need not be configured + with one or more destinations for reporting information. + Instead, the entity merely responds to whomever + makes a specific request. When changing the network + configuration, there is never a need to reconfigure + all remote manageable systems. In addition, any number + of "authorized" managers (i.e., those passing any + authentication tests imposed by the network management + protocol) may obtain information from any managed entity. + This occurs without reconfiguring the entity and + without reaching an entity-imposed limit on the maximum + number of potential managers. + +2.2 Polling Disadvantages + + (a) Response time for problem detection + + Having to poll many MIB [2] variables per machine on + a large number of machines is itself a real + problem. The ability of a manager to monitor + such a system is limited; should a system fail + shortly after being polled there may be a significant + delay before it is polled again. During this time, + the manager must assume that a failing system is + acceptable. See Appendix A for a hypothetical + example of such a system. + + It is worthwhile to note that while improving the mean + time to detect failures might not greatly improve the + time to correct the failure, the problem will generally + not be repaired until it is detected. In addition, + most network managers would prefer to at least detect + faults before network users start phoning in. + + (b) Volume of network management traffic + + Polling many objects (MIB variables) on many machines + greatly increases the amount of network management + + + +Steinberg [Page 4] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + traffic flowing across the network (see Appendix A). + While it is possible to minimize this through the use + of hierarchies (polling a machine for a general status + of all the machines it polls), this aggravates the + response time problem previously discussed. + +2.3 Alert Advantages + + (a) Real-time Knowledge of Problems + + Allowing the manager to be notified of problems + eliminates the delay imposed by polling many objects/ + systems in a loop. + + (b) Minimal amount of Network Management Traffic + + Alerts are transmitted only due to detected errors. + By removing the need to transfer large amounts of status + information that merely demonstrate a healthy system, + network and system (machine processor) resources may be + freed to accomplish their primary mission. + +2.4 Alert Disadvantages + + (a) Potential Loss of Critical Information + + Alerts are most likely not to be delivered when the + managed entity fails (power supply fails) or the + network experiences problems (saturated or isolated). + It is important to remember that failing machines and + networks cannot be trusted to inform a manager that + they are failing. + + (b) Potential to Over-inform the Manager + + An "open loop" system in which the flow of alerts to + a manager is fully asynchronous can result in an excess + of alerts being delivered (e.g., link up/down messages + when lines vacillate). This information places an extra + burden on a strained network, and could prevent the + manager from disabling the mechanism generating the + alerts; all available network bandwidth into the manager + could be saturated with incoming alerts. + + Most major network management systems strive to use an optimal + combination of alerts and polling. Doing so preserves the advantages + of each while eliminating the disadvantages of pure polling. + + + + +Steinberg [Page 5] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + +3. Specific Goals of this Memo + + This memo suggests mechanisms to minimize the disadvantages of alert + usage. An optimal system recognizes the potential problems + associated with sending too many alerts in which a manager becomes + ineffective at managing, and not adequately using alerts (especially + given the volumes of data that must be actively monitored with poor + scaling). It is the author's belief that this is best done by + allowing alert mechanisms that "close down" automatically when over- + delivering asynchronous (unexpected) alerts, and that also allow a + flow of synchronous alert information through a polled log. The use + of "feedback" (with a sliding window "pin") discussed in section 5 + addresses the former need, while the discussion in section 6 on + "polled, logged alerts" does the latter. + + This memo does not attempt to define mechanisms for controlling the + asynchronous generation of alerts, as such matters deal with + specifics of the management protocol. In addition, no attempt is + made to define what the content of an alert should be. The feedback + mechanism does require the addition of a single alert type, but this + is not meant to impact or influence the techniques for generating any + other alert (and can itself be generated from a MIB object or the + management protocol). To make any effective use of the alert + mechanisms described in this memo, implementation of several MIB + objects is required in the relevant managed systems. The location of + these objects in the MIB is under an experimental subtree delegated + to the Alert-Man working group of the Internet Engineering Task Force + (IETF) and published in the "Assigned Numbers" RFC [5]. Currently, + this subtree is defined as + + alertMan ::= { experimental 24 }. + +4. Compatibility With Existing Network Management Protocols + + It is the intent of this document to suggest mechanisms that violate + neither the letter nor the spirit of the protocols expressed in CMOT + [3] and SNMP [4]. To achieve this goal, each mechanism described + will give an example of its conformant use with both SNMP and CMOT. + +5. Closed Loop "Feedback" Alert Reporting with a "Pin" Sliding + Window Limit + + One technique for preventing an excess of alerts from being delivered + involves required feedback to the managed agent. The name "feedback" + describes a required positive response from a potentially "over- + reported" manager, before a remote agent may continue transmitting + alerts at a high rate. A sliding window "pin" threshold (so named + for the metal on the end of a meter) is established as a part of a + + + +Steinberg [Page 6] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + user-defined SNMP trap, or as a managed CMOT event. This threshold + defines the maximum allowable number of alerts ("maxAlertsPerTime") + that may be transmitted by the agent, and the "windowTime" in seconds + that alerts are tested against. Note that "maxAlertsPerTime" + represents the sum total of all alerts generated by the agent, and is + not duplicated for each type of alert that an agent might generate. + Both "maxAlertsPerTime" and "windowTime" are required MIB objects of + SMI [1] type INTEGER, must be readable, and may be writable should + the implementation permit it. + + Two other items are required for the feedback technique. The first + is a Boolean MIB object (SMI type is INTEGER, but it is treated as a + Boolean whose only value is zero, i.e., "FALSE") named + "alertsEnabled", which must have read and write access. The second + is a user defined alert named "alertsDisabled". Please see Appendix + B for their complete definitions. + +5.1 Use of Feedback + + When an excess of alerts is being generated, as determined by the + total number of alerts exceeding "maxAlertsPerTime" within + "windowTime" seconds, the agent sets the Boolean value of + "alertsEnabled" to "FALSE" and sends a single alert of type + "alertsDisabled". + + Again, the pin mechanism operates on the sum total of all alerts + generated by the remote system. Feedback is implemented once per + agent and not separately for each type of alert in each agent. While + it is also possible to implement the Feedback/Pin technique on a per + alert-type basis, such a discussion belongs in a document dealing + with controlling the generation of individual alerts. + + The typical use of feedback is detailed in the following steps: + + (a) Upon initialization of the agent, the value of + "alertsEnabled" is set to "TRUE". + + (b) Each time an alert is generated, the value of + "alertsEnabled" is tested. Should the value be "FALSE", + no alert is sent. If the value is "TRUE", the alert is + sent and the current time is stored locally. + + (c) If at least "maxAlertsPerTime" have been generated, the + agent calculates the difference of time stored for the + new alert from the time associated with alert generated + "maxAlertsPerTime" previously. Should this amount be + less than "windowTime", a single alert of the type + "alertsDisabled" is sent to the manager and the value of + + + +Steinberg [Page 7] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + "alertsEnabled" is then set to "FALSE". + + (d) When a manager receives an alert of the type "Alerts- + Disabled", it is expected to set "alertsEnabled" back + to "TRUE" to continue to receive alert reports. + +5.1.1 Example + + In a sample system, the maximum number of alerts any single managed + entity may send the manager is 10 in any 3 second interval. A + circular buffer with a maximum depth of 10 time of day elements is + defined to accommodate statistics keeping. + + After the first 10 alerts have been sent, the managed entity tests + the time difference between its oldest and newest alerts. By testing + the time for a fixed number of alerts, the system will never disable + itself merely because a few alerts were transmitted back to back. + + The mechanism will disable reporting only after at least 10 alerts + have been sent, and the only if the last 10 all occurred within a 3 + second interval. As alerts are sent over time, the list maintains + data on the last 10 alerts only. + +5.2 Notes on Feedback/Pin Usage + + A manager may periodically poll "alertsEnabled" in case an + "alertsDisabled" alert is not delivered by the network. Some + implementers may also choose to add COUNTER MIB objects to show the + total number of alerts transmitted and dropped by "alertsEnabled" + being FALSE. While these may yield some indication of the number of + lost alerts, the use of "Polled, Logged Alerts" offers a superset of + this function. + + Testing the alert frequency need not begin until a minimum number of + alerts have been sent (the circular buffer is full). Even then, the + actual test is the elapsed time to get a fixed number of alerts and + not the number of alerts in a given time period. This eliminates the + need for complex averaging schemes (keeping current alerts per second + as a frequency and redetermining the current value based on the + previous value and the time of a new alert). Also eliminated is the + problem of two back to back alerts; they may indeed appear to be a + large number of alerts per second, but the fact remains that there + are only two alerts. This situation is unlikely to cause a problem + for any manager, and should not trigger the mechanism. + + Since alerts are supposed to be generated infrequently, maintaining + the pin and testing the threshold should not impact normal + performance of the agent (managed entity). While repeated testing + + + +Steinberg [Page 8] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + may affect performance when an excess of alerts are being + transmitted, this effect would be minor compared to the cost of + generating and sending so many alerts. Long before the cost of + testing (in CPU cycles) becomes relatively high, the feedback + mechanism should disable alert sending and affect savings both in + alert sending and its own testing (note that the list maintenance and + testing mechanisms disable themselves when they disable alert + reporting). In addition, testing the value of "alertsEnabled" can + limit the CPU burden of building alerts that do not need to be sent. + + It is advised that the implementer consider allowing write access to + both the window size and the number of alerts allowed in a window's + time. In doing so, a management station has the option of varying + these parameters remotely before setting "alertsEnabled" to "TRUE". + Should either of these objects be set to 0, a conformant system will + disable the pin and feedback mechanisms and allow the agent to send + all of the alerts it generates. + + While the feedback mechanism is not high in CPU utilization costs, + those implementing alerts of any kind are again cautioned to exercise + care that the alerts tested do not occur so frequently as to impact + the performance of the agent's primary function. + + The user may prefer to send alerts via TCP to help ensure delivery of + the "alerts disabled" message, if available. + + The feedback technique is effective for preventing the over-reporting + of alerts to a manager. It does not assist with the problem of + "under-reporting" (see "polled, logged alerts" for this). + + It is possible to lose alerts while "alertsEnabled" is "FALSE". + Ideally, the threshold of "maxAlertsPerTime" should be set + sufficiently high that "alertsEnabled" is only set to "FALSE" during + "over-reporting" situations. To help prevent alerts from possibly + being lost when the threshold is exceeded, this method can be + combined with "polled, logged alerts" (see below). + +6. Polled, Logged Alerts + + A simple system that combines the request-response advantages of + polling while minimizing the disadvantages is "Polled, Logged + Alerts". Through the addition of several MIB objects, one gains a + system that minimizes network management traffic, lends itself to + scaling, eliminates the reliance on delivery, and imposes no + potential over-reporting problems inherent in pure alert driven + architectures. Minimizing network management traffic is affected by + reducing multiple requests to a single request. This technique does + not eliminate the need for polling, but reduces the amount of data + + + +Steinberg [Page 9] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + transferred and ensures the manager either alert delivery or + notification of an unreachable node. Note again, the goal is to + address the needs of information (alert) flow and not to control the + local generation of alerts. + +6.1 Use of Polled, Logged Alerts + + As alerts are generated by a remote managed entity, they are logged + locally in a table. The manager may then poll a single MIB object to + determine if any number of alerts have been generated. Each poll + request returns a copy of an "unacknowledged" alert from the alert + log, or an indication that the table is empty. Upon receipt, the + manager might "acknowledge" any alert to remove it from the log. + Entries in the table must be readable, and can optionally allow the + user to remove them by writing to or deleting them. + + This technique requires several additional MIB objects. The + alert_log is a SEQUENCE OF logTable entries that must be readable, + and can optionally have a mechanism to remove entries (e.g., SNMP set + or CMOT delete). An optional read-only MIB object of type INTEGER, + "maxLogTableEntries" gives the maximum number of log entries the + system will support. Please see Appendix B for their complete + definitions. + + The typical use of Polled, Logged Alerts is detailed below. + + (a) Upon initialization, the agent builds a pointer to a log + table. The table is empty (a sequence of zero entries). + + (b) Each time a local alert is generated, a logTable entry + is built with the following information: + + SEQUENCE { + alertId INTEGER, + alertData OPAQUE + } + + (1) alertId number of type INTEGER, set to 1 greater + than the previously generated alertId. If this is + the first alert generated, the value is initialized + to 1. This value should wrap (reset) to 1 when it + reaches 2**32. Note that the maximum log depth + cannot exceed (2**32)-1 entries. + + (2) a copy of the alert encapsulated in an OPAQUE. + + (c) The new log element is added to the table. Should + addition of the element exceed the defined maximum log + + + +Steinberg [Page 10] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + table size, the oldest element in the table (having the + lowest alertId) is replaced by the new element. + + (d) A manager may poll the managed agent for either the next + alert in the alert_table, or for a copy of the alert + associated with a specific alertId. A poll request must + indicate a specific alertId. The mechanism for obtaining + this information from a table is protocol specific, and + might use an SNMP GET or GET NEXT (with GET NEXT + following an instance of zero returning the first table + entry's alert) or CMOT's GET with scoping and filtering + to get alertData entries associated with alertId's + greater or less than a given instance. + + (e) An alertData GET request from a manager must always be + responded to with a reply of the entire OPAQUE alert + (SNMP TRAP, CMOT EVENT, etc.) or a protocol specific + reply indicating that the get request failed. + + Note that the actual contents of the alert string, and + the format of those contents, are protocol specific. + + (f) Once an alert is logged in the local log, it is up to + the individual architecture and implementation whether + or not to also send a copy asynchronously to the + manager. Doing so could be used to redirect the focus + of the polling (rather than waiting an average of 1/2 + the poll cycle to learn of a problem), but does not + result in significant problems should the alert fail to + be delivered. + + (g) Should a manager request an alert with alertId of 0, + the reply shall be the appropriate protocol specific + error response. + + (h) If a manager requests the alert immediately following + the alert with alertId equal to 0, the reply will be the + first alert (or alerts, depending on the protocol used) + in the alert log. + + (i) A manager may remove a specific alert from the alert log + by naming the alertId of that alert and issuing a + protocol specific command (SET or DELETE). If no such + alert exists, the operation is said to have failed and + such failure is reported to the manager in a protocol + specific manner. + + + + + +Steinberg [Page 11] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + +6.1.1 Example + + In a sample system (based on the example in Appendix A), a manager + must monitor 40 remote agents, each having between 2 and 15 + parameters which indicate the relative health of the agent and the + network. During normal monitoring, the manager is concerned only + with fault detection. With an average poll request-response time of + 5 seconds, the manager polls one MIB variable on each node. This + involves one request and one reply packet of the format specified in + the XYZ network management protocol. Each packet requires 120 bytes + "on the wire" (requesting a single object, ASN.1 encoded, IP and UDP + enveloped, and placed in an ethernet packet). This results in a + serial poll cycle time of 3.3 minutes (40 nodes at 5 seconds each is + 200 seconds), and a mean time to detect alert of slightly over 1.5 + minutes. The total amount of data transferred during a 3.3 minute + poll cycle is 9600 bytes (120 requests and 120 replies for each of 40 + nodes). With such a small amount of network management traffic per + minute, the poll rate might reasonably be doubled (assuming the + network performance permits it). The result is 19200 bytes + transferred per cycle, and a mean time to detect failure of under 1 + minute. Parallel polling obviously yields similar improvements. + + Should an alert be returned by a remote agent's log, the manager + notifies the operator and removes the element from the alert log by + setting it with SNMP or deleting it with CMOT. Normal alert + detection procedures are then followed. Those SNMP implementers who + prefer to not use SNMP SET for table entry deletes may always define + their log as "read only". The fact that the manager made a single + query (to the log) and was able to determine which, if any, objects + merited special attention essentially means that the status of all + alert capable objects was monitored with a single request. + + Continuing the above example, should a remote entity fail to respond + to two successive poll attempts, the operator is notified that the + agent is not reachable. The operator may then choose (if so + equipped) to contact the agent through an alternate path (such as + serial line IP over a dial up modem). Upon establishing such a + connection, the manager may then retrieve the contents of the alert + log for a chronological map of the failure's alerts. Alerts + undelivered because of conditions that may no longer be present are + still available for analysis. + +6.2 Notes on Polled, Logged Alerts + + Polled, logged alert techniques allow the tracking of many alerts + while actually monitoring only a single MIB object. This + dramatically decreases the amount of network management data that + must flow across the network to determine the status. By reducing + + + +Steinberg [Page 12] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + the number of requests needed to track multiple objects (to one), the + poll cycle time is greatly improved. This allows a faster poll cycle + (mean time to detect alert) with less overhead than would be caused + by pure polling. + + In addition, this technique scales well to large networks, as the + concept of polling a single object to learn the status of many lends + itself well to hierarchies. A proxy manager may be polled to learn + if he has found any alerts in the logs of the agents he polls. Of + course, this scaling does not save on the mean time to learn of an + alert (the cycle times of the manager and the proxy manager must be + considered), but the amount of network management polling traffic is + concentrated at lower levels. Only a small amount of such traffic + need be passed over the network's "backbone"; that is the traffic + generated by the request-response from the manager to the proxy + managers. + + Note that it is best to return the oldest logged alert as the first + table entry. This is the object most likely to be overwritten, and + every attempt should be made ensure that the manager has seen it. In + a system where log entries may be removed by the manager, the manager + will probably wish to attempt to keep all remote alert logs empty to + reduce the number of alerts dropped or overwritten. In any case, the + order in which table entries are returned is a function of the table + mechanism, and is implementation and/or protocol specific. + + "Polled, logged alerts" offers all of the advantages inherent in + polling (reliable detection of failures, reduced agent complexity + with UDP, etc.), while minimizing the typical polling problems + (potentially shorter poll cycle time and reduced network management + traffic). + + Finally, alerts are not lost when an agent is isolated from its + manager. When a connection is reestablished, a history of conditions + that may no longer be in effect is available to the manager. While + not a part of this document, it is worthwhile to note that this same + log architecture can be employed to archive alert and other + information on remote hosts. However, such non-local storage is not + sufficient to meet the reliability requirements of "polled, logged + alerts". + + + + + + + + + + + +Steinberg [Page 13] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + +7. Compatibility with SNMP [4] and CMOT [3] + +7.1 Closed Loop (Feedback) Alert Reporting + +7.1.1 Use of Feedback with SNMP + + At configuration time, an SNMP agent supporting Feedback/Pin is + loaded with default values of "windowTime" and "maxAlerts-PerTime", + and "alertsEnabled" is set to TRUE. The manager issues an SNMP GET + to determine "maxAlertsPerTime" and "windowTime", and to verify the + state of "alertsEnabled". Should the agent support setting Pin + objects, the manager may choose to alter these values (via an SNMP + SET). The new values are calculated based upon known network + resource limitations (e.g., the amount of packets the manager's + gateway can support) and the number of agents potentially reporting + to this manager. + + Upon receipt of an "alertsDisabled" trap, a manager whose state and + network are not overutilized immediately issues an SNMP SET to make + "alertsEnabled" TRUE. Should an excessive number of "alertsDisabled" + traps regularly occur, the manager might revisit the values chosen + for implementing the Pin mechanism. Note that an overutilized system + expects its manager to delay the resetting of "alertsEnabled". + + As a part of each regular polling cycle, the manager includes a GET + REQUEST for the value of "alertsEnabled". If this value is FALSE, it + is SET to TRUE, and the potential loss of traps (while it was FALSE) + is noted. + +7.1.2 Use of Feedback with CMOT + + The use of CMOT in implementing Feedback/Pin is essentially identical + to the use of SNMP. CMOT GET, SET, and EVENT replace their SNMP + counterparts. + +7.2 Polled, Logged Alerts + +7.2.1 Use of Polled, Logged alerts with SNMP + + As a part of regular polling, an SNMP manager using Polled, logged + alerts may issue a GET_NEXT Request naming + { alertLog logTableEntry(1) alertId(1) 0 }. Returned is either the + alertId of the first table entry or, if the table is empty, an SNMP + reply whose object is the "lexicographical successor" to the alert + log. + + Should an "alertId" be returned, the manager issues an SNMP GET + naming { alertLog logTableEntry(1) alertData(2) value } where "value" + + + +Steinberg [Page 14] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + is the alertId integer obtained from the previously described GET + NEXT. This returns the SNMP TRAP encapsulated within an OPAQUE. + + If the agent supports the deletion of table entries through SNMP + SETS, the manager may then issue a SET of { alertLog logTableEntry(1) + alertId(1) value } to remove the entry from the log. Otherwise, the + next GET NEXT poll of this agent should request the first "alertId" + following the instance of "value" rather than an instance of "0". + +7.2.2 Use of Polled, Logged Alerts with CMOT + + Using polled, logged alerts with CMOT is similar to using them with + SNMP. In order to test for table entries, one uses a CMOT GET and + specifies scoping to the alertLog. The request is for all table + entries that have an alertId value greater than the last known + alertId, or greater than zero if the table is normally kept empty by + the manager. Should the agent support it, entries are removed with a + CMOT DELETE, an object of alertLog.entry, and a distinguishing + attribute of the alertId to remove. + +8. Multiple Manager Environments + + The conflicts between multiple managers with overlapping + administrative domains (generally found in larger networks) tend to + be resolved in protocol specific manners. This document has not + addressed them. However, real world demands require alert management + techniques to function in such environments. + + Complex agents can clearly respond to different managers (or managers + in different "communities") with different reply values. This allows + feedback and polled, logged alerts to appear completely independent + to differing autonomous regions (each region sees its own value). + Differing feedback thresholds might exist, and feedback can be + actively blocking alerts to one manager even after another manager + has reenabled its own alert reporting. All of this is transparent to + an SNMP user if based on communities, or each manager can work with a + different copy of the relevant MIB objects. Those implementing CMOT + might view these as multiple instances of the same feedback objects + (and allow one manager to query the state of another's feedback + mechanism). + + The same holds true for polled, logged alerts. One manager (or + manager in a single community/region) can delete an alert from its + view without affecting the view of another region's managers. + + Those preferring less complex agents will recognize the opportunity + to instrument proxy management. Alerts might be distributed from a + manager based alert exploder which effectively implements feedback + + + +Steinberg [Page 15] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + and polled, logged alerts for its subscribers. Feedback parameters + are set on each agent to the highest rate of any subscriber, and + limited by the distributor. Logged alerts are deleted from the view + at the proxy manager, and truly deleted at the agent only when all + subscribers have so requested, or immediately deleted at the agent + with the first proxy request, and maintained as virtual entries by + the proxy manager for the benefit of other subscribers. + +9. Summary + + While "polled, logged alerts" may be useful, they still have a + limitation: the mean time to detect failures and alerts increases + linearly as networks grow in size (hierarchies offer shorten + individual poll cycle times, but the mean detection time is the sum + of 1/2 of each cycle time). For this reason, it may be necessary to + supplement asynchronous generation of alerts (and "polled, logged + alerts") with unrequested transmission of the alerts on very large + networks. + + Whenever systems generate and asynchronously transmit alerts, the + potential to overburden (over-inform) a management station exists. + Mechanisms to protect a manager, such as the "Feedback/Pin" + technique, risk losing potentially important information. Failure to + implement asynchronous alerts increases the time for the manager to + detect and react to a problem. Over-reporting may appear less + critical (and likely) a problem than under-informing, but the + potential for harm exists with unbounded alert generation. + + An ideal management system will generate alerts to notify its + management station (or stations) of error conditions. However, these + alerts must be self limiting with required positive feedback. In + addition, the manager should periodically poll to ensure connectivity + to remote stations, and to retrieve copies of any alerts that were + not delivered by the network. + +10. References + + [1] Rose, M., and K. McCloghrie, "Structure and Identification of + Management Information for TCP/IP-based Internets", RFC 1155, + Performance Systems International and Hughes LAN Systems, May + 1990. + + [2] McCloghrie, K., and M. Rose, "Management Information Base for + Network Management of TCP/IP-based internets", RFC 1213, Hughes + LAN Systems, Inc., Performance Systems International, March 1991. + + [3] Warrier, U., Besaw, L., LaBarre, L., and B. Handspicker, "Common + Management Information Services and Protocols for the Internet + + + +Steinberg [Page 16] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + (CMOT) and (CMIP)", RFC 1189, Netlabs, Hewlett-Packard, The Mitre + Corporation, Digital Equipment Corporation, October 1990. + + [4] Case, J., Fedor, M., Schoffstall, M., and C. Davin, "Simple + Network Management Protocol" RFC 1157, SNMP Research, Performance + Systems International, Performance Systems International, MIT + Laboratory for Computer Science, May 1990. + + [5] Reynolds, J., and J. Postel, "Assigned Numbers", RFC 1060, + USC/Information Sciences Institute, March 1990. + +11. Acknowledgements + + This memo is the product of work by the members of the IETF Alert-Man + Working Group and other interested parties, whose efforts are + gratefully acknowledged here: + + Amatzia Ben-Artzi Synoptics Communications + Neal Bierbaum Vitalink Corp. + Jeff Case University of Tennessee at Knoxville + John Cook Chipcom Corp. + James Davin MIT + Mark Fedor Performance Systems International, Inc. + Steven Hunter Lawrence Livermore National Labs + Frank Kastenholz Clearpoint Research + Lee LaBarre Mitre Corp. + Bruce Laird BBN, Inc + Gary Malkin FTP Software, Inc. + Keith McCloghrie Hughes Lan Systems + David Niemi Contel Federal Systems + Lee Oattes University of Toronto + Joel Replogle NCSA + Jim Sheridan IBM Corp. + Steve Waldbusser Carnegie-Mellon University + Dan Wintringham Ohio Supercomputer Center + Rich Woundy IBM Corp. + +Appendix A + + Example of polling costs + + The following example is completely hypothetical, and arbitrary. + It assumes that a network manager has made decisions as to which + systems, and which objects on each system, must be continuously + monitored to determine the operational state of a network. It + does not attempt to discuss how such decisions are made, and + assumes that they were arrived at with the full understanding that + the costs of polling many objects must be weighed against the + + + +Steinberg [Page 17] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + level of information required. + + Consider a manager that must monitor 40 gateways and hosts on a + single network. Further assume that the average managed entity + has 10 MIB objects that must be watched to determine the device's + and network's overall "health". Under the XYZ network management + protocol, the manager may get the values of up to 4 MIB objects + with a single request (so that 3 requests must be made to + determine the status of a single entity). An average response + time of 5 seconds is assumed, and a lack of response within 30 + seconds is considered no reply. Two such "no replies" are needed + to declare the managed entity "unreachable", as a single packet + may occasionally be dropped in a UDP system (those preferring to + use TCP for automated retransmits should assume a longer timeout + value before declaring the entity "unreachable" which we will + define as 60 seconds). + + We begin with the case of "sequential polling". This is defined + as awaiting a response to an outstanding request before issuing + any further requests. In this example, the average XYZ network + management protocol packet size is 300 bytes "on the wire" + (requesting multiple objects, ASN.1 encoded, IP and UDP enveloped, + and placed in an ethernet packet). 120 request packets are sent + each cycle (3 for each of 40 nodes), and 120 response packets are + expected. 72000 bytes (240 packets at 300 bytes each) must be + transferred during each poll cycle, merely to determine that the + network is fine. + + At five seconds per transaction, it could take up to 10 minutes to + determine the state of a failing machine (40 systems x 3 requests + each x 5 seconds per request). The mean time to detect a system + with errors is 1/2 of the poll cycle time, or 5 minutes. In a + failing network, dropped packets (that must be timed out and + resent) greatly increase the mean and worst case times to detect + problems. + + Note that the traffic costs could be substantially reduced by + combining each set of three request/response packets in a single + request/response transaction (see section 6.1.1 "Example"). + + While the bandwidth use is spread over 10 minutes (giving a usage + of 120 bytes/second), this rapidly deteriorates should the manager + decrease his poll cycle time to accommodate more machines or + improve his mean time to fault detection. Conversely, increasing + his delay between polls reduces traffic flow, but does so at the + expense of time to detect problems. + + Many network managers allow multiple poll requests to be "pending" + + + +Steinberg [Page 18] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + at any given time. It is assumed that such managers would not + normally poll every machine without any delays. Allowing + "parallel polling" and initiating a new request immediately + following any response would tend to generate larger amounts of + traffic; "parallel polling" here produces 40 times the amount of + network traffic generated in the simplistic case of "sequential + polling" (40 packets are sent and 40 replies received every 5 + seconds, giving 80 packets x 300 bytes each per 5 seconds, or 4800 + bytes/second). Mean time to detect errors drops, but at the cost + of increased bandwidth. This does not improve the timeout value + of over 2 minutes to detect that a node is not responding. + + Even with parallel polling, increasing the device count (systems + to manage) not only results in more traffic, but can degrade + performance. On large networks the manager becomes bounded by the + number of queries that can be built, tracked, responses parsed, + and reacted to per second. The continuous volume requires the + timeout value to be increased to accommodate responses that are + still in transit or have been received and are queued awaiting + processing. The only alternative is to reduce the poll cycle. + Either of these actions increase both mean time to detect failure + and worst case time to detect problems. + + If alerts are sent in place of polling, mean time to fault + detection drops from over a minute to as little as 2.5 seconds + (1/2 the time for a single request-response transaction). This + time may be increased slightly, depending on the nature of the + problem. Typical network utilization is zero (assuming a + "typical" case of a non-failing system). + +Appendix B + + All defined MIB objects used in this document reside + under the mib subtree: + + alertMan ::= { iso(1) org(3) dod(6) internet(1) + experimental(3) alertMan(24) ver1(1) } + + as defined in the Internet SMI [1] and the latest "Assigned + Numbers" RFC [5]. Objects under this branch are assigned + as follows: + + RFC 1224-MIB DEFINITIONS ::= BEGIN + + alertMan OBJECT IDENTIFIER ::= { experimental 24 } + + ver1 OBJECT IDENTIFIER ::= { alertMan 1 } + + + + +Steinberg [Page 19] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + feedback OBJECT IDENTIFIER ::= { ver1 1 } + polledLogged OBJECT IDENTIFIER ::= { ver1 2 } + + END + + + 1) Feedback Objects + + OBJECT: + ------ + + maxAlertsPerTime { feedback 1 } + + Syntax: + Integer + + Access: + read-write + + Status: + mandatory + + OBJECT: + ------ + + windowTime { feedback 2 } + + Syntax: + Integer + + Access: + read-write + + Status: + mandatory + + OBJECT: + ------ + + alertsEnabled { feedback 3 } + + Syntax: + Integer + + Access: + read-write + + Status: + + + +Steinberg [Page 20] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + mandatory + + + 2) Polled, Logged Objects + + OBJECT: + ------ + + alertLog { polledLogged 1 } + + Syntax: + SEQUENCE OF logTableEntry + + Access: + read-write + + Status: + mandatory + + OBJECT: + ------ + + logTableEntry { alertLog 1 } + + Syntax: + + logTableEntry ::= SEQUENCE { + + alertId + INTEGER, + alertData + OPAQUE + } + + Access: + read-write + + Status: + mandatory + + OBJECT: + ------ + + alertId { logTableEntry 1 } + + Syntax: + Integer + + + + +Steinberg [Page 21] + +RFC 1224 Managing Asynchronously Generated Alerts May 1991 + + + Access: + read-write + + Status: + mandatory + + OBJECT: + ------ + + alertData { logTableEntry 2 } + + Syntax: + Opaque + + Access: + read-only + + Status: + mandatory + + OBJECT: + ------ + + maxLogTableEntries { polledLogged 2 } + + Syntax: + Integer + + Access: + read-only + + Status: + optional + +Security Considerations + + Security issues are not discussed in this memo. + +Author's Address + + Lou Steinberg + IBM NSFNET Software Development + 472 Wheelers Farms Rd, m/s 91 + Milford, Ct. 06460 + + Phone: 203-783-7175 + EMail: LOUISS@IBM.COM + + + + +Steinberg [Page 22] +
\ No newline at end of file |