Evaluation of Popular Methods for Monitoring IP Processes on z/OS

A TCP/IP monitoring system enables a z/OS network administrator to proactively seek out network issues like poor response times, limited accessibility, dropped connections, etc., and avoid them before they impact the productivity of the organization. Apart from saving down-time costs, monitoring allows the administrator to assess the load on resources and to predict when demand is going to exceed capacity.

But mainframe TCP/IP networking is complex. A monitoring system needs to work with multiple LPARs and hosts, with virtual IP addresses and sysplex distributor, etc. Moreover, data regarding network operations needs to be collected in as close to real-time as possible, without requiring excessive overhead or mindlessly reporting minor issues. Hence there are many methods of monitoring IP, each suited to a particular set of resources and/or types of traffic. Effective monitoring requires a system that can manage all of that complexity.

Real-time monitoring with event-driven architecture
In event-driven architecture, IP events are captured in real-time. The monitoring tool extracts data from the system automatically, at the moment events occur. Immediately upon receiving the data, the monitoring system takes appropriate action. Problems are identified and reported quickly.

Poll-based monitoring
In poll-based monitoring, the monitoring tool asks the system for data at specified time intervals. Responding to those requests consumes a large percentage of system resources, including CPU capacity and system memory. To approach real-time monitoring, data must be polled extremely frequently, which further adds to the system overhead. If polling occurs too often, one poll cannot be completed before the next one is scheduled to begin.

Data extraction with NETSTAT commands
Using NETSTAT (network statistics) commands, you can verify TCP/IP configuration and monitor network resources that affect connectivity. NETSTAT is available as a z/OS console command, in TSO, and in the UNIX shell. But NETSTAT does not have an application programming interface (API). It will not identify all connectivity-related problems. And it cannot monitor connection-less protocols such as UDP, ICMP, and OSPF.

System management facility (SMF) exit-based monitoring
IP-related data can be collected with the z/OS System Management Facility (SMF), then accessed by a monitoring system using SMF exit programs. The drawback to SMF exit-based monitoring is that it needs multiple SMF exits and it cannot deliver results in real-time. Like polling, it also requires excessive overhead.

Simple Network Management Protocol (SNMP) based data collection
Most devices on an IP network, including TCP/IP stacks on z/OS, maintain a management information base (MIB). The MIB is a database of configuration and performance data. SNMP is the UDP-based protocol for acquiring that data over the network. z/OS does not collect that data by default. One needs to configure and activate SNMP daemons, and a monitoring system needs to repeatedly poll the daemons for MIB data. This may result in significant overhead, especially in large networks with busy systems. Hence, SNMP does not support real-time network monitoring.

Each of the above methods has its own advantages and limitations. To monitor TCP/IP activity easily and effectively, you need a tool that can combine them all, understand where and when each individual method is appropriate, and present the data in consolidated reports.