Impact of WANX on APM Measurements

WAN Optimization devices or WAN Accelerators (WANX) are truly magical devices. They cut the bandwidth needs of an enterprise significantly and improve end user response times for many applications. But having WANX in your environment does not absolve you from implementing APM tools for ongoing monitoring and troubleshooting. Remember that if you cannot measure something you cannot control it.

An APM tool for monitoring application response times and network/TCP parameters is typically installed in the data center –one can get visibility into all the remote sites that are connected to that data center. The TCP flow between the remote users and the servers in the data center is uninterrupted or unmodified in the absence of WANX devices. The response time measurements are close to the end user response times. The throughput measured from the data center to the remote site is close to the remote site bandwidth.

WANX throws a monkey’s wrench into all these measurements. WANX devices, for most, part terminate TCP sessions at the remote sites and are revived at the data center. The WANX pair between the remote site and the data center consolidates traffic into their own internal TCP sessions and does the magic of compression and optimization at the TCP/application level. As far as the servers are concerned they think the clients are right in the data center on the LAN and hence the measurements on this traffic flow have a LAN flavor to it.

What this means if there are some WAN effects (e.g., packet loss) impacting the performance between the two WANX devices, that won’t be fully reflected in the response time measurements at the APM device at the data center. Also, the throughput measurement between the data center and the remote site will be much higher than the bandwidth commissioned at the remote site – and that is OK as the WANXes were after all installed to amplify the bandwidth.
But the non-visibility of true end user response time at the data center is a handicap as we cannot be as proactive in troubleshooting as we would like to be. There are some techniques to overcome this handicap – it does come with some costs or may be lots of it.

One way to get end user response time visibility in a WANX environment is to deploy smaller sized APM tools at the remote sites. Typically, in many tool vendor architectures the performance metrics can be pulled from the remote site in to the central APM tool in the data center and this will restore full visibility into end user response time at the remote site. Sometimes this APM tool functionality can be incorporated into the WANX device itself as a software module (e.g., in the case of Riverbed/Opnet). If a company has hundreds or thousands of remote sites this method obviously becomes cost prohibitive and the remote deployment of the APM tool may have to be limited to very important remote sites only.

This discussion reminds us of the classical conundrum. True end user response time can only be measured accurately closer to the end user device (or can be felt by the end user). Deploying instrumentation near the end user (or agents in the user devices themselves) and sending this information to a central aggregator for analysis is always challenging. Proliferation of Smartphones and tablets with a variety of Apps makes this situation much worse. There is certainly an opportunity for innovation here.