Problem description

How does the DNA's Active Latency Detection (ALD) work?

Explanation

The Agent sends pings in groups of three. The pings are not actually ICMP ping requests, but TCP connection setup requests sent to a closed port, port 2408 by default. The expectation is that there is no listener on this socket at the remote server. This will force a connection refusal (an immediate TCP RST) for each ping, providing the DNA console with an accurate round trip measurement to each machine. Connection setup requests are used due to the higher service priority given them by a router, reducing congestion effects and resulting in the fastest possible round trip time. This translates into the most accurate latency measurement during capture. The term ping will continue to be used for convenience in this document.

We make a major request to every detected machine in the capture every second. If a filter is applied during capture, we will only send requests to nodes that are being captured, not the excluded nodes that may be sending traffic to the capture point NIC during the capture. A major request means three pings per second, which is the standard Microsoft connection request retry rate. (Each SYN/RST pair is about 120 bytes; repeated 3 times per second, ALD can generate up to 360 bytes per second per server.) The trio of pings will not be sent in parallel for the same node, but spaced apart based on Microsoft implementation of TCP/IP. If we haven't seen any traffic from a node, we decay the major ping rate to that node by a factor of 1.3, until the pinging stops completely. Pinging will resume at the normal rate once traffic is again seen from that particular node.
For the request to be valid, replies to all three pings must be received. If one or more connection refusals from the trio of pings are not returned due to network configuration or errors, the entire request is discarded. In order to get reasonable results all ALD traffic (port 2408) need to be let in pass-through mode, e.g. WAN Accelerators, Proxy Servers and like. Without such, usually will cause a negative impact on ALD.
After the capture is ended, AV will automatically adjust the trace if valid ALD information is available. For each machine, bandwidth estimation is performed to determine the time attributed to insertion delay, using a model based on throughput and round trips. The latency figures obtained during capture, along with the calculated insertion delay, provide the total contribution of the network to the transaction. A test will also be performed to ensure that each node in the trace is not local to the capture point, by an examination of the TTL fields in the packets. If the TTL is not decremented, or the detected latency is equal to or less than 2 msec. from node to capture point, then ALD information obtained will be discarded and not used for adjustment.
Without ALD, the network portion of transaction delay is included in the server time of the CNS Breakdown - since we can't determine the characteristics of the network path. For local environments - where the robots are on the same campus network as the server(s), ALD may not be important. However, for remote (WAN) connections, ALD can be quite enabling.

What to do next

If you have any questions, ask the DC RUM experts in the DC RUM Open Q&A forum. If the issue is more complex, please contact our Support team.