Active, invisible

Posted by GenD on Wednesday, November 25, 2009 with No comments
Enterprises and mobile operators are gaining Ethernet experience quickly as they speed deployment of the latest generation of business services and wireless backhaul networks with the latest technology. Nothing like jumping right in to learn the oddities and pain points of technology hitting prime time. We’ve seen operators shift their focus from the nuts and bolts to concern for QoS management and monitoring, with an emerging need to start reporting performance online to their customers (read post).

It’s at this point that so many are asking the tough question: “With hundreds or thousands of Ethernet circuits and services, how can I find the needle in the haystack when things go wrong?” Monitoring and test isn’t new to Ethernet, but large-scale service footprints arguably are. The truck-roll and bootstrap methods that keep things running in small metro deployments start to fray at the edges in 20-40% growth markets in the mass adoption phase.
These operators are starting to discover the Ethernet Operations, Administration and Maintenance standards (OAM) for Connectivity Fault Management (CFM) and Performance Monitoring (PM) that have been quietly integrated into network elements over the last few years under the IEEE 802.1ag and ITU-T Y.1731 standards – as they look for ways to keep tabs on latency and jitter, packet loss and availability to support SLA reporting and Quality of Service (QoS) management. These standards and other active testing techniques like the Performance Assurance Agent (PAA), bring full visibility into the performance of layer 2 & 3 services.
Active techniques conduct their measurements by transmitting a sparse but regular stream of precisely time-stamped “tracer” packets within the service under test (in-band). The test packets’ headers mimic those of the application or SLA of interest (e.g. by specifying VLAN, CoS / DSCP, protocol, drop eligibility, etc.), to ensure they follow the same path, experience the same delay, and are given the same priority as the monitored service. The test packets’ role is to provide a recurring, known reference from which SLA / QoS metrics can be measured, without having a noticeable affect on the service itself (i.e. non-service affecting, non-intrusive).
Rest assured, active testing is safe, friendly and virtually invisible to the end-user, as bandwidth consumed is fractional compared to the bandwidth of the service itself. But how negligible is it? Just noting it’s very small doesn’t satisfy anyone in engineering, so to quantify the impact I decided to run through the calculations myself.
Taking Y.1731 delay measurement and continuity tests as a standard case (from which latency, jitter, frame loss ratio and availability can all be calculated), the math works something like this:
  • 3 frames are required for each measurement instance: one Continuity Check Message (CCM) frame, a Delay Measurement Message (DMM) and its reply (DMR);
  • Each packet has a size ~200 bytes;
  • Worst case you’d run these tests with a per-second frequency, for a number of unique flows or SLAs concurrently – let’s take 3 as a typical number for a cell site or Enterprise running 3 unique service classes (real-time, “important”, and best-effort traffic categories).
This would lead to 3 frames x 200 bytes x 3 instances every second, or 1.8 Kbps. Assuming a GbE link, this amounts to less than 2/1,000ths of a percent of the link capacity (!) Even in a case where you’re running 100 concurrent OAM sessions from, say, a multicast host or mobile switching center (MSC), you’re talking about less than 0.006% of total bandwidth. Pretty small, enough to consider it negligible, especially in light of all the information it’s giving you in return. If you asked a customer if you could use 0.0002% of their link to ensure they get the best possible performance, I don’t think they’d think about it too long! We’re all used to overhead, whether it’s in the form of taxes or highway tolls – but in this case you can pretty much assume it just isn’t there, which is a relief to operators with Ethernet to deliver. Learn more about active testing’s impact with complete calculations for both OAM and PAA here.