6 Truths About QoE Monitoring for Virtualized Networks

Posted by Scott Sumner on Tuesday, October 24, 2017 with No comments

A third of mobile networks are not fully monitored end-to-end. Let that sink in for a minute.

It’s a surprising figure because most operators use three or more vendors for each major function in their radio, backhaul, and core networks. Being able to piece together a consistent monitoring layer end-to-end is necessary to manage quality of experience (QoE) over these discontinuous domains.

A lack of end-to-end visibility isn’t sustainable. Especially since network performance is rapidly becoming the key differentiator. As operators seek out and implement ways close their visibility gap, they are discovering the following six truths about managing QoE over next-generation networks.

1. Best-effort QoE assurance is no longer good enough

While best-effort QoE assurance has been the accepted standard for typical internet applications and services, that’s rapidly changing with the on-demand nature of usage today. Customers are no longer tolerant of services being “okay” rather than “excellent.” Even a bad experience with Netflix or YouTube can impact customers’ perception of network quality and lead to churn.

This truth translates into huge changes in terms of what’s required for real-time service instantiation, as well as service monitoring and assurance. Operators must now be able to perform per-application level quality assurance for a vast number of services and QoE requirements, as well as meet the need for automated processes in virtualized networks.

2. QoE management for virtualized networks is a whole new ballgame

The shift from traditional to virtualization networking significantly impacts how operators manage QoE. Traditional networking was, for the most part, static; thus, understanding the impact of Layer 2 and 3 issues on QoE was relatively straightforward. Virtualized networks, on the other hand, are very dynamic; thus, QoE optimization must be driven in real-time by emerging technologies like machine learning. Also, in such networks, the data plane is often split across various network slices (even for the same service), making it even more complex to understand service delivery.

Specifications for backhaul network metrics—such as throughput, latency, and availability—are becoming more stringent. As a result, traditional monitoring techniques may not be sufficient to see the types of performance issues that result in QoE problems.

In Accedian’s experience, we have seen cases like the one below where an operator was experiencing more than 100,000 call drops, with an obvious negative impact on QoE, even though classic quality of service (QoS) metrics (e.g. packet loss, delay) were well within spec. Only by analyzing millions of Layer 2 and 3 KPIs was a periodic relationship observed that led to discovery of the root cause. Without the ability to manage QoE in this way, the impact of such a situation would have been catastrophic to the operator. 

<click image to enlarge>

Source: Accedian slide from Heavy Reading webinar, “

Given the dynamic nature of virtualized networks, and the inability of traditional tools to assure virtualized services, it’s hardly surprising that operators are less confident in their ability to monitor QoE or the service experience.

3. Understanding the relationship between QoS and QoE is critical

Operators are still more comfortable monitoring QoS than QoE, a holdover from traditional telephony performance monitoring. The problem is, customer satisfaction is largely driven by QoE and not QoS. Being unable to monitor QoE puts operators’ business models at risk. And, as customers come to rely even more heavily on mobile connectivity as a way of life, QoE will become the competitive differentiator.

This is not to say that QoS and QoE are unrelated. Our work with South Korean operator SK Telecom, helping them develop a performance assurance strategy for their LTE-A network, highlighted some interesting relationships between QoS and QoE metrics. For example, it was found that even a small packet loss event—say, 0.1%—could lead to a 5% decrease in service throughput and a 2% loss could lead to as much as an 80% decrease in throughput. 

<click image to enlarge>

This type of surprising relationship is not limited to obvious issues like packet loss, either. In another example, it was shown that a 15ms latency increase on a critical path in the network could cause a throughput decrease as much as 50%.

In both cases, the biggest impact is seen in the application layer protocols managing service delivery, not the QoS monitoring. Monitoring only the resource and network layer is not enough.

Which leads to another inconvenient truth: with next-generation networks, there are simply too many services streaming through tens of thousands of network components to manage QoE manually; automation is required.

4. Next-gen performance assurance is beyond human control

Generating QoS metrics in a traditional network involves large, centralized test suites and network interface devices (NIDs) at the edge. It also requires a lot of engineering, planning, and provisioning work. Imagine that level of activity for a new network with 100,000+ eNodeB sites and 100,000+ aggregation and cell site routers. Clearly, the traditional networking approach won’t scale.

While operators are rolling out software-controlled, programmable networks—using intelligence to govern resource allocation as applications come online—they are still largely managing those networks in a way that’s unsustainably manual. The next step is to do all this in an automated way, using intelligent algorithms for a real-time view of network topology paths.

Automation enables operators to:
  • Relieve hotspots and avoid unnecessary upgrades, by balancing link utilization 
  • Ensure bandwidth is available when it’s needed, for each application
  • Satisfy customer bandwidth requests without negatively impacting other services
  • Determine the best links to place each new workload
5. Active, virtualized probes are the future of QoE assurance

Implementing software-defined networking (SDN) is a priority for mobile operators, and just about every operator now has a plan to virtualize at least part of their network—starting with the mobile core.

The reason for this trend is that existing network architecture is failing to enable the type of innovation operators need to make the foundation of their business strategies, especially as they roll out LTE-A networks and plan for 5G. They find it hard to create new revenue streams, because with proprietary hardware, management, and IT systems it’s difficult—and takes too long—to deploy new services.

But, this shift does require a new approach to QoE management.

In an effectively-managed virtualized network, the test suite becomes a virtual network function (VNF) that can be instantiated as needed, either to address a large geographic area or to scale compute power required for a large number of endpoints.

This setup does still require an endpoint to test toward, but that’s relatively straightforward given that most routers and base stations are now compliant with standardized test protocols such as Two-Way Active Measurement Protocol (TWAMP, RFC-5357). Sometimes there will be reasons to deploy purpose-built endpoint solutions (e.g. smart SFPs) when a common test standards are not supported, but in most networks this represents less than 10% of sites. The increasing popularity of x86 towards the mobile edge means that standards-based VNFs can be used as well.

What makes VNF-based performance monitoring even more compelling is the fact that either part of the equipment needed for service delivery is a software solution that can be easily orchestrated in a virtualized network; a QoE solution for a network of 100,000 eNodeBs can be deployed, up and generating KPIs in weeks, rather than several months, with very little capital required.

For all the reasons discussed so far, and given the economics of virtualized networks, mobile operators are moving away from centralized probe appliances and toward active virtualized probes. 

<click image to enlarge>

Source: Accedian slide from Heavy Reading webinar, 

6. Effective QoE management is possible, with analytics and automation

What is the best use for the potentially billions of KPIs generated daily by end-to-end network monitoring using active, virtualized probes? Big data analytics makes light work of this problem, allowing correlations between layers and events—through predictive trending and root cause determination—to be automated, and displayed in real-time.

Since the next generation of mobile networks requires automation to run optimally, having centralized network performance and QoE metrics in the operator’s big data infrastructure provides the necessary feedback for SDN control to make network configuration decisions that deliver the best experience to each customer, at any time.

That’s the ultimate goal for LTE-A and 5G: breathtaking performance over a highly dynamic, virtualized network, without humans getting in the way.