Sophisticated Skype: Good MOS Does Not Make a Good Call

As I discussed in my previous "Sophisticated Skype" article, mean opinion score (MOS) is a commonly used measure to determine if call quality was good. I want to expand on the topic here, but first, to recap quickly, a MOS score is measured on a five-point scale, from 1 (bad) to 5 (excellent). MOS initially was calculated based on subjective ratings gathered directly from users. Today's systems algorithmically calculate projected MOS scores, devoid of user input.

Many systems calculate multiple MOS ratings. For example, Microsoft Skype for Business records Caller MOS, Minimum Caller MOS, Callee MOS, Minimum Callee MOS, Network MOS, Minimum Network MOS, and Conversational MOS.

While each score focuses on one aspect of the audio transport, Conversational MOS purports to consider all the factors that could impact audio quality: echo, network delay, jitter buffering, and device-imposed delays. And yet, neither Conversational MOS nor any of the other MOS ratings accurately reflect how a user rates call quality.

In a simpler time, when desk phones were always at a fixed location and hardwired into the network, legacy voice reporting systems (PBXs) focused on Network MOS. That's because in an environment of purpose-built desk phones, the network was the only organizational-specific factor that could degrade voice quality.

In the era of the softphone, with users employing Skype for Business or other voice-enabled UC clients or applications such as Slack, Salesforce, Google G Suite with embedded voice, or Google Voice, the network, while still important, isn't the predominant factor impacting call quality. And as such, Network MOS, which many of the early-to-market reporting tools continue using to judge call quality, does not provide a good view of call quality from a user perspective.

Based on data from more than 100,000 users at many Skype for Business sites, Network MOS only weakly influences a user's perception of call quality.

As indicated by the above graph, even calls that users rated (via Microsoft's Rate My Call feature) as "very poor" on average exhibited strong -- "good" -- Network MOS scores. Calls rated as "excellent" did have a marginally higher (4%), but not statistically significant, Network MOS score.

In fact, none of the MOS scores, including the all-inclusive Conversational MOS, strongly predict user call satisfaction:

Caller MOS, in fact, negatively correlates; calls that received "very poor" ratings have the highest system-calculated scores.

You should investigate low MOS scores. Network issues almost certainly will degrade call quality. However, a high MOS isn't sufficient to ensure a good call experience. Good MOS does not guarantee a good call.

Our data analysis indicates that calls flagged as "poor" using the Microsoft Call Quality Methodology (CQM) generally correlate to calls users rate poorly.

As illustrated in the graph above, calls flagged as poor by the CQM tool received a "really bad" or "bad" rating 47% of the time. However, 28% of the time, calls flagged as poor were rated as "excellent." As with MOS, you should investigate high percentages of calls flagged as poor; however, calls flagged as poor don't necessarily indicate a bad user experience; conversely, calls not flagged as poor don't guarantee a good user experience.

In the classic Seinfeld Moviefone episode shown below, Kramer, unable to decode DTMF key presses, eventually says, "Why don't you just tell me the name of the movie you selected."

Similarly, direct user feedback, the original genesis of MOS, ends up the only way to know if you're truly providing a good quality communication service. Your customer service is good only if customers say it's good. This is why the Skype for Business Rate My Call service is critically important. "Why don't you just tell me how the call was."

But enabling Rate My Call data is just the start. To convert the data into insights and, more importantly, actionable insights, I would suggest you target low-satisfaction Skype for Business users -- those users who repeatedly have indicated their call experience is poor. Here's an SQL query that identifies users who have provided feedback at least five times and whose average rating is less than three stars:

This user list is certainly actionable. By talking to these users, you may uncover audio device driver issues, Bluetooth headsets not correctly being used with the provided USB dongle, out-of-date software, and perhaps some "protest votes" from users who resent the removal of their legacy desk phones. With active follow-up, you should be able to reduce the number of low-satisfaction users over time.

We often combine this satisfaction data with department data from Active Directory or location data to provide comparative metrics across organizational groups.

In the above dataset representing slightly more than 120,000 users, 1.15% or fewer of active users across all groups but one are regularly dissatisfied with Skype for Business. Group 10, the exception, for some reason has almost 4% low-satisfaction users.

While a good MOS score doesn't guarantee good call quality, thinking about quality has led us to explore user satisfaction, which is the ultimate measure of what's good.

The EnableUC Skype Insights service provides customized analysis and recommendations based on your specific data. Quantified metrics help ensure you are delivering a robust, high-quality, and reliable service that is being well adopted and can drive continuous improvement. If you have specific reporting questions please comment below, send a Tweet to @kkieller, or message me on LinkedIn.

Related content: