Sophisticated Skype: What Makes a Good Call?

Everyone who has deployed Skype for Business hopes their users have good calls. But what's good?

Many rely on audio quality to determine if a call was good. Most often this is based on the measure of mean opinion score (MOS). MOS is usually measured on a five-point scale, from Bad (1) to 5 (Excellent). Originally MOS was calculated based on subjective ratings gathered directly from users. Impractical as conducting subjective tests of voice quality for a production communication system is, MOS is more typically estimated algorithmically.

Microsoft's Skype for Business tracks MOS for several measures, broadly categorized as either "Listening Quality" (MOS-LQ) or "Conversation Quality" (MOS-CQ). Most VoIP vendors report MOS-LQ, which only measures the quality of the audio stream and doesn't take factors such as delay or echo into account. MOS-CQ considers bidirectional effects, accounting for listening quality in both directions.

Specifically, Skype for Business reports on:

  • Listening MOS (LQ) -- the predicted wideband listening quality MOS for the audio played to the user
  • Sending MOS (LQ) -- the predicted wideband MOS for the audio sent from the user
  • Network MOS (LQ) -- the predicted wideband MOS for audio played to the user only based on network factors such as packet loss, jitter, packet errors, packet reordering, and codec in use. Because the Network MOS (NMOS) is constrained by the codec used, achieving a 5.0 NMOS is impossible
  • Conversational MOS (CQ) -- the predicted narrowband conversational quality of the audio stream played to the user. Conversational MOS takes all the factors of Listening MOS into account and additionally considers echo, network delay, jitter buffering, and device-imposed delays

For many of the MOS scores, a best practice is to look at trends (is value moving up or down?) as opposed to focusing on achieving a specific target value.

In contrast to user-calculated MOS scores, the Skype for Business Rate My Call feature allows the capture of direct user feedback (see my related No Jitter post).

However, as our research has shown, the challenge is that predicted MOS-LQ scores don't correlate strongly with user quality ratings for calls. For instance, below is an analysis of MOS versus Rate My Call feedback across 10 different organizations representing a population of more than 100,000 users. Clearly, an increase in MOS doesn't correlate with a reliable increase in a user rating for the call.

(Note, we're currently researching correlations between the Conversation MOS and user ratings.)

Continue to Page 2: Beyond MOS, and more