Andrew Prokop
Andrew Prokop has been heavily involved in the world of communications since the early 1980s. He holds five United States...
Read Full Bio >>

Andrew Prokop | June 29, 2015 |


The Sound Science of Audio Codecs

The Sound Science of Audio Codecs What is sound and how do we take the noise that comes from our mouths and turn it into something that can be transported across an IP network?

What is sound and how do we take the noise that comes from our mouths and turn it into something that can be transported across an IP network?

I have never been happy with the answer "because." No matter what the subject or question, I am not satisfied until I am told the whys, wherefores, and possible exceptions. While I can't claim to fully understand every explanation I'm provided (I still don't completely fathom relativity), I want the opportunity to try. I won't know my limits until they've been stretched.

This year for the International Avaya Users Group's annual conference, Converge2015, one of the organizers asked me to speak about audio codecs. My first reaction was, "Is there anything I can say about codecs that hasn't already been said?" After all, G.711 has been around since 1972. How can anyone with a few years of communications under his or her belt not know about a codec that was invented before cell phones, the World Wide Web, and PCs?

After mulling it over for a few days, it suddenly hit me. Instead of simply running through the different codecs, I should explain why they exist in the first place. In other words, if G.711 has been around since 1972 and it has been doing a pretty good job all these years, why do we also have G.726, G.729, G.722, etc.?

This led me to the root question that all audio codecs share: What is sound and how do we take the noise that comes from our mouths and turn it into something that can be transported across an IP network?

Let's find out.

I'm Picking Up Good Vibrations

Simply put, sound is vibration on our eardrums. These vibrations can be as simple as the hum of a fan or as complicated as a symphony orchestra.

Vibrations have amplitude and frequency. Amplitude is a gauge of pressure change and is measured in decibels (dB). Frequency (oscillations per second) is denoted in Hertz (Hz).


Ultimately, these vibrations produce pressure changes in air molecules, and we measure these pressure changes in decibel sound pressure level (dB SPL). The lowest level of pressure change is known as the threshold of hearing (0 dB SPL). The highest level is called the threshold of pain (120 dB SPL). (Of course, Spinal Tap was able to push that to 121 dB SPL.)

At best, we humans can hear frequencies between 20 and 20,000 Hz. However, our ears are most sensitive to sounds that fall below 4000 Hz. This fact will become very important as I take you from analog sound to its digital representation.

From Analog to Digital

We live in an analog world of frequency and amplitude, but digital technology uses electrical energy to represent sound. To move from one to the other, we require devices known as transducers. The transducer used to convert mechanical pressure to electrical energy is known as a microphone, and the transducer that is used to convert electrical energy to mechanical pressure is called a speaker.

In the midst of all that, we have codecs. A codec (Coder-Decoder) is both the digital representation of analog sound and a logical encapsulation of that electrical energy. Think of it as the digital language that enables us to transmit sound from one device to another (e.g. from an IP phone to a conference server).

All codecs share these same characteristics:

  • DACs/ADCs (Digital to Analog and Analog to Digital converters) are used to turn analog waveforms into ones and zeros. These devices sample the waveforms at regular time intervals. The more samples that are taken, the better the representation of the waveform. This is known as sampling frequency.
  • Quantization is the number of bits used to represent the samples. An 8-bit sample gives us 256 different values and a 16-bit sample gives us 65,536 samples. More values equate to a better representation of sound. Only whole numbers can be used, so a sample value of 6.3455 must be rounded to 6. This rounding distorts the signal giving us noise.
  • Since human ears are most sensitive to quiet sounds, greater emphasis is placed on encoding the "quiet zone" and less on the "loud zone."
  • Frame size describes the length of the audio segment that a codec processes. Typical frame sizes are 10, 20, and 30 milliseconds.
  • Real-Time Protocol (RTP) and User Datagram Protocol (UDP) add 40 bytes of overhead to a packet of digitally encoded audio.
Do the Math

Years ago, two gentlemen by the names of Harry Nyquist and Claude Shannon determined that perfect waveform reconstruction (i.e. no signal loss) happens when sampling occurs at least two times the signaling rate. This is known as the Nyquist Theorem (Shannon got the short end of the naming stick). For the human ear, a sampling rate of 40 kHz perfectly captures our hearing range of 0 to 20 kHz.

The Nyquist Theorem tells us that 8 kHz is an adequate sample frequency to capture the 0 to 4 kHz range that most speech falls into. Anything less than that creates a poor audio experience, and anything beyond 8 kHz is gravy (or as we will see in a moment, wideband audio).

Encoding / Decoding

There are essentially two ways of converting an analog waveform to its digital equivalent. One of the earliest methods was called waveform encoding. This technique attempts to efficiently encode a waveform for transmission and decode it for playback. The goal is that the decoded waveform looks as close as possible to the original. As I said earlier, quantization and its inherent rounding will add noise and distortion, but a properly encoded waveform will create an acceptable user experience.

Pulse Code Modulation (PCM) is a well-known form of waveform encoding. In the world of communications, there are two common forms of PCM. Mu-Law, is used in North America and Japan and A-Law is used just about everywhere else. The difference between them is the logarithmic scale used for sizing the distance between sampling steps.

G.711 is how we typically refer to PCMU (Pulse Code Modulation Mu-Law) and PCMA (Pulse Code Modulation A-Law). Both forms use 8 bits to represent each sample -- 8 bits/sample * 8000 samples/second = 64K bits/second.

The next broad technique for encoding a waveform is known as differential coding. While PCM does a good job of representing an analog waveform, the ultimate size of its packets makes it an inefficient use of network bandwidth.

Instead of trying to make a digital copy of the waveform, differential coding predicts the next sample based on the previous sample. To compress the size of the IP packet, differential coding only stores the differences between the predicted sample and the actual sample. Differential coding is also referred to as predictive coding and is the basis of popular codecs such as G.729 and G.726.

This concept of predicting the waveform allows us to significantly reduce the number of bits transmitted between IP endpoints. For instance, G.729 requires less than half the number of bits as G.711.


Vocoding (voice encoding) is used to more efficiently encode human speech by understanding exactly how speech is created. Vocoders act upon things such as vocal cord vibration, unvoiced sounds, plosive sounds (air pressure behind a closure in the vocal tract and then suddenly released), and many other physiological aspects of human speech. Vocoders create a mathematical model of the human larynx and reproduce sounds by emulating how the human body works.

Vocoders are great when the goal is to encode, transmit, and decode spoken conversations. However, they are nearly worthless when it comes to non-speech sounds. For instance, you can't effectively use a vocoder to transmit music since musical instruments don't produce sound in the same way as the human body.

The same can be said for Dual Tone Multi-Frequency, or DTMF. Touch tones do not sound like people no matter how high pitched and squeaky a voice might be.

A common example of a vocoder in communications is G.729. It has been optimized to encode and decode spoken conversion at the expense of sounds that fall outside the realm of words and sentences. That is why you need a protocol such as the one described by RFC 4733/2833 to transmit DTFM and other communications-related tones.

Since G.711 does not utilize vocoder technology, it can be used for DTMF transmission. Additionally, G.711 is an acceptable candidate for fax pass-through.

Caution, Wide Load

While 0 to 4000 Hz is fine for what we have come to call toll quality audio, our ears are capable of hearing more than that. These extra sounds that lie outside the "sweet spot" frequency range can make a good IP telephone call sound amazing.

This is where wideband audio codecs come into play. Traditional narrow band codecs (e.g. G.711 and G.729) are focused on the frequency range of 300 to 3.4 kHz and wideband codecs (e.g. G.722) encode and decode frequencies as low as 50 Hz to about 7 kHz (or sometimes even higher). These additional frequencies fill out the sound of a conversation for a more satisfying user experience.

Since the compression techniques used by the newer wideband codecs produce a bit rate very similar to G.711, enterprises have started to use them whenever possible.

Mischief Managed

I hope you stuck with me because this is important stuff to know. If you are like me, you want to know why an 8 kHz sampling rate is so commonly used and why G.711 can be deployed for fax pass-through and G.729 cannot. While this knowledge may not come up at cocktail parties, it may be useful as you decide which codecs will be applied to particular devices, network regions, and use cases.

Now, if only someone can explain Einstein's Theory of Relativity in a way that even my feeble brain can comprehend...

Andrew Prokop writes about all things unified communications on his popular blog, SIP Adventures.

Follow Andrew Prokop on Twitter and LinkedIn!
Andrew Prokop on LinkedIn


March 7, 2018

Video collaboration is experiencing significant change and innovation-how can your enterprise take advantage? In this webinar, leading industry analyst Ira Weinstein will present detailed analysis

February 21, 2018

Business agility has become the strongest driver for enterprises to begin migrating their communications to the cloud-and its a benefit that enterprises are already realizing. To gain this benefit

February 7, 2018

Enterprises are starting to grasp the critical importance of security and compliance in their team collaboration deployments. And once the risks are mitigated, your enterprise can integrate these n

March 12, 2018
An effective E-911 implementation doesn't just happen; it takes a solid strategy. Tune in for tips from IT expert Irwin Lazar, of Nemertes Research.
March 9, 2018
IT consultant Steve Leaden lays out the whys and how-tos of getting the green light for your convergence strategy.
March 7, 2018
In advance of his speech tech tutorial at EC18, communications analyst Jon Arnold explores what voice means in a post-PBX world.
February 28, 2018
Voice engagement isn't about a simple phone call any longer, but rather a conversational experience that crosses from one channel to the next, as Daniel Hong, a VP and research director with Forrester....
February 16, 2018
What trends and technologies should you be up on for your contact center? Sheila McGee-Smith, Contact Center & Customer Experience track chair for Enterprise Connect 2018, gives us the lowdown.
February 9, 2018
Melanie Turek, VP of connected work research at Frost & Sullivan, walks us through key components -- and sticking points -- of customer-oriented digital transformation projects.
February 2, 2018
UC consultant Marty Parker has crunched lots of numbers evaluating UC options; tune in for what he's learned and tips for your own analysis.
January 26, 2018
Don't miss out on the fun! Organizer Alan Quayle shares details of his pre-Enterprise Connect hackathon, TADHack-mini '18, showcasing programmable communications.
December 20, 2017
Kevin Kieller, partner with enableUC, provides advice on how to move forward with your Skype for Business and Teams deployments.
December 20, 2017
Zeus Kerravala, principal analyst with ZK Research, shares his perspective on artificial intelligence and the future of team collaboration.
December 20, 2017
Delanda Coleman, Microsoft senior marketing manager, explains the Teams vision and shares use case examples.
November 30, 2017
With a ruling on the FCC's proposed order to dismantle the Open Internet Order expected this month, communications technology attorney Martha Buyer walks us through what's at stake.
October 23, 2017
Wondering which Office 365 collaboration tool to use when? Get quick pointers from CBT Nuggets instructor Simona Millham.
September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.