Stass Soldatov- TrueConf CTO

Stass Soldatov- TrueConf CTO | June 02, 2014 |


Implementing SVC in WebRTC

Implementing SVC in WebRTC A new technical approach can provide large-scale group WebRTC video conferencing despite WebRTC's lack of native support for Scalable Video Coding.

A new technical approach can provide large-scale group WebRTC video conferencing despite WebRTC's lack of native support for Scalable Video Coding.

One of the main issues with WebRTC is the lack of Scalable Video Coding (SVC) support for group video conferences. The WebRTC standard does not include SVC, and without it, a session with multiple participants--especially on a mobile platform--requires recoding for the same conference in different formats. This massively reduces the capacity of a WebRTC server.

Let's look at the problems group video conferencing faces in real Internet situations, along with the possible solutions for these problems.

One-on-One Conferencing Structure and the Challenges of Group Video Conferences

The optimal way to achieve one-on-one video communication is to directly connect two clients. The clients just need to agree on the maximum possible channel capacity. For example, if one user has a 300 kbps channel and another has a 200 kbps channel, they need to choose the minimum speed--200 kbps. This way, each user will send and receive 200 kbps for the duration of the call and they will have no problems.

The situation changes when more than two people want to connect. WebRTC allows them to connect directly with each other, but here we have a couple of issues.

Let's look at what happens with the channels in a group video conference. Each client will send the best resolution at the best possible speed. As a result, it is possible that one client will not be able to send 200 kbps to two users, because its upstream channel is only 200 kbps total. It can only send 100 kbps to each user, or 200 kbps to one and 0 kbps to the other. Both options are unacceptable for a full-scale group video conference.

The optimal solution is to install an additional entity: a video conferencing server or an MCU (Multipoint Control Unit) in a centralized location with good communication channels. However, there is no concept of this kind of server in basic WebRTC. There is also no traditional model for peer-to-peer group videoconferencing support. Therefore, to sustain full-fledged group video conferencing in WebRTC, we need to create a WebRTC server that can receive multiple video streams from the clients and distribute them to other participants.

A server can receive 200 kbps and distribute 200 kbps to each client. It seems the problem is solved, but, in fact, it is not.

In the following example, we can see where the problem is: Client #1 has an outgoing channel of 200 kbps and could send 200 kbps. Client #2, with a channel of 500 kbps, may be more than capable of receiving 200 kbps. But what is left for client #3? If client #1 only sends 100 kbps because client #3 can only accommodate a maximum of 100 kbps, then client #2 will wind up only receiving 100 kbps, too!

How can we give each client the best possible quality? The answer is: The server must be able to control each data stream.

MCU Transcoding vs SVC Technology
The classical approach to moderate the issues of variable communication channels is to transcode video streams on the server.

The most common example of this approach is using an MCU, which requires video stream transcoding for each layout and bit rate. This means that in real network conditions, each endpoint will require a separate transcoder. This, in turn, requires a lot of processing power. In the end, this makes the MCU-server option quite costly, and in the era of cloud services, this approach is unreasonably expensive.

The modern approach is to use Scalable Video Coding. SVC is a technique which allows a client to flexibly adjust an encoded video stream without re-coding. In other words, it cuts parts of the encoded stream, lowering bandwidth consumption while preserving the highest available quality for each user.

SVC consists of three forms of scalability: spatial, temporal and qualitative scalability. The first form--spatial--allows a client to select different video resolutions; temporal adjusts frame rate; and qualitative allows the client to effectively adjust image quality. SVC changes some video characteristics, including frames per second (fps) rate and resolution, which is also typical for classic MCU. However, there is no video transcoding happening on the server.

This feature of SVC makes it possible to conduct a large number of group conferences on a regular server, whereas video encoding on MCU requires considerable computing power that directly affects its cost. Because of this remarkable flexibility, SVC has changed the world of video conferencing over the past couple of years.

With SVC on a server, you can receive data at the best possible rate and then adjust bandwidth individually for each conference participant, eliminating the problem in which one channel affects the video quality for other users.

For normal operation of SVC, the party that sends the video stream must use a the video codec that supports this technology, and the video conferencing server must be able to work with such video streams. The problem with WebRTC is that there is neither: There is no concept of a server, as we have already mentioned above; and the VP8 video codec used in WebRTC does not have a full SVC extension.

The task is complicated by the fact that we cannot fully control the way in which clients (i.e. browsers) encode the video stream; therefore the implementation of the current SVC for WebRTC must be fully carried out by the server.

Despite the complexity of this situation, there is a solution that many people are not aware of. It turns out that the VP8 video codec supports the temporal scalability part of SVC: In other words, it can change the fps rate without re-encoding the stream.

How can this be used? If a VP8 stream is encoded with temporal scalability, then before and after the applying of "thinning," the stream is still normal VP8 and can be decoded by clients not aware of SVC, like WebRTC browsers. But how to use this temporal scalability stream feature of VP8 while web browsers are not able to make it themselves? To do this, we have to recode the stream on the video conferencing server upon receiving from the client browser.

Another very important feature, which is present in the SVC specification for H.264 but is absent in the VP8 codec, is called spatial scalability. It allows a client to change video resolution without transcoding, in addition to changing the fps rate. This capability is valuable for the effective use of SVC: Temporal scalability without the accompanying spatial scalability is restricted to a smaller range of bandwidth options, which can be made from one stream.

Despite the fact that this option is not available in the VP8 video codec standard, it can be developed independently and implemented thanks to its open architecture.

The Solution
To sum up all of the above, we will formulate an approach that can be used for carriers and cloud service providers who want to provide full-scale group WebRTC video conferencing to their customers but wish not to go bust in the process, from buying expensive infrastructure and MCUs.

It requires a video conferencing server, which performs the following tasks:

● Recodes regular incoming VP8 video streams from the browser into VP8 SVC
● Reduces the bandwidth of the video streams using SVC

Also, real browsers currently have problems with receiving multiple video streams (more than 4-6) and correctly estimating their downstream channels, so the video server should be able to:

● Create a group video conferencing canvas by mixing and regulating video streams.
● Monitor each client's parameters, its channel and screen resolution (desktop or mobile), and automatically "thin out" the mixed group video conference before sending it on to the WebRTC browser, which results in a normal VP8 stream with the desired characteristics. It could be done effectively by coding server streams into a VP8 SVC and applying SVC to the video stream.


As a result, each participant of a group video conference in WebRTC sends the maximum possible stream, and, in turn, receives the maximum that they are able to receive. At the same time, the video conferencing server does exactly one encoding, which is not dependent on the number of clients connected to the conference. This approach allows us to use SVC and WebRTC together and to increase the capacity of the video conferencing server.

In the long term, we believe that Google's cooperation with Vidyo to develop the VP9 SVC is a positive development for the industry. This agreement could significantly change the future of WebRTC technology and cut down SVC overhead costs.

Stass Soldatov is CTO of TrueConf.


Enterprise Connect Orlando 2017
March 27-30 | Orlando, FL
Connect with the Entire Enterprise Communications & Collaboration Ecosystem

Stay Up-to-Date: Hear industry visionaries in Keynotes and General Sessions delivering the latest insight on UC, mobility, collaboration and cloud

Grow Your Network: Connect with the largest gathering of enterprise IT and business leaders and influencers

Learn From Industry Leaders: Attend a full range of Conference Sessions, Free Programs and Special Events

Evaluate All Your Options: Engage with 190+ of the leading equipment, software and service providers

Have Fun! Mingle with sponsors, exhibitors, attendees, guest speakers and industry players during evening receptions

Special Offer - Save $200 Off Advance Rates

Register now with code NOJITTEREB to save $200 Off Advance Rates or get a FREE Expo Pass!

March 8, 2017

Enterprise IT's ability to innovate is critical to the success of the business -- 80% of CIOs agree. But the CIO role has never been more challenging than it is today, with rising operational respo

February 22, 2017

Sick of video call technology that make participants look like they're in the witness protection program? Turns out youre not alone. Poor-quality video solutions can give users an unprofessional ap

February 7, 2017

Securing voice communications used to be very simple since it was generally a closed system. However, with unified communications (UC) you no longer have the walled protection offered by a dedicate

February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.