Blog/Our Products

Testing Audio Quality in VoIP Apps: Our Approach and Best Practices

Audio wavelength

Once a novelty Voice over IP (VoIP) nowadays is a basic requirement for any social application. If not included initially, it will later once the application is gaining popularity.

Our need for VoIP testing came early – with one of our first clients, which had VoIP calls integrated. We started with network traffic analysis – gathering UDP/RTP data stream through Wireshark. While a very simple concept, there is quite a lot that can be found from looking at the data coming to and from the computer. Not only we were able to see the bandwidth used by the calls, we could also see the average packet size and packets sent per second. We also noticed that the data consumption correlated with the audio volume that was sent (which of course we have to take in mind when making any tests), however there was one exception.

It was actually an application that must have used some custom solution for their voice calling (this was 5 years ago, so expect the codec is no longer in use). When checking other (at that time popular applications) we saw a pattern – packet count per second was in between 20 and 40, the data consumption didn’t vary for more than 50% across applications, and, as mentioned before, the packets varied in size and correlated with the amount of sound sent. However, there was one application where the data stream consisted of the same sized packets, sent every 0.1 second- frequency two times lower than the second smallest. From our subjective evaluation the sound quality for the app was worse than the other ones we tried and they were using narrowband audio, so the codec might not be from this millennium.

Trace analysis also helped us test one of the features that our tested VoIP service provided – RTP packet duplication when abnormally high packet loss was detected. In short- when packet loss was detected, the app switched from UDP protocol to RTP and sent each packet twice. As RTP packets are numbered, the app (and we) could recognize the duplicate packets and drop them therefore increasing the percentage of good packets and reducing the packet loss.

What are the components of call quality?

Although checking network traffic was interesting, but sadly besides the bandwidth usage statistics, there was nothing much to offer in terms of improving the service. We had to look deeper into call quality metrics, and what we found was really basic – there are essentially two separate components to the quality of the call – call delay and audio quality. We started with easiest quantifiable metric – delay.

The call delay or call audio delay is time it takes for audio to travel from one device to another. There are lot of things that affect the delay such as audio codec, network, hardware and the infrastructure of voice service. In literature the requirement for high quality VoIP calls is to have delay below 200 ms. In reality for consumer applications on mobile devices, delay rarely drops down to 300 ms. 500 ms can be considered the limit when developers should optimize the service.

The first tests were very simple – we put 2 devices (device A and device B) in a call, and third device was physically recording the call. We put device A on speaker and muted its microphone. Then we made short sharp sounds in the microphone of device B and recorded original signal and received signal on callee side. After that we used an audio processing software (Audacity in our case) to measure the time for between both signals thus getting end to end delay measurement. If we repeat measurements multiple times in a single call – we can observe how delay varies over time.

Results from audio delay test shown in graph

Measuring audio quality is a lot more difficult. Without going into too many details of annoying colleagues with blind audio comparison tests, we concluded that we need to use a voice quality assessment algorithm. There is not a lot of competition for such tools, so we chose PEXQ tool by Opticom which offers audio sample comparison using PESQ and POLQA alghorithms. On top of these algorithms we built our audio quality testing solution.

We built an automated tool for audio quality testing

The principle of this tool is simple – you feed original and degraded audio sample into our tool and set audio sample range to narrowband (sample rate up to 8 kHz, used by PSTN calls and older services) or wideband (sample rate >8 kHz, used by modern HD audio codecs). The tool gives out MOS score of the audio – between 1 and 5 (1 for unacceptable to 5 for excellent). The tool will also return delay if it can be calculated – the samples have the delay included between them (the samples are not synced).

The tool has some limitations – the analyzed sample is quite short so it requires the test to be done multiple times. For that we use our solution that can:

  • playback and record audio,
  • split recorded file into channels,
  • put the sample files into evaluation tool,
  • display or gather result from PEXQ,
  • change network conditions if necessary.

Our solution launches the test in one call multiple times. We usually do 5 tests but if we detect that results are not stable we retry test enough times to be able to see the deviation of data and report the findings.

The hardware setup

We have our own setup for connecting devices. The purpose of the setup is to have original and degraded samples have correct offset. Thus, we don’t have to care about any delay introduced by any recording/playback hardware.

What we do is – we split playback audio into two channels – one channel is fed into the microphone of audio sending device, other channel is looped back directly into microphone of recording device (only single channel). Audio from first channel goes through the voice service to receiving device and from speaker straight into second channel of recording device – so as a result we have single audio file that we split into two channels for original and degraded audio analysis.

Our audio quality testing hardware setup
Our audio quality testing hardware setup

Another very important thing is network conditioning. Besides standard network for tests we use the following network condition (customizable depending on needs):

  • LTE network – as mobile network is guaranteed, use P2P connections for calls against Wi-Fi network.
  • 50 ms second jitter and 100 ms delay – while extensive for any normal real life conditions, jitter handling is required for any voice application regardless of jitter amount. Delay for the network is added to be able imitate the jitter.
  • 5% and 10% packet loss – ability to recover from lost data is important part of any modern codec. 5% packet loss should result in negligible quality loss for call, while 10% loss normally is much more noticable and a good landmark for checking codec resistance to a bad network. It’s possible to increase the values for deeper investigation.
  • 50 Kbps bandwidth limitation – from our measurements, the data consumption for wideband VoIP call is around 8 kilobytes per second which translates to 40 Kbps bandwidth. 50 kbps bandwidth is only barely in the limits of unaffected voice quality and is a challenge for voice application using unnecessary background data. Possible to reduce to see the threshold at which service still functions.

Result of our audio quality testing solution

Hardware setup and evaluation tool enables us to evaluate and compare performance of any VoIP service, or any service that sends and receives audio and that has 3.5mm audio jack. If we add network conditioning and different devices/operating systems, we can check all the different scenarios where service may have problems. Here are some example results for checking audio quality and delay at different network conditions.

Results of audio testing (MOS score)
Results of audio testing (MOS score)

From these results we can instantly see that Application2 has problems with jitter handling. The call quality drops significantly while delay only increases because of introduced network delay (to generate jitter). Additionally, Application1 starts struggling because of delay increase at jitter and 10% loss condition.

Results of audio testing (delay)
Results of audio testing (delay)

We can do these tests for any voice application. The capabilities of these tools becomes even more powerful when we include the process into continuous integration system. If we add automation framework we can launch automated audio tests every time the application is built and automatically check voice quality for the application to catch any regressions early.

What are benefits of audio quality testing?

Nowadays voice call functionality is very common and thus the competition is fierce. It also means that you should think about the quality of your application early in development process.

For example:

  • Decide on codecs and technologies used for your VoIP solution,
  • Fine-tune the codec settings to serve your application needs,
  • Make sure that voice service performs well, even at the worst conditions (bad network, old devices),
  • Continuously check for any regressions that may appear,
  • See how you compare with other competitors when it comes to quality metrics of voice call.

Please let us know if you have questions regarding testing the quality of your application. Contact us if you need help!

QA engineer having a video call with 5-start rating graphic displayed above

Deliver a product made to impress

Build a product that stands out by implementing best software QA practices.

Get started today