[Freeswitch-users] Proper prompt gain/level

Mon Jun 27 00:31:46 MSD 2011

As part of creating prompts for my IVRs, I've tried to match the audio gain of my prompts with the gain of the stock English prompts. During this process, I noticed that the English Callie prompts are recorded extremely low (max gain around -16DB). I have the Cepstral Callie voice, and I must set the Cepstral volume to about 50% in order to match the gain of the English prompts. In conferences and other situations where prompts are played over conversations, the level of the prompts are obviously low.

I can, of course, renormalize the gain of the prompts up to -10DB or more with a sound editor. However, I wonder if there is a better way to change the level of the prompts, or if there is a good reason for the prompts to be encoded at such a low level.

I haven't considered all of the implications yet, but I'm fairly sure that encoding the prompts this quietly is not the best approach, even if it is desirable for the prompts to play quietly on a call. For each 6DB reduction in gain, there is a 50% reduction in perceived volume, and one less significant bit is used for storing the audio. In a 16-bit file, a maximum gain of -16 means that only the 14 least significant bits are actually used for encoding the audio. This results in a reduction in dynamic range, but the difference isn't really noticeable as long as the data remains 16-bit. The problem comes when the audio is converted to a different bit depth. For example, most quickie routines for converting 16-bit audio to 8-bit audio will simply chop off the 8 least significant bits. Therefore, when the prompts are converted to 8-bit audio for use by most of the narrow band codecs, the prompts are only using 6 bits of audio. If the volume of the channel is increased, then the 6 bits are promoted, and the dithering errors at the bottom become louder. In the worst case, since these prompts are only encoded with 14 bits of actual data, and converting to an 8-bit channel will only leave 6 bits of actual data, boosting the gain of the prompts on an 8-bit channel to full loudness would result in the noise floor (the level of the dithering crackle) being about -18DB. That's almost as loud as the prompts themselves sound at the moment.

Anyway, regardless of observations, is there a reason why the prompts must be recorded so quietly? If I'd like them louder (without increasing the gain of the entire channel), is there a way other than running them through a tool to renormalize their max levels?

Bryan