[Freeswitch-users] Proper prompt gain/level
bryansmart at bryansmart.com
Tue Jun 28 07:04:07 MSD 2011
I think that dBm0 only applies if we are measuring the power on an analog circuit, or at the d/a point of a digital circuit.
I was performing analysis of a digitally encoded audio file, therefore the measurement is dBFS, and 0DBFS is the point where clipping takes place. Anything below 0DBFS does not clip, even though it might seem loud to someone.
I stated that the peak power of the stock prompts are typically -15 to -16 (DBFS), and the short term RMS power is about -32DBFS.
I don't know how to properly evaluate how DBFS will convert to DBm0. I gather that the codec and the d/a converter attenuate the level to some extent, but I don't know how much. If I have a digital file with a 1Khz test tone at -10DBFS, and play that over the pstn, what do I get out in DBm0?
Speaking just in terms of DBFS, -16DB is only about 15% of the potential power available before the audio reaches 0DB, and clips. Perhaps when -16DBFS is put out over the pstn, it is much closer to clipping than it would be in an entirely digital domain. I don't know enough to make that determination. Do you know?
I'm not for clipping and distortion. However, too little power can cause another problem: limited dynamic range and increased dithering artifacts. If audio is quiet on a pstn phone, then the person with the phone might be able to increase the level by turning up the phone's volume, if it has one. However, that raises the noise floor. Companding might make 8-bit channels sound a bit like 14-bit or 15-bit channels in terms of a low noise floor and decreased dithering artifacts, but it is still just an 8-bit channel. Companding hides most of the dithering artifacts in strong signals, but it magnifies them in weak signals. That's why G711 is fairly clear, but will sound scratchy if you put faint signals in to it and try to amplify them back up to normal levels. Over G711 or any pstn call, as you decrease the level of the audio going across, the scratchy dithering static obscures an increasing amount of the audio.
This gets worse if you compound it by stacking codecs. Consider if someone calls your IVR from a cell phone, and you play quiet prompts to them. The audio is first passed through G711, where the low gain means that scratchy dithering artifacts are added. Then, it is encoded to GSM for the cell phone,. GSM uses linear predictive coding, and, being tuned for voice, it is not optimized for having smooth waveforms interrupted periodically with random excursions. Thats part of why cell phone calls sound so bad if there is lots of background noise.
Anyway, clipping is to be avoided, but simply reducing levels dramatically creates other quality problems on a channel that uses companding. You trade distorted audio for scratchy audio. Same thing happened with cassette tapes that used Dolby noise reduction.
Maybe you can help clear up my understanding of how DBFS in a digital file will work out in DBm0 on a pstn line. As things stand, though, I think we have lots of room to increase the level of the prompts before we reach a point where clipping is an issue.
On Jun 27, 2011, at 9:31 PM, Steve Underwood wrote:
> On 06/28/2011 04:00 AM, Bryan Smart wrote:
>> I don't want to maximize peaks all the way to 0DB. I just wondered why a low level like -16DB was used. When I'd previously created prompts on Asterisk systems, I used -6DB as a peak (50% of max gain for the channel).
> The peaks of voice are general something like 12 to 13dB above the short
> term RMS value. If the -16dB value you quoted is -16dBOv, its 13dB below
> clipping. Your peaks should be close to clipping at -16dB.
>> I wasn't aware of any regulations about prompt level. If there is a standardized level, that is what I'd like to use. Perhaps the louder systems are disregarding standards? Does anyone have links to such info? I've been unable to find anything definitive, only opinions.
> In many places the regulation says the power on a PSTN line should not
> exceed -13dBm0. That's why -13dBm0 is the target power level for most
> PSTN modems.
>> Listen to TellMe (+1-800-555-8355). It's at least 3X the gain of the default FS prompts. Is TellMe in error? I've called them through both FS and Asterisk, using Vitelity and Callcentric, so I'm fairly sure that I'm not being mislead by a switch or ITSP boosting the gain. I always felt their level was strong and intelligible, without sounding overwhelming.
> Not everyone bothers to obey regulations these days, and many people do
> love to blast sound into an overloaded highly distorted mess, because
> volume is king. Its not good for clarity, though. What the loudest
> people do is no measure of good engineering. Sadly, this behaviour might
> make people set their levels around the excessively loud signals, so a
> properly adjusted signal sounds too quiet. In the early days of my FAX
> modem work I received many recordings from people who could not get
> reliable results, where the audio was perpetually in clipping, and any
> speech through the channel would have sounded awful. They would
> generally insist that voice was "perfect" on their system. There is a
> serious lack of engineering in most VoIP work.
>> One point that caught my attention, though, is that you said -16DB for both average and peak power. Average and peak power don't come out the same, though.
>> So that we can talk about something concrete, consider conference/32000/conf-enter_conf_pin.wav. Its peak power is -15.7DB. However, its average power (RMS) is -31.3DB! -31DB is profoundly quiet. If its average power is boosted to -16DB, then the peak power is now around -2DB. As long as peak power is less than 0DB, then the audio won't clip, but it might be too loud for comfort. I previously used -6DB for a peak, as I couldn't find any real guidelines regarding levels, and -6 sounded good to me.
> You still haven't said whether you are talking dBm0 or dBOv. It makes a
> 6dB difference. Also, what do you mean by peak power? If the peaks of
> the short term RMS power are hitting 0dB, the peaks of the waveform will
> be far into clipping. If you are talking about dBOv, then -6dB is only
> 3dB from the onset of clipping, and voice will clip a lot. If you are
> talking dBm0, -6dB is 9dB from clipping, and the voice will only clip a
> bit, and maybe not sound too bad. However, clipped voice tends to pass
> through low bit rate codecs worse than clean voice, so you might want to
> keep the clipping down to a really occasional event. Voice codecs have
> at least 12 bits of dynamic range. They are designed to allow a voice to
> bubble along at -30dB with good quality, and burst up to a much higher
> level in the loud bits.
>> Maybe FS is lower than it should be. Maybe other services are louder than they should be. If FS should be louder, though, I'd like to help to change the levels up-stream, rather than locally reprocessing the prompts.
>> So, is this a personal judgement case, or are their standards available that can be consulted?
>> On Jun 27, 2011, at 9:48 AM, Steve Underwood wrote:
>>> On 06/27/2011 07:11 AM, Bryan Smart wrote:
>>>> I have tools to batch-process audio files. I just was not sure that regaining all of the prompt files was the best approach. I figured that the gain must have been reduced so dramatically for some sort of reason (to avoid clipping in some situation, to work better with the internal resampling, etc).
>>>> What AGC do you mean? I know that AGC has recently been added to conferencing, but the level of the prompts is a system-wide situation. As far as I know, there isn't AGC that can be applied on every channel, and, even if there was, there would surely be a processing hit, so the goal would be to avoid needing it, right?
>>>> The root problem, at least for me, is this. I need to add voice prompts and other audio for an IVR. I can't simply normalize all of my prompts to 0DB, as, even though they don't distort, they're so loud when compared to the stock prompts, they'll blow the phone out of my hand. To match them to the stock prompts, I must normalize them to around -16DB. I can do that, but it seems very wrong. At -16DB, nearly 85% of the potential gain of the channel is lost.
>>> -16dBM0 or -16dBOv, and average or peak burst power? -16dBOv for the
>>> average power is about where you want a voice prompt to be. In some
>>> juristictions you could be in breach of a regulation or two if you set
>>> the level higher than that on the PSTN. Why would you set a voice prompt
>>> to 0dB? It will be clipping like crazy.
>>>> Try this... With the demo IVR (5000), add this before the sleep command in the dialplan.
>>>> <action application="set_audio_level" data="write 4"/>
>>>> That is the max gain boost available for a channel. The prompts should be really clipping with that much amplification, but they don't clip at all. At -16DB, you could literally amplify them to 6 times their native level without distorting. Native level is too low. Once I realized this, it became clear to me why Freeswitch sounded more quiet than Asterisk, at least when working with recorded prompts.
>>>> I suppose I could use set_audio_level on every last call, but I'm sure that real-time amplification, like AGC, is another processor drain that builds up with lots of calls. Besides, it seems weird to dramatically reduce the level of audio, and then waste cycles amplifying it back up in real-time.
>>> Join us at ClueCon 2011, Aug 9-11, Chicago
>>> http://www.cluecon.com 877-7-4ACLUE
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>> Join us at ClueCon 2011, Aug 9-11, Chicago
>> http://www.cluecon.com 877-7-4ACLUE
>> FreeSWITCH-users mailing list
>> FreeSWITCH-users at lists.freeswitch.org
> Join us at ClueCon 2011, Aug 9-11, Chicago
> http://www.cluecon.com 877-7-4ACLUE
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
More information about the FreeSWITCH-users