[Freeswitch-users] Proper prompt gain/level

Tue Jun 28 19:58:11 MSD 2011

On 06/28/2011 11:04 AM, Bryan Smart wrote:
> I think that dBm0 only applies if we are measuring the power on an analog circuit, or at the d/a point of a digital circuit.
Nope.
> I was performing analysis of a digitally encoded audio file, therefore the measurement is dBFS, and 0DBFS is the point where clipping takes place. Anything below 0DBFS does not clip, even though it might seem loud to someone.
What is your notion if dBFS? There are two - one where 0dBFS is a sine 
wave touching the limits of the number range, and the other where 0dBFS 
is a square wave touching those limits. They are only 3dB apart, but 
dbOv and dBm0 are better defined scales.

Your notion is 0dBFS not clipping is based on a sine wave. For speech 
you need something lower because of the statistics of speech.
> I stated that the peak power of the stock prompts are typically -15 to -16 (DBFS), and the short term RMS power is about -32DBFS.
That statement doesn't seem to make much sense.
> I don't know how to properly evaluate how DBFS will convert to DBm0. I gather that the codec and the d/a converter attenuate the level to some extent, but I don't know how much. If I have a digital file with a 1Khz test tone at -10DBFS, and play that over the pstn, what do I get out in DBm0?
A sine wave touching the limits of the digital range is about +3.14dBm0.
> Speaking just in terms of DBFS, -16DB is only about 15% of the potential power available before the audio reaches 0DB, and clips. Perhaps when -16DBFS is put out over the pstn, it is much closer to clipping than it would be in an entirely digital domain. I don't know enough to make that determination. Do you know?
Yes, and you should know, since I told you last time.
> I'm not for clipping and distortion. However, too little power can cause another problem: limited dynamic range and increased dithering artifacts. If audio is quiet on a pstn phone, then the person with the phone might be able to increase the level by turning up the phone's volume, if it has one. However, that raises the noise floor. Companding might make 8-bit channels sound a bit like 14-bit or 15-bit channels in terms of a low noise floor and decreased dithering artifacts, but it is still just an 8-bit channel. Companding hides most of the dithering artifacts in strong signals, but it magnifies them in weak signals. That's why G711 is fairly clear, but will sound scratchy if you put faint signals in to it and try to amplify them back up to normal levels. Over G711 or any pstn call, as you decrease the level of the audio going across, the scratchy dithering static obscures an increasing amount of the audio.
That paragraph is utter drivel. The companding of G.711 ensures that the 
audio quality is roughly maintained from clipping down to -45dBm0 or so. 
Below that the quality falls, just like most other digital coding scheme 
running out of bits. The FS prompts aren't nearly that quiet.
> This gets worse if you compound it by stacking codecs. Consider if someone calls your IVR from a cell phone, and you play quiet prompts to them. The audio is first passed through G711, where the low gain means that scratchy dithering artifacts are added. Then, it is encoded to GSM for the cell phone,. GSM uses linear predictive coding, and, being tuned for voice, it is not optimized for having smooth waveforms interrupted periodically with random excursions. Thats part of why cell phone calls sound so bad if there is lots of background noise.
Again, this is drivel.
> Anyway, clipping is to be avoided, but simply reducing levels dramatically creates other quality problems on a channel that uses companding. You trade distorted audio for scratchy audio. Same thing happened with cassette tapes that used Dolby noise reduction.
You really are just making this up as you go along, aren't you.
> Maybe you can help clear up my understanding of how DBFS in a digital file will work out in DBm0 on a pstn line. As things stand, though, I think we have lots of room to increase the level of the prompts before we reach a point where clipping is an issue.
I tried playing with a few of the prompts. The peaks of the words are 
about -20dBm0, or -26dBOv. That's probably -23 or -26 on your scale. 
Every word peaks to a similar level, as prompts are spoken in a pretty 
flat voice. They seem to be about 15dB away clipping, which would mean 
the crest factor is about 11dB for the Callie voice, which seems a 
little low for speech. You usually need to ride a little more than 13dB 
above the short term power of most voices to bring the crossing rate 
close to zero.

So, there is about 15dB of headroom which could be used to increase the 
speech level without clipping. Whether doing that is a good idea is 
questionable. The prompts are at about normal speech level. In normal 
speech the codecs have plenty of headroom to cope with people getting 
agitated and shouting, producing only modest amounts of clipping. Is 
cranking all the prompts up to shouting level appropriate?

Steve