[Freeswitch-users] Proper prompt gain/level

Tue Jun 28 23:26:12 MSD 2011

I must be wrong about DBm0, then. I wasn't familiar with it, so I checked Wikipedia. Wikipedia isn't always right, of course.

Wikipedia says...
----------
dBm0 is an abbreviation for the power in dBm measured at a zero transmission level point.

dBm0 is a concept used (amongst other areas) in audio/telephony processing since it allows a smooth integration of analog and digital chains. Notably, for
A-law and μ-law codecs the standards define a sequence which has a 0 dBm0 output.

......

Note 2: 0 dBm0 is often replaced by or used instead of digital milliwatt or zero transmission level point.
----------

Where is my zero level transmission point? Data from a codec is not power being transmitted, only data for reproducing power at some later time. Isn't it all up to the D/A converter at the far end to determine what the power levels yielded by decoding the data will be represented relative to. If it is a hardware SIP phone, maybe the person has the volume cranked up, or turned down. Since the analog representation of the signal starts in the D/A that feeds the handset, what is considered 0? I have no idea. It sounds like this is a scale used for calibrating a D/A that feeds a pstn circuit.

I admit feeling frustrated by this discussion. I've mixed music and mastered CDs for nearly 15 years, and I've always felt that 0DB in an entirely digital domain is a near universally understood concept. If I have a sample of 16-bit signed LPCM audio, then -32768 or 32767 represent the max amplitude that can be stored, and is what everyone that I know in pro audio calls 0DB when speaking strictly about a file, rather than a PA, in/out levels to an analog device like a tape machine, etc. If the gain of the encoded signal is boosted to a point where none of the samples are pushed beyond this range, then none of them clip. I thought that you'd think of DB in the same way, given we were talking about levels in files, but it feels like you're nit-picking or antagonizing me. It sounds like you're telling me that they can clip, even if the gain was never increased to a point where this range would be overflowed, but I have no idea how that is possible. It does not happen when I play such audio through the D/A on a computer's sound card, nor when I store it on a CD and play it back through a stereo.

I take your point about headroom. Still, I don't feel that the current level matches caller speech. If I felt that the prompts blended well, I wouldn't have even put myself through this thread. I've connected a mix of hardware SIP phones, desktop SIP clients, and iOS SIP clients to a conference, with no modifications to the audio level of the channel, and the level of people speaking to each other in the conference is significantly louder than any prompts that are played through it. Maybe *all* of the clients are pushing audio too strongly. I first thought it was something to do with the conference, but I soon realized that the prompts were quiet everywhere, not just when played in a conference.

I don't expect anything to change due to my personal preference, and I realize that a background in digital audio as applies to music and live recording doesn't mean that I'm not ignorant about many things that involve digital audio as it applies to telephony. I raised the issue here to see if I might be doing something wrong. If not, I wondered if the prompt levels are set based on some sort of standard by people that are wiser regarding the details than me? It seems, though, that there really isn't a standard, and the decision is someone else's personal preference. I'd rather that personal preferences never be a default. Since there isn't an official standard/guideline, I suppose that someone has to make a decision, and that decision will be influenced by their own preferences.

I have a few suggestions to help improve Michael's sox script, but that's where I'll leave this issue. It isn't worth a big argument when I can fix it myself.

I apologize to the list for any smoke, hints of flames, or other frustration that might have leaked through in to my posts. The sound level thing matters to me, but it is really a small thing. I enjoy Freeswitch immensely, and really appreciate everyones' efforts in producing and updating it. I've always been fascinated with phones and any type of interactive phone app, and so Asterisk, and now Freeswitch, really spark my imagination. I'm 34, and the study of phones in my early teens was my first conceptual exposure to a large network. I phreaked a bit at the time. I was fortunate (at least in one regard) to live in the US deep south, where digital switching equipment wasn't common, and so all of the old 1970's techniques weren't unavailable. That didn't last long, but it peaked my curiosity. I used VXML for a project for an employer in 2002 or so, but didn't really feel any excitement about phones, in the way that I used to, until I ran across Asterisk. Asterisk was great for the time, but I wanted to use it more for apps than a PBX, and so was quite excited to discover the different design of Freeswitch. I rarely become excited about new environments and frameworks anymore, but Freeswitch has put me back in to a fun mood of exploration and experimentation.

Bryan

On Jun 28, 2011, at 11:58 AM, Steve Underwood wrote:

> On 06/28/2011 11:04 AM, Bryan Smart wrote:
>> I think that dBm0 only applies if we are measuring the power on an analog circuit, or at the d/a point of a digital circuit.
> Nope.
>> I was performing analysis of a digitally encoded audio file, therefore the measurement is dBFS, and 0DBFS is the point where clipping takes place. Anything below 0DBFS does not clip, even though it might seem loud to someone.
> What is your notion if dBFS? There are two - one where 0dBFS is a sine 
> wave touching the limits of the number range, and the other where 0dBFS 
> is a square wave touching those limits. They are only 3dB apart, but 
> dbOv and dBm0 are better defined scales.
> 
> Your notion is 0dBFS not clipping is based on a sine wave. For speech 
> you need something lower because of the statistics of speech.
>> I stated that the peak power of the stock prompts are typically -15 to -16 (DBFS), and the short term RMS power is about -32DBFS.
> That statement doesn't seem to make much sense.
>> I don't know how to properly evaluate how DBFS will convert to DBm0. I gather that the codec and the d/a converter attenuate the level to some extent, but I don't know how much. If I have a digital file with a 1Khz test tone at -10DBFS, and play that over the pstn, what do I get out in DBm0?
> A sine wave touching the limits of the digital range is about +3.14dBm0.
>> Speaking just in terms of DBFS, -16DB is only about 15% of the potential power available before the audio reaches 0DB, and clips. Perhaps when -16DBFS is put out over the pstn, it is much closer to clipping than it would be in an entirely digital domain. I don't know enough to make that determination. Do you know?
> Yes, and you should know, since I told you last time.
>> I'm not for clipping and distortion. However, too little power can cause another problem: limited dynamic range and increased dithering artifacts. If audio is quiet on a pstn phone, then the person with the phone might be able to increase the level by turning up the phone's volume, if it has one. However, that raises the noise floor. Companding might make 8-bit channels sound a bit like 14-bit or 15-bit channels in terms of a low noise floor and decreased dithering artifacts, but it is still just an 8-bit channel. Companding hides most of the dithering artifacts in strong signals, but it magnifies them in weak signals. That's why G711 is fairly clear, but will sound scratchy if you put faint signals in to it and try to amplify them back up to normal levels. Over G711 or any pstn call, as you decrease the level of the audio going across, the scratchy dithering static obscures an increasing amount of the audio.
> That paragraph is utter drivel. The companding of G.711 ensures that the 
> audio quality is roughly maintained from clipping down to -45dBm0 or so. 
> Below that the quality falls, just like most other digital coding scheme 
> running out of bits. The FS prompts aren't nearly that quiet.
>> This gets worse if you compound it by stacking codecs. Consider if someone calls your IVR from a cell phone, and you play quiet prompts to them. The audio is first passed through G711, where the low gain means that scratchy dithering artifacts are added. Then, it is encoded to GSM for the cell phone,. GSM uses linear predictive coding, and, being tuned for voice, it is not optimized for having smooth waveforms interrupted periodically with random excursions. Thats part of why cell phone calls sound so bad if there is lots of background noise.
> Again, this is drivel.
>> Anyway, clipping is to be avoided, but simply reducing levels dramatically creates other quality problems on a channel that uses companding. You trade distorted audio for scratchy audio. Same thing happened with cassette tapes that used Dolby noise reduction.
> You really are just making this up as you go along, aren't you.
>> Maybe you can help clear up my understanding of how DBFS in a digital file will work out in DBm0 on a pstn line. As things stand, though, I think we have lots of room to increase the level of the prompts before we reach a point where clipping is an issue.
> I tried playing with a few of the prompts. The peaks of the words are 
> about -20dBm0, or -26dBOv. That's probably -23 or -26 on your scale. 
> Every word peaks to a similar level, as prompts are spoken in a pretty 
> flat voice. They seem to be about 15dB away clipping, which would mean 
> the crest factor is about 11dB for the Callie voice, which seems a 
> little low for speech. You usually need to ride a little more than 13dB 
> above the short term power of most voices to bring the crossing rate 
> close to zero.
> 
> So, there is about 15dB of headroom which could be used to increase the 
> speech level without clipping. Whether doing that is a good idea is 
> questionable. The prompts are at about normal speech level. In normal 
> speech the codecs have plenty of headroom to cope with people getting 
> agitated and shouting, producing only modest amounts of clipping. Is 
> cranking all the prompts up to shouting level appropriate?
> 
> Steve
> 
> 
> 
> _______________________________________________
> Join us at ClueCon 2011, Aug 9-11, Chicago
> http://www.cluecon.com 877-7-4ACLUE
> 
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org