[Freeswitch-users] Re- FreeSWITCH call billing timers vs. BT SIP billing

Thu Nov 26 01:09:45 MSK 2015

Michael,

I very much appreciate your feedback regarding my issue in regard to billing timer positions.

I take on-board everything you have stated, however I just wondered if you could clarify one thing for me.

Since it is possible (perhaps due to bad programming design) to load a service on FreeSWITCH that catches the HUNGUP and does some code that delays ending the service (which causes an unknown delay between CHANNEL_HANGUP being detected ("BYE" message") and CHANNEL_HANGUP_COMPLETE ("200 OK" message)) then since the Telco assumes the call is complete when they send the "BYE" message why doesn't FreeSWITCH simply stop the billing timer when receiving the "BYE" (ie. CHANNEL_HANGUP event) instead of currently setting up all the billing timer variables at CHANNEL_HANGUP_COMPLETE?

This would avoid a bad service from skewing the call durations and we would be left with only RTT (which I can live with).

Since there already is a variable called variable_answerusec (which seems to contain the time from CHANNEL_ANSWER to CHANNEL_HANGUP) then why is the variable not used for all the associated billing variables?

variable_billsec, variable_billmsec, variable_billusec

Let me know your thoughts.

Andrew

From: freeswitch-users-bounces at lists.freeswitch.org [mailto:freeswitch-users-bounces at lists.freeswitch.org] On Behalf Of Michael Giagnocavo
Sent: Wednesday, 25 November 2015 7:44 PM
To: FreeSWITCH Users Help <freeswitch-users at lists.freeswitch.org>
Subject: Re: [Freeswitch-users] Re- FreeSWITCH call billing timers vs. BT SIP billing

The answer is 5: It shouldn't really matter.

If FS didn't start the timer from the 200, then a call that was never ACK'd (which can last like 30 or 60 seconds with the terrible, default, SIP timer values) would have no duration. That's not desirable, so starting from the 200 is the only thing that really makes much sense. Likewise for the hangup part. After sending a BYE (signifying you're done), why would you continue to keep a call "up" until receiving a reply? Remember there's another leg connected, and you wanna start shutting that leg down and moving or otherwise moving on with life (dialplan). At best you'd want both timestamps (or an indication the other side timed out), so you might do some processing after BYE, then finalize it on 200 of the BYE.

But, this is really quite a distraction. Regardless which messages you choose to use, your times will never coincide perfectly with the others due to network latency. So if you accept you'll always be off by that much, the question now becomes: how much difference is there between the various forms of measuring? Under normal circumstances, you're looking at one network roundtrip + processing. Processing times should be negligible, so the RTT is the only thing that matters. However, the other side doesn't get much of a choice. BT cannot time off of when you _sent_ the 200, only when they received it. Nor can they time off of when you _received_ their ACK, only when they sent it. Therefore, regardless of which method you use, you'll always be off by one-way latency[1].

As an example, with 40ms one-way latency:

T+0.000: 200 OK SENT (FS)
T+0.040: 200 OK RECEIVED (BT)
T+0.040: ACK SENT (BT)
T+0.080: ACK RECEIVED (FS)

You hangup:
T+1.000: BYE SENT (FS)
T+1.040: BYE RECEIVED (BT)
T+1.040: 200 OK SENT (BT)
T+1.080: 200 OK RECEIVED (FS)

They hangup:
T+1.000: BYE SENT (BT)
T+1.040: BYE RECEIVED (FS)
T+1.040: 200 OK SENT (FS)
T+1.080: 200 OK RECEIVED (BT)

I'm assuming processing times are under 1ms, which seems quite reasonable. Notice how in the first part, BT gets no choice; both timestamps (OK rec'd, ACK sent) have the same ms. It's the same for the hangup case, except you're the one that doesn't get a choice of when to measure if they hangup. Though note something interesting: sometimes it balances out. You timestamp off the 200 OK, so you're "ahead" by 40ms (your call starts 40ms before theirs). Then you hangup, timestamping from when you send the BYE. Again you're ahead, but this time your call ends 40ms before theirs, cancelling out your head start. If they hangup, though, then they get to "start late" and "end early". I'll show the number on this, but it rounds out just fine. It'd only really be a problem if FS were inconsistent and measured off sending your own packet in one case, but receiving their reply in another. That'd be a bad design I suppose.

Otherwise the only time when this really matters is when there's something else going really wrong. A dropped packet can make one side retransmit or timeout, so waiting for a response might take significantly longer (many seconds if you follow the SiP spec). But if this is happening enough to matter, you've got a bigger issue to address. (Though since some systems didn't address this, you could potentially find a fraud exploit by e.g. delaying or not OK'ing BYEs.)

As for processing times: Any part of the network, from the app-level to the OS, to physical phenomenon, can delay things arbitrarily. The internal design of switches can delay things. For instance, you could do "start = now(); sendPacket();" only to have the OS preempt your execution after now() but before sendPacket()". Or perhaps the code is something like "runUserScripts(); doHangupWork();" - that'd explain the behaviour you described by delaying hangup, right? But generally, if processing times are making a real dent in things, you most likely have a more serious problem. I don't have an answer here on FS internals, and I'm not sure you want it anyways. Do you really want to take a dependency on implementation details (if they aren't documented and guaranteed)?

But ok, let's look at the bad case, where we measure from when we send the OK (our call starts 40ms "early"), but they hangup (our call ends 40ms "late"), and that all our traffic happens this way. In this case the offsets stack instead of cancelling out, so our calls are 80ms longer than theirs. If the ACD is, say, 3 minutes (180,000ms), that 80ms comes out to a 0.0% difference.[2] At least that's what I'm getting at 2am; please doublecheck all this and decide if I'm accurate.

>From experience: Having used FS for years, I've never found timestamps to be a significant source of discrepancies. You're far more likely to get missing CDRs due to someone's code/screwup somewhere - it's surprising how bad this can be sometimes. One big carrier, for instance, wouldn't have all their CDRs ready when they billed, so next week's bill might actually include calls that happened a month ago. Neato! Most people consider under 1% (or even 3%) of difference to be fine. Don't worry about it, just check after some tests or some live calls and make sure you're not seeing anything nutty. And know that by far the most common billing issue is rate disagreement, not duration.

Anyways, time in distributed systems is a complicated topic AFAIK. There's probably some sort of special name and way for how this sort of thing is handled in telecom as it applies to any system, not just SIP.

-Michael

1: Usually half of RTT but it's theoretically possible for asymmetric routing to make this not so.
2: You'd need to be doing some real short-call traffic for this to be a noticeable source of billing problems. And most short-call traffic is hungup by the sender, so the offsets cancel out.

From: freeswitch-users-bounces at lists.freeswitch.org<mailto:freeswitch-users-bounces at lists.freeswitch.org> [mailto:freeswitch-users-bounces at lists.freeswitch.org] On Behalf Of Andrew Keil
Sent: Tuesday, 24 November, 2015 16:16
To: FreeSWITCH Users Help <freeswitch-users at lists.freeswitch.org<mailto:freeswitch-users at lists.freeswitch.org>>
Subject: [Freeswitch-users] Re- FreeSWITCH call billing timers vs. BT SIP billing

To FreeSWITCH users,

I have a question in regard to correct billing of a SIP call within FreeSWITCH (to align with Telco billing),
since I have asked this question below of BT (direct to their SIP compliance testing team) and received their response today.

Based on the diagram below of a simple SIP call:

BT              FreeSWITCH
| --- INVITE ------> |
| <-- 100 Trying --- |
| <-- 200 OK ------- |
| --- ACK ---------> |
| <-- RTP ---------> |
| --- BYE ---------> |
| <-- 200 OK ------- |

In order to best match the BT billing records for call duration (that will be sent to my client for their SIP service) I want to make sure that our timers are correct.

In the above diagram shows an inbound call from BT to FreeSWITCH.

Which of the below statements are correct (2 out of the 4 should be correct):

1)   The answer (or connect) timer should start from when FreeSWITCH sends the "200 OK" to indicate that the call is connected?
2)   The answer (or connect) timer should start from when FreeSWITCH receives the ACK message back from BT (after sending the "200 OK" to indicate that the call was connected)?
3)   The timer for call duration should end when FreeSWITCH receives the "BYE" message from BT?
4)   The timer for call duration should end when FreeSWITCH sends the "200 OK" to BT (after it received the "BYE" message from BT)?

If the call was reversed ie. an outbound call made from FreeSWITCH to BT (simply swap BT and FreeSWITCH in the above diagram)

Which of the below statements are correct (2 out of the 4 should be correct):

1)   The answer (or connect) timer should start from when FreeSWITCH receives the "200 OK" to indicate that the call is connected?
2)   The answer (or connect) timer should start from when FreeSWITCH sends the ACK message back to BT (after receiving the "200 OK" to indicate that the call was connected)?
3)   The timer for call duration should end when FreeSWITCH sends the "BYE" message to BT?
4)   The timer for call duration should end when FreeSWITCH receives the "200 OK" from BT (after it sent the "BYE" message to BT)?

BT's response:

...
Out of the first 4 statements, numbers 2 and 3 are closest. After BT sends the ACK the call timer will start, it will then stop again when BT sends the BYE.
Out of the second 4 statements, numbers 2 and 3 are closest. After BT receives the ACK the call timer will start, it will then stop again after BT receives the BYE.
...

Therefore, when looking at FreeSWITCH this becomes more interesting, if I base BT's response to be true and correct.

Since currently the events generated at the end of a call CHANNEL_HANGUP and CHANNEL_HANGUP_COMPLETE, which align as follows:

CHANNEL_HANGUP when BYE received or sent
CHANNEL_HANGUP_COMPLETE when 200 OK sent or received

Which brings me to my actual question:

Since billing timing is done from the point of call connect (or when the ACK is received or sent after the 200 OK) to the point of receiving the BYE message (or sending it, depending on the direction of the call)
then why does FreeSWITCH return all the billing variables inside the CHANNEL_HANGUP_COMPLETE based on call timer ending when CHANNEL_HANGUP_COMPLETE is generated and not based on when CHANNEL_HANGUP is generated?

Here is a snipet of the CHANNEL_HANGUP_COMPLETE variables:

variable_duration:14
variable_billsec:14
...
variable_billmsec:14041
...
variable_billusec:14041103
...
variable_answerusec:58543

In order to test this, I delayed a Lua script after HUNGUP (or CHANNEL_HANGUP event) was detected (for about 10 seconds) prior to ending the Lua script.  I know this is not recommended, however it is just to explain this case.

As you can see the call duration and all the billing variables show the time up to CHANNEL_HANGUP_COMPLETE not CHANNEL_HANGUP.  However, the interesting variable is: variable_answerusec, which seems correct up to CHANNEL_HANGUP.

I would appreciate any comments regarding this since obviously I would like the call durations on FreeSWITCH to match BT.
If BT are not correct with their timer points, then I am happy to go back to them with some evidence to dispute their claims.

Regards,

Andrew Keil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20151125/5b6f540e/attachment-0001.html