[Freeswitch-users] FreeSWITCH 1.6 - Where are the bottlenecks?
Shaun Stokes
shaun.stokes at itec-support.co.uk
Thu Jul 26 08:44:47 UTC 2018
Hi All,
We've experienced a number of segmentation faults over time with various errors in the core dump and no clear indication of the exact cause of the problem, we suspect this is load related.
FreeSWITCH is typically handling more than 100 concurrent sessions but rarely more than 10 session per second, these segmentation faults have even occurred in the early hours of the morning with less than 10 session at the time which leads us to believe the bottleneck is on the number of simultaneous registrations\subscriptions which is the only variable which is roughly the same at all hours only decreasing and increasing slightly in the evenings and mornings.
We've created more SIP profiles to distribute registrations to no avail, we're only able to prevent segmentation faults entirely by distributing registrations across more servers. The problems seem to occur when we have combination of:
- 10,000 subscriptions and 400 registrations; reducing subscriptions to 5000 resolves the issue.
- 1200 subscriptions and 950 registrations; reducing registrations to 700 resolves the issue.
We have a few FreeSWITCH servers which are dedicated for call routing only, these are processing in excess of 400 concurrent sessions and never have any issues.
Our server specs:
Debian 8.9
PostgreSQL 9.4 - Optimised for best performance
FreeSWITCH 1.6.20 - Using PostgreSQL backend
32GB RAM
Dual socket Intel Xeon X5670 2.93Ghz - With Hyper threading enabled
Our plan is to implement SBCs in the future to handle subscriptions and registrations but in the short term we need to understand the limitations of FreeSWITCH, where are the bottlenecks, and are there any other solutions.
Looking at SIP traces at the time of these segmentation faults there are more than 30 SIP messages per second mostly subscribe, 202, notify, registration and 200. If we look at all packets on the FreeSWITCH SIP port including Application Data, ACK etc there are hundreds of packets per second.
We initially suspected the bottleneck could be the FreeSWITCH backend database but moving to PostgreSQL has had no effect.
Has anyone come across similar issues before and\or is able to offer any advice?
We're currently looking at two possibilities, a software bottleneck with-in FreeSWITCH, or a bottleneck on the hardware possibly as a result of using dual socket CPUs. We are in the process of enabling node interleaving on the hardware so we use SUMA (Sufficiently Uniform Memory Access) instead of NUMA (Non-Uniform Memory Access) which should theoretically reduce traffic on the QPI bridge slightly given that FreeSWITCH isn't optimised for NUMA.
Here's an example of some of some of the segmentation faults we've experienced:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 strlen () at ../sysdeps/x86_64/strlen.S:106
106 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0 strlen () at ../sysdeps/x86_64/strlen.S:106
#1 0x00007f92302f697e in __GI___strdup (s=0xecd42c18 <error: Cannot access memory at address 0xecd42c18>) at strdup.c:41
#2 0x00007f9231849ea4 in switch_ivr_generate_xml_cdr (session=0x7f91dd8146a8, xml_cdr=0x7f900b6da748) at src/switch_ivr.c:2803
#3 0x00007f91696fc7b0 in my_on_reporting (session=0x7f91dd8146a8) at mod_xml_cdr.c:228
#4 0x00007f92317801d5 in switch_core_session_reporting_state (session=0x7f91dd8146a8) at src/switch_core_state_machine.c:938
#5 0x00007f923177bf9c in switch_core_session_run (session=0x7f91dd8146a8) at src/switch_core_state_machine.c:609
#6 0x00007f9231775737 in switch_core_session_thread (thread=0x7f8ed51f5170, obj=0x7f91dd8146a8) at src/switch_core_session.c:1648
#7 0x00007f9231775b25 in switch_core_session_thread_pool_worker (thread=0x7f8ed51f5170, obj=0x7f8ed51f5000) at src/switch_core_session.c:1711
#8 0x00007f9231a9bea5 in dummy_worker (opaque=0x7f8ed51f5170) at threadproc/unix/thread.c:151
#9 0x00007f9230c85064 in start_thread (arg=0x7f900b6db700) at pthread_create.c:309
#10 0x00007f923035d62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
66 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
#1 0x00007fd953bf9db8 in apr_thread_mutex_lock (mutex=0x19101fd8) at locks/unix/thread_mutex.c:92
#2 0x00007fd95389e663 in switch_mutex_lock (lock=0x19101fd8) at src/switch_apr.c:293
#3 0x00007fd9538a7f0c in switch_channel_test_flag (channel=0x7fd619101910, flag=CF_THREAD_SLEEPING) at src/switch_channel.c:1568
#4 0x00007fd9538ed448 in switch_core_session_read_lock (session=0x7fd627d19eb8) at src/switch_core_rwlock.c:91
#5 0x00007fd9538d78fc in switch_core_session_hupall_matching_var_ans (var_name=0x7fd8b70229b1 "cc_member_pre_answer_uuid",
var_val=0x7fd891c86680 "e38c1d71-e760-44bf-af63-4b54dcbd01fc", cause=SWITCH_CAUSE_ORIGINATOR_CANCEL, type=(SHT_UNANSWERED | SHT_ANSWERED)) at src/switch_core_session.c:231
#6 0x00007fd8b701eacd in callcenter_function (session=0x7fd90508dbc8, data=0x7fd6149e5a68 "Adjustments at customerdomain.com") at mod_callcenter.c:3018
#7 0x00007fd9538df841 in switch_core_session_exec (session=0x7fd90508dbc8, application_interface=0x2836ff0, arg=0x7fd6149e5a68 "Adjustments at customerdomain.com")
at src/switch_core_session.c:2802
#8 0x00007fd9538defe8 in switch_core_session_execute_application_get_flags (session=0x7fd90508dbc8, app=0x7fd6149e5a58 "callcenter",
arg=0x7fd6149e5a68 "Adjustments at customerdomain.com", flags=0x0) at src/switch_core_session.c:2672
#9 0x00007fd9538e168b in switch_core_standard_on_execute (session=0x7fd90508dbc8) at src/switch_core_state_machine.c:353
#10 0x00007fd9538e31c6 in switch_core_session_run (session=0x7fd90508dbc8) at src/switch_core_state_machine.c:650
#11 0x00007fd9538db737 in switch_core_session_thread (thread=0x7fd9048e9370, obj=0x7fd90508dbc8) at src/switch_core_session.c:1648
#12 0x00007fd9538dbb25 in switch_core_session_thread_pool_worker (thread=0x7fd9048e9370, obj=0x7fd9048e9200) at src/switch_core_session.c:1711
#13 0x00007fd953c01ea5 in dummy_worker (opaque=0x7fd9048e9370) at threadproc/unix/thread.c:151
#14 0x00007fd952deb064 in start_thread (arg=0x7fd891c87700) at pthread_create.c:309
#15 0x00007fd9524c362d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thanks,
Shaun
[http://www.itec-support.co.uk/wp-content/uploads/2016/07/email_logo.jpg]
Shaun Stokes - Infrastructure Analyst
T : 01453 700713
E : shaun.stokes at itec-support.co.uk
W : www.itec-support.co.uk
Registered Address :- ITEC Support, Suite 2 Prospect House, Bath Road, Stroud, Gloucestershire GL5 3QF
Company No. 06908001
CONFIDENTIALITY NOTICE
This communication and the information it contains are intended for the person or organisation to which it is addressed. Its contents are confidential and may be protected in law. Unauthorised use, copying or disclosure of any of it may be unlawful. If you are not the intended recipient, please contact us immediately.
The contents of any attachments in this e-mail may contain software viruses, which could damage your own computer system. While ITEC Support has taken every reasonable precaution to minimise this risk, we cannot accept liability for any damage which you sustain as a result of software viruses. You should carry out your own virus checking procedure before opening any attachment.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20180726/fff04600/attachment-0001.html>
More information about the FreeSWITCH-users
mailing list