A CPasS ( communication platform as a service ) is cloud based communication platform alo B2B cloud communications platform that provides real time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces .
Traditionally , with IP protected protocols , licensed codecs maintaining a signalling protocol stack , network interfaces building communication platform was a costly affair. Cisco , Facetime , Skype were the only OTT ( over the top) players taking away from telco’s call revenue .
However with the advent of standardize , open source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real time communications on his platform has many options to choose from. This article provides an insight to how CPaaS solution are architectured and programmed
Call server + Media Server that can be interacted with via UA
Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
APIs that can trigger automated calls and perform preprogrammed routing.
Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
Some CPaaS providers also add features like transcribing ,transcoding , recording , playback etc to provide edge over other CPaaS providers
Datacentre vs Cloud server
Advantages of using a CPaaS vs building your own RTC platform
Tech insights and experiences
companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday
keeping up with emerging trends
Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP , other ML based enhancements are provided by CPaaS company
Auto Scaling , High Availability
A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure
CAPEX and OPEX
using a Cpaas saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models
In a nutshell I have come across so many small size startups trying to build CPaaS solution from scratch but only realising it after weeks of trying to build a MVP that they are stuck with firewall, NAT, media quality or interoperability issues . Since there are so many solution already out in the market it is best to instead use them as underlying layer and build applications services using it such as callcentre or CRM services etc .
Developing SIP-based applications – basic call routing, media management
SIP platform Development – NAt and DNS , Cross-platform and integration to External Telecommunication provider landscape , Databases
The contents of this article are
CICD on Jenkins pipeline
Configuration management using chef cookbooks
virtualization and containerization using Docker
Infrastructure management using terraform / Kubernetes
Logs Analysis and Alarming
Packet Capture (PCAP) is an API that captures live network packets. Besides tracking, audit and RTC visualizers, PCAP is widely used for debugging faults such as during production alarm on high failure occurrences.
Example usecase: Production alert on 503 SIP response or log entry from a gateway is not as helpful as PCAP tracking of the session ID of call across various endpoints in and out of the network to determine the point of failure.Debugging involves :
Pre-specified SIP / RTP and related protocols capture
DTMF – Both in-band and out of band DTMF for every call, along with the time stamp.
Codec negotiations – Extracting codecs from PCAP lets us
Validate later whether there were codec changes without prior SIP message,
If the call has been hung up with 488 error code then it was due to which codec
SIP errors – track deviations from standard SIP messaging.
Identify known erroneous SIP messaging scenarios such as for MITM or replay attacks
RTCP Media stats – extract Jitter, Loss, RTT with RTCP reports for both the incoming and outgoing stream.
Identify Media or ACK Timeouts
Check whether a party has not sent any media packet for > 60 s (media time out threshold duration)
When a call is hung up due to ACK time out.
Audio stream – After GDPR, take explicit permission from users before storing audio streams.
Continuous Integration and Delivery Automation using Jenkins
CICD provides continous delivery hub , distribute work across multiple machines, helping drive builds, tests and deployments across multiple platforms .
Jenkins jobs is a self-contained Java-based program extensible using plugins.
Jenkins pieline– orchestrates and automates building project in Jenkins
Configuration management using chef cookbooks
Alternatives like puppet and Ansible, which are also a cross-platform configuration management platform
Compute virtualization and containerization using Docker
Docker containers can be used instead of virtual machines such as VirtualBox , to isolates applications and be OS and platform independent
Makes distributed development possible and automates the deployment possible
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
wait Block until one or more containers stop, then print their exit codes
see all iamges
> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
sipcapture/homer-cron latest fb2243f90cde 3 hours ago 476MB
sipcapture/homer-kamailio latest f159d46a22f3 3 hours ago 338MB
sipcapture/heplify latest 9f5280306809 21 hours ago 9.61MB
<none> <none> edaa5c708b3a
See all stats
> docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
f42c71741107 homer-cron 0.00% 52KiB / 994.6MiB 0.01% 2.3kB / 0B 602MB / 0B 0
0111765091ae mysql 0.04% 452.2MiB / 994.6MiB 45.46% 1.35kB / 0B 2.06GB / 49.2kB 22
Run command from within a docker instnace
docker exec -it bash
First see all processes
select a process and enter its bash
docker exec -it 0472a5127fff bash
to edit or update a file inside docker either install vim everytime u login in resh docker conainer like
apt-get install vim
or add this to dockerfile
RUN [“apt-get”, “update”] RUN [“apt-get”, “install”, “-y”, “vim”]
see if ngrep is install , if not then install and run ngrep to get sip logs isnode that docker container
apt install ngrep
ngrep -p "14795778704" -W byline -d any port 5060
docker volume – Volumes are used for persisting data generated by and used by Docker containers. docker volumes have advantages over blind mounts such as easier to backup or migrate , managed by docker APIs, can be safely shared among multiple containers etc
docker stack – Lets to manager a cluster of docker containers thorugh docker swarm can be defined via docker-compose.yml file
create Create a new service
inspect Display detailed information on one or more services
logs Fetch the logs of a service or task
ls List services
ps List the tasks of one or more services
rm Remove one or more services
rollback Revert changes to a service’s configuration
scale Scale one or multiple replicated services
update Update a service
Run docker containers
sample run command
docker run -it -d --name opensips -e ENV=dev imagename:2.2
-it flags attaches to an interactive tty in the container.
-e gives envrionment variables
-d runs it in background and prints container id
Remove docker entities
To remove all stopped containers, all dangling images, and all unused networks:
docker system prune -a
To remove all unused volumes
docker system prune --volumes
To remove all stopped containers
docker container prune
sometimes docker images keep piling with stopped congainer such as
REPOSITORY TAG IMAGE ID CREATED SIZE d1dcfe2438ae 15 minutes ago 753MB 2d353828889b 16 hours ago 910MB ...
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0dd6698a7517 2d353828889b "/entrypoint.sh" 13 minutes ago Exited (137) 13 minutes ago hardcore_wozniak
to remove such images and their conainer , first stop and remove confainers
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
Terraform is used for building, changing and versioning infrastructure. Infra as Code – can run single application to datacentres via configuration files which create execution plan. It can manage low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc. Resource Graph – builds a graph of all your resources
tfenv can be used to manage terraform versions
> brew unlink terraform
tfenv install 0.11.14
Terraform configuration language
This is used for declaring resources and descriptions of infrastructure and associated files have a .tf or .tf.json file extension Group of resources can be gathered into a module. Terraform configuration consists of a root module, where evaluation begins, along with a tree of child modules created when one module calls another.
Example : launch a single AWS EC2 instance , fle server1.tf
console Interactive console for Terraform interpolations
destroy Destroy Terraform-managed infrastructure
env Workspace management
fmt Rewrites config files to canonical format
get Download and install modules for the configuration
graph Create a visual graph of Terraform resources
import Import existing infrastructure into Terraform
init Initialize a Terraform working directory
output Read an output from a state file
plan Generate and show an execution plan
providers Prints a tree of the providers used in the configuration
refresh Update local state file against real resources
show Inspect Terraform state or plan
taint Manually mark a resource for recreation
untaint Manually unmark a resource as tainted
validate Validates the Terraform files
version Prints the Terraform version
workspace Workspace management
0.12upgrade Rewrites pre-0.12 module source code for v0.12
debug Debug output management (experimental)
force-unlock Manually unlock the terraform state
push Obsolete command for Terraform Enterprise legacy (v1)
state Advanced state management
terraform init Initialize a working directory containing Terraform configuration files.
terraform validate checks that verify whether a configuration is internally-consistent, regardless of any provided variables or existing state.
container orchestration platform , automating deployment, scaling, and management of containerized applications. Can deploy to cluster of computers, automating the distribution and scheduling as well
Service discovery and load balancing – gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.
Automatic bin packing – Automatically places containers based on their resource requirements and other constraints, while not sacrificing availability. Mix critical and best-effort workloads in order to drive up utilization and save even more resources.
Storage orchestration – Automatically mount the storage system of your choice, whether from local storage, a public cloud provider such as GCP or AWS, or a network storage system such as NFS, iSCSI, Gluster, Ceph, Cinder, or Flocker.
Self-healing – Restarts containers that fail, replaces and reschedules containers when nodes die, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
Automated rollouts and rollbacks – progressively rolls out changes to your application or its configuration, while monitoring application health to ensure it doesn’t kill all your instances at the same time.
Secret and configuration management – Deploy and update secrets and application configuration without rebuilding your image and without exposing secrets in your stack configuration.
Batch execution– manage batch and CI workloads, replacing containers that fail, if desired.
Horizontal scaling – Scale application up and down with a simple command, with a UI, or automatically based on CPU usage.
Starting Kubernetes…minikube version: v1.3.0
minikube v1.3.0 on Ubuntu 18.04
Running on localhost (CPUs=2, Memory=2461MB, Disk=47990MB) …
OS release is Ubuntu 18.04.2 LTS
Preparing Kubernetes v1.15.0 on Docker 18.09.5 …
Pulling images …
Launching Kubernetes …
Done! kubectl is now configured to use "minikube"
dashboard was successfully enabled
start Starts a local kubernetes cluster
status Gets the status of a local kubernetes cluster
stop Stops a running local kubernetes cluster
delete Deletes a local kubernetes cluster
dashboard Access the kubernetes dashboard running within the minikube cluster
docker-env Sets up docker env variables; similar to ‘$(docker-machine env)’
cache Add or delete an image from the local cache.
Configuration and Management Commands:
addons Modify minikube’s kubernetes addons
config Modify minikube config
profile Profile gets or sets the current minikube profile
update-context Verify the IP address of the running cluster in kubeconfig.
Networking and Connectivity Commands:
service Gets the kubernetes URL(s) for the specified service in your local cluster
tunnel tunnel makes services of type LoadBalancer accessible on localhost
mount Mounts the specified directory into minikube
ssh Log into or run a command on a machine with SSH; similar to ‘docker-machine ssh’
kubectl Run kubectl
ssh-key Retrieve the ssh identity key path of the specified cluster
ip Retrieves the IP address of the running cluster
logs Gets the logs of the running instance, used for debugging minikube, not user code.
update-check Print current and latest version number
controls the Kubernetes cluster manager.
Basic Commands (Beginner):
create Create a resource from a file or from stdin.
expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
run Run a particular image on the cluster
set Set specific features on objects
explain Documentation of resources
get Display one or many resources
edit Edit a resource on the server
delete Delete resources by filenames, stdin, resources and names, or by resources and label selector
rollout Manage the rollout of a resource
scale Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController
Cluster Management Commands:
certificate Modify certificate resources.
cluster-info Display cluster info
top Display Resource (CPU/Memory/Storage) usage.
cordon Mark node as unschedulable
uncordon Mark node as schedulable
drain Drain node in preparation for maintenance
taint Update the taints on one or more nodes
Troubleshooting and Debugging Commands:
describe Show details of a specific resource or group of resources
logs Print the logs for a container in a pod
attach Attach to a running container
exec Execute a command in a container
port-forward Forward one or more local ports to a pod
proxy Run a proxy to the Kubernetes API server
cp Copy files and directories to and from containers.
auth Inspect authorization
diff Diff live version against would-be applied version
apply Apply a configuration to a resource by filename or stdin
patch Update field(s) of a resource using strategic merge patch
replace Replace a resource by filename or stdin
wait Experimental: Wait for a specific condition on one or many resources.
convert Convert config files between different API versions
kustomize Build a kustomization target from a directory or a remote url.
label Update the labels on a resource
annotate Update the annotations on a resource
completion Output shell completion code for the specified shell (bash or zsh)
api-resources Print the supported API resources on the server
api-versions Print the supported API versions on the server, in the form of “group/version”
config Modify kubeconfig files
plugin Provides utilities for interacting with plugins.
version Print the client and server version information
DevOps monitoring tools nagios
Manage Docker configs
create Create a config from a file or STDIN
inspect Display detailed information on one or more configs
ls List configs
rm Remove one or more configs
attach Attach local standard input, output, and error streams to a running container
commit Create a new image from a container’s changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container’s filesystem
exec Run a command in a running container
export Export a container’s filesystem as a tar archive
inspect Display detailed information on one or more containers
kill Kill one or more running containers
logs Fetch the logs of a container
ls List containers
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
prune Remove all stopped containers
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
run Run a command in a new container
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
wait Block until one or more containers stop, then print their exit codes
Alternatives, Senu multi-cloud monitoring or Raygun
Monitoring, debugging, logs analysis and alarms
Aggregate logs into logstash and provide search and filtering via Elastic Search and Kibana. Can also trigger alerts or notifications on specific keyword searches in logs such as WARNING or ERRRO or call_failed.
Some common alert scenarios include :
SBC and proxy gateways failures – check states of VM instance
DNS caching alerts – Domain Name System (DNS) caching, a Dynamic Host Configuration Protocol (DHCP) server, router advertisement and network boot alerts from service such as dnsmasq
Disk usage alert – setup alerts for 80% usage and trigger an alarm to either manually prune or create automatic timely archive backups. check the percentage of DISK USAGE
Mostly it is either the logs file or pcap recorder which need to be archieved in external storage.
Use logrotate – it can rotates, compresses, and mails system logs
config file for logrorate – logrotate -vf /etc/logrotate.conf
Elevated Call failure SIP 503 or Call timeout SIP 408 – high frequency of failed calls indicate an internal issue and must be followed up by smoke testing the entire system to identify any probable issue such as undetected frequent crashes of any individual component or any blacklisting by a destination endpoint etc
sudo tail -f sip.log | grep 503
sudo tail -f sip.log | grep WARNING
cron service or processed alerts –
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
3 ? I< 0:00 \_ [rcu_gp]
4 ? I< 0:00 \_ [rcu_par_gp]
5 ? I 0:00 \_ [kworker/0:0-eve]
6 ? I< 0:00 \_ [kworker/0:0H-kb]
7 ? I 0:00 \_ [kworker/0:1-eve]
8 ? I 0:00 \_ [kworker/u4:0-nv]
9 ? I< 0:00 \_ [mm_percpu_wq]
10 ? S 0:00 \_ [ksoftirqd/0]
11 ? I 0:00 \_ [rcu_sched]
12 ? S 0:00 \_ [migration/0]
13 ? S 0:00 \_ [cpuhp/0]
14 ? S 0:00 \_ [cpuhp/1]
15 ? S 0:00 \_ [migration/1]
16 ? S 0:00 \_ [ksoftirqd/1]
17 ? I 0:00 \_ [kworker/1:0-eve]
18 ? I< 0:00 \_ [kworker/1:0H-kb]
or checks cron status
service cron status
● cron.service - Regular background program processing daemon
Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2016-06-26 03:00:37 UTC; 1min 17s ago
Main PID: 845 (cron)
Tasks: 1 (limit: 4383)
└─845 /usr/sbin/cron -f
Jun 26 03:00:37 ip-172-31-45-21 systemd: Started Regular background program processing daemon.
Jun 26 03:00:37 ip-172-31-45-21 cron: (CRON) INFO (pidfile fd = 3)
Jun 26 03:00:37 ip-172-31-45-21 cron: (CRON) INFO (Running @reboot jobs)
restart or start cron service if required
DB connections / connection pool process – keep listening for any alerts on DB connections failure or even warnings as this can be due to too many read operations such as in DDOS and can escalate very quickly
cron zombie process checks – zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the “Terminated state”. List xombie process and kill them with pid to free up .
kill -9 <PID1>
bulk calls checks – consult ongoing call cmd commands for application server such as For Freeswitch use
Incase of DDOS or other macious attacker IP identification block the IP
iptables -I INPUT -s y.y.y.y -j DROP
Can also use fail2ban
>apt-get update && apt-get installfail2ban
Additionally check how many dispatchers are responding on outbound gateway
opensipsctl dispatcher dump
Process control supervisor or pm2 checks – supervisor is a Linux Process Control System that allows its users to monitor and control a number of processes
ps axf | grep supervisor
> pm2 status
[PM2] Spawning PM2 daemon with pm2_home=/Users/altanai/.pm2
[PM2] PM2 Successfully daemonized
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │
htop to check memeory and CPU
Health and load on the reverse proxy, load balancer as Nginx – perform a direct curl request to host to check if Nginx responds with a non 4xx / 5xx response or not
curl -v <public-fqdn-of-server>
Incase of error response , restart
Incase of updates restart ngnix config
nginx -s reload
For HTTP/SSL proxy daemon such as tiny proxy which are used for fast resposne , set the MinSpareServers, MaxSpareServers , MaxClients , MaxRequestsPerChild etc appropriately
VPN checks – restart fireealls or IPsec incase of ssues
Additionally also check ssh service
ps axf | grep sshd
restart sshd if required
SSL cert expiry checks – to keep the operations running securely and prevent and abrupt termination it is a good practise to run regular certificate expiry checks for SSL certs especially on secure HTTP endpoint like APIs , web server and also on SIP applications servers for TLS. If any expiry is due in < 10 days to trigger an alert to renew the certs
Health of Task scheduling services such as RabbitMQ, Celery Distributed Task Queue – remote debugging of these can be set up via pdb which supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame.
fscli > show status UP 0 years, 0 days, 0 hours, 58 minutes, 33 seconds, 15 milliseconds, 58 microseconds FreeSWITCH (Version 1.6.20 git 987c9b9 2018-01-23 21:49:09Z 64bit) is ready 3 session(s) since startup 0 session(s) - peak 1, last 5min 1 0 session(s) per Sec out of max 30, peak 1, last 5min 1 1000 session(s) max min idle cpu 0.00/80.83 Current Stack Size/Max 240K/8192K
Programming or Syntax error in the production environment – mostly arising due to incomplete QA/testing before pushing new changes to production. Should trigger alerts for dev teams and meet with hot patches.
Many programing application development frameworks have inbuild libs for debugging , exceotion handling and reporting such as