WebRTC CPaaS ( Communication Platform as a Service )

A CPasS ( communication platform as a service ) is cloud based communication platform alo B2B cloud communications platform that provides real time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces .

Traditionally , with IP protected protocols , licensed codecs maintaining a signalling protocol stack , network interfaces building communication platform was a costly affair. Cisco , Facetime , Skype were the only OTT ( over the top) players taking away from telco’s call revenue .

However with the advent of standardize , open source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real time communications on his platform has many options to choose from. This article provides an insight to how CPaaS solution are architectured and programmed

Sample CPass Architecture build on open source technologies

I have written an article before on Steps for building and deploying WebRTC solution , which includes standalone , cloud hosted and TURN based NAT handler systems .

A typical CPaaS solution provides

  • Call server + Media Server that can be interacted with via UA
  • Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
  • APIs that can trigger automated calls and perform preprogrammed routing.
  • Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
  • Some CPaaS providers also add features like transcribing ,transcoding , recording , playback etc to provide edge over other CPaaS providers

Datacentre vs Cloud server

-tbd

Advantages of using a CPaaS vs building your own RTC platform

Tech insights and experiences

companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday

keeping up with emerging trends

Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP , other ML based enhancements are provided by CPaaS company

Auto Scaling , High Availability

A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure

CAPEX and OPEX

using a Cpaas saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models

In a nutshell I have come across so many small size startups trying to build CPaaS solution from scratch but only realising it after weeks of trying to build a MVP that they are stuck with firewall, NAT, media quality or interoperability issues . Since there are so many solution already out in the market it is best to instead use them as underlying layer and build applications services using it such as callcentre or CRM services etc .

Hosted IP-PBX and its SBC

SBC ( Session Borde Controllers ) are basically gateways that provide interconnectivity between the hosted IP-PBX of the enterprise to the outside world endpoints such as telco service provider, PSTN/ TDM , SIP trunking providers or even third party OTT provider apps like skype for business etc.

If you have a hosted IPPBX or PBX in your data-centre or on premise and you need controlled but heavy outflowing traffic, it is a good idea to integrate a resilient and efficient SBC to provide seamless interconnectivity.

Hosted PBX

For an enterprises such as an Trading floor or warehouse with multiple phone types , softphones , hardphones , turrets etc distributed across various geographies and zones a device agnostic architectural setup is prime . Listing the essentials for setting up such a system. Note supplementary services are data-services , logging , licensing etc are important but kept out of scope to keep focus on functional aspects .

An enterprise application usually is structured in tiers or layers

  • Client tier – the networks clients communication to the central java programs . Runs on client machines
  • web tier – state full communication between client and business tier . Runs in server machine.
  • business tier- handles the logic of the application. The business tier uses the Enterprise Java Bean (EJB) container, which manages the execution of the beans
  • data tier – encompasses DB drivers . Runs on separate machines for database storage

Event services for Line status notifications

providers lines status notification across enterprise for inter zone and softphone to hardphone .

Routing services

routing calls within enterprise and hardphone sites read more about resource zones later in the article

Call Control Manager (CCM)

consolidated set of all service and component that make up the VOIP platform besides media handlers . It includes SIP adapters , bridge managers , call processing frameworks , API frameworks , healthchecks etc .

Call processing framework ( CPF)

signalling and call routing logic , mostly in SIP and trunks . Manages identities such as Call Line information , Called Party Information , line status etc in shared memory.

Multiple shared Lines and their statuses

Incases where there is a need to process multiple calls from a single User agent device such as a softphone or hardphone ( common scenario for a turret phone) , the design involves assigning it multiple sip uris and each sip uri will establish a line.

When caller calls callee , the line is said to be BUSY , otherwise said to be IDLE. Transition of a shared sip line from IDLE to BUSY is transmitted to others via SIP PUBLISH as other UAs holding the same sip

Similarly any other event like transfer is propagated to other via SIP UPDATE

Clustering Call control managers (CCM)

A Call Communication manager (CCM) from various zones should be able to cowork on call and session management and advanced features such as routing from home guest zone to home zone , call transfer , refer , barge etc. Designing a clustered setup will also provide elasticity , fail-over and high availability. Can use clustered , HA compliant framework such as Oracle Communication Application Server , suited for enterprise level deployments.

Call Replication and distributed memory management

A node will store two types of data: active sessions and passive sessions. The active sessions are used by the node and stored in cache. The passive sessions are the replicas from the other nodes’ active sessions. The passives sessions are stored on a persistent storage.

Controlling Line Calls using AOR and Resource Zones

When dealing with many SIP endpoints , now referred to as resource, it is best to assign the resources to their respective zones. Thus a resource’s status updates will be only updated by its active resource zone while can be read by any resource zone.

Incoming request Zone vs Active Resource Zone

For an Incoming request such a INVITE , check whether the zone sending the request is its active resource zone or not .If the Active Resource Zone is the same zone on which the INVITE came in, then the call is handled by that zone. If the Active Resource Zone is a different zone, then the call needs to be forwarded to the Active Resource Zone.

Bridges for Local Media connections

Although call signalling is handled by a resources active resource zone only, we can still create media bridges in local zone of the resource .

Local MM bridges are used to auto answer an incoming sip line call and create trunk , especially from hardphones which do not support provisional responses.

Interzone proxy Handler

proxies call control messages between active and non active resource zones. Primarily mapping the sip messages with all custom headers inbetween the communication device interfaces.

Dial Trunk using multiple dedicated sip lines and connect via Media Bridge

To save up on call routing /connection time and to support te ability to add as many users on call at runtime , a dedicated media bridge is established for every call.

  • A sip line activated is auto-answered by MM , creates a trunk and waits for other endpoint to join the bridge. The flow is as follows :
  • As INVITE arrives for an IDLE sip line , it is connected to a trunk and auto answered by a local MM bridge .
  • Since the call is already answered , when caller dials number for callee , collect the DTMF digits over RTP using RFC 2833 DTMF events.
  • Run inter-digit timer for digit collection and detect end of dialing on timeout.
  • The dialed trunk connection is made and call is added to media bridge
  • When provisional responses are received on the trunk connection, generate in-band call progress tones (ringing, proceeding etc) via the MM
  • When the line answers, the progress tones have to be stopped and the called party gets bridged to the calling party via the media bridge.

Call Diversion involves forwarding calls from zone to another zone. joinjed parties get call UPDATE status and forward response .

Call barge is the processing of joining an ongoing call . The barge event is usually propagated to joined parities via SIP INFO. Private lines do not allow barge in and are exclusively reserved for only few users.

Interconnectivity provided by an SBC ( Session Border Controller)

Hold-Resume and Music on Hold in multi-line evironment

While a regular p2p call involves simple reinvite based hold and resume with varrying SDP, the scenario is slightly more detailed for hold resume on bridged trunk connection , as explained below.

As the calls made are on bridge , a hold signal involves a RE-INIVITE with held-SDP to media manager (MM). If hold status on trunk is 200 OK the hold status will be sent to other call interfaces connected on the trunk. Else if hold is denied ,403 is sent back to hold-initiates.

Music on hold is an one way RTP mostly from media server.

For a bridged scenarios , separate Music on hold bridges are kept on Media Managers. When an UA has to hold , it is removed from original bridge and place on music on hold bridge . To be unhold/ resume it is placed back into the orignal bridge from music on hold bridge .

Conference

user initiates conference, the conference feature can execute on the zone where the user was logged on, irrespective of zones where the other conference attendees join from . The Call processing framework of originators zone completes the SDP exchange to establish two-way speech path among all the parties.

Incases there are multiple connections from a zone , a local MM conference bridge can be created for them which would connect back to originators MM conf bridge . this two part conf bridge will be transparent to the sip line sand users .

For provisioning inputs and settings setup a Diagnostics , Administration and Configuration platform which can process APIs for data services , licences , alarms or do remote device control such as using SNMP

Session Border Controllers (SBC)

At network level SBC operations include

  • bridging multiple interfaces in different networks even between the IPv4 and IPv6 networks
  • auto NAT discovery and STUN
  • protocol conversion such as TLS to UDP etc
  • Flood detection and IP filtering

For SIP specific functionalities , SBC does

  • SIP validation involving checks on syntax and message contents also consistency checks are performed.
  • stateful and call aware. tracing, monitoring and checking for validitya and health of all the SIP messages
  • Topology hiding
  • Traffic filtering
  • Codec filtering , reordering , media pinning, transcoding, or call recording
  • Data replication brings High Availability (HA) with hot backups or even Active-Active solutions.

Traffic sharing and routing roles of SBC can include

  • IP-based and Digest-based authentication
  • limiting traffic by number of concurrent calls or calling rate.
  • Dialplan and/or Custom routing
  • Dispatching/Load-balancing to a backend cluster of servers

SBC’s can be physical hardware boxes or software based applications, as the name suggests their purpose is to control the session at border between the enterprise and external service provider.

SIP to PSTN – SIP is an IP protocol whereas PSTN is a TDM one , achieving interoperability is also the KRA of an SBC

SIP trunking – SBC provide a secure sip connectivity to connect calls to sip trunks which provide bulk calls functionality at a flat pricing.

support for various fixed or mobile endpoints – SBC ensure they are RFC compliant and can extend SIP to any kind of telecom endpoint like PSTN , GSM, fax , Skype , sipphone , IP phones etc.

NAT / Network address translator – To meet the packet routing challenges across a firewall or even during private -public mapping. A combo of DHCP servers and NAT provider comes very handy to reroute or perform hole punching such that signalling and media packets are not dropped and meet the required endpoint. More about NAT here – NAT traversal using STUN and TURN.

Load balancing – Reverse proxies and Load balancers is a much adopted industry practise to mask the inner IPs of the VoIP platform and also route traffic appropriately between control and media server .

Security , QoS and Regulatory compliance – since SBCs are required to typically support a large array of clients they adhere to regulatory and industry accepted standards ,which also involves security features like AAA, TLS/SSL and other means for quality of assurance like logging and fault detection, preventing DDoS etc . In many cases SBC can also encrypt / decrypt RTP streams for probing , tapping or lawful inspection .

Terminating at carriers , PSTN and IP gateways

Additional SBC features

Inaddition to above it is good to have if an SBC provides extra features like forking , emergency number dialing ( 911 ) or active directory integration . Real Time Analysis and monitoring of call and metrics are also expected from a SBC since they reside on edge of the network and are more vulnerable to threats . For example Dialogic Mediant SBC’s and gateways , Audio Codes SBCs

With the shift from on premise PBXs to cloud based VM or microservice architecture , SBC vendors adopt a lager umbrella of services also including automation scripts for checks , reporting tools / consoles , developer friendly APIs to manage sessions via SBC and even WebRTC gateways to connect browser endpoints .

Usage Scenarios

Any VOIP dependant system which deals with bulksome voice / video traffic from external endpoints is a usages scenarios. Listing few

  • Contact Call centres
  • Remote work / offsite monitoring
  • CRM solution for sales/marketing
  • Connecting webrtc click to dial from webpage to enterprise representatives
  • connecting enterprise UCC clients to PSTN endpoints

There are many more.

VoIP system DevOps, operations and Infrastructure management, Automation

This article is focussed around various tools required to operate and mantain a growing large scale VoIP Platform .

Read about VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform which includes :

  • Scalable and Flexible SIP platform building
  • Cluster SIP telephony Server for High Availability
  • Failure Recovery
  • Multi-tier cluster architecture
  • Role Abstraction / Micro-Service based architecture
  • Distributed Event management and Event-Driven architecture
  • Containerization
  • Autoscaling Cloud Servers
  • Open standards and Data Privacy
  • Flexibility for inter-working – NextGen911 , IMS , PSTN
  • security and Operational Efficiencies

Read more about SIP VoIP system Architecture which includes

  • Infrastructure Requirements
  • Integral Components of a VOIP SIP-based architecture
  • RTP ( Real-Time Transport Protocol ) / RtCP
  • SIP gateways, registrar, proxy, redirect, application
  • Developing SIP-based applications – basic call routing, media management
  • SIP platform Development – NAt and DNS , Cross-platform and integration to External Telecommunication provider landscape , Databases

The contents of this article are

  • PCAP Collections
  • CICD on Jenkins pipeline
  • Configuration management using chef cookbooks
  • virtualization and containerization using Docker
  • Infrastructure management using terraform / Kubernetes
  • Logs Analysis and Alarming
Overview of VoIP platform DevOPS tools

PCAP Collection

Packet Capture (PCAP) is an API that captures live network packets. Besides tracking, audit and RTC visualizers, PCAP is widely used for debugging faults such as during production alarm on high failure occurrences.

Example usecase: Production alert on 503 SIP response or log entry from a gateway is not as helpful as PCAP tracking of the session ID of call across various endpoints in and out of the network to determine the point of failure.Debugging involves :

  1. Pre-specified SIP / RTP and related protocols capture 

Capture pcaps examples

tcpdump -i any -w alltraffic.pcap
rtpbreak -P2 -t100 -T100 -d logz -r alltraffic.pcap

2. Call SessionId to uniquely identify failed calls among tens of thousands of the packet 

3. Analyzer such as wireshark or tshark to track the packet

TShark inspection examples

brew cask install wireshark
tshark -r alltraffic.pcap -R "sip.CSeq.method eq INVITE"

Some of the useful call specs captured from PCAP

  • DTMF – Both in-band and out of band DTMF for every call, along with the time stamp.
  • Codec negotiations –  Extracting codecs from PCAP lets us 
    1. Validate later whether there were codec changes without prior SIP message,
    2. If the call has been hung up with 488 error code then it was due to which  codec 
  • SIP errors – track deviations from standard SIP messaging. 
    1. Identify known erroneous SIP messaging scenarios such as for MITM or replay attacks
  • RTCP Media stats – extract Jitter, Loss, RTT with RTCP reports for both the incoming and outgoing stream.
  • Identify Media or ACK Timeouts 
    1. Check whether a party has not sent any media packet for > 60 s (media time out threshold duration)
    2. When a call is hung up due to ACK time out.
  • Audio stream – After GDPR, take explicit permission from users before storing audio streams.
PCAP file analyzed in Wireshark ( PCAP source : https://wiki.wireshark.org/SampleCaptures#Sample_Captures)

Continuous Integration and Delivery Automation using Jenkins

CICD provides continous delivery hub , distribute work across multiple machines, helping drive builds, tests and deployments across multiple platforms .

Jenkins jobs is a self-contained Java-based program extensible using plugins.

Jenkins pieline– orchestrates and automates building project in Jenkins

Configuration management using chef cookbooks

Alternatives like puppet and Ansible, which are also a cross-platform configuration management platform

Compute virtualization and containerization using Docker

Docker containers can be used instead of virtual machines such as VirtualBox , to isolates applications and be OS and platform independent
Makes distributed development possible and automates the deployment possible

  • stop Stop one or more running containers
  • top Display the running processes of a container
> docker top 4417600169e8
UID PID PPID C STIME TTY TIME CMD
root 9913 9888 0 08:50 ? 00:00:00 bash /point.sh
root 10083 9913 0 08:50 ? 00:00:01 /usr/sbin/worker
root 10092 10083 0 08:50 ? 00:00:02 /micro-service
  • unpause Unpause all processes within one or more containers
  • update Update configuration of one or more containers
  • wait Block until one or more containers stop, then print their exit codes

see all iamges

> docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             SIZE
sipcapture/homer-cron       latest              fb2243f90cde        3 hours ago         476MB
sipcapture/homer-kamailio   latest              f159d46a22f3        3 hours ago         338MB
sipcapture/heplify          latest              9f5280306809        21 hours ago        9.61MB
<none>                      <none>              edaa5c708b3a        

See all stats

>  docker stats
CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
f42c71741107        homer-cron          0.00%               52KiB / 994.6MiB      0.01%               2.3kB / 0B          602MB / 0B          0
0111765091ae        mysql               0.04%               452.2MiB / 994.6MiB   45.46%              1.35kB / 0B         2.06GB / 49.2kB     22

Run command from within a docker instnace

docker exec -it  bash

First see all processes

docker ps

select a process and enter its bash

docker exec -it 0472a5127fff bash

to edit or update a file inside docker either install vim everytime u login in resh docker conainer like

apt-get update
apt-get install vim

or add this to dockerfile

RUN [“apt-get”, “update”]
RUN [“apt-get”, “install”, “-y”, “vim”]

see if ngrep is install , if not then install and run ngrep to get sip logs isnode that docker container

apt update
apt install ngrep
ngrep -p "14795778704" -W byline -d any port 5060

docker volume – Volumes are used for persisting data generated by and used by Docker containers.
docker volumes have advantages over blind mounts such as easier to backup or migrate , managed by docker APIs, can be safely shared among multiple containers etc

docker stack – Lets to manager a cluster of docker containers thorugh docker swarm can be defined via docker-compose.yml file

docker service

  • create Create a new service
  • inspect Display detailed information on one or more services
  • logs Fetch the logs of a service or task
  • ls List services
  • ps List the tasks of one or more services
  • rm Remove one or more services
  • rollback Revert changes to a service’s configuration
  • scale Scale one or multiple replicated services
  • update Update a service

Run docker containers

sample run command

docker run -it -d --name opensips -e ENV=dev imagename:2.2

-it flags attaches to an interactive tty in the container.
-e gives envrionment variables
-d runs it in background and prints container id

Remove docker entities

To remove all stopped containers, all dangling images, and all unused networks:

docker system prune -a

To remove all unused volumes

docker system prune --volumes

To remove all stopped containers

docker container prune
sometimes docker images keep piling with stopped congainer such as 

REPOSITORY                                                             TAG                 IMAGE ID            CREATED             SIZE                                                                              d1dcfe2438ae        15 minutes ago      753MB                                                                           2d353828889b        16 hours ago        910MB                                                          ...
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                        PORTS               NAMES

0dd6698a7517        2d353828889b        "/entrypoint.sh"         13 minutes ago      Exited (137) 13 minutes ago                       hardcore_wozniak

to remove such images and their conainer , first stop and remove confainers

docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)

then remove all dangling images

docker rmi  $(docker images -aq --filter dangling=true)

Infrastructure management using terraform

Terraform is used for building, changing and versioning infrastructure.
Infra as Code – can run single application to datacentres via configuration files which create execution plan.
It can manage low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc.
Resource Graph – builds a graph of all your resources

tfenv can be used to manage terraform versions

> brew unlink terraform
tfenv install 0.11.14
tfenv list 

Terraform configuration language

This is used for declaring resources and descriptions of infrastructure and associated files have a .tf or .tf.json file extension
Group of resources can be gathered into a module. Terraform configuration consists of a root module, where evaluation begins, along with a tree of child modules created when one module calls another.

Example : launch a single AWS EC2 instance , fle server1.tf

provider "aws" {
  profile    = "default"
  region     = "us-east-1"
}

resource "aws_instance" "server1" {
  ami           = "ami-2757fxxx"
  instance_type = "t2.micro"
}

note : AMI IDs are region specific.
profile attribute here refers to the AWS Config File in ~/.aws/credentials

Terraform command line interface (CLI)

engine for evaluating and applying Terraform configurations.
uses plugins called providers that each define and manage a set of resource types

Command Usage: terraform [-version] [-help] [args]

  • apply Builds or changes infrastructure
  • console Interactive console for Terraform interpolations
  • destroy Destroy Terraform-managed infrastructure
  • env Workspace management
  • fmt Rewrites config files to canonical format
  • get Download and install modules for the configuration
  • graph Create a visual graph of Terraform resources
  • import Import existing infrastructure into Terraform
  • init Initialize a Terraform working directory
  • output Read an output from a state file
  • plan Generate and show an execution plan
  • providers Prints a tree of the providers used in the configuration
  • refresh Update local state file against real resources
  • show Inspect Terraform state or plan
  • taint Manually mark a resource for recreation
  • untaint Manually unmark a resource as tainted
  • validate Validates the Terraform files
  • version Prints the Terraform version
  • workspace Workspace management
  • 0.12upgrade Rewrites pre-0.12 module source code for v0.12
  • debug Debug output management (experimental)
  • force-unlock Manually unlock the terraform state
  • push Obsolete command for Terraform Enterprise legacy (v1)
  • state Advanced state management

terraform init
Initialize a working directory containing Terraform configuration files.

terraform validate
checks that verify whether a configuration is internally-consistent, regardless of any provided variables or existing state.

Kubernetes

container orchestration platform , automating deployment, scaling, and management of containerized applications. Can deploy to cluster of computers, automating the distribution and scheduling as well

Service discovery and load balancing – gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.

Automatic bin packing – Automatically places containers based on their resource requirements and other constraints, while not sacrificing availability. Mix critical and best-effort workloads in order to drive up utilization and save even more resources.

Storage orchestration – Automatically mount the storage system of your choice, whether from local storage, a public cloud provider such as GCP or AWS, or a network storage system such as NFS, iSCSI, Gluster, Ceph, Cinder, or Flocker.

Self-healing – Restarts containers that fail, replaces and reschedules containers when nodes die, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.

Automated rollouts and rollbacks – progressively rolls out changes to your application or its configuration, while monitoring application health to ensure it doesn’t kill all your instances at the same time.

Secret and configuration management – Deploy and update secrets and application configuration without rebuilding your image and without exposing secrets in your stack configuration.

Batch execution– manage batch and CI workloads, replacing containers that fail, if desired.

Horizontal scaling – Scale application up and down with a simple command, with a UI, or automatically based on CPU usage.

create minikube cluster and deploy pods

prerequisities : docker , curl , redis , others

install minikube

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
chmod +x minikube
install minikube /usr/local/bin

Install kubectl

snap install kubectl --classic
ln -s /snap/bin/kubectl /usr/local/bin

Setup Minikube

minikube start --vm-driver=none
minikube addons enable registry-creds
kubectl -n kube-system create secret generic registry-creds-ecr
kubectl -n kube-system create secret generic registry-creds-gcr
kubectl -n kube-system create secret generic registry-creds-dpr
minikube addons configure registry-creds
Starting Kubernetes…minikube version: v1.3.0
 commit: 43969594266d77b555a207b0f3e9b3fa1dc92b1f
 minikube v1.3.0 on Ubuntu 18.04
 Running on localhost (CPUs=2, Memory=2461MB, Disk=47990MB) …
 OS release is Ubuntu 18.04.2 LTS
 Preparing Kubernetes v1.15.0 on Docker 18.09.5 …
 kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
 Pulling images …
 Launching Kubernetes …
 Done! kubectl is now configured to use "minikube"
 dashboard was successfully enabled
 Kubernetes Started 

Basic Commands

  • start Starts a local kubernetes cluster
  • status Gets the status of a local kubernetes cluster
  • stop Stops a running local kubernetes cluster
  • delete Deletes a local kubernetes cluster
  • dashboard Access the kubernetes dashboard running within the minikube cluster

Images Commands:

  • docker-env Sets up docker env variables; similar to ‘$(docker-machine env)’
  • cache Add or delete an image from the local cache.

Configuration and Management Commands:

  • addons Modify minikube’s kubernetes addons
  • config Modify minikube config
  • profile Profile gets or sets the current minikube profile
  • update-context Verify the IP address of the running cluster in kubeconfig.

Networking and Connectivity Commands:

  • service Gets the kubernetes URL(s) for the specified service in your local cluster
  • tunnel tunnel makes services of type LoadBalancer accessible on localhost

Advanced Commands:

  • mount Mounts the specified directory into minikube
  • ssh Log into or run a command on a machine with SSH; similar to ‘docker-machine ssh’
  • kubectl Run kubectl

Troubleshooting Commands:

  • ssh-key Retrieve the ssh identity key path of the specified cluster
  • ip Retrieves the IP address of the running cluster
  • logs Gets the logs of the running instance, used for debugging minikube, not user code.
  • update-check Print current and latest version number

kubectl

controls the Kubernetes cluster manager.

Basic Commands (Beginner):

  • create Create a resource from a file or from stdin.
  • expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
  • run Run a particular image on the cluster
  • set Set specific features on objects
  • explain Documentation of resources
  • get Display one or many resources
  • edit Edit a resource on the server
  • delete Delete resources by filenames, stdin, resources and names, or by resources and label selector

Deploy Commands:

  • rollout Manage the rollout of a resource
  • scale Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
  • autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController

Cluster Management Commands:

  • certificate Modify certificate resources.
  • cluster-info Display cluster info
  • top Display Resource (CPU/Memory/Storage) usage.
  • cordon Mark node as unschedulable
  • uncordon Mark node as schedulable
  • drain Drain node in preparation for maintenance
  • taint Update the taints on one or more nodes

Troubleshooting and Debugging Commands:

  • describe Show details of a specific resource or group of resources
  • logs Print the logs for a container in a pod
  • attach Attach to a running container
  • exec Execute a command in a container
  • port-forward Forward one or more local ports to a pod
  • proxy Run a proxy to the Kubernetes API server
  • cp Copy files and directories to and from containers.
  • auth Inspect authorization

Advanced Commands:

  • diff Diff live version against would-be applied version
  • apply Apply a configuration to a resource by filename or stdin
  • patch Update field(s) of a resource using strategic merge patch
  • replace Replace a resource by filename or stdin
  • wait Experimental: Wait for a specific condition on one or many resources.
  • convert Convert config files between different API versions
  • kustomize Build a kustomization target from a directory or a remote url.

Settings Commands:

  • label Update the labels on a resource
  • annotate Update the annotations on a resource
  • completion Output shell completion code for the specified shell (bash or zsh)

Other Commands:

  • api-resources Print the supported API resources on the server
  • api-versions Print the supported API versions on the server, in the form of “group/version”
  • config Modify kubeconfig files
  • plugin Provides utilities for interacting with plugins.
  • version Print the client and server version information

DevOps monitoring tools nagios

Manage Docker configs

  • create Create a config from a file or STDIN
  • inspect Display detailed information on one or more configs
  • ls List configs
  • rm Remove one or more configs

Manage containers

  • attach Attach local standard input, output, and error streams to a running container
  • commit Create a new image from a container’s changes
  • cp Copy files/folders between a container and the local filesystem
  • create Create a new container
  • diff Inspect changes to files or directories on a container’s filesystem
  • exec Run a command in a running container
  • export Export a container’s filesystem as a tar archive
  • inspect Display detailed information on one or more containers
  • kill Kill one or more running containers
  • logs Fetch the logs of a container
  • ls List containers
  • pause Pause all processes within one or more containers
  • port List port mappings or a specific mapping for the container
  • prune Remove all stopped containers
  • rename Rename a container
  • restart Restart one or more containers
  • rm Remove one or more containers
  • run Run a command in a new container
  • start Start one or more stopped containers
  • stats Display a live stream of container(s) resource usage statistics
  • stop Stop one or more running containers
  • top Display the running processes of a container
  • unpause Unpause all processes within one or more containers
  • update Update configuration of one or more containers
  • wait Block until one or more containers stop, then print their exit codes

Alternatives, Senu multi-cloud monitoring or Raygun

Monitoring, debugging, logs analysis and alarms

Aggregate logs into logstash and provide search and filtering via Elastic Search and Kibana. Can also trigger alerts or notifications on specific keyword searches in logs such as WARNING or ERRRO or call_failed.

Some common alert scenarios include :

SBC and proxy gateways failures – check states of VM instance

DNS caching alerts – Domain Name System (DNS) caching, a Dynamic Host Configuration Protocol (DHCP) server, router advertisement and network boot alerts from service such as dnsmasq

Disk usage alert – setup alerts for 80% usage and trigger an alarm to either manually prune or create automatic timely archive backups.
check the percentage of DISK USAGE

df -h

Mostly it is either the logs file or pcap recorder which need to be archieved in external storage.

Use logrotate – it can rotates, compresses, and mails system logs

config file for logrorate – logrotate -vf /etc/logrotate.conf

/var/log/messages {
    rotate 5
    weekly
    postrotate
        /usr/bin/killall -HUP syslogd
    endscript
}

Elevated Call failure SIP 503 or Call timeout SIP 408 – high frequency of failed calls indicate an internal issue and must be followed up by smoke testing the entire system to identify any probable issue such as undetected frequent crashes of any individual component or any blacklisting by a destination endpoint etc

sudo tail -f sip.log | grep 503

or

sudo tail -f sip.log | grep WARNING

cron service or processed alerts

 ps axf
  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        I<     0:00  \_ [rcu_gp]
    4 ?        I<     0:00  \_ [rcu_par_gp]
    5 ?        I      0:00  \_ [kworker/0:0-eve]
    6 ?        I<     0:00  \_ [kworker/0:0H-kb]
    7 ?        I      0:00  \_ [kworker/0:1-eve]
    8 ?        I      0:00  \_ [kworker/u4:0-nv]
    9 ?        I<     0:00  \_ [mm_percpu_wq]
   10 ?        S      0:00  \_ [ksoftirqd/0]
   11 ?        I      0:00  \_ [rcu_sched]
   12 ?        S      0:00  \_ [migration/0]
   13 ?        S      0:00  \_ [cpuhp/0]
   14 ?        S      0:00  \_ [cpuhp/1]
   15 ?        S      0:00  \_ [migration/1]
   16 ?        S      0:00  \_ [ksoftirqd/1]
   17 ?        I      0:00  \_ [kworker/1:0-eve]
   18 ?        I<     0:00  \_ [kworker/1:0H-kb]

or checks cron status

service cron status
● cron.service - Regular background program processing daemon
   Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2016-06-26 03:00:37 UTC; 1min 17s ago
     Docs: man:cron(8)
 Main PID: 845 (cron)
    Tasks: 1 (limit: 4383)
   CGroup: /system.slice/cron.service
           └─845 /usr/sbin/cron -f

Jun 26 03:00:37 ip-172-31-45-21 systemd[1]: Started Regular background program processing daemon.
Jun 26 03:00:37 ip-172-31-45-21 cron[845]: (CRON) INFO (pidfile fd = 3)
Jun 26 03:00:37 ip-172-31-45-21 cron[845]: (CRON) INFO (Running @reboot jobs)

restart or start cron service if required

DB connections / connection pool process – keep listening for any alerts on DB connections failure or even warnings as this can be due to too many read operations such as in DDOS and can escalate very quickly

netstat -nltp  | grep db 
tcp        0      0 0.0.0.0:5433            0.0.0.0:*               LISTEN      5792/db-server * 

Routine deepstatus checks is a good practice too

Port checks – Regular checks if servers are lsietning on ports such as 5060 for SIP

netstat -nltp | grep 5060
tcp        0      0 x.x.x.x:5060       0.0.0.0:*               LISTEN      8970/kamailio  

cron zombie process checks – zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the “Terminated state”. List xombie process and kill them with pid to free up .

kill -9 <PID1>

bulk calls checks – consult ongoing call cmd commands for application server such as
For Freeswitch use

fs_ctl> show channels 

For kamailio use kamcmd

kamcmd dlg.list

For asterisk watch or show cmmand

watch -n 1 "sudo asterisk -vvvvvrx 'core show channels' | grep call"

Incase of DDOS or other macious attacker IP identification block the IP

iptables -I INPUT -s y.y.y.y -j DROP   

Can also use fail2ban

>apt-get update && apt-get install fail2ban

Additionally check how many dispatchers are responding on outbound gateway

opensipsctl dispatcher dump

Process control supervisor or pm2 checks – supervisor is a Linux Process Control System that allows its users to monitor and control a number of processes

ps axf | grep supervisor

for pm2

> pm2 status
[PM2] Spawning PM2 daemon with pm2_home=/Users/altanai/.pm2
[PM2] PM2 Successfully daemonized
┌─────┬───────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │

htop to check memeory and CPU

Health and load on the reverse proxy, load balancer as Nginx – perform a direct curl request to host to check if Nginx responds with a non 4xx / 5xx response or not

curl -v <public-fqdn-of-server> 

Incase of error response , restart

/etc/init.d/nginx start

Incase of updates restart ngnix config

nginx -s reload

For HTTP/SSL proxy daemon such as tiny proxy which are used for fast resposne , set the MinSpareServers, MaxSpareServers , MaxClients , MaxRequestsPerChild etc appropriately

VPN checks – restart fireealls or IPsec incase of ssues

/etc/init.d/ipsec restart

Additionally also check ssh service

ps axf | grep sshd

restart sshd if required

SSL cert expiry checks – to keep the operations running securely and prevent and abrupt termination it is a good practise to run regular certificate expiry checks for SSL certs especially on secure HTTP endpoint like APIs , web server and also on SIP applications servers for TLS. If any expiry is due in < 10 days to trigger an alert to renew the certs

Health of Task scheduling services such as RabbitMQ, Celery Distributed Task Queue – remote debugging of these can be set up via pdb which supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame.

import pdb; pdb.set_trace()
python3 -m pdb myscript.py

It can also be set up via using the client libraries provided by these Queue services themselves

cluster status – setup an efficient health check service which monitors the cluster status for High Availability. Learn more about ensuring HA –
JSON object depicting the status of cluster shards

{
  "cluster_name" : "ABC-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 14,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 200,
  "active_shards" : 300,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0
}

Status of Crticial Server Server

fscli > show status
UP 0 years, 0 days, 0 hours, 58 minutes, 33 seconds, 15 milliseconds, 58 microseconds
FreeSWITCH (Version 1.6.20 git 987c9b9 2018-01-23 21:49:09Z 64bit) is ready
3 session(s) since startup
0 session(s) - peak 1, last 5min 1
0 session(s) per Sec out of max 30, peak 1, last 5min 1
1000 session(s) max
min idle cpu 0.00/80.83
Current Stack Size/Max 240K/8192K

Programming or Syntax error in the production environment – mostly arising due to incomplete QA/testing before pushing new changes to production. Should trigger alerts for dev teams and meet with hot patches.

Many programing application development frameworks have inbuild libs for debugging , exceotion handling and reporting such as

  • backend service in Django
  • API service in Go

Distributed memory caching – redis , memcahe

Redis info

>redis-cli info
# Server
redis_version:6.0.4
redis_git_dirty:0
redis_mode:standalone
os:Darwin 18.7.0 x86_64
arch_bits:64
multiplexing_api:kqueue
atomicvar_api:atomic-builtin
gcc_version:4.2.1
tcp_port:6379

# Clients
connected_clients:1
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:1065648
used_memory_human:1.02M
number_of_cached_scripts:0
maxmemory:0
allocator_frag_bytes:1123680
allocator_rss_ratio:1.00
rss_overhead_bytes:37888
mem_fragmentation_ratio:2.16
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:0
module_fork_last_cow_size:0

# Stats
total_connections_received:1
total_commands_processed:0
..

# Replication
role:master
connected_slaves:0
..

# CPU
used_cpu_sys:0.011198
used_cpu_sys_children:0.000000

# Modules

# Cluster
cluster_enabled:0

SMS service using smsc on Kannel From the kannel servers, you should see the PANIC error (most of the time Assertion error crashing kannel):

grep PANIC /var/log/kannel/bearerbox.log

IF you are going to restart , Flush redis cache

sudo redis-cli FLUSHALL
sudo redis-cli SAVE

restart kannel

sudo /etc/init.d/kannel restart

If the carriers are throttling the SMS request , verify “ERROR” responses using

sudo grep -i "throttling" bearerbox.log

Alternatives include AWS logs services ,

  • Scalyr logging
  • Sensu monitoring for multi-cloud monitoring using event pipeline

References :


OTT ( Over the Top ) Communication applications

Market trends are really not in favor of Telecom Service /providers with increasing use of OTT ( Over The Top ) application like watsapp , Facebook messenger , Google hangouts , skype  , viber , etc .

OTT
OTT ( Over The Top ) Applications

What is an OTT ?

An Over The Top ( OTT ) application is one which provides communication services over Internet . Therefore these bypass the communication billing system setup by a Telecom Operator , resulting in no gain or loss of revenue to Telecom Operator who is providing the Internet service to user in first place .

Hence we see that OTT are major threat and concern for Telecom Operators whose traditional and obviously expensive ( when compared to OTTs free service ) billing models are facing disruption .


Telecom Regulatory bodies around the world

The telecom regulatory authorities in some of the countries are for example listed as :

  • Afghanistan Telecom Regulatory Authority (ATRA) – Afganistan
  • Australian Communications and Media Authority (ACMA) – Australia
  • Bangladesh Telecommunication Regulatory Commission (BTRC) – Bnagaladesh
  • Canadian Radio-television and Telecommunications Commission (CRTC) – Canada
  • Ministry of Information Industry (MII) – China
  • Autorité de Régulation des Communications Électroniques et des Postes (ARCEP) – France
  • Bundesnetzagentur (BNA) – Germany
  • Telecom Regulatory Authority of India (TRAI) – India
  • Ministry for Communications and Informatization of the Russian Federation (Minsvyaz) – Russia
  • Infocomm Development Authority of Singapore (IDA) – Singapore
  • Independent Communications Authority of South Africa (ICASA) – south Africa
  • Federal Communications Commission (FCC) , National Association of Regulatory Utility Commissioners (regulators of individual states) (NARUC) , CTIA – The Wireless Association (CTIA) – USA

Such telecom regulatory bodies get to decide whether to enforce differential price to end consumers for using OTT so that telecom service providers can benefit or keep the Internet fair and open by passing Net Neutrality Laws and Bills and amendments .

what is Net Neaurality ?

The fundamental principle of Net Neurality is that Telecom Operators should not block , slow down or charge consumers extra for using other services as their means of communication. This states that it is wrong to charge users above the regular data rates for using VOIP apps and other internet based communication services.

The following counteries have adopted principles of Net Neutrality by passing bills or making law .

  • Chile – Chile’s General Law of Telecommunications, “No [ISP] can block, interfere with, discriminate, hinder, nor restrict the right of any Internet user of using, send, receive, or offer any content, application, or legitimate service through the Internet, as well as any activity or legitimate use conducted through the Internet.”
  • Brazil – ” Internet Bill of Rights ” makes equal access to internet mandatory in Brazil .
  • Netherlands – Even European Union has adopted Netherlands’ Net Neutrality amendment which reads “traffic should be treated equally, without discrimination, restriction or interference, independent of the sender, receiver, type, content, device, service or application.”
  • USA – Citizens make ‘We the People’ platform to ‘Restore Net Neutrality By Directing the Federal Communications Commission (FCC) to Classify Internet Providers as ‘Common Carriers‘. Therefore not allowing them to either throttle speed by paid prioritization , discriminate in pricing or block any broadband access to legal content .  Above facts are from this tech.firstpost.com article.

Inspite of the fact that I Support Net Neutrality with all my heart , as a telecom engineer I understand the cost investment made by Telecom operators in providing am efficient communication network to its subscribers ( Access , Network and Application layers ). Therefor I do have my sympathies with the Telcos and to level out the wide ranging conflict between Telcos and  ISP ( Internet Service Providers ) , I pen down the following points which reflect the Telecom Operators Problems and also highlight the solutions that can be adopted to counteract the OTT threat .

Depleting revenue for Telco

  1. Messaging – OTT messaging cost operators $13.9 billion, or 9% of message revenue in 2013
  2. Voice – Voice services under threat from VOIP services like Skype, Viber
  3. OTT apps – Voice & Message apps have been the operator’s biggest headache. Its time Operator should launch its own OTT Services
  4. Data Traffic – The utilization is yet to reach its peak. Will face challenges from  WiFi access
  5. Critical Pain areas – Erosion of Operator’s revenue from voice and (especially) messaging

Telco’s OTT aPPLICATION

At this stage it is crucial for a telecom Service provider / Operator to enter the Apps market and bring forth a Messenger which is more powerful , interactive and awesome than a OTT application.  Fortunately the Operator can always couple this application with his background telecom infrastructure to provide the edge in performance and functionalists .

Road block while developing a OTT application for a Telecom Service Provider :

  • Investment in Data Network is not being utilized due to lack of service
  • Reuse of Existing business Logic and extending the service reach across devices and networks is tough
  • Operator already has full fledged network Infrastructure in Place
  • Desire for minimum CAPEX while investing in new technologies
  • compete with OTT players and open new revenue streams is a challenge

Next we find the way of solving the problems and integrating them together to form a Solution .

OTT Application for Telecom Service provider

  • Introduce new services to benefit from investment on Data Plans and Bandwidth
  • Expose REST API to enable 3trd party Integration with existing network Infrastructure
  • Partner with individual OTT players to make new services  that do not compete on core competencies like billing etc
  • Use protocols like SIP that reduce CAPEX and have goto market more quickly
  • Go for enriched service that lead to better user experience

This writeup outlines the process of creating a OTT application for a Telecom Service Provider . Components for the application include cloud Address Book , Video Chatting , Location share , Contact synchronization ,REST based thin  client , OS and device agnostic etc shown in the figure below

telco's OTT app
telco’s OTT app

The Application  is designed to close knit with Operator’s own infrastructure hence the crucial entities like Network Address Book , Location Service are synced and fetched from Backend Network .

OTT application Feature Overview

Smart Address Book

  • Automatic: Get contacts from Gmail, Facebook
  • Fast search by first, last name, frequently
  •   dialed number
  • Roadmap: View calendar events
  • Personal: Get image from Gmail and display in   contacts list

Geo Location

  • Share own location during chatting
  • Get map for calculating the distance between two chat users
  • Roadmap : Trigger device (say Switch on/off AC before reaching home) from a threshold distance away from home   location

Messaging

  • Ad-hoc Chat
  • Session Based Chat
  • Voice Input for texting
  • Presence information of contacts
  • RoadMap: Legacy message integration

Telephony

  • Voice call to mobile
  • Voice call to PSTN
  • Video call to other @imAll user
  • Share images during voice call to other

Device agnostic

  • Compatible with IOS, windows
  • Can run as native app on ipad
  • Can run as browser client on windows
  • RoadMap: native app for android, windows phone,blackberry10

Roadmap

  • To upgrade the application and provide enganced and enrich service support the I propose the following roadmap.
  • From plain vanilla voice and video calling ( supported by every other OTT application ) our application should progress towards  legacy telecom support whihc included PSTN , GSM , ISDN etc . This requires backbone of telecom network and a good setup for media codec conversion to suit various legacy media codecs .

Road Map  from Traditional to New age services 

  1. Voice and video calling
  2. Legacy services support like MMS and SMS
  3. Integration with 3rd party Vendors
  4. Give new enriched services like Multilingual support , file transfer , screen-sharing etc
  5. give facility to integrated web plugins for web calling

To keep the interest of customers it is essential that the application be supported on other popular OTT services like skype  , Gtalk . for exmaple a caller should be able to make call from Skype  / Gtalk to our application .Multilingual capabilities, support for larger protocol spectrum will just act like icing on the cake .

How does it benefit the Operator??

  1.  Saves on development cost and time
  2.  Device Agnostic OTT Applications
  3. Simplified Service deployment
  4. Saves licensing cost per client
  5. Reuses existing Messaging and   Address Book service logic.
  6. Open New Revenue Streams for operator
  7. No separate SIP stack required for the client
  8.  Faster Time to Market

Update : At the time of writing this post I did not anticipate the wave of change that bring focus on subjects like “net neutrality” , ” Save the internet” and “free internet” . However I see now that I had described this phenomenon way in advance for my time .


Business Challenges for a telecom service provider

With the fast pace of telecom evolution both towards the access network front ( ie GSM , UMTS , 3G , 4G , LTE , VOLTE ) to core network side ( ie application servers , registrar , proxies , gateway , media server etc ) a CSP ( content service provider ) is trying hard to keep up with the user expectation . The user expects a plethora of services , reduced cost and high speed bandwidth . If this was not enough a CSP also has competition  OTT (   Over The Top ) Players who provide communication and messaging for FREE .

You can read on how OTT’s players are disruption the revenue streams of traditional telecom operators and how can Telco’s develop  their own OTT app , integrated with their backend system to answer to that challenge  here – OTT ( Over the Top ) Communication applications

The following points outline the major business challenges faced by telecom operators today .

Technology Evolution Challenges

  •  The increased data speeds and further more increasing hunger for the data overwhelms the existing network infrastructures.
  • Ensure uniform service experience across the network technologies to check the customer churn.
  • Access / Radio Technology independent delivery of services.
  • Enhance Reuse for exiting investments.

Multiple Service Platform Challenges

  • Typical network constitutes of Multiple Service Platforms increasing network complexity and integration challenges many fold.
  • Heterogeneous multiple SDP Solutions typically deployed to cater to Multiple Types of Networks/ Standards/Variants
  • Service Islands makes introduction of seamless services a challenging task for the CSP

Transport Upgrade and Convergence of Wireless Wireline

  • Retain investments in copper wire systems while migrating towards next generation Fiber Optic systems.
  • Severe competition among wire-line and wireless operators to provide latest services to retain subscriber base.
  • Fixed Mobile Convergence leading to a diminishing gap among the revenue shares of various operators in the space, and leading to losses for wire-line only players.