SIP application server (AS) text logs analysis may help in detection and, in some specific situations, prediction of different types of issues within a VoIP network. SIP server text logs contain the information which is difficult to obtain or even cannot be obtained from other sources, such as CDRs or signaling traffic captures.
The following parameters, among others, can help in estimating VoIP signaling network status:
Depending on signaling load, a SIP AS can generate up to several tens of gigabytes of logs in text format per day, that’s why analysis of the SIP AS text logs is time- and resource-consuming task. Pandas data frames (DF) can help in such analysis. Pandas provides powerful tools for working with large DFs. Maximization of large DF processing speed may be achieved, in particular, by vectorizing all operations applied to the DF. All of the code for this post is available on GitHub. See also https://datascienceplus.com/sip-text-log-analysis-using-pandas/
SIP text log file processing steps:
In this concrete case, the SIP DF contains the following columns:
Fig. 1. SIP DF example
Having such SIP DF, we can extract some amount of helpful information.
2. Request-response times (in ms) for transmitted INFO or INVITE requests
Fig. 3. Resp_Req_Time plots show approximately the same distribution of request-response times for INFO- and INVITE-transactions for the same groups of SIP peers. Request-response times > 500 ms point to retransmits. 500 ms is the default value for SIP T1 timer.
3. The number of retransmissions of INVITE or INFO requests.
Retransmit of a SIP request may be detected as a sequence of the transmitted SIP requests with the same Call-ID and SIP method and CSeq sequence number.
4. Request-response times (in ms) for received INFO-requests.
We cannot use Pandas
groupby operation in this case because of the following reasons:
One of possible solutions is splitting DF into two separate data frames df_req and df_resp. ‘Timestamp’, ‘Call ID’, ‘CSeq_num’, ‘SIP method’ columns are the same for both DFs, ‘TS_req’ and ‘TS_resp’ are unique for df_req and df_resp. ‘Call ID’ and ‘CSeq_num’ columns are necessary for further analysis of particular INFO-200 OK transactions.
Fig. 4. Request-response time count plot for received INFO messages
Pandas DFs may be used as an additional tool for obtaining helpful information from SIP logs. Some of the methods described in this post may be used to analyze text log files of other protocols based on the request-response model.