JDFSL V9N2
This work is licensed under a Creative Commons Attribution 4.0 International License.
ABSTRACT
This paper presents a fast multi-stage method for on-line detection of RTP streams and codec
identification of transmitted voice or video traffic. The method includes an RTP detector that filters
packets based on specific values from UDP and RTP headers. When an RTP stream is successfully
detected, codec identification is applied using codec feature sets. The paper shows advantages and
limitations of the method and its comparison with other approaches. The method was implemented
as a part of network forensics framework NetFox developed in project SEC6NET. Results show
that the method can be successfully used for Lawful Interception as well as for network monitoring.
c 2014 ADFSL Page 101
JDFSL V9N2 Fast RTP Detection and Codecs Classification ...
In this work we focus on the detection of It means that the PT value in the RTP header
RTP protocols without knowing signalization, cannot be used without validation as a unique
i.e., without knowing L4 identification of RTP codec identifier. That is the reason why our
traffic (ports). Our RTP detector identifies and codec classifier implements advanced codec clas-
filters RTP packets on the fly using selected sification using a specific feature set. This
values from IP and UDP headers. Since RTP method employs specific features of RTP pack-
uses dynamic ports on L4, other UDP traffic ets related to codecs, e.g., length of the packet,
can be mistaken for RTP transmission. Our time delay between two adjacent RTP packets,
results prove that with a sufficient number of and typical patterns of the payload in the codec
RTP packets per stream we are able to detect classification.
an RTP stream with high probability in real
1.1 Contribution
time. If the minimum number of RTP packets
per stream is set to 10, the probability of false The main contribution of our work is a fast
positives comes near to zero depending on the method for RTP detection and codec classifi-
type of traffic. cation that combines several approaches in or-
Another part of our work deals with the classi- der to detect a RTP stream and its codec with
fication of codecs used to encode multimedia con- high probability on the fly. This is especially
tents encapsulated in a RTP stream. Common valuable for high-speed networks where fast and
audio and video codecs can be identified using a less resource-demanding processing is strongly
payload type (PT) value from the RTP header. required. Another contribution related to our
This value corresponds to RTP Audio/Video work is a creation of an annotated reference
Profiles as defined in RFC 3551 (Schulzrinne dataset that contains real RTP streams encap-
& Casner, 2003). However, there are two lim- sulating multimedia payload encoded using dif-
itations. First, RTP Audio/Video Profiles list ferent codecs. The dataset was generated in
only well-known codecs with a static PT value. order to test our application for accuracy and
Such codecs are called static codecs. Many RTP for the comparison with other tools. The dataset
streams transmit audio and video data encoded is freely available to researchers who work with
using dynamic codecs where a value of the codec RTP detection and codec classification2 . It con-
is not standardized and signalling protocols like tains 60 annotated PCAPs (Packet Capture File
SIP/SDP, or H.323 are needed to inform com- Format) with RTP streams encoded using dif-
municating sites about the codec type. The lat- ferent codecs and sampling frequencies.
ter limitation reflects the fact that open source
1.2 Structure of the Paper
codecs used in VoIP softphones and hard phones
have different PT numbers depending on imple- The remainder of the paper is structured as fol-
mentation even for the same codec, see Table lows. Section 2 gives a short overview of re-
1. lated work in the area of RTP detection and
This table shows PT types of common audio codecs classification. It discusses advantages
and video codecs in soft phones (Ekiga, X-lite, SJ and limitations of current methods and compares
Phone), hard phones (Well T20, Linksys WRP them with our approach. Section 3 describes our
400-G2), and video terminal software Polycom method of RTP detection and codec classifica-
PVX. You can see that for static codecs like tion. This section shows how payload type (PT)
G.711 PCM A-law, -law, GSM, G.722, H.261, in RTP header should be understood in terms
H.263, or H.264 payload type can be determined of classification and how PT values of dynamic
using a PT value in RTP header. However, for codecs depend on the end-point application or
dynamic codecs like iLBC, Speex, or G.726 PT device. Section 4 shows how our tool RTPinfo
value can differ. Even the same software (Ekiga) 2
Codecs Database is available for download at
differs in versions under Windows (Ekiga W) or https://nes.fit.vutbr.cz/ansa/pmwiki.php?n=Main
under Unix (Ekiga U). .Codecs.
Page 102
c 2014 ADFSL
Fast RTP Detection and Codecs Classification ... JDFSL V9N2
Codec Ekiga (W) Ekiga (U) X-lite SJ Phone Well T20 Linksys PVX
G.711 -law 0 0 0 0 0 0 0
G.711 A-law 8 8 8 8 8 8 8
iLBC 110 116 98 97,98 - - -
Speex-16 125 113 100,106 - - - -
Speex-8 124 112 97,105 110 - - -
GSM 3 3 3 - - - -
MS-GSM 106 123 - - - - -
G.721 - - - - - - 101,102,103
G.722 9 9 - - 9 - 9
G.726-16 105 122 - - - - -
G.726-24 104 121 - - - - -
G.726-32 103 120 - - - 98 -
G.726-40 102 119 - - - - -
G.728 - - - - - - 15
G.729 - - - - 18 99 18
H.261 31 31 - - - - 31
H.263 34 - 34 - - - 34
H.263-98 108 - 115 - - - 96
H.264 109 - - - - - 109
was tested on well-known classified data. The 2003). Unfortunately their paper does not reveal
result is compared with other tools (Wireshark, how their approach is accurate. According to
PacketScan, Cisco nBAR). The last part of the our tests, per-packet RTP checking can be inac-
paper summarizes our results and discusses di- curate for short RTP streams where possibility
rections for the future work. of false positives rises. Thus, in our approach
we implement a two-stage detection combined
with a codec classification. The first stage of
2. RELATED WORK the detection works on per-packet basis using a
The classification of network applications, includ- similar algorithm as described by Costeux et al.
ing RTP detection, was explored by research (n.d.), however we use different filters to check
teams (Costeux, Guyard, & Bustos, n.d.) or if a packet is RTP. The second stage of our de-
(Zhang et al., 2008), as well as by commercial tection validates previous results using per-flow
companies (Cisco Systems, 2002). These ap- checking that minimizes false positives and false
proaches mostly use pattern-based methods, i.e., negatives.
packet classification using a check of selected
field in IP, UDP, and RTP headers, or well- Another part of our work is codec identifica-
defined patterns in RTP payload. Cisco Net- tion in detected RTP streams. There are many
work Based Application Recognition (nBAR) algorithms for the audio and video codec iden-
(Cisco Systems, 2002) looks deeper into RTP tification proposed in research papers. Most
header and successfully classifies RTP pack- of these algorithms is based on the machine
ets based on multiple attributes in the RTP learning and observation of statistical proper-
header rather than UDP port numbers. How- ties of codecs. These approaches usually require
ever, nBAR is not able to identify codecs in a set of features of the incoming stream that
RTP payload, as our tests showed. Costeux is compared with a set of training profiles us-
et al. (n.d.) use per-packet checking for RTP ing neural networks (Yargicocglu & Ilk, 2012),
identification. Their algorithm is based on val- chaotic sets (Hicsonmez, H.T.Sencar, & Avcibas,
idation check described in (Schulzrinne et al., 2011), etc. These methods give very precise
c 2014 ADFSL Page 103
JDFSL V9N2 Fast RTP Detection and Codecs Classification ...
Page 104
c 2014 ADFSL
Fast RTP Detection and Codecs Classification ... JDFSL V9N2
1. Only IPv4 or IPv6 datagrams with UDP this behavior violates the basic rules of audio
payload are permitted. transmission, it is valid for video traffic. Thus,
we decided to check only the number of packets
2. Source and destination ports of UDP must in RTP streams without losing the accuracy of
be higher than 1023. detection.
3. The length of an application header must Table 2 shows, how the correctly chosen min-
be at least minimal RTP header length ac- imum number of RTP packets per stream can
cording to CSRC Count (CC), i.e., higher decrease number of false positives and false neg-
than 12 + 4 CC bytes. atives. In the table, you can see four tests with
minimum packets per stream set to 100, 10, or 1.
4. RTP version must be 2. If minimum is set to 1, only per-packet checking
(i.e., the first stage of detection) is applied. You
5. RTP payload type must be within the range can see in Test no. 4, that there is an enor-
defined by RFC 3550. Packets with PT mous number of false positives in this case. For
type containing unassigned or reserved val- some inputs, per-packet checking can generate
ues are filtered out. inaccurate results. This is the reason, why the
6. If padding bit P is set, the last byte of the multi-stage detection was implemented. A naive
padding is checked with the total length approach would expect that the higher value of
of the packet. However, some devices (like minimum number of packets per stream (e.g.,
Tanberg Video Conferencing System) set P 100) is more efficient since shorter streams can-
bit but did not properly set the last byte of not transmit any useful video or audio records.
the padding. Thus, the padding bit filtering Nevertheless, our results in Table 2 reveal that
can be switched off in our tool. there can be real RTP streams with just few
packets and value 100 can be too high for them.
If a packet successfully passes all the above We can see that in Test no. 1 and 2, the con-
written filtering rules, it is marked as an RTP siderable number of false negatives was detected
packet. Then, the second stage of detection is because streams shorter than 100 RTP pack-
applied: RTP packets with the same source and ets were not labeled as an RTP stream. For
destination IP addresses, source and destination some kind of traffic, even value 10 is too high
ports and the same SSRC identifier are grouped to be the minimum of packets per stream. Our
together into a possible RTP stream using the tests showed that the optimal value for the min-
per-flow checking. If the number of packets in an imum packets per stream varies between 2 to 5
RTP stream is higher than a required minimum, depending on the input traffic.
the RTP stream is successfully detected. Other-
3.2 Codecs Classification Using a
wise, the packets are considered false negatives
Specific Feature Set
and labeled as non-RTP data.
There are also additional possibilities of the Similar to the previously mentioned approaches
per-flow checking like checking sequence num- (Yargicocglu & Ilk, 2012; Hicsonmez et al., 2011;
bers of packets, checking incrementation of Jenner & Kwasinski, 2012), we are able to de-
timestamps in adjacent packets, etc. However, tect a set of known codecs, that are specified in
these checks dont work properly with some Codec Mapper Table (CMT). If an RTP stream
video streams. If an inter-leaved video is trans- contains a payload with a codec whose features
mitted using RTP, RTP packets with the same are not specified in CMT, the codec is not recog-
timestamp would have different sequence num- nized and it is classified as unknown. Currently,
bers. Also, if two audio or video sources are we are able to classify about 20 common audio
mixed together, it can happen that a recently codecs and their variants. Classification of video
received packet could have an older timestamp codecs will be added in a new version of the tool.
than a previously received packet. Even though CMT table uses four distinguished features
c 2014 ADFSL Page 105
JDFSL V9N2 Fast RTP Detection and Codecs Classification ...
(payload type, time, payload size, and ratio). if it is G.723.1-5k codec or G.723.1-6k codec. If
By these features we are able to classify a codec a PT value falls into the dynamic codecs range,
of the RTP payload using RTP header values this first feature cannot be used as a unique
only. The set of specific features for the most codec identifier (see Table 1). In this case, the
common audio codecs is shown in Table 3. In the value is internally set to 1, it means a feature
following text we describe how these features are not used, and other features are tested.
determined and how the classification proceeds. The second feature of the codec classification
is time. time is given as the difference
3.2.1 Codec Features
between timestamps of adjacent packets i and
The first feature is a payload type. This fea- j, i.e., time = tj ti , where j = i + 1, tx is a
ture is extracted directly from RTP header field timestamp of packet x. time is computed for
PT. Our tests proved that static values of the each pair of adjacent RTP packets of the stream.
payload type as defined in standard RFC 3551 It should be the same for all RTP packet of the
(Schulzrinne & Casner, 2003) are constant for given RTP stream. If the value of time differs
end-point devices and can be used as a unique between any two adjacent packets of the RTP
feature to identify a corresponding audio or video stream, it means that the packetization time was
codec. For some codecs (e.g, G.723) the value of changed during RTP transmission. This may
PT points only to the codec category and does happen for video transmissions when adjacent
not say anything about sub-categories what is of- samples are very similar (static scenes) and the
ten important for the proper decoding. In VoIP, sender decides to change the sampling period
the missing information (e.g., sampling period) in order to safe bandwidth. Looking at audio
is included in signalling protocols like SIP/SDP. codecs, this was also observed for Silk codecs
For the classification without signalization, we only. In case of changing time value, the fea-
need to use additional features to classify the ture is set to zero and the feature is invalidated
codec sub-category precisely, e.g., to determine for the further classification.
Page 106
c 2014 ADFSL
Fast RTP Detection and Codecs Classification ... JDFSL V9N2
Payload size, or the voice payload size, con- also considered, like payload patterns. Payload
tains the length of RTP payload. This value patterns can be successfully used for the iden-
is mostly fixed for many codecs. However, the tification of most video codecs and some audio
payload size can be manually configured at end- codecs like GSM, G.723, or Speex. However, the
point devices for some codecs. This is typical current set of four features seems to be sufficient
for G.726 (all variants) and G.729b (if VAD is for the successful identification of most common
switched on, see * in CMT table). In these cases, audio codecs as our experiences show. The main
also time changes but the ratio between advantages of this method are its simplicity,
time and the payload size remains unchanged. quality of accuracy, and high performance for
Thus, ratio is further used as the fourth clas- large data volumes.
sification feature. Some codecs (for example
3.2.2 Codec Classification
Silk8, Silk16) use a variable payload size, so this
feature cannot be applied in their case. The process of classification is implemented in
ratio is an important feature especially for Stream Collector during per-stream checking.
codecs with a variable packetization period (e.g., The classification works as multi-stage filter us-
G.726 codecs). ratio is fixed because times- ing the set of codecs from CMT. Input RTP
tamp values in RTP packets are related to the streams are compared with each CMT entry.
packetization time and the payload size. So, When a match on the specific feature is found,
when the packetization period changes, times- the next feature is examined. Using four features,
tamps are changed too but ratio remains con- four steps of comparison are provided for each
stant as seen in Table 3 for G.726 codecs. This CMT entry during the processing. Then, the
ratio is also used to distinguish sub-categories best codec is selected according to the number
of G.723 as seen in that table. of features matched.
Using a specific feature set, we are able to This classification algorithm works with con-
uniquely detect at least 20 different audio codecs stant time complexity. Its input values are taken
with high probability. Additional features were only from RTP headers. The accuracy of the
c 2014 ADFSL Page 107
JDFSL V9N2 Fast RTP Detection and Codecs Classification ...
algorithm depends on the feature set specified in 4.2 Testing and Results
CMT. Adding a new codec to the CMT table is a
The testing was performed on all PCAP files
simple operation that includes (i) determination
from our RTP Datasets. In Wireshark configura-
of a specific feature set of the new codec, and
tion, decoding of RTP packets outside conversa-
(ii) insertion of a new entry into CMT.
tion was allowed in Edit/Preferences/Protocols.
Currently, our CMT contains 20 entries of PacketScan had a default configuration without
the most common audio codecs. Now, we are additional changes. The results of classification
working on adding video codecs to the database. tests are in Tables 5, 6, and 7 for each tested ap-
For some new codecs, it can be useful to add plication, i.e., Wireshark (W), PacketScan (P),
a new feature into CMT, e.g., payload pattern and our application RTPinfo (R). For each ap-
and offset of the pattern in the payload. Adding plication, two values are shown: a number of
a new feature is straightforward operation in our detected RTP packets (RTP) and a name of the
classifier because each filtering stage is indepen- identified codec (Codec).
dent.
In Table 5 we can see, that not all applica-
tions were able to detect RTP packets. Wire-
shark and RTPinfo detected the same number
4. RESULTS of RTP packets while PacketScan missed some
RTP packets. The number of false negatives of
RTP detection and codec classification was
PacketScan was 10. Concerning codec identifi-
tested on a RTP dataset that was created for the
cation, our tool RTPinfo (R) was the most accu-
project. The comparison with three other tools
rate. Each tool had problems with identification
for RTP detection and codec identification was
of Silk codec that uses a variable packetization
done: we used open source packet analyzer Wire-
time and payload size, see Table 3. RTP pack-
shark, professional analyzer PacketScan from GL
ets with Silk payload generated by Ekiga had
Communications, Inc., and Cisco nBAR feature
value PT=92. According to standard RFC 3551
on Cisco 2911 routers.
(Schulzrinne & Casner, 2003), this value is re-
4.1 RTP Dataset stricted and should not be used. Wireshark was
able to recognize all static codecs, however it did
For testing purposes, an annotated dataset con- not classify dynamic codecs correctly. A similar
taining RTP streams with known codecs was result was observed by PacketScan where static
created. To our best knowledge, we are not codecs classification was successful but the tool
aware of any available dataset with real VoIP did not match any dynamic codec properly.
communication encoded by different codecs. Similar results were detected for CME set, see
The RTP dataset was generated using the fol- Table 6. Wireshark and RTPinfo were able to
lowing tools: soft phone Ekiga (Ekiga Set), Cisco detect all RTP packets properly, Packet Scan
ISR router with Call Manager Express (CME missed some packets. All applications were able
Set), and IXIA XM2 tester (IXIA Set). Each to determine all codecs successfully.
generated set contains several RTP streams with The reason is that all codecs of this set are
signalling protocols (mostly SIP signalization). static with well-defined fixed PT value. We can
For the codec classification, we modified these see that RTPinfo was able to detect even sub-
RTP datasets so that all signalling packets were category of G.729 that is important for proper
removed from PCAP files. Thus, the classifi- decoding.
cation is based on processing of RTP packets Last tests were done using IXIA set, see Table
only. 7. Similarly to the previous tests, the number of
Our RTP dataset contains several PCAP files detected RTP packets by Wireshark and by RT-
with RTP streams transmitting a voice encoded Pinfo is correct. Again, PacketScan missed few
by following codecs, see Table 4. RTP packets. Wireshark was able to classify all
Page 108
c 2014 ADFSL
Fast RTP Detection and Codecs Classification ... JDFSL V9N2
File RTP (W) Codec (W) RTP (P) Codec (P) RTP (R) Codec (R)
pcma.pcap 822 G.711 A-law 812 G.711 A-law 822 G.711 A-law
pcmu.pcap 791 G.711 -law 781 G.711 -law 791 G.711 -law
speex8.pcap 804 Unknown 794 Speex 16kHz 804 Speex 8kHz
speex16.pcap 1015 Unknown 1005 iSAC 1015 Speex 16kHz
gsm.pcap 796 GSM 06.10 786 GSM 06.10 796 GSM 06.10
g722.pcap 736 G.722 726 G.722 736 G.722
g7221.pcap 846 Unknown 838 AMR-WB 846 G.722.1
g726-16.pcap 609 Unknown 599 G.726 40kbps 609 G.726 16kbps
g726-24.pcap 583 Unknown 573 EVRCC 583 G.726 24kbps
g726-32.pcap 588 Unknown 578 G.711 -law 588 G.726 32kbps
g726-40.pcap 623 Unknown 615 G.711 A-law 623 G.726 40kbps
amr-wb.pcap 717 Unknown 707 G.726 32kbps 717 AMR-WB
silk8.pcap 663 Unknown 653 Unknown 663 Unknown
silk16.pcap 631 Unknown 621 Unknown 631 Unknown
File RTP (W) Codec (W) RTP (P) Codec (P) RTP (R) Codec (R)
g711alaw.pcap 1245 G.711 A-law 1235 G.711 A-law 1245 G.711 A-law
g711ulaw.pcap 1379 G.711 -law 1369 G.711 -law 1379 G.711 -law
g729r8.pcap 2797 G.729 2787 G.729 2797 G.729a
g729br8.pcap 1291 G.729 1281 G.729 1291 G.729b
c 2014 ADFSL Page 109
JDFSL V9N2 Fast RTP Detection and Codecs Classification ...
File RTP (W) Codec (W) RTP (P) Codec (P) RTP (R) Codec (R)
alaw.pcap 48016 G.711 A-law 48006 G.711 A-law 48016 G.711 A-law
ulaw.pcap 48016 G.711 -law 48006 G.711 -law 48016 G.711 -law
g7231-5k.pcap 32012 G.723.1 32002 G.723.1 32012 G.723.1 5,3kbps
g7231-6k.pcap 32012 G.723.1 32002 G.723.1 32012 G.723.1 6,3kbps
g726-16.pcap 96032 Unknown 96022 Unknown 96032 G.726 16kbps
g726-24.pcap 96032 Unknown 96022 Unknown 96032 G.726 24kbps
g726-32.pcap 96032 Unknown 96022 Unknown 96032 G.726 32kbps
g726-40.pcap 96032 Unknown 96022 GSM-EFR 96032 G.726 40kbps
g729.pcap 48016 G.729 48006 G.729 48016 G.729/G.729a
g729a.pcap 48016 G.729 48006 G.729 48016 G.729/G.729a
g729b.pcap 40882 G.729 40872 G.729 40882 G.729b
amr-12.pcap 48016 Unknown 48006 G.726 24kbps 48016 AMR 12,2kbps
Page 110
c 2014 ADFSL
Fast RTP Detection and Codecs Classification ... JDFSL V9N2
prior identification of the specific feature set Cisco Systems, I. (2002). Network Based Ap-
characterizing different RTP codecs properties. plication Recognition RTP Payload Clas-
Feature values for different codecs are organized sification (White Paper). Cisco Systems,
in the Codec Mapper Table structure, which is Inc.
optimized for quick identification of candidate Costeux, J.-L., Guyard, F., & Bustos, A.-M.
codecs of an analyzed RTP stream. (n.d.). Detection and comparison of RTP
The benefit of the method lies in its simplicity and skype traffic and performance. In
that enables cheap and efficient implementation IEEE Global Telecommunications Confer-
suitable for fast on-line traffic monitoring. Our ence, GLOBECOM06 (pp. 1--5).
method results from an observation that RTP Hicsonmez, S., H.T.Sencar, & Avcibas, I. (2011,
streams have certain characteristics distinguish- Nov). Audio codec identification through
ing RTP traffic from other communication. payload sampling. In IEEE International
The presented method is usable in network Workshop on Information Forensics and
monitoring applications and network forensics. Security (WIFS), 2011 (pp. 1--6).
With the increase of the VoIP communication Jenner, F., & Kwasinski, A. (2012, March).
a tool that precisely measures characteristics of Highly accurate non-intrusive speech foren-
VoIP traffic would greatly help administrators sics for codec identifications from observed
in the network management. The method can decoded signals. In IEEE International
be also useful during network forensic analysis Conference on Acoustics, Speech and Sig-
in situations when VoIP signalization is missing nal Processing (ICASSP), 2012 (pp. 1737-
or corrupted. -1740).
Perkins, C. (2003). RTP: Audio and Video for
ACKNOWLEDGEMENTS the Internet. Addison-Wesley.
Schulzrinne, H., & Casner, S. (2003, July). RTP
Research in this paper was supported by project Profile for Audio and Video Conferences
Modern Tools for Detection and Mitigation of with Minimal Control (RFC 3551).
Cyber Criminality on the New Generation Inter- Schulzrinne, H., Casner, S., Frederick, R., &
net, no. VG20102015022 granted by Ministry of Jacobson, V. (2003, July). RTP: A Trans-
the Interior of the Czech Republic and an inter- port Protocol for Real-Time Applications
nal University project Research and application (RFC 3550).
of advanced methods in ICT, no. FIT-S-14-2299 Yargicocglu, A. U., & Ilk, H. G. (2012). Speech
granted by Brno University of Technology. Coder Identification Using Chaotic Fea-
tures Based On Steganalyzer Models. Com-
REFERENCES munications, Zilina, Slovakia, 12 , 63--69.
Bestagini, P., Allam, A., Milani, S., Tagliasac- Zhang, G., Xie, G., Yang, J., Min, Y., Zhou, Z.,
chi, M., & Tubaro, S. (2012). Video codec & Duan, X. (2008, Jan). Accurate Online
identification. In IEEE International Con- Traffic Classification with Multi-Phases
ference on Acoustics, Speech and Signal Identification Methodology. In 5th IEEE
Processing (ICASSP), 2012 (pp. 2257-- Consumer Communications and Network-
2260). ing Conference (pp. 141--146).
c 2014 ADFSL Page 111
JDFSL V9N2 Fast RTP Detection and Codecs Classification ...
Page 112
c 2014 ADFSL