Anda di halaman 1dari 6

Voice over Internet Protocol (VoIP)

By: Zhaoyang Dong (2005)

Introduction
Circuit switching was traditionally used for carrying voice traffic, it was designed for voice and does its

job extremely well. However, circuit switching is not particularly suitable for much new world of multimedia communications. IP become an attractive alternative for many reasons, including its widespread availability and easy integration with advanced services, etc. Voice over Internet Protocol (VoIP) is simply the transport of voice traffic by using the Internet Protocol (IP). However, IP was not originally designed to carry voice or similar real-time interactive media. In contrast, IP was designed for data, and it can tolerate delay. So, it is a great challenge for IP to be used for voice with steady high quality. This report describes how the Internet is used for telephony. In particular, this report attempts to ask and answer the questions like: How the real-time constraints are handled? How to locate a user who could be anywhere in the world? In the remainder of this report, I begin with the discussion of the protocols that help make real-time transport possible on top of IP. Section 3 focuses on voice coding techniques. Section 4 discusses the H.323 architecture and protocol suite. Section 5 discusses the Session Initiation Protocol (SIP) and we conclude in section 6.

Transporting Voice by Using IP


When voice is to be carried on IP, UDP is used rather than TCP. The reason is related to the character-

istics of speech itself. In a conversation, the occasional loss of one or two packets of voice is not a catastrophe, because modern voice-coding algorithms have the capability to recover from occasional losses. On the other hand, voice is extremely delay-sensitive. The connection setup routine in TCP and TCPs acknowledgement routines introduce delays. Even worse, in the case of lost packets, TCP will cause retransmission and thereby introduce even more delays. Tolerating some packet loss is far better than introducing delays in voice transmission. Though UDP is better than TCP for voice traffic, it was not designed with voice traffic in mind and has some shortcomings when used for real-time applications. The Real-Time Transport Protocol (RTP) [1], which operates on top of UDP, was introduced for the support of real-time applications. UDP does nothing in terms of avoiding packet loss or even ensuring ordered delivery. RTP packets include a sequence number, so that the application using RTP can at least detect the occurrence of lost packets and can ensure that received packets are presented to the user in the correct order. RTP also include a timestamp that corresponds to the time at which the packet was sampled from its source media system. The destination application can use this timestamp to ensure synchronized play-out to the destination user and to calculate delay and jitter. Jitter is delay variation, and can be alleviated through the use of jitter buffers. RTP carries the actual digitally encoded voice by taking one or more digitally encoded voice sample and attaching an RTP header so that we have RTP packets, comprising an RTP header and a payload of the voice samples. Given that there are many different voice- and video-coding standards, RTP includes a payload type identifier in the RTP header for the receiving end to know which coding standard is being used so that the payload can be correctly interpreted. The interpretation of the payload type number is specified in RFC 1890 [6]. RTP does not reserve resources and does not guarantee quality of service. RTCP, a companion protocol of RTP, provides a number of messages that provide feedback regarding the quality of the session. The type of information includes details such as the number of lost RTP packets, delays, and inter-arrival jitter. Real-time traffic sometimes requires priority treatment to achieve good performance, this is known as QoS, which can be achieved by managing router queues and by routing traffic around congested parts of

the network. Many QoS solutions have been developed, exemplified by the Resource-Reservation Protocol (RSVP) [2], Differentiated Service (DiffServ)[3], and Multi-Protocol Label Switching (MPLS)[4][5].

Speech Coding
RTP header includes a payload type, which specifies the coding algorithm used. Speech coding is sim-

ply the process by which a digital stream of ones and zeros is made to represent an analog voice waveform. There are many coding algorithms available for digitizing speech, and the choice of coding scheme is a balance between quality and cost. Table 1 gives some of the characteristics of a few standard codecs, which are detailed in [8,9,10,11,12], and quantitative results are given in [7]. Table 1. Characteristics of Several Voice Codecs

H.323
Previous sections describe that voice is carried in RTP packets between session participants. We asCodec Algorithm
PCM ADPCM LD-CELP

Frame Size/ Lookahead


0.125ms/0 0.125ms/1.5ms 0.125ms/0 0.625ms/0

Usual Rate
64Kb/s 48,56 or 64Kb/s 32Kb/s 16Kb/s Universal use

Comments

G.711 G.722 G.726 G.728

Wideband coder High quality, low complexity High quality in tandem; Recommended for cable

G.729(A) G.729e

CS-ACELP Hybrid CELP

10ms/5ms 10ms/5ms

8Kb/s 11.8Kb/s

Widespread use High quality/complexity; Recommended for cable

G.723.1(6.3) G.723.1(5.3) IS-127 AMR

MPC-MLQ ACELP RCELP ACELP

30ms/7.5ms 30ms/7.5ms 20ms/5ms 20ms

6.3Kb/s 5.3Kb/s Var.4.2Kb/s Va.4.75-12.2Kb

Video conference origin Video conference origin Compatible w.No.Amer. & Japanese digital cellular, WCDMA(not CDMA2000);Nokia IPR

sume that session participants know of each others existence and that media sessions are somehow created such that they can exchange voice by using RTP packets. So how are those sessions created and ended? How does one party indicate to another party a desire to set up a call, and how does the second party indicate a willingness to accept the call. The key is signaling. The first VoIP systems used proprietary signaling protocols. The immediate drawback was that two users could communicate only if they both used systems from the same vendor. In response to this problem, the International Telecommunications Union (ITU)-T recommendation H.323 originated in 1996 and later revised in 1998 [13], which served as a standardized signaling protocol for VoIP. H.323 specifies an overall architecture and methodology and that incorporates several other recommendations, among them, H.225.0 [14] and H.245 [15] are most important. The architecture of H.323 is illustrated in Fig.1. The architecture involves H.323 terminals, gateways, and gatekeepers. A terminal is an endpoint that offers real-time communication with other H.323 endpoints. Typically, this terminal is an end-user communication device that supports at least one audio codec and might optionally support other audio codecs and/or video codecs. Gateway connects the Internet to the telephone networks. It is responsible for the translation between audio and video codec formats. And it also responsible for the mapping of signaling messages from the packet side of the network to other networks. The gatekeeper is a device that controls the terminals under its jurisdiction. The collection of a gatekeeper and the terminals registered with it is called a zone.

Fig.1. H.323 architectural model for VoIP

Table 2 shows the H.323 protocol stack. The actual signaling messages exchanged between H.323 entities are specified by H.225.0 and H.245. H.225.0 is a two-part protocol. One part is effectively a variant of ITU-T recommendation Q.931 [16]. This part is used for the establishment and tear down of connections between H.323 endpoints. This type of signaling is known as call signaling or Q.931 signaling. The other part of H.225.0 is known as Registration, Admission, and Status (RAS) signaling. This signaling is used between endpoints and gatekeepers and enables a gatekeeper to manage the endpoints within its zone. H.245 is a control protocol used between two or more endpoints. The main purpose of H.245 is to manage the media streams between H.323 session participants. To that end, H.245 includes functions such as ensuring that the media to be sent by one entity is limited to the set of media that can be received and understood by another. H.245 operates through the establishment of one or more logical channels between endpoints. Table 2. H.323 protocol stack
Audio/Video Application Audio/Video Codecs RTP UDP IP Data link protocol Physical layer protocol RTCP Terminal /Application Control H.225.0 RAS Signaling H.225.0 Call signaling H.245 Control Signaling TCP

Now, lets consider a case of a PC terminal on a LAN with a gatekeeper calling a remote telephone. The PC first has to find the gatekeeper, so it broadcasts a UDP gatekeeper discovery packet to port 1718. When the gatekeeper responds, the PC learns the gatekeepers IP address. Now the PC registers with the gatekeeper by sending it a RAS message in a UDP packet. After it has been accepted, the PC sends the gatekeeper a RAS admission message requesting bandwidth. Only after bandwidth has been granted may call setup begin. The method of requesting bandwidth in advance is to allow the gatekeeper to limit the number of calls to avoid oversubscribing the outgoing line in order to help provide the necessary quality of service. When it has bandwidth allocated, the PC can send a Q.931 SETUP message over the TCP connection to the gatekeeper to begin call setup. Because Call setup uses existing telephone network protocols, which are connection oriented, so TCP is needed. This message specifies the number of the telephone being called (or the IP address and port, if a computer is being called). The gatekeeper responds with a Q.931 CALL PROCEEDING message to acknowledge correct receipt of the request. The gatekeeper then forwards the SETUP message to the gateway. The gateway then makes an ordinary telephone call to the desired telephone. The end office to which the telephone is attached rings the called telephone

and also sends back a Q.931 ALERT message to tell the calling PC that ringing has begun. When the person at the other end picks up the telephone, the end office sends back a Q.931 CONNECT message to signal the PC that it has a connection. Once the connection has been established, the gatekeeper is no longer in the loop. Subsequent packets bypass the gatekeeper and go directly to the gateways IP address. The H.245 protocol is now used to negotiate the parameters of the call. It uses the H.245 control channel, which is always open. Each side starts out by announcing its capabilities. Once each side knows what the other one can handle, two unidirectional data channels are set up and a codec and other parameters assigned to each one. After all negotiations are complete, data flow can begin using RTP, which is managed using RTCP.

SIP
Many people in the Internet community think H.323 is inherently complex, has overheads and thus in-

efficient for VoIP. Consequently, SIP (Session Initiation Protocol) [17] has been designed by keeping the Internet in mind, because it avoids both the complexity and extensibility pitfalls. SIP is designed to be a part of the overall Internet Engineering Task Force (IETF) multimedia data and control architecture. As such, RIP is used in conjunction with several other protocols, such as the Session Description Protocol (SDP) [18], and the Real-Time Streaming Protocol (RTSP) [19]. Many believe that SIP, in conjunction with MGCP [20,21] or MEGACO [22] will be the dominant VoIP signaling architecture in the future. SIP is a control protocol similar to HTTP. It is a protocol that can set up and tear down any type of sessions. SIP uses a URI to identify a logical destination, not an IP address. The address could be a nickname, an e-mail address (e.g., sip: zdong@epcc.ed.ac.uk), or a telephone number. In addition to setting up a phone call, SIP can notify users of events, such as I am online, a person entered the room, or e-mail has arrived. SIP can also be used to send instant text messages. Using a client-server model, SIP defines logical entities that may be implemented separately or together in the same product. Clients send SIP requests, whereas servers accept SIP requests, execute the requested methods, and respond. The SIP specification defines six request methods: REGISTER allows either the user or a third party to register contact information with a SIP server. INVITE initiates the call signaling sequence. ACK and CANCEL support session setup. BYE terminates a session. OPTIONS queries a server about its capabilities. SIP has some important functional entities: User Agent, Proxy Server, Registrar Server, Location Server and Redirect Server. User agent performs the functions of both a user agent client, which initiates a SIP request, and a user agent server, which contacts the user when a SIP request is received and returns a response on behalf of the user. SIP proxy acts as both a SIP client and a SIP server in making SIP requests on behalf of other SIP clients. A SIP proxy server may be either stateful or stateless. A proxy server must be stateful to support TCP, or to support a variety of services. However, a stateless proxy server scales better (supports higher call volumes). Registrar is a SIP server that receives, authenticates and accepts REGISTER requests from SIP clients. It may be collocated with a SIP proxy server. Location server stores user information in a database and helps determine where (to what IP address) to send a request. It may also be collocated with a SIP proxy server. Redirect server is stateless. It responds to a SIP request with an address where the request originator can contact the desired entity directly. It does not accept calls or initiate its own requests. How the SIP locates the callee (who may not be at his home machine) and handles the mechanics of call setup and termination? Firstly, telephone numbers in SIP are represented as URLs using the sip scheme, for example, sip:bob@epcc.ed.ac.uk for a user named bob at the host specified by the DNS name

epcc.ed.edu. SIP URLs may also contain IPv4 addresses, IPv6 address, or actual telephone numbers. As mentioned previously, the SIP protocol is a text-based protocol modeled on HTTP. One party sends a message in ASCII text consisting of a method name on the first line, followed by additional lines containing headers for passing parameters. Now, we assumed that zoys SIP device only knew the email address, bob@epcc.ed.ac.uk, and that this same address is used for SIP-based calls. It is impossible for Zoy to know bobs IP address, for not only IP addresses are often dynamically assigned with DHCP, but also because Bob may have multiple IP devices (for example, different devices for his home, office, and car). In this case, Zoy needs to obtain the IP address of the device that the user bob@epcc.ed.ac.uk is currently using. She first creates an INVITE message that begins with INVITE bob@epcc.ed.ac.uk SIP/2.0 and sends this message to an SIP proxy to hide the possible redirection. The proxy will respond with an SIP reply that might include the IP address of the device that bob@epcc.ed.ac.uk is currently using. Now, you might wonder, how can the proxy server determine the current IP address for bob@epcc.ed.ac.uk? The key is Registrar. Every SIP user has an associated registrar. Whenever a caller launches an SIP application on a device, the application sends an SIP register message to the registrar, informing the registrar of its current IP address. Bobs registrar keeps track of Bobs current IP address. Whenever Bob switches to a new SIP device, the new device sends a new register message, indicating the new IP address. Though Bob may remains at the same device for an extended period of time, the device still send refresh register message, showing that the most recently sent IP address is still valid. Note that the registrar is analogous to a DNS authoritative name server. The SIP registrar translates fixed human identifiers such as bob@epcc.ed.ac.uk to dynamic IP addresses. Often SIP registrars and SIP proxies are run on the same host. Now, lets examine how Zoys SIP proxy server obtains Bobs current IP address. From the preceding discussion we see that proxy server simply needs to forward Zoys INVITE message to Bobs registrar/proxy. The registrar/proxy could then forward the message to Bobs current SIP device. It then acts as a relay for the subsequent messages.

Fig.2 SIP, involving proxies and registrars As an example, consider Fig.2, in which bob@a.edu, currently working on 217.123.56.89, wants to initiate a voice-over-IP session with kate@b.edu, currently working on 197.87.54.21. The following steps are taken: (1) bob sends an INVITE message to the a.edu SIP proxy. (2) The proxy does a DNS lookup on the SIP registrar server. (3) Because kate@b.edu no longer registered at the b.edu registrar, then the registrar sends a redirect response, indicating that it should try kate@c.edu. (4) The a.edu proxy sends an INVITE to the c.edu SIP registrar. (5) The c.edu registrar knows the IP address of kate@c.edu and forwards the INVITE to the host 197.87.54.21, which is running kates SIP client. (6-8) A SIP response is sent back through registrars/proxies to the SIP client bob on 217.123.56.89. (9) Media is sent directly between the two clients (there is also an SIP acknowledgment message, which is not shown.)

Conclusion
In this report, we have briefly described the VoIP techniques, including how to transport real-time

voice using IP, various speech-coding techniques, and two signaling protocols (H.323 and SIP). However, due to space limitations, many techniques are only given their names without further explanations, such as various QoS techniques, MGCP and Megaco, which use a master-slave control-signaling paradigm. And there are many interesting and important topics that are not covered in this short report, such as the VoIP issues with NAT and firewalls.

References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. IETF RFC 1889, 1996. R. Braden, L. Zhang, S. Berson, S. Herzog, and S. Jamin. Resource reservation protocol (RSVP) version 1 functional specification. IETF RFC 2205, 1997. D. Black, S. Blake, M. Carlson, E. Davies, Z. Wong, and W. Weiss. An architecture for differentiated services. IETF RFC 2475, 1998. E. Rosen, A. Viswanathan, R. Callon. Multiprotocol Label Switching Architecture. IETF RFC 3031, 2001. F. Le Faucheur et al. MPLS support of differentiated services. IETF RFC 3270,2002. H. Schulzrinne, S. Casner. RTP Profile for Audio and Video Conferences with Minimal Control. IETF RFC 1890, 1996. M. Perkins, K. Evans, D. Pascal, and L. Thorpe. Characterizing the subjective performance of the ITU-T 8 kb/s speech-coding algorithm ITU-T G.729. IEEE Commun. Mag., vol. 35, pp. 7481, Sept. 1997. ITU-T. Pulse code modulation (PCM) of voice frequencies. Recommendation G.711. 1988. ITU-T. 40,32,24,16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). Recommendation G.726. Dec 1990. ITU-T. Coding of speech at 16 kbits using low-delay code excited linear prediction. Recommendation G.728. Dec. 1990. ITU-T. Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. Recommendation G.723.1. Mar. 1996. ITU-T. Coding of speech at 8 kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP). Recommendation G.729. Mar. 1996. ITU-T. Packet-based multimedia communications systems. Recommendation H.323. Feb.1998. ITU-T. Call Signaling protocols and media stream packetization for packet-based multimedia communication systems. Recommendation H.225.0. Feb.1988. ITU-T. Control protocol for multimedia communication. Recommendation H.245. Sep. 1988. ITU-T. ISDN user-network interface layer 3 specification for basic call control. Recommendation Q.931.1993. J. Rosenberg, H. Schulzrinne, Camarillo, Johnston, Peterson, Sparks, Handley, and Schooler. SIP: Session initiation protocol v.2.0. IETF RFC 3261, 2002. M. Handley and V. Jacobson. SDP: Session description protocol. IETF RFC 2327, 1998. H. Schulzrinne, et al. Real Time Streaming Protocol (RTSP). IETF RFC 2326, Apr.1998. M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett. Media gateway control protocol (MGCP) Version 1.0. IETF RFC 2705, 1999. N. Greene, M. Ramalho, and B. Rosen. Media gateway control protocol architecture and requirements. IETF RFC 2805, 2000. F. Cuervo, N. Greene, A. Rayhan, C. Huitema, B. Rosen, and J. Segers. Megaco Protocol Version 1.0. IETF RFC 3015, 2000. James F.Kurose Keith W.Ross. Computer Networking. Third edition. Andrew S. Tanenbaum. Computer Networking. Fourth edition.

Anda mungkin juga menyukai