stream. Figure shows DSPs that are used for mixed-mode conferencing. A DSP that is used for mixed-mode conferences allows the conference participants to use different codecs. In this case, the DSP not only mixes streams with the same codec type but can mix streams of different codec types. The DSP also provides transcoding functions. Because of this additional functionality, mixed-mode conferences are more DSP-intensive and support fewer conferences than in single mode. A DSP that is used for single-mode conferences supports only one codec that all conference participants must use. In this case, the DSP can mix streams with the same codec type only. If devices with different codecs join the conference, transcoding is required using separate DSPs. Transcoding services allow two devices using different codecs to exchange voice information. As seen in the previous example, this can be the case if conference resources support only single-mode conferences but participants use different codecs. Figure shows a voice-mail system located in the headquarters of a company. The voice-mail system only uses G.711. The company has a branch office that connects to the headquarters via an IP WAN. To conserve bandwidth, the WAN only permits G.729. If users from the branch access the voice-mail system, they can use only G.729 toward the headquarters, but the voice-mail system requires G.711. DSPs in the headquarters router provide transcoding services to solve the problem of two different standards. Calls from the branch to the voice-mail system set up a G.729 stream to the transcoding device (headquarters router), which transcodes the received G.729 stream into a G.711 stream toward the voice-mail system.

Content 2.3 Encapsulating Voice Packets for Transport 2.3.1 Voice Transport in Circuit-Switched Networks In PSTN environments, residential telephones connect to central office (CO) switches on analog circuits. The core network is composed of switches that are interconnected by digital trunks, as illustrated in Figure . When a caller places a call to a second telephone, the call setup stage occurs first. This sets up an end-to-end dedicated circuit (DS-0) for the call. The CO switch then converts the received analog signals into digital format using the G.711 codec. During the transmission stage, the synchronous transmission sends G.711 bits at a fixed rate with a very low but constant delay. The circuit dedicates the whole bandwidth (64 kbps) to the call, and because all bits follow the same path, all voice samples stay in order. When the call finishes, switches release the individual DS-0 circuits, making them available for use by other calls.

Content 2.3 Encapsulating Voice Packets for Transport 2.3.2 Voice Transport in IP Networks In VoIP networks, analog telephones connect to VoIP gateways through analog interfaces. The gateways connect through an IP network, as shown in Figure . IP phones connect to switches, and the switches in turn connect directly to routers. When a caller places a call from one telephone to another telephone, the call setup stage sets the call up logically, but no dedicated circuits (lines) are associated with the call. The gateway then converts the received analog signals into digital format using a codec, such as G.711 or G.729 with voice compression. During the transmission stage, voice gateways insert voice packets into data packets and then send the data packets, one by one, out to the network. The bandwidths of the links between the individual routers are not time division multiplexed into separate circuits but are single high-bandwidth circuits, carrying IP packets from several devices. As shown in Figure , data and voice packets share the same path and the same links. Voice packets enter the network at a constant rate (which is lower than the physical line speed, leaving space for other packets). However, the packets may arrive at their destination at varying rates. Each packet encounters different delays on the route to the destination, and packets may even take different routes to the same destination. The condition where packets arrive at varying, unpredictable rates is called jitter. For voice to play back accurately, the destination router has two tasks to complete. The router must reinsert the correct time intervals and must ensure that packets are in the correct order. After the call is complete, the gateway that ended the call (the caller who hung up first) logically tears down the call and stops sending voice packets onto the network.

Content 2.3 Encapsulating Voice Packets for Transport 2.3.3 Protocols Used in Voice Encapsulation IP is not well suited to voice transmission. Real-time applications such as voice and video require a guaranteed connection with consistent and predictable delay characteristics. IP does not guarantee reliability, flow control, error detection, or error correction. The result is that packets (or datagrams) can arrive at the destination out of sequence or with errors or not arrive at all. Two transport layer protocols are available to help overcome the inherent weaknesses of IP. Both TCP and UDP enable the transmission of information between the correct processes (or applications) on host computers. These processes are associated with unique port numbers (for example, the HTTP application is usually associated with port 80). However, only UDP is suitable for VoIP applications. TCP offers both connection-oriented and reliable transmission. TCP establishes a communications path prior to transmitting data. TCP handles sequencing and error detection to ensure that the destination application receives a reliable stream of data. However, voice is a real-time application. If a voice packet becomes lost, a TCP retransmission triggered by the expiration of a retransmission timer arrives too late for an effective re-transmission of that voice packet. In such a situation, it is better to lose a few packets (which briefly degrades quality) rather than to resend the packet seconds later. When using VoIP, it is more important that packets arrive at the destination application in the correct sequence and with predictable delay characteristics than packets not arriving at all. UDP, like IP, is a connectionless protocol. UDP routes data to its correct destination port but does not attempt to perform any sequencing or to ensure data reliability. The timing, or rather the relative timing, that VoIP devices require to reassemble packets is also important. For example, jitter comes from a variation in delay times that individual packets in the data stream experience. To reduce the effects of jitter, VoIP can buffer data at the receiving end of the link so that the data plays out at a constant rate. Two protocols, Real-time Transport Protocol (RTP) and RTP Control Protocol (RTCP) handle these tasks:

RTP transports the digitized samples of real-time information.
RTCP provides feedback on the quality of the transmission link.

Note
Note that RTP and RTCP do not reduce the overall delay of the real-time information. Nor do they make any guarantees concerning quality of service. RTP has another important function: reordering packets. In an IP network, packets can arrive in a different order than they were transmitted. Real-time applications must know the relative time of packet transmission. RTP time-stamps packets to provide these benefits:

The packets can be correctly reordered.
The packets can have appropriate delays inserted between packets.

Before the VoIP device passes the packet payload to the application, the device must ensure the correct order of the packets. TCP also provides the functionality that is needed to ensure the correct arrival order. However, TCP has too high a bandwidth overhead to be an option in VoIP. Using RTP, buffering ensures delivery of voice packets in the correct order. The TCP overhead that is needed to provide reliable transport is considerable and must