accommodate received packets being acknowledged, sent packets being kept in a retransmission buffer until the receiver acknowledges receipt, and so on. The large TCP header (20 bytes) consumes more bandwidth than the smaller UDP header (8 bytes) does. Also, voice transmission does not need the retransmission functionality of TCP once call setup is complete. Because UDP and RTP headers are smaller than TCP headers, UDP and RTP headers do not provide reliable transport. A VoIP device can have multiple active calls. The device must track which packets belong to each call. To provide the VoIP device with this needed multiplexing capability, UDP port numbers identify the call that the packet belongs to and tracks packets to specific calls. During call setup, the VoIP device negotiates UDP port numbers for each call and ensures that the port numbers are unique for all currently active calls. The UDP port numbers that are used for RTP are in the range of 16,384 to 32,767.

Content 2.3 Encapsulating Voice Packets for Transport 2.3.4 Voice Encapsulation Codecs VoIP devices encapsulate voice into RTP and UDP before adding the IP header. The size of the whole VoIP packet depends on the codec that is used and the amount of voice that is packetized. The length of the samples can vary, but for voice, samples representing 20 ms are the maximum duration for the payload. The selection of this payload duration is a compromise between bandwidth requirements and quality. Smaller payloads demand proportionately higher bandwidth per channel band because the header length remains at 40 octets. However, if payloads increase, the overall delay in the system increases, and the system is more susceptible to the loss of individual packets by the network. Figure shows voice samples encapsulated in IP packets with UDP and RTP support. Each sample uses a different codec, and is based on a default of 20 ms of voice per packet. When analog signals are digitized using the G.711 codec, 20 ms of voice consists of 160 samples, with each sample measuring 8 bits. The result is 160 bytes of voice information. These G.711 samples (160 bytes) are encapsulated into an RTP header (12 bytes), a UDP header (8 bytes), and an IP header (20 bytes). Therefore, the whole IP packet carrying UDP, RTP, and the voice payload using G.711 has a size of 200 bytes. In contrast, 20 ms of voice encoded with the G.729 codec consists of 160 samples, where groups of 10 samples are represented by a 10-bit code word. The result is 160 bits (20 bytes) of voice information. These G.729 code words (20 bytes in size) are encapsulated into an RTP header (12 bytes), a UDP header (8 bytes), and an IP header (20 bytes). Therefore, the whole IP packet carrying UDP, RTP, and the voice payload using G.729 has a size of 60 bytes.

Content 2.3 Encapsulating Voice Packets for Transport 2.3.5 Reducing Header Overhead with cRTP The combined overhead of IP, UDP, and RTP headers is enormously high, especially because voice travels in relatively small packets and at high packet rates. Figure shows the voice packet structure using G.729 and G.711. When G.729 is used, the headers are twice the size of the voice payload. The pure voice bandwidth of the G.729 codec (8 kbps) is tripled for the whole IP packet. This total, however, is still not the final bandwidth requirement because Layer 2 overhead must also be included. Without the Layer 2 overhead, a G.729 call requires 24 kbps. When G.711 is used, the ratio of header to payload is smaller because of the larger voice payload. Headers of 40 bytes are added to 160 bytes of payload, so one-quarter of the G.711 codec bandwidth (64 kbps) must be added. Without Layer 2 overhead, a G.711 call requires 80 kbps. Figure shows how RTP header compression (cRTP) reduces the huge bandwidth overhead that is caused by the IP, UDP, and RTP headers. The name of this process can be misleading because cRTP not only compresses the RTP header, but it also compresses the IP and UDP headers. cRTP is configured on a link-by-link basis. It is possible to use cRTP on just some links within your IP network. When cRTP is configured (whether on all or some links in the path), a router that receives cRTP packets on one interface and routes the packets out another interface (that is also configured for cRTP) has to decompress the packet at the first interface and then compress the packet again at the second interface. cRTP compresses the IP, UDP, and RTP headers from 40 bytes to 2 bytes if the UDP checksum is not conserved (this is the default setting on Cisco devices) and to 4 bytes if the UDP checksum is also transmitted. cRTP is especially beneficial when the RTP payload size is small; for example, with compressed audio payloads between 20 and 50 bytes. cRTP works on the premise that most of the fields in the IP, UDP, and RTP headers do not change or that the change is predictable. Static fields in the headers include source and destination IP address, source and destination UDP port numbers, and many other fields in all three headers. The RTP Header Compression Process table shows the cRTP process for the fields where the change occurs. RTP Header Compression Process Condition Action The change is predictable. The sending side tracks the predicted change. The predicted change is tracked. The sending side sends a hash of the header. The receiving side predicts what the constant change is. The receiving side substitutes the original stored header and calculates the changed fields. There is an unexpected change. The sending side sends the entire header without compression. These examples illustrate the impact of header compression under various conditions. Example: cRTP with G.729, without UDP Checksum
When RTP is used for G.729 voice streams without preserving the UDP checksum, 20 bytes of voice are encapsulated into 2 bytes of cRTP. In this case, the overhead is 10 percent; uncompressed encapsulation adds 200 percent of overhead. Example: cRTP with G.711, with UDP Checksum
When cRTP is used for G.711 voice streams preserving the UDP checksum, 160 bytes of voice are encapsulated into 4 bytes of cRTP. The overhead in this case is 2.5 percent; uncompressed encapsulation adds 25 percent of overhead.

Content 2.3 Encapsulating Voice Packets for Transport 2.3.6 When to Use RTP Header Compression cRTP reduces overhead for multimedia RTP traffic. The reduction in overhead for multimedia RTP traffic results in a corresponding reduction in delay; cRTP is especially beneficial when the RTP payload size is small, such as audio payloads. Use RTP header compression on any WAN interface where you are concerned about bandwidth and where there is a high proportion of RTP traffic.Despite the advantages, there are some disadvantages to consider before enabling cRTP. Consider the following factors before enabling cRTP:

Use cRTP when you need to conserve bandwidth on your WAN links, but enable cRTP only on slow links (less than 2 Mbps).
Consider the disadvantages of cRTP:
- cRTP adds to processing overhead; check the available resources on your routers before turning on cRTP.
- cRTP introduces additional delays because of the time it takes to perform compression and decompression.
Tune cRTP by limiting the number of sessions that are compressed on the interface. The default is 16 sessions. If your router CPU cannot manage 16 sessions, lower the number of cRTP sessions. If the router has enough CPU power and you want to compress more than 16 sessions on a link, set the parameter to a higher value.

These points are summarized in Figure .

Content 2.4 Calculating Bandwidth Requirements for VoIP 2.4.1 Impact of Voice Samples and Packet Size on Bandwidth When a VoIP device sends voice over packet networks, the device encapsulates digitized voice information into IP packets. This encapsulation overhead requires extra bandwidth as determined