analog voice interfaces. Figure illustrates a call from an analog telephone (Phone1), that is connected to a router (R1), to an analog telephone (Phone2) that is connected to another router (R2). The two routers connect to an IP network. When router R2 receives the IP packets carrying digitized voice, the router converts the packets back to analog signals. The analog signals go to Phone2 and play through the speaker of the telephone; the user at Phone2 hears the original speech. The digital-to-analog process is the reverse of the analog-to-digital conversion. DSPs on the voice interface cards of the voice-enabled routers convert digital signals to analog signals. Figure summarizes the steps: Step 1 Decompression: Any compressed voice samples are first decompressed. This is an optional step in converting analog signals. Step 2 Decoding: The DSPs in the voice interface card decode the digital voice samples to the amplitude value of the samples and then rebuild a PAM signal of the original amplitude. Step 3 Reconstruction of the analog signal: The DSP passes the PAM signal through a properly designed filter that removes the discrete digital steps from the output and produces the smooth analog signal mirroring the original analog waveform from the digitally coded counterpart.

Content 2.2 Digitizing and Packetizing Voice 2.2.3 Sampling When a DSP converts an analog signal to digital form, the DSP samples the analog signal first. The sampling rate impacts the quality of the digitized signal. If the sampling rate is too low, the DSP processes too little information and the resulting quality is degraded. The Nyquist theorem is the basis of analog-to-digital conversion. In simple terms, the Nyquist theorem tells us that reconstruction of a signal from the signal samples is possible if the sampling frequency is greater than twice the signal bandwidth. In practical terms, reconstruction is neither perfect nor exact. Engineers select sampling rates to meet the practical requirements of specific applications. This topic describes how to select a practical sampling rate. Figure illustrates two situations. In the first situation, the sampling rate is too low and the reconstructed information is imprecise. Practical reconstruction is impossible. In the second situation, a higher sampling rate is used, and the resulting PAM signals represent the original waveform; this situation allows for practical reconstruction. The Nyquist theorem predicts how a DSP works; when the DSP samples a signal instantaneously at regular intervals and at a rate of at least twice the highest channel frequency, then the samples contain sufficient information to allow an accurate reconstruction of the signal at the receiver. The example in Figure illustrates how engineers arrived at a rate of 8000 samples per second for telephony applications. Although the human ear can sense sounds from 20 to 20000 Hz, and speech encompasses sounds from about 200 to 9000 Hz, the telephone channels we use operate at about 300 to 3400 Hz. This economical range carries enough fidelity to allow callers to identify who the party at the other end of the connection is and to sense the other party’s mood. To allow capturing of higher-frequency sounds that the telephone channel can deliver, the highest frequency for voice was set to 4000 Hz. Using the Nyquist theorem, the sampling rate results in 8000 samples per second; that is, one sample every 125 ms. Taking samples above the Nyquist rate is called oversampling.

Content 2.2 Digitizing and Packetizing Voice 2.2.4 Quantization Telephony applications use a sampling rate of 8000 MHz to convert an analog signal to a digital format. The DSP must round the value of each sample to the nearest integer on a scale that varies according to the resolution of the signal. The DSP then converts the integers to binary numbers. Quantization is the process of selecting those binary numbers to represent the voltage level of each sample (the pulse amplitude modulation [PAM] value). In a sense, DSPs use quantization to approximate analog sounds to the nearest binary value that is available. The DSP must select a whole number that is closest to the signal level the DSP is reading at the instant the signal is sampled. The PAM values are rounded up or down to the step that is closest to the original analog signal. The difference between the original analog signal and the quantization level assigned is called quantization error or quantization noise. This difference is the source of distortion in digital transmission systems. Note
Noise and distortion are different phenomena. Distortion is any change in the signal that results in the output being different from the original. Noise is additional information/signals added to the original. Noise is a form of error that is not as directly related to the input signal. In other words, noise is uncorrelated with the input signal. Noise is also random in relation to distortion because it comes from outside the input signal. In terms of measurement, distortion often sounds “meaningful” even though it is not, and as such, distortion is difficult to separate from noise. For this reason, distortion can be more distracting in an audio signal than noise. The term noise is often used in place of distortion. Telephony applications usually use 8-bit quantization. DSPs represent all possible values of the analog waveform with 256 distinct voltage values, each represented by an 8-bit binary number. These approximations are not an exact duplication of the analog waveform and contain quantization errors (noise). By comparison, compact discs use 16-bit quantization that allows for 65,536 distinct voltage levels. Although 8-bit quantization is crude and introduces substantial quantization noise into the signal, the result is still more than adequate to represent human speech in telephony applications. Figure depicts quantization. In this example, the x-axis of the chart is time and the y-axis of the chart is the voltage value (PAM). The example shows quantization noise distortion at all signals that do not exactly match one of the steps. Another important term is signal-to-noise ratio. Signal-to-noise ratio (SNR) is the ratio of a given transmitted signal to the background noise of the transmission medium. An unfortunate reality of quantization is that SNR is larger at signal amplitudes because the signal is using a smaller portion of the available dynamic range. This means errors are proportionally large relative to the signal. To avoid these SNR issues, engineers use a logarithmic scale to provide better granularity for smaller signals, resulting in a more uniform SNR for all signals.

Content 2.2 Digitizing and Packetizing Voice 2.2.5 Digital Voice Encoding Digital voice samples are represented by 8 bits per sample. Each sample is encoded in the following way:

One polarity bit: Indicates positive or negative signals
Three segment bits: Identify the logarithmically sized segment number (0–7)
Four step bits: Identify the linear step within a segment

Because telephony sampling takes 8000 samples per second, the bandwidth that is needed per call is 64 kbps. This bandwidth need is why traditional circuit-switched telephony networks use time-division-multiplexed lines, combining multiple channels of 64 kbps each (digital signal level 0 [DS-0]) in a single physical interface.

Content 2.2 Digitizing and Packetizing Voice 2.2.6 Companding Companding refers to the process of first compressing an analog signal at the source and then expanding this signal back to its original size when it reaches its destination. The term companding comes from combining the two terms, compressing and expanding, into one word. A compander compresses input analog signal samples into logarithmic segments. The compander then quantizes and codes each segment using uniform quantization. Bell Systems defined