Audio-visual Coding

Previous: Overview
Up: Multimedia Conferencing and the Responsibilities of the Partners
Next: Conference Rooms
Previous Page: Overview
Next Page: Conference Rooms

Audio-visual Coding

The set of standards called p x 64 [5] cover video, voice, data; they include the H.261 standard for compressed video, several for voice, and H.221 framing for multimedia streams over serial links. Usually, the data and the voice are multiplexed within an H.221 stream - which is isochronous. The native H.261 coding for the video is not strictly isochronous - though there are delay bounds. At present most of the workstation-based software implementations treat the H.261 video and the audio separately. This allows us to route video and audio separately, for instance sending the audio via direct internet multicast and the video via a CMMC, or to prioritise the video and audio differently, so that sites on low bandwidth links only receive the audio. Moreover, because the H.221 framing standard has been designed for circuit switched environments, it is assumed that there is a capability to accept continuous streams of data; this framing is not very well suited for input to a workstation [8].

Much more processing power is required for encoding video than for decoding it. Whilst only a powerful workstation could be considered to perform encoding, decoding requires an order of magnitude less. Where software implementations of H.261 exist, they can be performed using less processing power by omitting to perform a search for motion vectors - this results in slightly lower quality video, but brings the processing requirements within the bounds set by today's workstation technology.

For a point-point conference over a circuit-switched network, it makes sense to integrate the voice with the video. For multi-way conferencing over a packet-switched net, this is less suitable: voice is much more sensitive to delay and jitter than video, and it can be compressed to use much less bandwidth (between 4.8 - 16 Kbps). By using silence suppression, and the fact that usually only one person speaks at a time, it is possible to service a large audience with voice for a modest outlay of bandwidth, provided multicast is used. The five most recent IETF (Internet Engineering Task Force) meetings have demonstrated that audio can be multicast to audiences worldwide (up to 500 sites in 5 continents at the last one); these broadcasts have also demonstrated that some networks and routers may suffer from very bad delay variation.

It is possible to engineer the packet switched networks to ensure reasonable bounds on delay and jitter in the research networks at our disposal [9]. It is also possible to implement voice relays which mediate between sites with different coding schemes - often heavier coding is adopted because of the availability of reduced communication bandwidths. While the standard method for encoding voice is pcm, there are several coding standards for reducing its bandwidth; examples are CELP [10], ADPCM [11], LPC [12] and GSM [13]. We expect that many of these will be used, but have not yet decided which we will be supporting.

Previous: Overview
Up: Multimedia Conferencing and the Responsibilities of the Partners
Next: Conference Rooms
Previous Page: Overview
Next Page: Conference Rooms

[email protected]
Thu Dec 30 18:10:38 GMT 1993