Development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio, and their combination, in order to satisfy a wide variety of applications.
Programme of work
Serve as responsible body within ISO/IEC for recommending a set of standards consistent with the Area of Work; Cooperate with other standardisation bodies dealing with similar applications; Consider requirements for interworking with other applications such as telecommunications and broadcasting, with other image coding algorithms defined by other SC29 working groups and with other picture and audio coding algorithms defined by other standardisation bodies; Define methods for the subjective assessment of quality of audio, moving pictures and their combination for the purpose of the area of work; Assess characteristics of implementation technologies realising coding algorithms of audio, moving picture and their combination; Assess characteristics of digital storage and other delivery media target of the standards developed by WG11; Develop standards of coding of moving pictures, audio and their combination taking into account quality of coded media, effective implementation and constraints from delivery media; Propose standards for the coded representation of moving picture information; Propose standards for the coded representation of audio information; Propose standards for the coded representation of information consisting of moving pictures and audio in combination; Propose standards for protocols associated with coded representation of moving pictures, audio and their combination
MPEG is a working group of ISO, the International Organisation for Standardisation. Its formal name is ISO/IEC JTC 1/SC 29/WG 11. The title is: Coding of moving pictures and audio. The are of work assigned to it is: Development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio, and their combination, in order to satisfy a wide variety of applications.
MPEG has developed the following standards
11172 | (MPEG-1) | Coding of moving pictures and associated audio at up to about 1.5 Mbit/s |
| Part 1 | Systems |
| Part 2 | Video |
| Part 3 | Audio |
| Part 4 | Conformance testing |
| Part 5 | Software simulation |
13818 | (MPEG-2) | Generic coding of moving pictures and associated audio |
| Part 1 | Systems |
| Part 2 | Video |
| Part 3 | Audio |
| Part 4 | Conformance testing |
| Part 5 | Software simulation |
| Part 6 | System extensions - DSM-CC |
| Part 7 | Advanced Audio Coding |
| Part 8 | VOID - (withdrawn) |
| Part 9 | System extension RTI |
| Part 10 | Conformance extension - DSM-CC |
| Part 11 | IPMP on MPEG-2 Systems |
14496 | (MPEG-4) | Coding of audio-visual objects |
| Part 1 | Systems |
| Part 2 | Visual |
| Part 3 | Audio |
| Part 4 | Conformance testing |
| Part 5 | Reference Software |
| Part 6 | Delivery Multimedia Integration Framework |
| Part 7 | Optimised software for MPEG-4 tools |
| Part 8 | 4 on IP framework |
| Part 9 | Reference Hardware Description |
| Part 10 | Advanced Video Coding |
| Part 11 | Scene Description and Application Engine |
| Part 12 | ISO Base Media File Format |
| Part 13 | IPMP Extensions |
| Part 14 | MP4 File Format |
| Part 15 | AVC File Format |
| Part 16 | Animation Framework eXtension (AFX) |
| Part 17 | Streaming Text Format |
| Part 18 | Font compression and streaming |
| Part 19 | Synthesized Texture Stream |
| Part 20 | Lightweight Application Scene Representation |
| Part 21 | MPEG-J Extension for rendering |
| Part 22 | Open Font Format |
| Part 23 | Symbolic Music Representation |
| Part 24 | Audio-System interaction |
| Part 25 | 3D Graphics Compression Model |
| Part 26 | Audio Conformance |
| Part 27 | 3D Graphics Conformance |
15938 | (MPEG-7) | Multimedia Content Description Interface |
| Part 1 | Systems |
| Part 2 | Description Definition Language |
| Part 3 | Visual |
| Part 4 | Audio |
| Part 5 | Multimedia Description Schemes |
| Part 6 | Reference Software |
| Part 7 | Conformance |
| Part 8 | Extraction and Use of MPEG-7 Descriptions |
| Part 9 | Profiles |
| Part 10 | Schema definition |
| Part 11 | Profile schemas |
| Part 12 | Query Format |
21000 | (MPEG-21) | Multimedia Framework |
| Part 1 | Vision, Technologies and Strategy |
| Part 2 | Digital Item Declaration |
| Part 3 | Digital Item Identification |
| Part 4 | IPMP Components |
| Part 5 | Rights Expression Language |
| Part 6 | Rights Data Dictionary |
| Part 7 | Digital Item Adaptation |
| Part 8 | Reference Software |
| Part 9 | File Format |
| Part 10 | Digital Item Processing |
| Part 11 | Evaluation Tools for Persistent Association |
| Part 12 | Test Bed for MPEG-21 Resource Delivery |
| Part 13 | VOID - (to MPEG-4 part 10) |
| Part 14 | Conformance |
| Part 15 | Event reporting |
| Part 16 | Binary format |
| Part 17 | Fragment Identification |
| Part 18 | Digital Item Streaming |
| Part 19 | Media Value Chain Ontology |
MPEG-1 (ISO/IEC 11172)
The first MPEG work item was: Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s. In practice this meant a standard for efficient storage and retrieval of audio and video on compact disc. Parts 1 (Systems), part 2 (Video) and part 3 (Audio) of the standard were approved at the November 1992 meeting in London. The Systems part provide multiplexing and synchronisation support to elementary Audio and Video streams. The Video part provides efficient encoding of non-interlaced pictures with roughly VHS quality at 1,15 Mbit/s. The Audio part provides encoding of stereo audio with transparency (i.e. subjective quality similar to the original streo) at 384, 256 and 192 kbit/s per Layer II, II and III respectively.
Part 4 of the standard "Conformance Testing" provides methods and reference bitstreams that can be used to assess conformance of a bitstream or of a decoder, was approved one year later. Part 5 of the standard "Reference Software" was approved in 1994. The latter contains the C-code implementation of a Systems multiplexer/demultiplexer and of encoders and decoders for Audio and Video.
This is the table of MPEG-1 parts:
11172 | (MPEG-1) | Coding of moving pictures and associated audio at up to about 1.5 Mbit/s |
| Part 1 | Systems |
| Part 2 | Video |
| Part 3 | Audio |
| Part 4 | Conformance testing |
| Part 5 | Software simulation |
MPEG-1 has been and is being used by many industries in a variety of products, services and applications and has triggered the start of a number of others.
There are many versions of a full MPEG-1 audiovisual players that can be used in a software environment. These utilise all 3 parts of Audio standard with Audio typically in Layer II. Many software packages exists that are capable of encoding audio and video in MPEG-1 and editing the resulting files.
The Video CD is a full application of MPEG-1 that is typically used to encode movies on 2 CDs. Several hundreds million hardware Video CD decoders have been sold worldwide and billions of Video CD discs have been printed. Software Video CD decoders are also avilable from multiple sources.
MPEG-1 Audio Layer III, also known as MP3, has been implemented in manifold ways. Many software packages exist to rip a track from a CD Audio and compress it in MP3. This has given rise to innovative ways of consuming music, such as the ability to create compilations to one's liking that can then downloaded to light non-mechanical MP3 players. With the arrival of MP3 the music world has been changed without recognition.
MPEG-2 (ISO/IEC 13818)
The Porto meeting in July 1990 was the first to address the MPEG-2 standard called "Generic coding of Moving Pictures and Associated Audio" and the Singapore meeting in November 1994 was the one that approved the first 3 parts: Systems, Video and Audio. Conformance (part 5) was approved one year later and Reference Software in 1996. The Systems part, in its "Transport Stream" version, provides support for efficient transmission over error-prone delivery systems, while the "Program Stream" version, similar to MPEG-1 Systems, is more useful for digital storage media. The Video part provides support for efficient coding of interlaced pictures ad different spatial resolution. The Audio part provides support to encoding of multi-channel audio in such a way that an MPEG-1.
MPEG-2 has more parts than MPEG-1. Part 6 "Digital Storage Media Command and Control' or DSM-CC provide protocols for session set up across different networks and for remote control of a server containing MPEG-2 content. Part 7 "Advanced Audio Coding" or AAC provides a new multichannel audio coding that is not backward compatible with MPEG-1 Audio. Part 8 was intended to support video coding when samples are represented with an accuracy of more than 8 bits, but its development was discontinued when the interest of the industry that had requested it did not materialise. Part 9 "Real Time Interface" provides a standard interface between an MPEG-2 Transport Stream and a decoder.
13818 | (MPEG-2) | Generic coding of moving pictures and associated audio |
| Part 1 | Systems |
| Part 2 | Video |
| Part 3 | Audio |
| Part 4 | Conformance testing |
| Part 5 | Software simulation |
| Part 6 | System extensions - DSM-CC |
| Part 7 | Advanced Audio Coding |
| Part 8 | VOID - (withdrawn) |
| Part 9 | System extension RTI |
| Part 10 | Conformance extension - DSM-CC |
| Part 11 | IPMP on MPEG-2 Systems |
Parts 1, 2 and 3 (this last one sometimes replaced with a proprietary solution) are used in the some 50 million digital television set top boxes and 10 million Digital Versatile Discs (DVD). Some MPEG-2 encoders are very costly professional equipment and some are very inexpensive PC board that are sold with video editing software.
Several examples of DSM-CC is widely used in set top boxes for satellite and cable. This part of the standard is also at the basis of provision of other set top box functionalities by other standards bodies and industry consortia. AAC has been adopted by Japan for a national digital television standard and by several manufacturers of secure digital music.
Over the years several amendments, i.e. extensions, of the standard have been developed. One of the most important is the 4:2:2 profile that extends the use of MPEG-2 into the television studio.
A number of patents are thought to be relevant for implementing the MPEG-2 standard. As MPEG is prevented by ISO rules to deal with patent issues, there is at least one organisation known that handles the licensing of MPEG-2 Systems and Video and another of MPEG-2 Audio.
MPEG-4 (ISO/IEC 14496)
Work on the MPEG-4 standard "Coding of audio-visual objects" began in July 1993 in New York, NY and the first set of standards (so-called version 1) was approved at the Atlantic City, NJ meeting in October 1998. A major extension of the standard (so-called version 2) was approved at the Maui, HI meeting in December 1999.
The first 6 parts of the standard correspond roughly to those of MPEG-2. The title of the first 5 is the same as MPEG-2, the title of the 6th is Delivery Multimedia Integration Framework. There are, however, a number of significant differences of content.
MPEG-4 enables the coding of individual objects. This means that the video information needs not be of rectangular shape as MPEG-1 and MPEG-2 Video assume. The same applies for audio, which provides all tools to encode speech and audio ad different rates and with different functionalities, including an extension of AAC. The systems part, therefore, contains, in addition to the traditional parts of MPEG-1 and MPEG-2 Systems, also the "composition function. Further, since a composition object can be also of synthetic nature, MPEG-4 Systems also contains standard technology to represent time-varying synthetic 3D information. A framework to deal with management and protection of rights arising from individual objects is also provided by MPEG-4 Systems. Finally a file format has been standardised. Part 5 is a complete software implementation of both encoders and decoders. Compared with the reference software of MPEG-1 and MPEG-2 whose value is purely informative, the MPEG-4 Reference Software has the same normative value as the textual parts of the standard. The software may also be used for commercial products and the copyright of the software is licensed at no cost by ISO/IEC for products conforming to the standard. Part 6 “Delivery Multimedia Integration Framework” (DMIF) provides a standard interface to access various transport mechanisms and an abstraction from the underlying delivery mechanism.
Part 7 “Optimised software for MPEG-4 tools” that provides examples of reference software that not just implement the standard correctly but also in optimised form. Part 8 “4 on IP framework” complements the generic MPEG-4 RTP payload defined by IETF as RFC 3640. Part 9 is “Reference Hardware Description” providing "reference software" in VHSIC Hardware Description Language (VHDL) for synthesis of VLSI chips.
Part 10, Advanced Video Coding (AVC), has been produced by the Joint Video Team (JVT) has roughly twice the compression capability of MPEG-2 and MPEG-4 Visual and contains Scalable Video Coding (SVC) and Multiview Video Coding (MVC).
Scene Description (part 11) provides technologies for the functionality of “composing” different information elements in a “scene”. This is called Binary Format for MPEG-4 Scenes (BIFS).
The ISO Base Media File Format (part 12) is designed to contain timed media information for a presentation in a flexible, extensible format that facilitates interchange, management, editing, and presentation of the media. These may be ‘local’ to the system containing the presentation, or may be via a network or other stream delivery mechanism. Part 14 “MP4 File Format” extends the File Format to cover the needs of MPEG-4 scenes while part 15 “AVC File Format” supports the storage of AVC and MVC bitstreams.
Tools for coding synthetic visual information for 3D graphics are specified in Part 2 - Face and Body Animation and 3D Mesh Compression, Part 11 - Interpolator Compression - and Part 16 - which is a complete framework, called Animation Framework eXtension (AFX), for efficiently coding the shape, texture and animation of interactive synthetic 3D objects.
Streaming Text Format (part 17) defines text streams that are capable of carrying Third Generation Partnership Program (3GPP) Timed Text. To transport the text streams, a flexible framing structure is specified that can be adapted to the various transport layers, such as RTP/UDP/IP and MPEG-2 Transport and Program Stream, for use in media such as broadcast and optical discs. Part 18 Font compression and streaming provides tools for the purpose indicated by the title. Part 19 Synthesized Texture Stream defines the representation of synthesised textures.
Part 20 “Lightweight Application Scene Representation” (LASeR) provides composition technology with similar functionalities is provided by BIFS. Part 21 MPEG-J Extension for rendering provides a Java powered version of BIFS called MPEG-J.
Part 22 Open Font Formatr is the well-known OpenType specification converted to an ISO standard.
Part 23 Symbolic Music Representation provides a standard for representing music scores.
Part 24 Audio-System interaction clarifies some Audio aspects in a Systems environment.
Part 25 3D Graphics Compression Model defines an architecture for 3D Graphics related applications.
Part 26 Audio Conformance and Part 27 3D Graphics Conformance collects all specifications of audio and 3D graphics.
The full list of MPEG-4 parts is
14496 | (MPEG-4) | Coding of audio-visual objects |
| Part 1 | Systems |
| Part 2 | Visual |
| Part 3 | Audio |
| Part 4 | Conformance testing |
| Part 5 | Reference Software |
| Part 6 | Delivery Multimedia Integration Framework |
| Part 7 | Optimised software for MPEG-4 tools |
| Part 8 | 4 on IP framework |
| Part 9 | Reference Hardware Description |
| Part 10 | Advanced Video Coding |
| Part 11 | Scene Description and Application Engine |
| Part 12 | ISO Base Media File Format |
| Part 13 | IPMP Extensions |
| Part 14 | MP4 File Format |
| Part 15 | AVC File Format |
| Part 16 | Animation Framework eXtension (AFX) |
| Part 17 | Streaming Text Format |
| Part 18 | Font compression and streaming |
| Part 19 | Synthesized Texture Stream |
| Part 20 | Lightweight Application Scene Representation |
| Part 21 | MPEG-J Extension for rendering |
| Part 22 | Open Font Format |
| Part 23 | Symbolic Music Representation |
| Part 24 | Audio-System interaction |
| Part 25 | 3D Graphics Compression Model |
| Part 26 | Audio Conformance |
| Part 27 | 3D Graphics Conformance |
MPEG-7 (ISO/IEC 15938)
Work on MPEG-7 "Multimedia Content Description Interface" standard started at the April 1997 meeting in Bristol. MPEG-7 is an audio-visual information representation that is different from the previous MPEG standards in the sense that what is represented is not the information itself but the information about the information. MPEG-7 is an 11-part standard:
15938 | (MPEG-7) | Multimedia Content Description Interface |
| Part 1 | Systems |
| Part 2 | Description Definition Language |
| Part 3 | Visual |
| Part 4 | Audio |
| Part 5 | Multimedia Description Schemes |
| Part 6 | Reference Software |
| Part 7 | Conformance |
| Part 8 | Extraction and Use of MPEG-7 Descriptions |
| Part 9 | Profiles |
| Part 10 | Schema definition |
| Part 11 | Profile schemas |
| Part 12 | Query Format |
The technical content of the standard is as follows:
- Systems provides the architectural framework of the standard, the carriage of MPEG-7 content and the binarisation of MPEG-7 content
- Description Definition Language allows to create descriptors and description schemes
- Visual provides standard descriptors and description schemes that are purely visual
- Audio provides standard descriptors and description schemes that are purely audio
- Multimedia Description Schemes provides standard descriptors and description schemes that are neither visual nor audio
- Reference software has the same normative value as the MPEG-4 reference software and may be used for products at the same conditions
- Conformance is the means to test an implementation or data for conformity.
- Part 8 describes how feature extraction can be implemented
- Part 9 provides a set of profiles
- Part 10 the MPEG-7 schema definition
- Part 11 collects the profile schemas
- Part 12 specifies the interface between a requester for and a responder of multimedia content retrieval systems
MPEG-21 (ISO/IEC 21000)
Work on MPEG-21 "Multimedia Framework" standard started at the May-June 2000 meeting in Geneva. MPEG-21 provides a multimedia framework and sets out a vision for the future of an environment where delivery and use of all content types by different categories of users in multiple application domains will be possible.
MPEG-21 assumes that there are Users (anybody in the value network) and Digital Items (assembly of content) on which Users execute Actions that generate other Digital Items that can become object of Transactions. In order to make this possible a number of technologies are needed that fall under the following categories
- Digital Item Declaration
- Digital Item Identification
- Intellectual Property Management and Protection
- Terminals and Networks
- Digital Item Management and Usage
- Digital Item Representation
- Event Reporting
The current table of MPEG-21 standards is
21000 | (MPEG-21) | Multimedia Framework |
| Part 1 | Vision, Technologies and Strategy |
| Part 2 | Digital Item Declaration |
| Part 3 | Digital Item Identification |
| Part 4 | IPMP Components |
| Part 5 | Rights Expression Language |
| Part 6 | Rights Data Dictionary |
| Part 7 | Digital Item Adaptation |
| Part 8 | Reference Software |
| Part 9 | File Format |
| Part 10 | Digital Item Processing |
| Part 11 | Evaluation Tools for Persistent Association |
| Part 12 | Test Bed for MPEG-21 Resource Delivery |
| Part 13 | VOID - (to MPEG-4 part 10) |
| Part 14 | Conformance |
| Part 15 | Event reporting |
| Part 16 | Binary format |
| Part 17 | Fragment Identification |
| Part 18 | Digital Item Streaming |
Part 19 | Media Value Chain Ontology |
- Part 1 Vision, Technologies and Strategy lays down the scope and development plan of the project.
- Part 2 Digital Item Declaration (DID) defin es a structure that can flexibly accommodate the many components of a multimedia object (resources, identifiers, metadata, encryption keys, licenses etc.).
- Part 3 Digital Item Identification (DII), a standard to handle identifiers in Digital Items.
- Part 4 Intellectual Property Management and Protection (IPMP) Components specifies the component technologies to make elements of a Digital Item available in a form that can be processed by a machine.
- Part 5 Rights Expression Language (REL) defines a language to express machine readable rights in a rich form that is comparable to the richness of the human language.
- Part 6 Rights Data Dictionary (RDD) defines a standard semantics for verbs commonly used in the media environment, especially for use by Part 5..
- Part 7 Digital Item Adaptation (DIA) specifies the syntax and semantics of the tools that may be used to assist in the adaptation of Digital Items, metadata and resources.
- Part 8 Reference Software provides the reference software implementation of the relevant MPEG-21 standards.
- Part 9 File Format defines a standard file format for Digital Items.
- 10 Digital Item Processing (DIP) provides the tools to enable a Digital Item creator to suggest how a user can interact with the Digital Item.
- Part 11 Evaluation Tools for Persistent Association provides the means to evaluate the performance of a given Persistent Association Technology to see how well it fulfils the requirements of the intended application.
- Part 12 Test Bed for MPEG-21 Resource Delivery is a software test bed that has been developed to enable experimentation with different means of resource delivery.
- Part 13 - withdrawn
- Part 14 Conformance provides test methodologies and suites to assess the conformity of a bitstream (typically an XML document) and a decoder (typically a parser) to the relevant MPEG-21 standard.
- Part 15 Event Reporting (ER) provides the technology to generate an event every time an action specified in the “Event Report Request” (ERR) contained in a Digital Item is made on a resource.
- Part 16 Binary format references the technology specified in MPEG-B Part 1 “Binary MPEG format for XML” (BiM).
- Part 17 Fragment Identification (FID) specifies a normative syntax for URI Fragment Identifiers to be used for addressing parts of a resource from a number of Internet Media Types.
- Part 18 Digital Item Streaming (DIS) provides the technology to achieve this when the streaming mechanism employed is MPEG-2 Transport Stream and RTP/UDP/IP.
H.264 or MPEG-4 Part 10 (Advanced Video Coding), was developed as a successor to MPEG-2, providing gains in efficiency and a comprehensive toolset for delivering high flexibility. As a result, H.264 provides equivalent video quality at substantially lower bit rates compared to MPEG-2. An H.264 encoder can, without compromising image quality, reduce bandwidth requirements by as much as 50 percent in comparison with MPEG-2. It was also developed to use the same asymmetrical architecture. Computational complexity at the decoder was minimized, ensuring enough flexibility for a wide range of applications, including broadcasting, storage and wireless multimedia communications.
H.264's algorithm is similar to MPEG-2 and uses the same underlying principles, including block-based motion compensation and the discrete cosine transform. However, H.264 emphasizes efficiency and reliability. It performs spatial prediction for intra-frame coding and temporal motion estimation for inter-frame coding to improve compression efficiency. In intra-frame coding, each frame is encoded on its own, without using any information from its neighboring frames. In addition, H.264 makes use of preprocessing stages and relies on spatial prediction using neighboring pixels from previously encoded blocks to take advantage of inter-block spatial correlation.
Key features of the standard include compression and transmission efficiency, and a focus on widely used applications of video compression. Its flexibility and scalability is evidenced by the 17 profiles and 16 levels supported today, each targeting specific classes of popular video communication applications.
The limitations posed by H.264 are similar to those faced by MPEG-2 in its infancy. Ultimately, it's the capability of existing technology that's slowed H.264's penetration in the broadcast professional domain. Today, the most technologically advanced and standards-compliant H.264 codec is capable of producing compressed video streams of 80Mb/s limited to 8-bit resolution. Deploying a H.264 link can be costly, as much as four times higher than competing standards both in cost and in power consumption. The architectural asymmetry of the codec has also led to the assumption that high-quality decoders are low-cost devices. Users often end up surprised by the high cost of professional video decoders.
Video compression
Broadcasters have a range of choices when it comes to signal compression solutions. For advanced, professional-grade compression, MPEG-2, H.264 and JPEG 2000 are all viable options. Ultimately, network infrastructure, bandwidth requirements and budget all help to define the “right” choice for a broadcaster. MPEG-2 and H.264 are strong options for compression for next-generation multimedia applications.
Increasingly, a strong case can be made for JPEG 2000, whose advanced intra-frame based encoding provides a degree of flexibility and control not found in other compression schemes. Furthermore, the surge in the amount of video transport applications requiring both very low latency and very high visual quality make JPEG 2000 an optimum solution to meet the demands of a video landscape moving toward HD.
For all broadcasters — regardless of network infrastructure, compression solution and application specifics — the goal is to deliver maximum quality given bandwidth and cost limitations with the sole purpose of maximizing revenue. Keeping in mind that video transport is a complete value chain. Anything done along the chain affects overall processing with consequences of a misstep dire and resulting in devaluation downstream. Of course, how the chosen compression solution is engineered and managed is also critical to achieving the best performance, regardless of the compression scheme selected.
The big picture
All codecs discussed here have a role in professional, contribution-quality video transport. H.264/MPEG-4 and MPEG-2 are still relevant in the realm of professional broadcast transport. They provide high-quality solutions for bandwidth-constrained environments, but they are not necessarily the right choice in all applications.
JPEG 2000 provides very high visual quality and low latency with multiple coding cycles. It has proved its worth in a complete video chain and for an environment trending toward IP transport as well as 3G, HD and 3-D technologies.
In addition to quality and network infrastructure, resources and cost must also be considered when evaluating and selecting a compression solution. Generally, MPEG-2 and H.264 compression are expensive, power-consuming, large and complex technologies.
The future of JPEG 2000 is bright as it requires less power, consumes less space and generally delivers greater scalability, flexibility and visual quality than other codecs. An increasing number of service providers and broadcasters are using JPEG 2000 implementations for large, globally significant events — particularly over IP networks. But the landscape is ever changing. An even newer compression solution, 1080p50/60, is now causing a stir in the broadcast community. Let's all stay tuned.
Dr. Chin Koh is director of product management at Nevion.