TechInt: Digital Video Off Boresight
by Dan Makinster and George Mancuso, Agile Communication Systems, and Dr. Yendo Hu, Sculpture Networks
Video is pervasive in consumer and military applications and the video requirements are constrained by both similar, and different, factors. The need to deliver consumer video to a large audience is primarily governed by available bandwidth. The military and other government agencies, although working with a smaller audience, are also restricted by bandwidth. In addition, they also have a need to deliver video in real-time.
Due to bandwidth limitations, transmission of uncompressed video to even moderate size audiences would make distribution impractical. Encoding systems are therefore employed to compress video to manageable data rates. As illustrated in Figure 1, a typical encoding process introduces picture delay, which provides a false indication to the observer of target location that can adversely affect mission objectives. Moreover, picture delay increases the difficulty of tracking objects, especially when magnification is required and, or, targets are moving at moderate velocities. New techniques are, however, available which can greatly minimize encoding system delay.
Video encoding systems normally specify picture delay in terms of latency measured from the input of the video encoder to the output of the decoder. Mission critical video systems are required to have a low delay to maintain situational awareness; that is, what is being viewed has not changed since the image was initially captured. Using current video encoding technology, latencies in the order of 100 to 500ms are achieved at the expense of transmitting the signal at high rates (20 Mbits/s or greater).
Alternate design approaches exist to achieve low latencies at reduced transmission rates, but they result in the inability to handle motion without an accompanying distortion in the form of picture macro blocking or other picture artifacts.
The technology presented herein is based on a combined High Definition (HD) encoder / decoding system delay of 30 ms at a transmission data rate of 5 Mbps.
Video Encoding System
A video encoder system represents the video source (picture) material by a model that is a close reproduction of the original. The model construct reduces redundancies in the original source thereby allowing transmission in a compressed format. A decoder deconstructs the model into a near original format.
Video compression is necessary to limit transmission bandwidth since the source signals run at high data rates. For HD, the input source data rate is 1.5 GBits/s (109) whereas the encoder output may be 10 Mbits/s (106), resulting is a compression factor of 150:1.
Digital Video Encoder
An aircraft moving through a clear sky can visualize redundancies. The video source is represented by a series of picture frames, each showing the aircraft in a unique position with the background remaining unchanged. For this example, the majority of the video sequence is static. Consequently, the background needs to be transmitted only once, overlaid by the aircraft movement in steps correlated to each video frame. Other statistical and human visual redundancies also exist which are eliminated in the transmitted video sequence, further reducing bandwidth, but are re-assimilated during the decode process.
Two of the primary factors contributing video quality are source material complexity and latency. To maintain video quality as complexity increases, a higher transmission bandwidth is required. MPEG-4 has an improved algorithm that allows for a reduction in bandwidth for material of similar complexity as compared to the bandwidth required when using MPEG-2. Standard techniques for reducing latency consist of limiting the bits transmitted or running the encoder at higher data rates to rapidly clear buffers. Reducing the bit rate impairs video quality and is used for motion-limited applications. Running at higher data rates mandates a greater bandwidth.
Video Compression Background
Video compression technology continues to advance as the theoretical framework improves. Video compression research started as early as the 1960s, but the first commercially viable standard did not appear until the early 1990s. The specific classes of techniques supporting these standards form the basis of the modern day video compression evolution framework. To date, the compression standards community has introduced several significant evolutionary standards, including MPEG-2 and MPEG-4, Part 10 (H.264).
The requirements within the transmission community will continue to change, pressing the need for better compression and improved performance.
The video transmission space addresses the needs of a diverse set of markets, including the broadcast, surveillance, Internet, government and teleconferencing segments. Within each of the markets, the deployment scenarios further broaden the scope of applications video compression can address. The compression requirements to address the needs of this area vary significantly as the application and the deployment setups change. In general, though, the differences between the compression needs are listed next:
Distortion artifact types
The reality is, depending on the requirements, the implementation approach and the resulting performance associated with each of the category can be drastically different. Furthermore, most of the listed requirements are inversely correlated with each other, resulting in the constant battle to balance the tradeoffs between the various parameters. A prime example: Bandwidth gain, stream robustness, and compression delay work in opposite directions.
General Digital Video Encoding System
The video encoder block diagram illustrates the techniques used to compress source video into a form that can be decoded with an appropriate device.
The temporal block (inter encoding) reduces time dependent redundancies by exploiting similarities between neighboring video frames (stored) by constructing a prediction of the current frame. A prediction is then formed from one or more previous or future frames. The output of the temporal block is a residual frame and a set of motion vectors (used to predict position). The spatial block (intra encoding) makes use of similarities within the frame to reduce spatial redundancies.
The Discrete Cosine Transform (DCT) transforms spatial and temporal information into the frequency domain. The resultant DCT coefficients are quantized and coded. Quantizing reduces the number coefficient bits and also gives preference to those based on human perception. Coding is also performed so that redundant coefficients are processed once to reduce data.
The coefficients and motion vectors of the temporal and spatial blocks are further compressed by the entropy encoder to eliminate statistical redundancy.
The output of the encoder is a compressed digital packetized bit stream including motion vectors, residual coefficients, and f5. The compression process represents the video source by a Group of Pictures (GOP). A generic GOP consists of I-, B- and P-frames. Other more advanced types of frames also exist. Intra I-frames represent a fixed spatial image, which is independent of other picture types. Each GOP begins with an I-frame. Predictive P-frames contain motion compensated difference information from the preceding I- or P-frame. The Bi Directional B-frame contains difference information from the preceding and following I- or P-frame within a GOP. The video compression process results in loss of information. P- and B-frames accumulate loss and I frames are inserted to limit loss by refreshing the GOP. I-frames, however, contain the largest amount of information requiring the most bits to be transmitted.
From a compression perspective, the Bi-directional frames will achieve the highest degree of compression efficiency, given the fact they can predict from both from the future and the past. However, from a latency perspective, B-frames are detrimental as they required the storage of future frames. For an implementation where only I- and P-frames exists, the theoretical minimum delay is less than 1ms for a 1080p video at 30fps for a configuration given in the diagram, the minimum theoretical delay is two frames, 66ms. Thus, for low delay applications, Bi-directional encoding is typically not used.
Low Latency Encoding Architecture
Low latency entirely goes against compression efficiency for a simple fact:
Low Latency Limits Complexity Distribution over Time
Due to this fundamental limitation, the approach to designs targeting for low delay are fundamentally different from the traditional video encoder system designs. A video encoding architecture created by Sculpture Networks Inc. was built from the ground up, specifically intended for the low delay mission critical requirements. In conjunction with this platform, Sculpture Networks has developed a tightly coupled compression pre-processing, mode decision, and rate control management package to realize the full potential of this platform, achieving the best-of-breed low delay encoding.
The Sculpture Networks compression engine is built with two-pass architecture and a preprocessing engine. For low delay benefit, the engine is built with a top down pipeline architecture for extremely low delay throughput.
The first pass compression analysis engine carries out full frame analysis of the pixel complexity, both in the temporal and the spatial domains. This complexity data is then distributed to both the Rate Control and the Mode Decision engine as inputs to the decision making process. Look-ahead visibility introduces a significant advantage to the compression control process. With this knowledge, the engine can better determine the appropriate compression levels and the compression modes for the best distribution of information. The first pass awareness directly benefits the rate control engine to tightly manage the decoder buffers, so that minimum delay is maintained without unexpected underflows or overflows. Underflow and overflows translates to hesitation in rendering, which forces unnecessary frame delays.
The Compression Distortion matching engine follows an advanced proprietary algorithm capable of reducing the distortion differences between the different compression methods, specifically, the distortion artifacts between intra encoding and inter encoding. The low compression efficiency inherent in intra-encoding is then distributed across the inter-encoding frames to balance the load.
The compression inefficiency due to intra encoding is distributed across a row of inter-encoding period. This method effectively removes the need to introduce a large decoder buffer, which is typically required to store the large I-frames. As a result, a much tighter delay can be achieved.
The forced mixing of intra- and inter-encoding methods within the same frame does result in a fundamental issue: The distortion differences between intra- and inter-encoding introduce a difference in the noise, which is perceivable by the human eye.
Through subjective tests, even with the same PSNR (figure of merit for video quality), a direct correlation between the intra coded regions verse inter coded regions will result in a visible shading effect at the intra-/inter-encoding boundaries.
Through Sculpture Networks technology, the encoder is able to correlate the distortions between the two and minimize the perceived noise differences between the two encoding methods.
The Sculpture Networks engine carries out the delicate balancing act so to achieve the best compression quality without the cost of delay and robustness. The rate control and mode decision algorithms are implemented in hardware, with real time rate control and mode decision compensation at the macroblock level.
In meeting the sub-frame delay capability, the decoder buffer model is updated at the macroblock level. To balance the compression efficiency verses recovery robustness, the encoding engine utilizes a dynamic refresh mode decision algorithm that maintains refresh capability with minimal compression inefficiencies and delay impact. Through dual pass pixel analysis, transform domain data collection, pre-processing, and encoding distortion processing, the low delay encoding platform is able to optimize the trade offs between compression effectiveness, delay, and the perceived distortion.
Real time video encoding is often required for mission critical applications. Current low delay video encoding systems reduce latency either at the expense of bandwidth or video quality. The encoding technology presented herein, supports high quality HD video having a delay of 66 ms or less, at one-quarter the rate achieved by a traditional compression approach.
About the authors
Mr. Dan Makinster Is the President of Agile Communication Systems (ACS). As a Communication Systems Engineer, Mr. Makinster has developed advanced communication systems for the Banking, Broadcast and Military markets for the past 30 years.
Mr. George Mancuso serves as Manager of Strategic Accounts for Agile Communication Systems (ACS). He is building on the current success of ACS as it continues to expand the Companys presence within the video, satellite communications and military markets.
Dr. Yendo Hu is a founder of Sculpture Networks, a company that specializes in the content of multi-media transmission space. He has introduced breakthrough technical advances in compression, aggregation, and transmission for both wired and wireless networks.
About Agile Communication Systems
Agile Communication Systems, Inc. is a dedicated group of engineers with hands-on experience in the architecture and design of large scale networks and applications for harsh environments. The Company has expanded to include industry experts in Modeling & Simulation as well as Software Development.
Agile Communication Systems offers an organization with hands-on experience in large-scale systems and system of systems deployment experience runs from design and modeling to testing and analysis in mobile, distributed, and harsh environments, applying this expertise to legacy and next generation systems of all sizes. Our knowledge of, and experience with, all forms of terrestrial and satellite communications technologies and their users gives us a broad view, and allows us to keep our customers informed and away from developmental dead ends. Agile Communications Systems maintains a robust Research & Development team uncommon for a small business. Our NetRAT suite of automated network test and analysis software and hardware is becoming a standard for evaluating modern wireless network communications.
About Sculpture Networks
Sculpture Networks Inc. (SN) is an advanced compression technology company, developing the next generation compression architecture capable of low delay, low bandwidth HD transmission at the lowest possible cost. SN is positioned to introduce the next generation compression implementations that are capable of reducing transmission and storage bandwidth by half. The SN leadership consists of an exceptional team of industry veterans in video compression and ASIC implementations, and software experts to build the next generation FPGA and software based engine core that is protected by a strong IP portfolio. SN is currently positioned to complete the integration effort, targeting the compression transmission industry with a next generation engine, achieve low delay at 50 percent the bandwidth. It is expected that this SN implementation will address specific needs within the transmission market including the broadcast, teleconferencing and defense industries. SN expects to further the algorithms and implementations to address all compression applications.