Hierarchical B-Frames or B-Pyramid


Hierarchical B-Frames or B-Pyramid


What’s Hierarchical B-Frame Mode or B-pyramid (notice that in my opinion B-pyramid is a bad term)?

If there is a run of B frames and some B-frames in the run are used for backward reference for some other B frames – then this mode is called Hierarchical B-Frames Coding or B-pyramid.

The following figure is taken from the paper “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF”, by Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, illustrates the conception of B-pyramid:

Let’s display the first GOP from the above figure slightly different:

So, some geometric form is revealed but not a pyramid. Therefore, in my opinion the term B-pyramid is not a good choicce.

To exploit B-pyramid feature fully it's necessary to set GOP size (in frames) to a dyadic number (2^n), e.g. gop size = 16 frames or 32 frames.

According to results of the above mentioned article “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF” using of Hierarchical B-Frames commonly improves coding efficiency (e.g. on Football CIF 30Hz, the improvement is about 0.5 Y-PSNR dB).

Pros and Cons of Hierarchical B-frames

Pros: better exploitation of temporal redundancy.

Cons: long coding latency (not suitable for low-latency applications)

How Detect Hierarchical B-Frames or B-Pyramid?

For each frame we check that all following four conditions:

  • Current frame is B

  • Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)

  • Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous B-frame is used for reference)

  • POC of current B frame is smaller than that of the previous one

If all above conditions are met then the B-pyramid is detected.

If elementary stream is encapsulated in Mpeg-TS container then we can use PTS instead of POC. It's worth mentioning PTS are easily picked while in case of pic_order_cnt_type=1 the derivation of POC is a complicated process.  Indeed, to parse the POC value it’s necessary to dive into SPS and pick log2_max_pic_order_cnt_lsb  or a dozen other parameters in case of pic_order_cnt_type=1.

B-Pyramid versus non-reference B-frames

What's a gain of B-pyramid GOP structure IPbBbPbBb.... against IPbbbPbbb.... (three consecutive non-reference B-frames). Here 'B' denotes B-frame used for reference and 'b'

denotes B-frame not used for reference. i use x264 in constant QP mode (QP=25), closed GOP = 30 frames

On  the testing yuv-sequence "container" (384x320, 300 frames):   the bit-size saving is ~0.7%

On  the testing yuv-sequence " akiyo" (384x320, 300 frames): the bit-size saving is ~1.7%


 x264   --input-res 384x320 --fps 30   --b-adapt 0  --bframes 3 --b-pyramid none --ref 1 --no-scenecut --keyint 30 --min-keyint 30  --qp 25  --output  test_ibbb.h264  container_384x320.yuv


x264   --input-res 384x320 --fps 30   --b-adapt 0  --bframes 3 --b-pyramid strict --ref 1 --no-scenecut --keyint 30 --min-keyint 30  --qp 25  --output  test_ibBb.h264  container_384x320.yuv

How Detect B-Pyramid if Elementary Stream is Encapsulated in Mpeg-TS or MPEG4 Container?

MPEG TS Container

When Elementary Stream is encapsulated in MPEG-TS container we look for video frame boundaries to pick up PTS. We get PTS from the PES header and frame start is mandatory indicated by AUD (nal_type=9) in transport packet payload. Notice that if PTS is not present then PTS=DTS and no B-pyramid can exist in such case. Picture data (or slice data in case of multiple slices per picture) is contained in NALU with nal_type = 1 or 5 (IDR). There is a possibility that slice data  is absent in the current transport packet and it’s present in the next or next-next video packet (e.g. if SPS is too long).

Once NAL with nal_type 1 or 5 is sensed we need extract nal_ref_idc from the NAL header and two first parameters from the slice header: first_mb_in_slice and slice_type.

NAL unit of each slice consists of:

Start-code (000001 or 00000001), nal header (1 byte), slice header and slice data.

nalType = nal_header & 0x1f

nal_ref_idc =  ( nal_header & 0x60 )>>5

To determine first_mb_in_slice and slice_type we need read the first byte from the slice header  - slh[0] and to execute the following operations:

  • Get first_mb_in_slice:first_mb_in_slice = slh[0]>>7

  • if first_mb_in_slice==1 then the current slice is the first slice in a picture and it actually is the start of picture data (in such case the next step is to determine whether the slice type is B or not)

  • If first_mb_in_slice=0 then the current slice is not the first one in a picture and the picture type has been already determined.

  • if first_mb_in_slice==1 then we have to determine whether the slice type is B or not. Slice type code corresponding to B has two values 1 or 6. Exp-golomb bit-representation of 1 is '010’ and 6 is '00111’.

Hence if the current slice is corresponding to the first slice in a picture (i.e. first_mb_in_slice=1 or MSbit is '1’) and the picture type is B then one of the following two bit-patterns are transmitted in the first byte slh[0] of the slice:

1010     or      100111

Basing on the above patterns we derive the following rules to determine whether the picture type is B or not:

if (slh[0]>>4)=0xA then current slice is the first slice and the picture type is B

if ( slh[0] & 0xFC ) = 0x9C then then current slice is the first slice and the picture type is B

For each frame we check that all following four conditions:

  • Current frame is B

  • Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)

  • Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)

  • PTS of current B frame is smaller than that of the previous one

If all above conditions are met then B-pyramid is detected.

MPEG4 Container (non-fragmented)

With 'stco’ and 'stsz’ tables in meta-data we can access all access units successively in decoding order.

For each access unit we skip over non-VCL units (e.g. SEI) until first slice data NAL sensed (nal_type=1 or 5).

Then we read NAL header (to determine nal_ref_idc) and the following byte (which corresponds to the first byte of slice header) to determine slice type (B or not B). Slice type and nal_ref_idc are identically determined according to the previous section.  Although ref_idc can be derived from sdtp-box provided that this box is present in meta-data (notice it’s not mandatory to signal sdtp-box).

With ctts-table in meta data we derive PTS of each access unit (if ctts is not present then PTS = DTS and no B-pyramid can exist in such stream).

For each frame we check that all following four conditions:

  • Current frame is B

  • Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)

  • Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)

  • PTS of current B frame is smaller than that of the previous one

If all above conditions are met then B-pyramid is detected.

