rfc9628v1.txt | rfc9628.txt | |||
---|---|---|---|---|
Internet Engineering Task Force (IETF) J. Uberti | Internet Engineering Task Force (IETF) J. Uberti | |||
Request for Comments: 9628 S. Holmer | Request for Comments: 9628 S. Holmer | |||
Category: Standards Track M. Flodman | Category: Standards Track M. Flodman | |||
ISSN: 2070-1721 D. Hong | ISSN: 2070-1721 D. Hong | |||
J. Lennox | J. Lennox | |||
8x8 / Jitsi | 8x8 / Jitsi | |||
August 2024 | February 2025 | |||
RTP Payload Format for VP9 Video | RTP Payload Format for VP9 Video | |||
Abstract | Abstract | |||
This specification describes an RTP payload format for the VP9 video | This specification describes an RTP payload format for the VP9 video | |||
codec. The payload format has wide applicability as it supports | codec. The payload format has wide applicability as it supports | |||
applications from low bitrate peer-to-peer usage to high bitrate | applications from low bitrate peer-to-peer usage to high bitrate | |||
video conferences. It includes provisions for temporal and spatial | video conferences. It includes provisions for temporal and spatial | |||
scalability. | scalability. | |||
skipping to change at line 37 ¶ | skipping to change at line 37 ¶ | |||
received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
Internet Engineering Steering Group (IESG). Further information on | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | Internet Standards is available in Section 2 of RFC 7841. | |||
Information about the current status of this document, any errata, | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | and how to provide feedback on it may be obtained at | |||
https://www.rfc-editor.org/info/rfc9628. | https://www.rfc-editor.org/info/rfc9628. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Revised BSD License text as described in Section 4.e of the | include Revised BSD License text as described in Section 4.e of the | |||
Trust Legal Provisions and are provided without warranty as described | Trust Legal Provisions and are provided without warranty as described | |||
skipping to change at line 132 ¶ | skipping to change at line 132 ¶ | |||
allow a frame to be encoded at the same resolution but at different | allow a frame to be encoded at the same resolution but at different | |||
qualities (and, thus, with different amounts of coding error). VP9 | qualities (and, thus, with different amounts of coding error). VP9 | |||
supports quality layers as spatial layers without any resolution | supports quality layers as spatial layers without any resolution | |||
changes; hereinafter, the term "spatial layer" is used to represent | changes; hereinafter, the term "spatial layer" is used to represent | |||
both spatial and quality layers. | both spatial and quality layers. | |||
This payload format specification defines how such temporal and | This payload format specification defines how such temporal and | |||
spatial scalability layers can be described and communicated. | spatial scalability layers can be described and communicated. | |||
Temporal and spatial scalability layers are associated with non- | Temporal and spatial scalability layers are associated with non- | |||
negative integer IDs. The lowest layer of either type has an ID of 0 | negative integer IDs. The lowest layer of either type has an ID of | |||
and is sometimes referred to as the "base" temporal or spatial layer. | zero and is sometimes referred to as the "base" temporal or spatial | |||
layer. | ||||
Layers are designed, and MUST be encoded, such that if any layer, and | Layers are designed, and MUST be encoded, such that if any layer, and | |||
all higher layers, are removed from the bitstream along either the | all higher layers, are removed from the bitstream along either the | |||
spatial or temporal dimension, the remaining bitstream is still | spatial or temporal dimension, the remaining bitstream is still | |||
correctly decodable. | correctly decodable. | |||
For terminology, this document uses the term "frame" to refer to a | For terminology, this document uses the term "frame" to refer to a | |||
single encoded VP9 frame for a particular resolution/quality, and | single encoded VP9 frame for a particular resolution and/or quality, | |||
"picture" to refer to all the representations (frames) at a single | and "picture" to refer to all the representations (frames) at a | |||
instant in time. Thus, a picture consists of one or more frames, | single instant in time. Thus, a picture consists of one or more | |||
encoding different spatial layers. | frames, encoding different spatial layers. | |||
Within a picture, a frame with spatial-layer ID equal to SID, where | Within a picture, a frame with spatial-layer ID equal to S, where S > | |||
SID > 0, can depend on a frame of the same picture with a lower | 0, can depend on a frame of the same picture with a lower spatial- | |||
spatial-layer ID. This "inter-layer" dependency can result in | layer ID. This "inter-layer" dependency can result in additional | |||
additional coding gain compared to the case where only traditional | coding gain compared to the case where only "inter-picture" | |||
"inter-picture" dependency is used, where a frame depends on a | dependency is used, where a frame depends on a previously coded frame | |||
previously coded frame in time. For simplicity, this payload format | in time. For simplicity, this payload format assumes that, within a | |||
assumes that, within a picture and if inter-layer dependency is used, | picture and if inter-layer dependency is used, a spatial-layer S | |||
a spatial-layer SID frame can depend only on the immediately previous | frame can depend only on the immediately previous spatial-layer S-1 | |||
spatial-layer SID-1 frame, when S > 0. Additionally, if inter- | frame, when S > 0. Additionally, if inter-picture dependency is | |||
picture dependency is used, a spatial-layer SID frame is assumed to | used, a spatial-layer S frame is assumed to only depend on a | |||
only depend on a previously coded spatial-layer SID frame. | previously coded spatial-layer S frame. | |||
Given the above simplifications for inter-layer and inter-picture | Given the above simplifications for inter-layer and inter-picture | |||
dependencies, a flag (the D bit described below) is used to indicate | dependencies, a flag (the D bit described below) is used to indicate | |||
whether a spatial-layer SID frame depends on the spatial-layer SID-1 | whether a spatial-layer SID frame depends on the spatial-layer SID-1 | |||
frame. Given the D bit, a receiver only needs to additionally know | frame. Given the D bit, a receiver only needs to additionally know | |||
the inter-picture dependency structure for a given spatial-layer | the inter-picture dependency structure for a given spatial-layer | |||
frame in order to determine its decodability. Two modes of | frame in order to determine its decodability. Two modes of | |||
describing the inter-picture dependency structure are possible: | describing the inter-picture dependency structure are possible: | |||
"flexible mode" and "non-flexible mode". An encoder can only switch | "flexible mode" and "non-flexible mode". An encoder can only switch | |||
between the two on the first packet of a keyframe with a temporal- | between the two on the first packet of a keyframe with a temporal- | |||
layer ID equal to 0. | layer ID equal to zero. | |||
In flexible mode, each packet can contain up to three reference | In flexible mode, each packet can contain up to three reference | |||
indices, which identify all frames referenced by the frame | indices, which identify all frames referenced by the frame | |||
transmitted in the current packet for inter-picture prediction. This | transmitted in the current packet for inter-picture prediction. This | |||
(along with the D bit) enables a receiver to identify if a frame is | (along with the D bit) enables a receiver to identify if a frame is | |||
decodable or not and helps it understand the temporal-layer | decodable or not and helps it understand the temporal-layer | |||
structure. Since this is signaled in each packet, it makes it | structure. Since this is signaled in each packet, it makes it | |||
possible to have very flexible temporal-layer hierarchies and | possible to have very flexible temporal-layer hierarchies and | |||
scalability structures, which are changing dynamically. | scalability structures, which are changing dynamically. | |||
In non-flexible mode, frames are encoded using a fixed, recurring | In non-flexible mode, frames are encoded using a fixed, recurring | |||
pattern of dependencies; the set of pictures that recur in this | pattern of dependencies; the set of pictures that recur in this | |||
pattern is known as a "Picture Group" (or "PG"). In this mode, the | pattern is known as a "Picture Group" (or "PG"). In this mode, the | |||
inter-picture dependencies (the reference indices) of the PG MUST be | inter-picture dependencies (the reference indices) of the PG MUST be | |||
pre-specified as part of the Scalability Structure (SS) data. Each | pre-specified as part of the Scalability Structure (SS) data. Each | |||
packet has an index to refer to one of the described pictures in the | packet has an index to refer to one of the described pictures in the | |||
PG from which the pictures referenced by the picture transmitted in | PG from which the pictures referenced by the picture transmitted in | |||
the current packet for inter-picture prediction can be identified. | the current packet for inter-picture prediction can be identified. | |||
Note: A "Picture Group" or "PG", as used in this document, is not the | | Note: A "Picture Group" or "PG", as used in this document, is | |||
same thing as the term "Group of Pictures" as it is traditionally | | not the same thing as the term "Group of Pictures" as it is | |||
used in video coding, i.e., to mean an independently decodable run of | | commonly used in video coding, i.e., to mean an independently | |||
pictures beginning with a keyframe. | | decodable run of pictures beginning with a keyframe. | |||
The SS data can also be used to specify the resolution of each | The SS data can also be used to specify the resolution of each | |||
spatial layer present in the VP9 stream for both flexible and non- | spatial layer present in the VP9 stream for both flexible and non- | |||
flexible modes. | flexible modes. | |||
4. Payload Format | 4. Payload Format | |||
This section describes how the encoded VP9 bitstream is encapsulated | This section describes how the encoded VP9 bitstream is encapsulated | |||
in RTP. To handle network losses, usage of RTP/AVPF [RFC4585] is | in RTP. To handle network losses, usage of RTP/AVPF [RFC4585] is | |||
RECOMMENDED. All integer fields in the specifications are encoded as | RECOMMENDED. All integer fields in this specification are encoded as | |||
unsigned integers in network octet order. | unsigned integers in network octet order. | |||
4.1. RTP Header Usage | 4.1. RTP Header Usage | |||
The general RTP payload format for VP9 is depicted below. | The general RTP payload format for VP9 is depicted below. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
|V=2|P|X| CC |M| PT | sequence number | | |V=2|P|X| CC |M| PT | sequence number | | |||
skipping to change at line 232 ¶ | skipping to change at line 233 ¶ | |||
| : | | | : | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | |||
+ | | + | | |||
: VP9 payload : | : VP9 payload : | |||
| | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| : OPTIONAL RTP padding | | | : OPTIONAL RTP padding | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 1: General RTP Payload Format for VP | Figure 1: General RTP Payload Format for VP9 | |||
See Section 4.2 for more information on the VP9 payload descriptor; | See Section 4.2 for more information on the VP9 payload descriptor; | |||
the VP9 payload is described in [VP9-BITSTREAM]. OPTIONAL RTP | the VP9 payload is described in [VP9-BITSTREAM]. OPTIONAL RTP | |||
padding MUST NOT be included unless the P bit is set. | padding MUST NOT be included unless the P bit is set. | |||
Marker bit (M): This bit MUST be set to 1 for the final packet of | Marker bit (M): This bit MUST be set to one for the final packet of | |||
the highest spatial-layer frame (the final packet of the picture), | the highest spatial-layer frame (the final packet of the picture); | |||
and 0 otherwise. Unless spatial scalability is in use for this | otherwise, it is zero. Unless spatial scalability is in use for | |||
picture, this bit will have the same value as the E bit described | this picture, this bit will have the same value as the E bit | |||
in Section 4.2. Note this bit MUST be set to 1 for the target | described in Section 4.2. Note this bit MUST be set to one for | |||
spatial-layer frame if a stream is being rewritten to remove | the target spatial-layer frame if a stream is being rewritten to | |||
higher spatial layers. | remove higher spatial layers. | |||
Payload Type (PT): In line with the policy in Section 3 of | Payload Type (PT): In line with the policy in Section 3 of | |||
[RFC3551], applications using the VP9 RTP payload profile MUST | [RFC3551], applications using the VP9 RTP payload profile MUST | |||
assign a dynamic payload type number to be used in each RTP | assign a dynamic payload type number to be used in each RTP | |||
session and provide a mechanism to indicate the mapping. See | session and provide a mechanism to indicate the mapping. See | |||
Section 6.1 for the mechanism to be used with the Session | Section 6.1 for the mechanism to be used with the Session | |||
Description Protocol (SDP) [RFC8866]. | Description Protocol (SDP) [RFC8866]. | |||
Timestamp: The RTP timestamp [RFC3550] indicates the time when the | Timestamp: The RTP timestamp [RFC3550] indicates the time when the | |||
input frame was sampled, at a clock rate of 90 kHz. If the input | input frame was sampled, at a clock rate of 90 kHz. If the input | |||
picture is encoded with multiple-layer frames, all of the frames | picture is encoded with multiple frames, all of the frames of the | |||
of the picture MUST have the same timestamp. | picture MUST have the same timestamp. | |||
If a frame has the VP9 show_frame field set to 0 (i.e., it is | If a frame has the VP9 show_frame field set to zero (i.e., it is | |||
meant only to populate a reference buffer without being output), | meant only to populate a reference buffer without being output), | |||
its timestamp MAY alternatively be set to be the same as the | its timestamp MAY alternatively be set to be the same as the | |||
subsequent frame with show_frame equal to 1. (This will be | subsequent frame with show_frame equal to one. (This will be | |||
convenient for playing out pre-encoded content packaged with VP9 | convenient for playing out pre-encoded content packaged with VP9 | |||
"superframes", which typically bundle show_frame==0 frames with a | "superframes", which typically bundle show_frame==0 frames with a | |||
subsequent show_frame==1 frame.) Every frame with show_frame==1, | subsequent show_frame==1 frame.) Every picture containing a frame | |||
however, MUST have a unique timestamp modulo the 2^32 wrap of the | with show_frame==1, however, MUST have a unique timestamp modulo | |||
field. | the 2^32 wrap of the field. | |||
The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number, | The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number, | |||
SSRC, and CSRC identifiers) are used as specified in Section 5.1 of | SSRC, and CSRC identifiers) are used as specified in Section 5.1 of | |||
[RFC3550]. | [RFC3550]. | |||
4.2. VP9 Payload Descriptor | 4.2. VP9 Payload Descriptor | |||
In flexible mode (with the F bit below set to 1), the first octets | In flexible mode (with the F bit below set to one), the first octets | |||
after the RTP header are the VP9 payload descriptor, with the | after the RTP header are the VP9 payload descriptor, with the | |||
following structure. | following structure. | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
|I|P|L|F|B|E|V|Z| (REQUIRED) | |I|P|L|F|B|E|V|Z| (REQUIRED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
I: |M| PICTURE ID | (REQUIRED) | I: |M| PICTURE ID | (REQUIRED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
M: | EXTENDED PID | (RECOMMENDED) | M: | EXTENDED PID | (RECOMMENDED) | |||
skipping to change at line 296 ¶ | skipping to change at line 297 ¶ | |||
L: | TID |U| SID |D| (Conditionally RECOMMENDED) | L: | TID |U| SID |D| (Conditionally RECOMMENDED) | |||
+-+-+-+-+-+-+-+-+ -\ | +-+-+-+-+-+-+-+-+ -\ | |||
P,F: | P_DIFF |N| (Conditionally REQUIRED) - up to 3 times | P,F: | P_DIFF |N| (Conditionally REQUIRED) - up to 3 times | |||
+-+-+-+-+-+-+-+-+ -/ | +-+-+-+-+-+-+-+-+ -/ | |||
V: | SS | | V: | SS | | |||
| .. | | | .. | | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
Figure 2: Flexible Mode Format for VP9 Payload Descriptor | Figure 2: Flexible Mode Format for VP9 Payload Descriptor | |||
In non-flexible mode (with the F bit below set to 0), the first | In non-flexible mode (with the F bit below set to zero), the first | |||
octets after the RTP header are the VP9 payload descriptor, with the | octets after the RTP header are the VP9 payload descriptor, with the | |||
following structure. | following structure. | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
|I|P|L|F|B|E|V|Z| (REQUIRED) | |I|P|L|F|B|E|V|Z| (REQUIRED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
I: |M| PICTURE ID | (RECOMMENDED) | I: |M| PICTURE ID | (RECOMMENDED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
M: | EXTENDED PID | (RECOMMENDED) | M: | EXTENDED PID | (RECOMMENDED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
L: | TID |U| SID |D| (Conditionally RECOMMENDED) | L: | TID |U| SID |D| (Conditionally RECOMMENDED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| TL0PICIDX | (Conditionally REQUIRED) | | TL0PICIDX | (Conditionally REQUIRED) | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
V: | SS | | V: | SS | | |||
| .. | | | .. | | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
Figure 3: Non-flexible Mode Format for VP9 Payload Descriptor | Figure 3: Non-Flexible Mode Format for VP9 Payload Descriptor | |||
I: Picture ID (PID) present. When set to 1, the OPTIONAL PID MUST | Except as noted, the following field descriptions apply to the | |||
payload descriptor formats in both Figures 2 and 3. | ||||
I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST | ||||
be present after the mandatory first octet and specified as below. | be present after the mandatory first octet and specified as below. | |||
Otherwise, PID MUST NOT be present. If the V bit was set in the | Otherwise, PID MUST NOT be present. If the V bit was set in the | |||
stream's most recent start of a keyframe (i.e., the SS field was | stream's most recent start of a keyframe (i.e., the SS field was | |||
present) and the F bit is set to 0 (i.e., non-flexible scalability | present) and the F bit is set to zero (i.e., non-flexible | |||
mode is in use), then this bit MUST be set on every packet. | scalability mode is in use), then this bit MUST be set on every | |||
packet. | ||||
P: Inter-picture predicted frame. When set to 0, the frame does not | P: Inter-picture predicted frame. When set to zero, the frame does | |||
utilize inter-picture prediction. In this case, up-switching to a | not utilize inter-picture prediction. In this case, up-switching | |||
current spatial layer's frame is possible from a directly lower | to a current spatial layer's frame is possible from a directly | |||
spatial-layer frame. P SHOULD also be set to 0 when encoding a | lower spatial-layer frame. P SHOULD also be set to zero when | |||
layer synchronization frame in response to a Layer Refresh Request | encoding a layer synchronization frame in response to a Layer | |||
(LRR) [RFC9627] message (see Section 5.3). When P is set to 0, | Refresh Request (LRR) [RFC9627] message (see Section 5.3). When P | |||
the TID field (described below) MUST also be set to 0 (if | is set to zero, the Temporal-layer ID (TID) field (described | |||
present). Note that the P bit does not forbid intra-picture, | below) MUST also be set to zero (if present). Note that the P bit | |||
inter-layer prediction from earlier frames of the same picture, if | does not forbid intra-picture, inter-layer prediction from earlier | |||
any. | frames of the same picture, if any. | |||
L: Layer indices present. When set to 1, the one or two octets | L: Layer indices present. When set to one, the one or two octets | |||
following the mandatory first octet and the PID (if present) is as | following the mandatory first octet and the PID (if present) is as | |||
described by "Layer indices" below. If the F bit (described | described by "Layer indices" below. If the F bit (described | |||
below) is set to 1 (indicating flexible mode), then only one octet | below) is set to one (indicating flexible mode), then only one | |||
is present for the layer indices. Otherwise, if the F bit is set | octet is present for the layer indices. Otherwise, if the F bit | |||
to 0 (indicating non-flexible mode), then two octets are present | is set to zero (indicating non-flexible mode), then two octets are | |||
for the layer indices. | present for the layer indices. | |||
F: Flexible mode. When set to 1, this indicates flexible mode; if | F: Flexible mode. When set to one, this indicates flexible mode; if | |||
the P bit is also set to 1, then the octets following the | the P bit is also set to one, then the octets following the | |||
mandatory first octet, the PID, and layer indices (if present) are | mandatory first octet, the PID, and layer indices (if present) are | |||
as described by "Reference indices" below. This bit MUST only be | as described by "reference indices" below. This bit MUST only be | |||
set to 1 if the I bit is also set to 1; if the I bit is set to 0, | set to one if the I bit is also set to one; if the I bit is set to | |||
then this bit MUST also be set to 0 and ignored by receivers. | zero, then this bit MUST also be set to zero and ignored by | |||
(Flexible mode's Reference indices are defined as offsets from the | receivers. (Flexible mode's reference indices are defined as | |||
Picture ID field, so they would have no meaning if I were not | offsets from the Picture ID field, so they would have no meaning | |||
set.) The value of the F bit MUST only change on the first packet | if I were not set.) The value of the F bit MUST only change on | |||
of a key picture. A "key picture" is a picture whose base | the first packet of a key picture. A "key picture" is a picture | |||
spatial-layer frame is a keyframe, and thus one which completely | whose base spatial-layer frame is a keyframe, and thus one which | |||
resets the encoder state. This packet will have its P bit equal | completely resets the encoder state. This packet will have its P | |||
to 0, SID or L bit (described below) equal to 0, and B bit | bit equal to zero, SID or L bit (described below) equal to zero, | |||
(described below) equal to 1. | and B bit (described below) equal to one. | |||
B: Start of a frame. This bit MUST be set to 1 if the first payload | B: Start of Frame. This bit MUST be set to one if the first payload | |||
octet of the RTP packet is the beginning of a new VP9 frame; | octet of the RTP packet is the beginning of a new VP9 frame; | |||
otherwise, it MUST NOT be 1. Note that this frame might not be | otherwise, it MUST NOT be one. Note that this frame might not be | |||
the first frame of a picture. | the first frame of a picture. | |||
E: End of a frame. This bit MUST be set to 1 for the final RTP | E: End of Frame. This bit MUST be set to one for the final RTP | |||
packet of a VP9 frame, and 0 otherwise. This enables a decoder to | packet of a VP9 frame; otherwise, it is zero. This enables a | |||
finish decoding the frame, where it otherwise may need to wait for | decoder to finish decoding the frame, where it otherwise may need | |||
the next packet to explicitly know that the frame is complete. | to wait for the next packet to explicitly know that the frame is | |||
Note that, if spatial scalability is in use, more frames from the | complete. Note that, if spatial scalability is in use, more | |||
same picture may follow; see the description of the B bit above. | frames from the same picture may follow; see the description of | |||
the B bit above. | ||||
V: Scalability Structure (SS) data present. When set to 1, the | V: Scalability Structure (SS) data present. When set to one, the | |||
OPTIONAL SS data MUST be present in the payload descriptor. | OPTIONAL SS data MUST be present in the payload descriptor. | |||
Otherwise, the SS data MUST NOT be present. | Otherwise, the SS data MUST NOT be present. | |||
Z: Not a reference frame for upper spatial layers. If set to 1, | Z: Not a reference frame for upper spatial layers. If set to one, | |||
indicates that frames with higher spatial layers SID+1 and greater | indicates that frames with higher spatial layers SID+1 and greater | |||
of the current and following pictures do not depend on the current | of the current and following pictures do not depend on the current | |||
spatial-layer SID frame. This enables a decoder that is targeting | spatial-layer SID frame. This enables a decoder that is targeting | |||
a higher spatial layer to know that it can safely discard this | a higher spatial layer to know that it can safely discard this | |||
packet's frame without processing it, without having to wait for | packet's frame without processing it, without having to wait for | |||
the D bit in the higher-layer frame (see below). | the D bit in the higher-layer frame (see below). | |||
The mandatory first octet is followed by the extension data fields | The mandatory first octet is followed by the extension data fields | |||
that are enabled: | that are enabled: | |||
M: The most significant bit of the first octet is an extension flag. | M: The most significant bit of the first octet is an extension flag. | |||
The field MUST be present if the I bit is equal to one. If M is | The field MUST be present if the I bit is equal to one. If M is | |||
set, the PID field MUST contain 15 bits; otherwise, it MUST | set, the PID field MUST contain 15 bits; otherwise, it MUST | |||
contain 7 bits. See PID below. | contain 7 bits. See PID below. | |||
Picture ID (PID): Picture ID represented in 7 or 15 bits, depending | Picture ID (PID): Picture ID represented in 7 or 15 bits, depending | |||
on the M bit. This is a running index of the pictures, where the | on the M bit. This is a running index of the pictures, where the | |||
sender increments the value by 1 for each picture it sends. | sender increments the value by one for each picture it sends. | |||
(Note, however, that because a middlebox can discard pictures | (Note, however, that because a middlebox can discard pictures | |||
where permitted by the SS, Picture IDs as received by a receiver | where permitted by the SS, Picture IDs as received by a receiver | |||
might not be contiguous.) This field MUST be present if the I bit | might not be contiguous.) This field MUST be present if the I bit | |||
is equal to one. If M is set to 0, 7 bits carry the PID; else, if | is equal to one. If M is set to zero, 7 bits carry the PID; else, | |||
M is set to 1, 15 bits carry the PID in network byte order. The | if M is set to one, 15 bits carry the PID in network byte order. | |||
sender may choose between a 7- or 15-bit index. The PID SHOULD | The sender may choose between a 7- or 15-bit index. The PID | |||
start on a random number and MUST wrap after reaching the maximum | SHOULD start on a random number and MUST wrap after reaching the | |||
ID (0x7f or 0x7fff depending on the index size chosen). The | maximum ID (0x7f or 0x7fff depending on the index size chosen). | |||
receiver MUST NOT assume that the number of bits in the PID stays | The receiver MUST NOT assume that the number of bits in the PID | |||
the same through the session. If this field transitions from 7 | stays the same through the session. If this field transitions | |||
bits to 15 bits, the value is zero-extended (i.e., the value after | from 7 bits to 15 bits, the value is zero-extended (i.e., the | |||
0x6e is 0x006f); if the field transitions from 15 bits to 7 bits, | value after 0x6e is 0x006f); if the field transitions from 15 bits | |||
it is truncated (i.e., the value after 0x1bbe is 0xbf). | to 7 bits, it is truncated (i.e., the value after 0x1bbe is 0xbf). | |||
In the non-flexible mode (when the F bit is set to 0), this PID is | In the non-flexible mode (when the F bit is set to zero), this PID | |||
used as an index to the PG specified in the SS data below. In | is used as an index to the PG specified in the SS data below. In | |||
this mode, the PID of the keyframe corresponds to the first | this mode, the PID of the keyframe corresponds to the first | |||
specified frame in the PG. Then subsequent PIDs are mapped to | specified frame in the PG. Then subsequent PIDs are mapped to | |||
subsequently specified frames in the PG (modulo N_G, specified in | subsequently specified frames in the PG (modulo N_G, specified in | |||
the SS data below), respectively. | the SS data below), respectively. | |||
All frames of the same picture MUST have the same PID value. | All frames of the same picture MUST have the same PID value. | |||
Frames (and their corresponding pictures) with the VP9 show_frame | Frames (and their corresponding pictures) with the VP9 show_frame | |||
field equal to 0 MUST have distinct PID values from subsequent | field equal to zero MUST have distinct PID values from subsequent | |||
pictures with show_frame equal to 1. Thus, a picture (as defined | pictures with show_frame equal to one. Thus, a picture (as | |||
in this specification) is different than a VP9 superframe. | defined in this specification) is different than a VP9 superframe. | |||
All frames of the same picture MUST have the same value for | All frames of the same picture MUST have the same value for | |||
show_frame. | show_frame. | |||
Layer indices: This information is optional but RECOMMENDED whenever | Layer indices: This field is optional but RECOMMENDED whenever | |||
encoding with layers. For both flexible and non-flexible modes, | encoding with layers. For both flexible and non-flexible modes, | |||
one octet is used to specify a layer frame's temporal-layer ID | one octet is used to specify a layer frame's Temporal-layer ID | |||
(TID) and spatial-layer ID (SID) as shown both in Figure 2 and | (TID) and Spatial-layer ID (SID) as shown both in Figures 2 and 3. | |||
Figure 3. Additionally, a bit (U) is used to indicate that the | Additionally, a bit (U) is used to indicate that the current frame | |||
current frame is a "switching up point" frame. Another bit (D) is | is a "switching up point" frame. Another bit (D) is used to | |||
used to indicate whether inter-layer prediction is used for the | indicate whether inter-layer prediction is used for the current | |||
current frame. | frame. | |||
In the non-flexible mode (when the F bit is set to 0), another | In the non-flexible mode (when the F bit is set to zero), another | |||
octet is used to represent temporal-layer 0 index (TL0PICIDX), as | octet is used to represent the Temporal Layer 0 Picture Index (8 | |||
depicted in Figure 3. The TL0PICIDX is present so that all | bits) (TL0PICIDX), as depicted in Figure 3. The TL0PICIDX is | |||
minimally required frames (the base temporal-layer frames) can be | present so that all minimally required frames (the base temporal- | |||
tracked. | layer frames) can be tracked. | |||
The TID and SID fields indicate the temporal and spatial layers | The TID and SID fields indicate the temporal and spatial layers | |||
and can help middleboxes and endpoints quickly identify which | and can help middleboxes and endpoints quickly identify which | |||
layer a packet belongs to. | layer a packet belongs to. | |||
TID: The temporal-layer ID of the current frame. In the case of | TID: The temporal-layer ID of the current frame. In the case of | |||
non-flexible mode, if a PID is mapped to a picture in a | non-flexible mode, if a PID is mapped to a picture in a | |||
specified PG, then the value of the TID MUST match the | specified PG, then the value of the TID MUST match the | |||
corresponding TID value of the mapped picture in the PG. | corresponding TID value of the mapped picture in the PG. | |||
U: Switching up point. If this bit is set to 1 for the current | U: Switching up point. When this bit is set to one, if the | |||
picture with a temporal-layer ID equal to TID, then "switch up" | current picture has a temporal-layer ID equal to value T, then | |||
to a higher frame rate is possible as subsequent higher | subsequent pictures with temporal-layer ID values higher than T | |||
temporal-layer pictures will not depend on any picture before | will not depend on any picture before the current picture (in | |||
the current picture (in coding order) with temporal-layer ID | coding order) with a temporal-layer ID value greater than T. | |||
greater than TID. | ||||
SID: The spatial-layer ID of the current frame. Note that frames | SID: The spatial-layer ID of the current frame. Note that frames | |||
with spatial-layer SID > 0 may be dependent on decoded spatial- | with spatial-layer SID > 0 may be dependent on decoded spatial- | |||
layer SID-1 frame within the same picture. Different frames of | layer SID-1 frame within the same picture. Different frames of | |||
the same picture MUST have distinct spatial-layer IDs, and | the same picture MUST have distinct spatial-layer IDs, and | |||
frames' spatial layers MUST appear in increasing order within | frames' spatial layers MUST appear in increasing order within | |||
the frame. | the frame. | |||
D: Inter-layer dependency is used. D MUST be set to 1 if and | D: Inter-layer dependency is used. D MUST be set to one if and | |||
only if the current spatial-layer SID frame depends on spatial- | only if the current spatial-layer SID frame depends on spatial- | |||
layer SID-1 frame of the same picture; otherwise, it MUST be | layer SID-1 frame of the same picture; otherwise, it MUST be | |||
set to 0. For the base-layer frame (with SID equal to 0), the | set to zero. For the base-layer frame (with SID equal to | |||
D bit MUST be set to 0. | zero), the D bit MUST be set to zero. | |||
TL0PICIDX: 8 bits temporal-layer zero index. TL0PICIDX is only | TL0PICIDX: Temporal Layer 0 Picture Index (8 bits). TL0PICIDX is | |||
present in the non-flexible mode (F = 0). This is a running | only present in the non-flexible mode (F = 0). This is a | |||
index for the temporal base-layer pictures, i.e., the pictures | running index for the temporal base-layer pictures, i.e., the | |||
with a TID set to 0. If the TID is larger than 0, TL0PICIDX | pictures with a TID set to zero. If the TID is larger than | |||
indicates which temporal base-layer picture the current picture | zero, TL0PICIDX indicates which temporal base-layer picture the | |||
depends on. TL0PICIDX MUST be incremented by 1 when the TID is | current picture depends on. TL0PICIDX MUST be incremented by | |||
equal to 0. The index SHOULD start on a random number and MUST | one when the TID is equal to zero. The index SHOULD start on a | |||
restart at 0 after reaching the maximum number 255. | random number and MUST restart at zero after reaching the | |||
maximum number 255. | ||||
Reference indices: When P and F are both set to 1, indicating a non- | Reference indices: When P and F are both set to one, indicating a | |||
keyframe in flexible mode, then at least one reference index MUST | non-keyframe in flexible mode, then at least one reference index | |||
be specified as below. Additional reference indices (a total of | MUST be specified as below. Additional reference indices (a total | |||
up to three reference indices are allowed) may be specified using | of up to three reference indices are allowed) may be specified | |||
the N bit below. When either P or F is set to 0, then no | using the N bit below. When either P or F is set to zero, then no | |||
reference index is specified. | reference index is specified. | |||
P_DIFF: The reference index (in 7 bits) specified as the relative | P_DIFF: The reference index (in 7 bits) specified as the relative | |||
PID from the current picture. For example, when P_DIFF=3 on a | PID from the current picture. For example, when P_DIFF=3 on a | |||
packet containing the picture with PID 112 means that the | packet containing the picture with PID 112 means that the | |||
picture refers back to the picture with PID 109. This | picture refers back to the picture with PID 109. This | |||
calculation is done modulo the size of the PID field, i.e., | calculation is done modulo the size of the PID field, i.e., | |||
either 7 or 15 bits. A P_DIFF value of 0 is invalid. | either 7 or 15 bits. A P_DIFF value of zero is invalid. | |||
N: 1 if there is additional P_DIFF following the current P_DIFF. | N: 1 if there is additional P_DIFF following the current P_DIFF. | |||
4.2.1. Scalability Structure (SS) | 4.2.1. Scalability Structure (SS) | |||
The SS data describes the resolution of each frame within a picture | The SS data describes the resolution of each frame within a picture | |||
as well as the inter-picture dependencies for a PG. If the VP9 | as well as the inter-picture dependencies for a PG. If the VP9 | |||
payload descriptor's V bit is set, the SS data is present in the | payload descriptor's V bit is set, the SS data is present in the | |||
position indicated in Figures 2 and 3. | position indicated in Figures 2 and 3. | |||
skipping to change at line 521 ¶ | skipping to change at line 527 ¶ | |||
+-+-+-+-+-+-+-+-+ -/ | +-+-+-+-+-+-+-+-+ -/ | |||
G: | N_G | (OPTIONAL) | G: | N_G | (OPTIONAL) | |||
+-+-+-+-+-+-+-+-+ -\ | +-+-+-+-+-+-+-+-+ -\ | |||
N_G: | TID |U| R |-|-| (OPTIONAL) . | N_G: | TID |U| R |-|-| (OPTIONAL) . | |||
+-+-+-+-+-+-+-+-+ -\ . - N_G times | +-+-+-+-+-+-+-+-+ -\ . - N_G times | |||
| P_DIFF | (OPTIONAL) . - R times . | | P_DIFF | (OPTIONAL) . - R times . | |||
+-+-+-+-+-+-+-+-+ -/ -/ | +-+-+-+-+-+-+-+-+ -/ -/ | |||
Figure 4: VP9 Scalability Structure | Figure 4: VP9 Scalability Structure | |||
N_S: N_S + 1 indicates the number of spatial layers present in the | N_S: Number of Spatial Layers Minus 1. N_S + 1 indicates the number | |||
VP9 stream. | of spatial layers present in the VP9 stream. | |||
Y: Each spatial layer's frame resolution is present. When set to 1, | Y: Each spatial layer's frame resolution is present. When set to | |||
the OPTIONAL WIDTH (2 octets) and HEIGHT (2 octets) MUST be | one, the OPTIONAL WIDTH (2 octets) and HEIGHT (2 octets) MUST be | |||
present for each layer frame. Otherwise, the resolution MUST NOT | present for each layer frame. Otherwise, the resolution MUST NOT | |||
be present. | be present. | |||
G: The PG description present flag. | G: The PG description present flag. | |||
-: A bit reserved for future use. It MUST be set to 0 and MUST be | -: A bit reserved for future use. It MUST be set to zero and MUST | |||
ignored by the receiver. | be ignored by the receiver. | |||
N_G: N_G indicates the number of pictures in a PG. If N_G is | N_G: N_G indicates the number of pictures in a PG. If N_G is | |||
greater than 0, then the SS data allows the inter-picture | greater than zero, then the SS data allows the inter-picture | |||
dependency structure of the VP9 stream to be pre-declared, rather | dependency structure of the VP9 stream to be pre-declared, rather | |||
than indicating it on the fly with every packet. If N_G is | than indicating it on the fly with every packet. If N_G is | |||
greater than 0, then for N_G pictures in the PG, each picture's | greater than zero, then for N_G pictures in the PG, each picture's | |||
temporal-layer ID (TID), switch up point (U), and Reference | Temporal-layer ID (TID), switch up point (U), and reference | |||
indices (P_DIFFs) are specified. | indices (P_DIFFs) are specified. | |||
The first picture specified in the PG MUST have a TID set to 0. | The first picture specified in the PG MUST have a TID set to zero. | |||
G set to 0 or N_G set to 0 indicates that either there is only one | G set to zero or N_G set to zero indicates that either there is | |||
temporal layer (for non-flexible mode) or no fixed inter-picture | only one temporal layer (for non-flexible mode) or no fixed inter- | |||
dependency information is present (for flexible mode) going | picture dependency information is present (for flexible mode) | |||
forward in the bitstream. | going forward in the bitstream. | |||
Note that for a given picture, all frames follow the same inter- | Note that for a given picture, all frames follow the same inter- | |||
picture dependency structure. However, the frame rate of each | picture dependency structure. However, the frame rate of each | |||
spatial layer can be different from each other; this can be | spatial layer can be different from each other; this can be | |||
described with the use of the D bit described above. The | described with the use of the D bit described above. The | |||
specified dependency structure in the SS data MUST be for the | specified dependency structure in the SS data MUST be for the | |||
highest frame rate layer. | highest frame rate layer. | |||
R: The number of P_DIFF fields that are present. | ||||
In a scalable stream sent with a fixed pattern, the SS data SHOULD be | In a scalable stream sent with a fixed pattern, the SS data SHOULD be | |||
included in the first packet of every key frame. This is a packet | included in the first packet of every key frame. This is a packet | |||
with the P bit equal to 0, SID or L bit equal to 0, and B bit equal | with the P bit equal to zero, SID or L bit equal to zero, and B bit | |||
to 1. The SS data MUST only be changed on the picture that | equal to one. The SS data MUST only be changed on the picture that | |||
corresponds to the first picture specified in the previous SS data's | corresponds to the first picture specified in the previous SS data's | |||
PG (if the previous SS data's N_G was greater than 0). | PG (if the previous SS data's N_G was greater than zero). | |||
4.3. Frame Fragmentation | 4.3. Frame Fragmentation | |||
VP9 frames are fragmented into packets in RTP sequence number order: | VP9 frames are fragmented into packets in RTP sequence number order: | |||
beginning with a packet with the B bit set and ending with a packet | beginning with a packet with the B bit set and ending with a packet | |||
with the E bit set. There is no mechanism for finer-grained access | with the E bit set. There is no mechanism for finer-grained access | |||
to parts of a VP9 frame. | to parts of a VP9 frame. | |||
4.4. Scalable Encoding Considerations | 4.4. Scalable Encoding Considerations | |||
skipping to change at line 641 ¶ | skipping to change at line 649 ¶ | |||
+----------+---------+------------+---------+ | +----------+---------+------------+---------+ | |||
Table 1: Example Scalability Structure | Table 1: Example Scalability Structure | |||
This structure is constructed such that the U bit can always be set. | This structure is constructed such that the U bit can always be set. | |||
5. Feedback Messages and Header Extensions | 5. Feedback Messages and Header Extensions | |||
5.1. Reference Picture Selection Indication (RPSI) | 5.1. Reference Picture Selection Indication (RPSI) | |||
The reference picture selection index is a payload-specific feedback | The RPSI is a payload-specific feedback message defined within the | |||
message defined within the RTCP-based feedback format. The RPSI | RTCP-based feedback format. The RPSI message is generated by a | |||
message is generated by a receiver and can be used in two ways: | receiver and can be used in two ways: either it can signal a | |||
either it can signal a preferred reference picture when a loss has | preferred reference picture when a loss has been detected by the | |||
been detected by the decoder (preferably a reference that the decoder | decoder (preferably a reference that the decoder knows is perfect) or | |||
knows is perfect) or it can be used as positive feedback information | it can be used as positive feedback information to acknowledge | |||
to acknowledge correct decoding of certain reference pictures. The | correct decoding of certain reference pictures. The positive | |||
positive feedback method is useful for VP9 used for point-to-point | feedback method is useful for VP9 used for point-to-point (unicast) | |||
(unicast) communication. The use of RPSI for VP9 is preferably | communication. The use of RPSI for VP9 is preferably combined with a | |||
combined with a special update pattern of the codec's two special | special update pattern of the codec's two special reference frames -- | |||
reference frames -- the golden frame and the altref frame -- in which | the golden frame and the altref frame -- in which they are updated in | |||
they are updated in an alternating leapfrog fashion. When a receiver | an alternating leapfrog fashion. When a receiver has received and | |||
has received and correctly decoded a golden or altref frame, and that | correctly decoded a golden or altref frame, and that frame had a | |||
frame had a Picture ID in the payload descriptor, the receiver can | Picture ID in the payload descriptor, the receiver can acknowledge | |||
acknowledge this simply by sending an RPSI message back to the | this simply by sending an RPSI message back to the sender. The | |||
sender. The message body (i.e., the "native RPSI bit string" in | message body (i.e., the "native RPSI bit string" in [RFC4585]) is | |||
[RFC4585]) is simply the (7- or 15-bit) Picture ID of the received | simply the (7- or 15-bit) Picture ID of the received frame. | |||
frame. | ||||
Note: because all frames of the same picture must have the same | | Note: because all frames of the same picture must have the same | |||
inter-picture reference structure, there is no need for a message to | | inter-picture reference structure, there is no need for a | |||
specify which frame is being selected. | | message to specify which frame is being selected. | |||
5.2. Full Intra Request (FIR) | 5.2. Full Intra Request (FIR) | |||
The Full Intra Request (FIR) [RFC5104] RTCP feedback message allows a | The Full Intra Request (FIR) [RFC5104] RTCP feedback message allows a | |||
receiver to request a full state refresh of an encoded stream. | receiver to request a full state refresh of an encoded stream. | |||
Upon receipt of a FIR request, a VP9 sender MUST send a picture with | Upon receipt of a FIR request, a VP9 sender MUST send a picture with | |||
a keyframe for its spatial-layer 0 layer frame and then send frames | a keyframe for its spatial-layer 0 layer frame and then send frames | |||
without inter-picture prediction (P=0) for any higher-layer frames. | without inter-picture prediction (P=0) for any higher-layer frames. | |||
skipping to change at line 688 ¶ | skipping to change at line 695 ¶ | |||
+---------------+---------------+ | +---------------+---------------+ | |||
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| | |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| | |||
+---------------+---------+-----+ | +---------------+---------+-----+ | |||
| RES | TID | RES | SID | | | RES | TID | RES | SID | | |||
+---------------+---------+-----+ | +---------------+---------+-----+ | |||
Figure 5: LRR Index Format | Figure 5: LRR Index Format | |||
Figure 5 shows the format of an LRR's layer index fields for VP9 | Figure 5 shows the format of an LRR's layer index fields for VP9 | |||
streams. The two "RES" fields MUST be set to 0 on transmission and | streams. The two "RES" fields MUST be set to zero on transmission | |||
ignored on reception. See Section 4.2 for details on the TID and SID | and ignored on reception. See Section 4.2 for details on the TID and | |||
fields. | SID fields. | |||
Identification of a layer refresh frame can be derived from the | Identification of a layer refresh frame can be derived from the | |||
reference IDs of each frame by backtracking the dependency chain | reference IDs of each frame by backtracking the dependency chain | |||
until reaching a point where only decodable frames are being | until reaching a point where only decodable frames are being | |||
referenced. Therefore, it's recommended for both the flexible and | referenced. Therefore, it's recommended for both the flexible and | |||
the non-flexible mode that, when switching up points are being | the non-flexible mode that, when switching up points are being | |||
encoded in response to an LRR, those packets contain layer indices | encoded in response to an LRR, those packets contain layer indices | |||
and the reference field or fields so that the decoder or selective | and the reference field or fields so that the decoder or selective | |||
forwarding middleboxes [RFC7667] can make this derivation. | forwarding middleboxes [RFC7667] can make this derivation. | |||
Example: | Example: | |||
LRR {1,0}, {2,1} is sent by a Multipoint Control Unit (MCU) when it | LRR {1,0}, {2,1} is sent by a Multipoint Control Unit (MCU) when it | |||
is currently relaying {1,0} to a receiver and which wants to upgrade | is currently relaying {1,0} to a receiver that wants to upgrade to | |||
to {2,1}. In response, the encoder should encode the next frames in | {2,1}. In response, the encoder should encode the next frames in | |||
layers {1,1} and {2,1} by only referring to frames in {1,0}, or | layers {1,1} and {2,1} by only referring to frames in {1,0} or {0,0}. | |||
{0,0}. | ||||
In the non-flexible mode, periodic upgrade frames can be defined by | In the non-flexible mode, periodic upgrade frames can be defined by | |||
the layer structure of the SS; thus, periodic upgrade frames can be | the layer structure of the SS; thus, periodic upgrade frames can be | |||
automatically identified by the Picture ID. | automatically identified by the Picture ID. | |||
6. Payload Format Parameters | 6. Payload Format Parameters | |||
This payload format has three optional parameters: max-fr, max-fs, | This payload format has three optional parameters: max-fr, max-fs, | |||
and profile-id. | and profile-id. | |||
The max-fr and max-fs parameters are used to signal the capabilities | The max-fr and max-fs parameters are used to signal the capabilities | |||
of a receiver implementation. If the implementation is willing to | of a receiver implementation. If the implementation is willing to | |||
receive media, both parameters MUST be provided. These parameters | receive media, both parameters MUST be provided. These parameters | |||
MUST NOT be used for any other purpose. A media sender SHOULD NOT | MUST NOT be used for any other purpose. A media sender SHOULD NOT | |||
send media with a frame rate or frame size exceeding the max-fr and | send media with a frame rate or frame size exceeding the max-fr and | |||
max-fs values signaled. (There may be scenarios, such as pre-encoded | max-fs values signaled. (There may be scenarios, such as pre-encoded | |||
media or selective forwarding middleboxes [RFC7667], where a media | media or selective forwarding middleboxes [RFC7667], where a media | |||
sender does not have media available that fits within a receiver's | sender does not have media available that fits within a receiver's | |||
max-fs and max-fr value; in such scenarios, a sender MAY exceed the | max-fs and max-fr values; in such scenarios, a sender MAY exceed the | |||
signaled values.) | signaled values.) | |||
max-fr: The value of max-fr is an integer indicating the maximum | max-fr: The value of max-fr is an integer indicating the maximum | |||
frame rate in units of frames per second that the decoder is | frame rate in units of frames per second that the decoder is | |||
capable of decoding. | capable of decoding. | |||
max-fs: The value of max-fs is an integer indicating the maximum | max-fs: The value of max-fs is an integer indicating the maximum | |||
frame size in units of macroblocks that the decoder is capable of | frame size in units of macroblocks that the decoder is capable of | |||
decoding. | decoding. | |||
The decoder is capable of decoding this frame size as long as the | The decoder is capable of decoding this frame size as long as the | |||
width and height of the frame in macroblocks are less than | width and height of the frame in macroblocks are each less than | |||
int(sqrt(max-fs * 8)); for instance, a max-fs of 1200 (capable of | int(sqrt(max-fs * 8)); for instance, a max-fs of 1200 (capable of | |||
supporting 640x480 resolution) will support widths and heights up | supporting 640x480 resolution) will support widths and heights up | |||
to 1552 pixels (97 macroblocks). | to 1552 pixels (97 macroblocks). | |||
profile-id: The value of profile-id is an integer indicating the | profile-id: The value of profile-id is an integer indicating the | |||
default coding profile (the subset of coding tools that may have | default coding profile (the subset of coding tools that may have | |||
been used to generate the stream or that the receiver supports). | been used to generate the stream or that the receiver supports). | |||
Table 2 lists all of the profiles defined in Section 7.2 of | Table 2 lists all of the profiles defined in Section 7.2 of | |||
[VP9-BITSTREAM] and the corresponding integer values to be used. | [VP9-BITSTREAM] and the corresponding integer values to be used. | |||
skipping to change at line 772 ¶ | skipping to change at line 778 ¶ | |||
+=========+============+ | +=========+============+ | |||
| 0 | 0 | | | 0 | 0 | | |||
+---------+------------+ | +---------+------------+ | |||
| 1 | 1 | | | 1 | 1 | | |||
+---------+------------+ | +---------+------------+ | |||
| 2 | 2 | | | 2 | 2 | | |||
+---------+------------+ | +---------+------------+ | |||
| 3 | 3 | | | 3 | 3 | | |||
+---------+------------+ | +---------+------------+ | |||
Table 2: Comparison of | Table 2: | |||
Correspondence between | ||||
profile-id to VP9 | profile-id to VP9 | |||
Profile Integer | Profile Integer | |||
+=========+===========+=================+==========================+ | +=========+===========+=================+==========================+ | |||
| Profile | Bit Depth | SRGB Colorspace | Chroma Subsampling | | | Profile | Bit Depth | SRGB Colorspace | Chroma Subsampling | | |||
+=========+===========+=================+==========================+ | +=========+===========+=================+==========================+ | |||
| 0 | 8 | No | YUV 4:2:0 | | | 0 | 8 | No | YUV 4:2:0 | | |||
+---------+-----------+-----------------+--------------------------+ | +---------+-----------+-----------------+--------------------------+ | |||
| 1 | 8 | Yes | YUV 4:2:2,4:4:0 or 4:4:4 | | | 1 | 8 | Yes | YUV 4:2:2,4:4:0 or 4:4:4 | | |||
+---------+-----------+-----------------+--------------------------+ | +---------+-----------+-----------------+--------------------------+ | |||
| 2 | 10 or 12 | No | YUV 4:2:0 | | | 2 | 10 or 12 | No | YUV 4:2:0 | | |||
+---------+-----------+-----------------+--------------------------+ | +---------+-----------+-----------------+--------------------------+ | |||
| 3 | 10 or 12 | Yes | YUV 4:2:2,4:4:0 or 4:4:4 | | | 3 | 10 or 12 | Yes | YUV 4:2:2,4:4:0 or 4:4:4 | | |||
+---------+-----------+-----------------+--------------------------+ | +---------+-----------+-----------------+--------------------------+ | |||
Table 3: Profile Capabilities | Table 3: Profile Capabilities | |||
| Note: SRGB (often sRGB) = Standard Red-Green-Blue | ||||
6.1. SDP Parameters | 6.1. SDP Parameters | |||
6.1.1. Mapping of Media Subtype Parameters to SDP | 6.1.1. Mapping of Media Subtype Parameters to SDP | |||
The media type video/vp9 string is mapped to fields in the Session | The media type video/vp9 string is mapped to fields in the Session | |||
Description Protocol (SDP) [RFC8866] as follows: | Description Protocol (SDP) [RFC8866] as follows: | |||
* The media name in the "m=" line of SDP MUST be video. | * The media name in the "m=" line of SDP MUST be video. | |||
* The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the | * The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the | |||
skipping to change at line 894 ¶ | skipping to change at line 903 ¶ | |||
Lennox <jonathan.lennox@8x8.com> | Lennox <jonathan.lennox@8x8.com> | |||
Intended usage: COMMON | Intended usage: COMMON | |||
Restrictions on usage: This media type depends on RTP framing; | Restrictions on usage: This media type depends on RTP framing; | |||
hence, it is only defined for transfer via RTP [RFC3550]. | hence, it is only defined for transfer via RTP [RFC3550]. | |||
Author: Jonathan Lennox <jonathan.lennox@8x8.com> | Author: Jonathan Lennox <jonathan.lennox@8x8.com> | |||
Change controller: IETF AVTCore Working Group delegated from the | Change controller: IETF AVTCore Working Group delegated from the | |||
IESG. | IETF. | |||
8. Security Considerations | 8. Security Considerations | |||
RTP packets using the payload format defined in this specification | RTP packets using the payload format defined in this specification | |||
are subject to the security considerations discussed in the RTP | are subject to the security considerations discussed in the RTP | |||
specification [RFC3550], and in any applicable RTP profile such as | specification [RFC3550], and in any applicable RTP profile such as | |||
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | |||
SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | |||
Does Not Mandate a Single Media Security Solution" [RFC7202] | Does Not Mandate a Single Media Security Solution" [RFC7202] | |||
discusses, it is not an RTP payload format's responsibility to | discusses, it is not an RTP payload format's responsibility to | |||
discuss or mandate what solutions are used to meet the basic security | discuss or mandate what solutions are used to meet the basic security | |||
goals like confidentiality, integrity, and source authenticity for | goals like confidentiality, integrity, and source authenticity for | |||
RTP in general. This responsibility lies with anyone using RTP in an | RTP in general. This responsibility lies with anyone using RTP in an | |||
application. They can find guidance on available security mechanisms | application. They can find guidance on available security mechanisms | |||
in "Options for Securing RTP Sessions [RFC7201]. Applications SHOULD | in "Options for Securing RTP Sessions" [RFC7201]. Applications | |||
use one or more appropriate strong security mechanisms. | SHOULD use one or more appropriate strong security mechanisms. | |||
Implementations of this RTP payload format need to take appropriate | Implementations of this RTP payload format need to take appropriate | |||
security considerations into account. It is extremely important for | security considerations into account. It is extremely important for | |||
the decoder to be robust against malicious or malformed payloads and | the decoder to be robust against malicious or malformed payloads and | |||
ensure that they do not cause the decoder to overrun its allocated | ensure that they do not cause the decoder to overrun its allocated | |||
memory or otherwise misbehave. An overrun in allocated memory could | memory or otherwise misbehave. An overrun in allocated memory could | |||
lead to arbitrary code execution by an attacker. The same applies to | lead to arbitrary code execution by an attacker. The same applies to | |||
the encoder, even though problems in encoders are (typically) rarer. | the encoder, even though problems in encoders are (typically) rarer. | |||
This RTP payload format and its media decoder do not exhibit any | This RTP payload format and its media decoder do not exhibit any | |||
skipping to change at line 944 ¶ | skipping to change at line 953 ¶ | |||
non-reference frames and discard them in order to reduce network | non-reference frames and discard them in order to reduce network | |||
congestion. Note that discarding of non-reference frames cannot be | congestion. Note that discarding of non-reference frames cannot be | |||
done if the stream is encrypted (because the non-reference marker is | done if the stream is encrypted (because the non-reference marker is | |||
encrypted). | encrypted). | |||
10. IANA Considerations | 10. IANA Considerations | |||
IANA has registered the media type registration "video/vp9" as | IANA has registered the media type registration "video/vp9" as | |||
specified in Section 7. The media type has also been added to the | specified in Section 7. The media type has also been added to the | |||
"RTP Payload Format Media Types" <https://www.iana.org/assignments/ | "RTP Payload Format Media Types" <https://www.iana.org/assignments/ | |||
rtp-parameters> subregistry of the "Real-Time Transport Protocol | rtp-parameters> registry of the "Real-Time Transport Protocol (RTP) | |||
(RTP) Paramaeters" registry. | Paramaeters" registry group as follows. | |||
Media Type: video | ||||
Subtype: VP9 | ||||
Clock Rate (Hz): 90000 | ||||
Reference: RFC 9628 | ||||
11. References | 11. References | |||
11.1. Normative References | 11.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
skipping to change at line 997 ¶ | skipping to change at line 1011 ¶ | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | |||
Session Description Protocol", RFC 8866, | Session Description Protocol", RFC 8866, | |||
DOI 10.17487/RFC8866, January 2021, | DOI 10.17487/RFC8866, January 2021, | |||
<https://www.rfc-editor.org/info/rfc8866>. | <https://www.rfc-editor.org/info/rfc8866>. | |||
[RFC9627] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. | [RFC9627] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. | |||
Flodman, "The Layer Refresh Request (LRR) RTCP Feedback | Flodman, "The Layer Refresh Request (LRR) RTCP Feedback | |||
Message", RFC 9627, DOI 10.17487/RFC9627, August 2024, | Message", RFC 9627, DOI 10.17487/RFC9627, February 2025, | |||
<https://www.rfc-editor.org/info/rfc9627>. | <https://www.rfc-editor.org/info/rfc9627>. | |||
[VP9-BITSTREAM] | [VP9-BITSTREAM] | |||
Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream & | Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream & | |||
Decoding Process Specification", Version 0.6, 31 March | Decoding Process Specification", Version 0.6, 31 March | |||
2016, | 2016, | |||
<https://storage.googleapis.com/downloads.webmproject.org/ | <https://storage.googleapis.com/downloads.webmproject.org/ | |||
docs/vp9/vp9-bitstream-specification- | docs/vp9/vp9-bitstream-specification- | |||
v0.6-20160331-draft.pdf>. | v0.6-20160331-draft.pdf>. | |||
End of changes. 64 change blocks. | ||||
189 lines changed or deleted | 203 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |