Diff: rfc9628v1.txt - rfc9628.txt

	rfc9628v1.txt		rfc9628.txt

	Internet Engineering Task Force (IETF) J. Uberti		Internet Engineering Task Force (IETF) J. Uberti
	Request for Comments: 9628 S. Holmer		Request for Comments: 9628 S. Holmer
	Category: Standards Track M. Flodman		Category: Standards Track M. Flodman
	ISSN: 2070-1721 D. Hong		ISSN: 2070-1721 D. Hong
	Google		Google
	J. Lennox		J. Lennox
	8x8 / Jitsi		8x8 / Jitsi

	August 2024		February 2025

	RTP Payload Format for VP9 Video		RTP Payload Format for VP9 Video

	Abstract		Abstract

	This specification describes an RTP payload format for the VP9 video		This specification describes an RTP payload format for the VP9 video
	codec. The payload format has wide applicability as it supports		codec. The payload format has wide applicability as it supports
	applications from low bitrate peer-to-peer usage to high bitrate		applications from low bitrate peer-to-peer usage to high bitrate
	video conferences. It includes provisions for temporal and spatial		video conferences. It includes provisions for temporal and spatial
	scalability.		scalability.

	skipping to change at line 37 ¶		skipping to change at line 37 ¶
	received public review and has been approved for publication by the		received public review and has been approved for publication by the
	Internet Engineering Steering Group (IESG). Further information on		Internet Engineering Steering Group (IESG). Further information on
	Internet Standards is available in Section 2 of RFC 7841.		Internet Standards is available in Section 2 of RFC 7841.

	Information about the current status of this document, any errata,		Information about the current status of this document, any errata,
	and how to provide feedback on it may be obtained at		and how to provide feedback on it may be obtained at
	https://www.rfc-editor.org/info/rfc9628.		https://www.rfc-editor.org/info/rfc9628.

	Copyright Notice		Copyright Notice


	Copyright (c) 2024 IETF Trust and the persons identified as the		Copyright (c) 2025 IETF Trust and the persons identified as the
	document authors. All rights reserved.		document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal		This document is subject to BCP 78 and the IETF Trust's Legal
	Provisions Relating to IETF Documents		Provisions Relating to IETF Documents
	(https://trustee.ietf.org/license-info) in effect on the date of		(https://trustee.ietf.org/license-info) in effect on the date of
	publication of this document. Please review these documents		publication of this document. Please review these documents
	carefully, as they describe your rights and restrictions with respect		carefully, as they describe your rights and restrictions with respect
	to this document. Code Components extracted from this document must		to this document. Code Components extracted from this document must
	include Revised BSD License text as described in Section 4.e of the		include Revised BSD License text as described in Section 4.e of the
	Trust Legal Provisions and are provided without warranty as described		Trust Legal Provisions and are provided without warranty as described

	skipping to change at line 132 ¶		skipping to change at line 132 ¶
	allow a frame to be encoded at the same resolution but at different		allow a frame to be encoded at the same resolution but at different
	qualities (and, thus, with different amounts of coding error). VP9		qualities (and, thus, with different amounts of coding error). VP9
	supports quality layers as spatial layers without any resolution		supports quality layers as spatial layers without any resolution
	changes; hereinafter, the term "spatial layer" is used to represent		changes; hereinafter, the term "spatial layer" is used to represent
	both spatial and quality layers.		both spatial and quality layers.

	This payload format specification defines how such temporal and		This payload format specification defines how such temporal and
	spatial scalability layers can be described and communicated.		spatial scalability layers can be described and communicated.

	Temporal and spatial scalability layers are associated with non-		Temporal and spatial scalability layers are associated with non-

	negative integer IDs. The lowest layer of either type has an ID of 0		negative integer IDs. The lowest layer of either type has an ID of
	and is sometimes referred to as the "base" temporal or spatial layer.		zero and is sometimes referred to as the "base" temporal or spatial
			layer.

	Layers are designed, and MUST be encoded, such that if any layer, and		Layers are designed, and MUST be encoded, such that if any layer, and
	all higher layers, are removed from the bitstream along either the		all higher layers, are removed from the bitstream along either the
	spatial or temporal dimension, the remaining bitstream is still		spatial or temporal dimension, the remaining bitstream is still
	correctly decodable.		correctly decodable.

	For terminology, this document uses the term "frame" to refer to a		For terminology, this document uses the term "frame" to refer to a

	single encoded VP9 frame for a particular resolution/quality, and		single encoded VP9 frame for a particular resolution and/or quality,
	"picture" to refer to all the representations (frames) at a single		and "picture" to refer to all the representations (frames) at a
	instant in time. Thus, a picture consists of one or more frames,		single instant in time. Thus, a picture consists of one or more
	encoding different spatial layers.		frames, encoding different spatial layers.


	Within a picture, a frame with spatial-layer ID equal to SID, where		Within a picture, a frame with spatial-layer ID equal to S, where S >
	SID > 0, can depend on a frame of the same picture with a lower		0, can depend on a frame of the same picture with a lower spatial-
	spatial-layer ID. This "inter-layer" dependency can result in		layer ID. This "inter-layer" dependency can result in additional
	additional coding gain compared to the case where only traditional		coding gain compared to the case where only "inter-picture"
	"inter-picture" dependency is used, where a frame depends on a		dependency is used, where a frame depends on a previously coded frame
	previously coded frame in time. For simplicity, this payload format		in time. For simplicity, this payload format assumes that, within a
	assumes that, within a picture and if inter-layer dependency is used,		picture and if inter-layer dependency is used, a spatial-layer S
	a spatial-layer SID frame can depend only on the immediately previous		frame can depend only on the immediately previous spatial-layer S-1
	spatial-layer SID-1 frame, when S > 0. Additionally, if inter-		frame, when S > 0. Additionally, if inter-picture dependency is
	picture dependency is used, a spatial-layer SID frame is assumed to		used, a spatial-layer S frame is assumed to only depend on a
	only depend on a previously coded spatial-layer SID frame.		previously coded spatial-layer S frame.

	Given the above simplifications for inter-layer and inter-picture		Given the above simplifications for inter-layer and inter-picture
	dependencies, a flag (the D bit described below) is used to indicate		dependencies, a flag (the D bit described below) is used to indicate
	whether a spatial-layer SID frame depends on the spatial-layer SID-1		whether a spatial-layer SID frame depends on the spatial-layer SID-1
	frame. Given the D bit, a receiver only needs to additionally know		frame. Given the D bit, a receiver only needs to additionally know
	the inter-picture dependency structure for a given spatial-layer		the inter-picture dependency structure for a given spatial-layer
	frame in order to determine its decodability. Two modes of		frame in order to determine its decodability. Two modes of
	describing the inter-picture dependency structure are possible:		describing the inter-picture dependency structure are possible:
	"flexible mode" and "non-flexible mode". An encoder can only switch		"flexible mode" and "non-flexible mode". An encoder can only switch
	between the two on the first packet of a keyframe with a temporal-		between the two on the first packet of a keyframe with a temporal-

	layer ID equal to 0.		layer ID equal to zero.

	In flexible mode, each packet can contain up to three reference		In flexible mode, each packet can contain up to three reference
	indices, which identify all frames referenced by the frame		indices, which identify all frames referenced by the frame
	transmitted in the current packet for inter-picture prediction. This		transmitted in the current packet for inter-picture prediction. This
	(along with the D bit) enables a receiver to identify if a frame is		(along with the D bit) enables a receiver to identify if a frame is
	decodable or not and helps it understand the temporal-layer		decodable or not and helps it understand the temporal-layer
	structure. Since this is signaled in each packet, it makes it		structure. Since this is signaled in each packet, it makes it
	possible to have very flexible temporal-layer hierarchies and		possible to have very flexible temporal-layer hierarchies and
	scalability structures, which are changing dynamically.		scalability structures, which are changing dynamically.

	In non-flexible mode, frames are encoded using a fixed, recurring		In non-flexible mode, frames are encoded using a fixed, recurring
	pattern of dependencies; the set of pictures that recur in this		pattern of dependencies; the set of pictures that recur in this
	pattern is known as a "Picture Group" (or "PG"). In this mode, the		pattern is known as a "Picture Group" (or "PG"). In this mode, the
	inter-picture dependencies (the reference indices) of the PG MUST be		inter-picture dependencies (the reference indices) of the PG MUST be
	pre-specified as part of the Scalability Structure (SS) data. Each		pre-specified as part of the Scalability Structure (SS) data. Each
	packet has an index to refer to one of the described pictures in the		packet has an index to refer to one of the described pictures in the
	PG from which the pictures referenced by the picture transmitted in		PG from which the pictures referenced by the picture transmitted in
	the current packet for inter-picture prediction can be identified.		the current packet for inter-picture prediction can be identified.


	Note: A "Picture Group" or "PG", as used in this document, is not the		\| Note: A "Picture Group" or "PG", as used in this document, is
	same thing as the term "Group of Pictures" as it is traditionally		\| not the same thing as the term "Group of Pictures" as it is
	used in video coding, i.e., to mean an independently decodable run of		\| commonly used in video coding, i.e., to mean an independently
	pictures beginning with a keyframe.		\| decodable run of pictures beginning with a keyframe.

	The SS data can also be used to specify the resolution of each		The SS data can also be used to specify the resolution of each
	spatial layer present in the VP9 stream for both flexible and non-		spatial layer present in the VP9 stream for both flexible and non-
	flexible modes.		flexible modes.

	4. Payload Format		4. Payload Format

	This section describes how the encoded VP9 bitstream is encapsulated		This section describes how the encoded VP9 bitstream is encapsulated
	in RTP. To handle network losses, usage of RTP/AVPF [RFC4585] is		in RTP. To handle network losses, usage of RTP/AVPF [RFC4585] is

	RECOMMENDED. All integer fields in the specifications are encoded as		RECOMMENDED. All integer fields in this specification are encoded as
	unsigned integers in network octet order.		unsigned integers in network octet order.

	4.1. RTP Header Usage		4.1. RTP Header Usage

	The general RTP payload format for VP9 is depicted below.		The general RTP payload format for VP9 is depicted below.

	0 1 2 3		0 1 2 3
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1		0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	\|V=2\|P\|X\| CC \|M\| PT \| sequence number \|		\|V=2\|P\|X\| CC \|M\| PT \| sequence number \|

	skipping to change at line 232 ¶		skipping to change at line 233 ¶
	\| : \|		\| : \|
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \|		+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \|
	\| \|		\| \|
	+ \|		+ \|
	: VP9 payload :		: VP9 payload :
	\| \|		\| \|
	\| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+		\| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	\| : OPTIONAL RTP padding \|		\| : OPTIONAL RTP padding \|
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


	Figure 1: General RTP Payload Format for VP		Figure 1: General RTP Payload Format for VP9

	See Section 4.2 for more information on the VP9 payload descriptor;		See Section 4.2 for more information on the VP9 payload descriptor;
	the VP9 payload is described in [VP9-BITSTREAM]. OPTIONAL RTP		the VP9 payload is described in [VP9-BITSTREAM]. OPTIONAL RTP
	padding MUST NOT be included unless the P bit is set.		padding MUST NOT be included unless the P bit is set.


	Marker bit (M): This bit MUST be set to 1 for the final packet of		Marker bit (M): This bit MUST be set to one for the final packet of
	the highest spatial-layer frame (the final packet of the picture),		the highest spatial-layer frame (the final packet of the picture);
	and 0 otherwise. Unless spatial scalability is in use for this		otherwise, it is zero. Unless spatial scalability is in use for
	picture, this bit will have the same value as the E bit described		this picture, this bit will have the same value as the E bit
	in Section 4.2. Note this bit MUST be set to 1 for the target		described in Section 4.2. Note this bit MUST be set to one for
	spatial-layer frame if a stream is being rewritten to remove		the target spatial-layer frame if a stream is being rewritten to
	higher spatial layers.		remove higher spatial layers.

	Payload Type (PT): In line with the policy in Section 3 of		Payload Type (PT): In line with the policy in Section 3 of
	[RFC3551], applications using the VP9 RTP payload profile MUST		[RFC3551], applications using the VP9 RTP payload profile MUST
	assign a dynamic payload type number to be used in each RTP		assign a dynamic payload type number to be used in each RTP
	session and provide a mechanism to indicate the mapping. See		session and provide a mechanism to indicate the mapping. See
	Section 6.1 for the mechanism to be used with the Session		Section 6.1 for the mechanism to be used with the Session
	Description Protocol (SDP) [RFC8866].		Description Protocol (SDP) [RFC8866].

	Timestamp: The RTP timestamp [RFC3550] indicates the time when the		Timestamp: The RTP timestamp [RFC3550] indicates the time when the
	input frame was sampled, at a clock rate of 90 kHz. If the input		input frame was sampled, at a clock rate of 90 kHz. If the input

	picture is encoded with multiple-layer frames, all of the frames		picture is encoded with multiple frames, all of the frames of the
	of the picture MUST have the same timestamp.		picture MUST have the same timestamp.


	If a frame has the VP9 show_frame field set to 0 (i.e., it is		If a frame has the VP9 show_frame field set to zero (i.e., it is
	meant only to populate a reference buffer without being output),		meant only to populate a reference buffer without being output),
	its timestamp MAY alternatively be set to be the same as the		its timestamp MAY alternatively be set to be the same as the

	subsequent frame with show_frame equal to 1. (This will be		subsequent frame with show_frame equal to one. (This will be
	convenient for playing out pre-encoded content packaged with VP9		convenient for playing out pre-encoded content packaged with VP9
	"superframes", which typically bundle show_frame==0 frames with a		"superframes", which typically bundle show_frame==0 frames with a

	subsequent show_frame==1 frame.) Every frame with show_frame==1,		subsequent show_frame==1 frame.) Every picture containing a frame
	however, MUST have a unique timestamp modulo the 2^32 wrap of the		with show_frame==1, however, MUST have a unique timestamp modulo
	field.		the 2^32 wrap of the field.

	The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number,		The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number,
	SSRC, and CSRC identifiers) are used as specified in Section 5.1 of		SSRC, and CSRC identifiers) are used as specified in Section 5.1 of
	[RFC3550].		[RFC3550].

	4.2. VP9 Payload Descriptor		4.2. VP9 Payload Descriptor


	In flexible mode (with the F bit below set to 1), the first octets		In flexible mode (with the F bit below set to one), the first octets
	after the RTP header are the VP9 payload descriptor, with the		after the RTP header are the VP9 payload descriptor, with the
	following structure.		following structure.

	0 1 2 3 4 5 6 7		0 1 2 3 4 5 6 7
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	\|I\|P\|L\|F\|B\|E\|V\|Z\| (REQUIRED)		\|I\|P\|L\|F\|B\|E\|V\|Z\| (REQUIRED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	I: \|M\| PICTURE ID \| (REQUIRED)		I: \|M\| PICTURE ID \| (REQUIRED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	M: \| EXTENDED PID \| (RECOMMENDED)		M: \| EXTENDED PID \| (RECOMMENDED)

	skipping to change at line 296 ¶		skipping to change at line 297 ¶
	L: \| TID \|U\| SID \|D\| (Conditionally RECOMMENDED)		L: \| TID \|U\| SID \|D\| (Conditionally RECOMMENDED)
	+-+-+-+-+-+-+-+-+ -\		+-+-+-+-+-+-+-+-+ -\
	P,F: \| P_DIFF \|N\| (Conditionally REQUIRED) - up to 3 times		P,F: \| P_DIFF \|N\| (Conditionally REQUIRED) - up to 3 times
	+-+-+-+-+-+-+-+-+ -/		+-+-+-+-+-+-+-+-+ -/
	V: \| SS \|		V: \| SS \|
	\| .. \|		\| .. \|
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+

	Figure 2: Flexible Mode Format for VP9 Payload Descriptor		Figure 2: Flexible Mode Format for VP9 Payload Descriptor


	In non-flexible mode (with the F bit below set to 0), the first		In non-flexible mode (with the F bit below set to zero), the first
	octets after the RTP header are the VP9 payload descriptor, with the		octets after the RTP header are the VP9 payload descriptor, with the
	following structure.		following structure.

	0 1 2 3 4 5 6 7		0 1 2 3 4 5 6 7
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	\|I\|P\|L\|F\|B\|E\|V\|Z\| (REQUIRED)		\|I\|P\|L\|F\|B\|E\|V\|Z\| (REQUIRED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	I: \|M\| PICTURE ID \| (RECOMMENDED)		I: \|M\| PICTURE ID \| (RECOMMENDED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	M: \| EXTENDED PID \| (RECOMMENDED)		M: \| EXTENDED PID \| (RECOMMENDED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	L: \| TID \|U\| SID \|D\| (Conditionally RECOMMENDED)		L: \| TID \|U\| SID \|D\| (Conditionally RECOMMENDED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	\| TL0PICIDX \| (Conditionally REQUIRED)		\| TL0PICIDX \| (Conditionally REQUIRED)
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+
	V: \| SS \|		V: \| SS \|
	\| .. \|		\| .. \|
	+-+-+-+-+-+-+-+-+		+-+-+-+-+-+-+-+-+


	Figure 3: Non-flexible Mode Format for VP9 Payload Descriptor		Figure 3: Non-Flexible Mode Format for VP9 Payload Descriptor


	I: Picture ID (PID) present. When set to 1, the OPTIONAL PID MUST		Except as noted, the following field descriptions apply to the
			payload descriptor formats in both Figures 2 and 3.

			I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST
	be present after the mandatory first octet and specified as below.		be present after the mandatory first octet and specified as below.
	Otherwise, PID MUST NOT be present. If the V bit was set in the		Otherwise, PID MUST NOT be present. If the V bit was set in the
	stream's most recent start of a keyframe (i.e., the SS field was		stream's most recent start of a keyframe (i.e., the SS field was

	present) and the F bit is set to 0 (i.e., non-flexible scalability		present) and the F bit is set to zero (i.e., non-flexible
	mode is in use), then this bit MUST be set on every packet.		scalability mode is in use), then this bit MUST be set on every
			packet.


	P: Inter-picture predicted frame. When set to 0, the frame does not		P: Inter-picture predicted frame. When set to zero, the frame does
	utilize inter-picture prediction. In this case, up-switching to a		not utilize inter-picture prediction. In this case, up-switching
	current spatial layer's frame is possible from a directly lower		to a current spatial layer's frame is possible from a directly
	spatial-layer frame. P SHOULD also be set to 0 when encoding a		lower spatial-layer frame. P SHOULD also be set to zero when
	layer synchronization frame in response to a Layer Refresh Request		encoding a layer synchronization frame in response to a Layer
	(LRR) [RFC9627] message (see Section 5.3). When P is set to 0,		Refresh Request (LRR) [RFC9627] message (see Section 5.3). When P
	the TID field (described below) MUST also be set to 0 (if		is set to zero, the Temporal-layer ID (TID) field (described
	present). Note that the P bit does not forbid intra-picture,		below) MUST also be set to zero (if present). Note that the P bit
	inter-layer prediction from earlier frames of the same picture, if		does not forbid intra-picture, inter-layer prediction from earlier
	any.		frames of the same picture, if any.


	L: Layer indices present. When set to 1, the one or two octets		L: Layer indices present. When set to one, the one or two octets
	following the mandatory first octet and the PID (if present) is as		following the mandatory first octet and the PID (if present) is as
	described by "Layer indices" below. If the F bit (described		described by "Layer indices" below. If the F bit (described

	below) is set to 1 (indicating flexible mode), then only one octet		below) is set to one (indicating flexible mode), then only one
	is present for the layer indices. Otherwise, if the F bit is set		octet is present for the layer indices. Otherwise, if the F bit
	to 0 (indicating non-flexible mode), then two octets are present		is set to zero (indicating non-flexible mode), then two octets are
	for the layer indices.		present for the layer indices.


	F: Flexible mode. When set to 1, this indicates flexible mode; if		F: Flexible mode. When set to one, this indicates flexible mode; if
	the P bit is also set to 1, then the octets following the		the P bit is also set to one, then the octets following the
	mandatory first octet, the PID, and layer indices (if present) are		mandatory first octet, the PID, and layer indices (if present) are

	as described by "Reference indices" below. This bit MUST only be		as described by "reference indices" below. This bit MUST only be
	set to 1 if the I bit is also set to 1; if the I bit is set to 0,		set to one if the I bit is also set to one; if the I bit is set to
	then this bit MUST also be set to 0 and ignored by receivers.		zero, then this bit MUST also be set to zero and ignored by
	(Flexible mode's Reference indices are defined as offsets from the		receivers. (Flexible mode's reference indices are defined as
	Picture ID field, so they would have no meaning if I were not		offsets from the Picture ID field, so they would have no meaning
	set.) The value of the F bit MUST only change on the first packet		if I were not set.) The value of the F bit MUST only change on
	of a key picture. A "key picture" is a picture whose base		the first packet of a key picture. A "key picture" is a picture
	spatial-layer frame is a keyframe, and thus one which completely		whose base spatial-layer frame is a keyframe, and thus one which
	resets the encoder state. This packet will have its P bit equal		completely resets the encoder state. This packet will have its P
	to 0, SID or L bit (described below) equal to 0, and B bit		bit equal to zero, SID or L bit (described below) equal to zero,
	(described below) equal to 1.		and B bit (described below) equal to one.


	B: Start of a frame. This bit MUST be set to 1 if the first payload		B: Start of Frame. This bit MUST be set to one if the first payload
	octet of the RTP packet is the beginning of a new VP9 frame;		octet of the RTP packet is the beginning of a new VP9 frame;

	otherwise, it MUST NOT be 1. Note that this frame might not be		otherwise, it MUST NOT be one. Note that this frame might not be
	the first frame of a picture.		the first frame of a picture.


	E: End of a frame. This bit MUST be set to 1 for the final RTP		E: End of Frame. This bit MUST be set to one for the final RTP
	packet of a VP9 frame, and 0 otherwise. This enables a decoder to		packet of a VP9 frame; otherwise, it is zero. This enables a
	finish decoding the frame, where it otherwise may need to wait for		decoder to finish decoding the frame, where it otherwise may need
	the next packet to explicitly know that the frame is complete.		to wait for the next packet to explicitly know that the frame is
	Note that, if spatial scalability is in use, more frames from the		complete. Note that, if spatial scalability is in use, more
	same picture may follow; see the description of the B bit above.		frames from the same picture may follow; see the description of
			the B bit above.


	V: Scalability Structure (SS) data present. When set to 1, the		V: Scalability Structure (SS) data present. When set to one, the
	OPTIONAL SS data MUST be present in the payload descriptor.		OPTIONAL SS data MUST be present in the payload descriptor.
	Otherwise, the SS data MUST NOT be present.		Otherwise, the SS data MUST NOT be present.


	Z: Not a reference frame for upper spatial layers. If set to 1,		Z: Not a reference frame for upper spatial layers. If set to one,
	indicates that frames with higher spatial layers SID+1 and greater		indicates that frames with higher spatial layers SID+1 and greater
	of the current and following pictures do not depend on the current		of the current and following pictures do not depend on the current
	spatial-layer SID frame. This enables a decoder that is targeting		spatial-layer SID frame. This enables a decoder that is targeting
	a higher spatial layer to know that it can safely discard this		a higher spatial layer to know that it can safely discard this
	packet's frame without processing it, without having to wait for		packet's frame without processing it, without having to wait for
	the D bit in the higher-layer frame (see below).		the D bit in the higher-layer frame (see below).

	The mandatory first octet is followed by the extension data fields		The mandatory first octet is followed by the extension data fields
	that are enabled:		that are enabled:

	M: The most significant bit of the first octet is an extension flag.		M: The most significant bit of the first octet is an extension flag.
	The field MUST be present if the I bit is equal to one. If M is		The field MUST be present if the I bit is equal to one. If M is
	set, the PID field MUST contain 15 bits; otherwise, it MUST		set, the PID field MUST contain 15 bits; otherwise, it MUST
	contain 7 bits. See PID below.		contain 7 bits. See PID below.

	Picture ID (PID): Picture ID represented in 7 or 15 bits, depending		Picture ID (PID): Picture ID represented in 7 or 15 bits, depending
	on the M bit. This is a running index of the pictures, where the		on the M bit. This is a running index of the pictures, where the

	sender increments the value by 1 for each picture it sends.		sender increments the value by one for each picture it sends.
	(Note, however, that because a middlebox can discard pictures		(Note, however, that because a middlebox can discard pictures
	where permitted by the SS, Picture IDs as received by a receiver		where permitted by the SS, Picture IDs as received by a receiver
	might not be contiguous.) This field MUST be present if the I bit		might not be contiguous.) This field MUST be present if the I bit

	is equal to one. If M is set to 0, 7 bits carry the PID; else, if		is equal to one. If M is set to zero, 7 bits carry the PID; else,
	M is set to 1, 15 bits carry the PID in network byte order. The		if M is set to one, 15 bits carry the PID in network byte order.
	sender may choose between a 7- or 15-bit index. The PID SHOULD		The sender may choose between a 7- or 15-bit index. The PID
	start on a random number and MUST wrap after reaching the maximum		SHOULD start on a random number and MUST wrap after reaching the
	ID (0x7f or 0x7fff depending on the index size chosen). The		maximum ID (0x7f or 0x7fff depending on the index size chosen).
	receiver MUST NOT assume that the number of bits in the PID stays		The receiver MUST NOT assume that the number of bits in the PID
	the same through the session. If this field transitions from 7		stays the same through the session. If this field transitions
	bits to 15 bits, the value is zero-extended (i.e., the value after		from 7 bits to 15 bits, the value is zero-extended (i.e., the
	0x6e is 0x006f); if the field transitions from 15 bits to 7 bits,		value after 0x6e is 0x006f); if the field transitions from 15 bits
	it is truncated (i.e., the value after 0x1bbe is 0xbf).		to 7 bits, it is truncated (i.e., the value after 0x1bbe is 0xbf).


	In the non-flexible mode (when the F bit is set to 0), this PID is		In the non-flexible mode (when the F bit is set to zero), this PID
	used as an index to the PG specified in the SS data below. In		is used as an index to the PG specified in the SS data below. In
	this mode, the PID of the keyframe corresponds to the first		this mode, the PID of the keyframe corresponds to the first
	specified frame in the PG. Then subsequent PIDs are mapped to		specified frame in the PG. Then subsequent PIDs are mapped to
	subsequently specified frames in the PG (modulo N_G, specified in		subsequently specified frames in the PG (modulo N_G, specified in
	the SS data below), respectively.		the SS data below), respectively.

	All frames of the same picture MUST have the same PID value.		All frames of the same picture MUST have the same PID value.

	Frames (and their corresponding pictures) with the VP9 show_frame		Frames (and their corresponding pictures) with the VP9 show_frame

	field equal to 0 MUST have distinct PID values from subsequent		field equal to zero MUST have distinct PID values from subsequent
	pictures with show_frame equal to 1. Thus, a picture (as defined		pictures with show_frame equal to one. Thus, a picture (as
	in this specification) is different than a VP9 superframe.		defined in this specification) is different than a VP9 superframe.

	All frames of the same picture MUST have the same value for		All frames of the same picture MUST have the same value for
	show_frame.		show_frame.


	Layer indices: This information is optional but RECOMMENDED whenever		Layer indices: This field is optional but RECOMMENDED whenever
	encoding with layers. For both flexible and non-flexible modes,		encoding with layers. For both flexible and non-flexible modes,

	one octet is used to specify a layer frame's temporal-layer ID		one octet is used to specify a layer frame's Temporal-layer ID
	(TID) and spatial-layer ID (SID) as shown both in Figure 2 and		(TID) and Spatial-layer ID (SID) as shown both in Figures 2 and 3.
	Figure 3. Additionally, a bit (U) is used to indicate that the		Additionally, a bit (U) is used to indicate that the current frame
	current frame is a "switching up point" frame. Another bit (D) is		is a "switching up point" frame. Another bit (D) is used to
	used to indicate whether inter-layer prediction is used for the		indicate whether inter-layer prediction is used for the current
	current frame.		frame.


	In the non-flexible mode (when the F bit is set to 0), another		In the non-flexible mode (when the F bit is set to zero), another
	octet is used to represent temporal-layer 0 index (TL0PICIDX), as		octet is used to represent the Temporal Layer 0 Picture Index (8
	depicted in Figure 3. The TL0PICIDX is present so that all		bits) (TL0PICIDX), as depicted in Figure 3. The TL0PICIDX is
	minimally required frames (the base temporal-layer frames) can be		present so that all minimally required frames (the base temporal-
	tracked.		layer frames) can be tracked.

	The TID and SID fields indicate the temporal and spatial layers		The TID and SID fields indicate the temporal and spatial layers
	and can help middleboxes and endpoints quickly identify which		and can help middleboxes and endpoints quickly identify which
	layer a packet belongs to.		layer a packet belongs to.

	TID: The temporal-layer ID of the current frame. In the case of		TID: The temporal-layer ID of the current frame. In the case of
	non-flexible mode, if a PID is mapped to a picture in a		non-flexible mode, if a PID is mapped to a picture in a
	specified PG, then the value of the TID MUST match the		specified PG, then the value of the TID MUST match the
	corresponding TID value of the mapped picture in the PG.		corresponding TID value of the mapped picture in the PG.


	U: Switching up point. If this bit is set to 1 for the current		U: Switching up point. When this bit is set to one, if the
	picture with a temporal-layer ID equal to TID, then "switch up"		current picture has a temporal-layer ID equal to value T, then
	to a higher frame rate is possible as subsequent higher		subsequent pictures with temporal-layer ID values higher than T
	temporal-layer pictures will not depend on any picture before		will not depend on any picture before the current picture (in
	the current picture (in coding order) with temporal-layer ID		coding order) with a temporal-layer ID value greater than T.
	greater than TID.

	SID: The spatial-layer ID of the current frame. Note that frames		SID: The spatial-layer ID of the current frame. Note that frames
	with spatial-layer SID > 0 may be dependent on decoded spatial-		with spatial-layer SID > 0 may be dependent on decoded spatial-
	layer SID-1 frame within the same picture. Different frames of		layer SID-1 frame within the same picture. Different frames of
	the same picture MUST have distinct spatial-layer IDs, and		the same picture MUST have distinct spatial-layer IDs, and
	frames' spatial layers MUST appear in increasing order within		frames' spatial layers MUST appear in increasing order within
	the frame.		the frame.


	D: Inter-layer dependency is used. D MUST be set to 1 if and		D: Inter-layer dependency is used. D MUST be set to one if and
	only if the current spatial-layer SID frame depends on spatial-		only if the current spatial-layer SID frame depends on spatial-
	layer SID-1 frame of the same picture; otherwise, it MUST be		layer SID-1 frame of the same picture; otherwise, it MUST be

	set to 0. For the base-layer frame (with SID equal to 0), the		set to zero. For the base-layer frame (with SID equal to
	D bit MUST be set to 0.		zero), the D bit MUST be set to zero.


	TL0PICIDX: 8 bits temporal-layer zero index. TL0PICIDX is only		TL0PICIDX: Temporal Layer 0 Picture Index (8 bits). TL0PICIDX is
	present in the non-flexible mode (F = 0). This is a running		only present in the non-flexible mode (F = 0). This is a
	index for the temporal base-layer pictures, i.e., the pictures		running index for the temporal base-layer pictures, i.e., the
	with a TID set to 0. If the TID is larger than 0, TL0PICIDX		pictures with a TID set to zero. If the TID is larger than
	indicates which temporal base-layer picture the current picture		zero, TL0PICIDX indicates which temporal base-layer picture the
	depends on. TL0PICIDX MUST be incremented by 1 when the TID is		current picture depends on. TL0PICIDX MUST be incremented by
	equal to 0. The index SHOULD start on a random number and MUST		one when the TID is equal to zero. The index SHOULD start on a
	restart at 0 after reaching the maximum number 255.		random number and MUST restart at zero after reaching the
			maximum number 255.


	Reference indices: When P and F are both set to 1, indicating a non-		Reference indices: When P and F are both set to one, indicating a
	keyframe in flexible mode, then at least one reference index MUST		non-keyframe in flexible mode, then at least one reference index
	be specified as below. Additional reference indices (a total of		MUST be specified as below. Additional reference indices (a total
	up to three reference indices are allowed) may be specified using		of up to three reference indices are allowed) may be specified
	the N bit below. When either P or F is set to 0, then no		using the N bit below. When either P or F is set to zero, then no
	reference index is specified.		reference index is specified.

	P_DIFF: The reference index (in 7 bits) specified as the relative		P_DIFF: The reference index (in 7 bits) specified as the relative
	PID from the current picture. For example, when P_DIFF=3 on a		PID from the current picture. For example, when P_DIFF=3 on a
	packet containing the picture with PID 112 means that the		packet containing the picture with PID 112 means that the
	picture refers back to the picture with PID 109. This		picture refers back to the picture with PID 109. This
	calculation is done modulo the size of the PID field, i.e.,		calculation is done modulo the size of the PID field, i.e.,

	either 7 or 15 bits. A P_DIFF value of 0 is invalid.		either 7 or 15 bits. A P_DIFF value of zero is invalid.

	N: 1 if there is additional P_DIFF following the current P_DIFF.		N: 1 if there is additional P_DIFF following the current P_DIFF.

	4.2.1. Scalability Structure (SS)		4.2.1. Scalability Structure (SS)

	The SS data describes the resolution of each frame within a picture		The SS data describes the resolution of each frame within a picture
	as well as the inter-picture dependencies for a PG. If the VP9		as well as the inter-picture dependencies for a PG. If the VP9
	payload descriptor's V bit is set, the SS data is present in the		payload descriptor's V bit is set, the SS data is present in the
	position indicated in Figures 2 and 3.		position indicated in Figures 2 and 3.


	skipping to change at line 521 ¶		skipping to change at line 527 ¶
	+-+-+-+-+-+-+-+-+ -/		+-+-+-+-+-+-+-+-+ -/
	G: \| N_G \| (OPTIONAL)		G: \| N_G \| (OPTIONAL)
	+-+-+-+-+-+-+-+-+ -\		+-+-+-+-+-+-+-+-+ -\
	N_G: \| TID \|U\| R \|-\|-\| (OPTIONAL) .		N_G: \| TID \|U\| R \|-\|-\| (OPTIONAL) .
	+-+-+-+-+-+-+-+-+ -\ . - N_G times		+-+-+-+-+-+-+-+-+ -\ . - N_G times
	\| P_DIFF \| (OPTIONAL) . - R times .		\| P_DIFF \| (OPTIONAL) . - R times .
	+-+-+-+-+-+-+-+-+ -/ -/		+-+-+-+-+-+-+-+-+ -/ -/

	Figure 4: VP9 Scalability Structure		Figure 4: VP9 Scalability Structure


	N_S: N_S + 1 indicates the number of spatial layers present in the		N_S: Number of Spatial Layers Minus 1. N_S + 1 indicates the number
	VP9 stream.		of spatial layers present in the VP9 stream.


	Y: Each spatial layer's frame resolution is present. When set to 1,		Y: Each spatial layer's frame resolution is present. When set to
	the OPTIONAL WIDTH (2 octets) and HEIGHT (2 octets) MUST be		one, the OPTIONAL WIDTH (2 octets) and HEIGHT (2 octets) MUST be
	present for each layer frame. Otherwise, the resolution MUST NOT		present for each layer frame. Otherwise, the resolution MUST NOT
	be present.		be present.

	G: The PG description present flag.		G: The PG description present flag.


	-: A bit reserved for future use. It MUST be set to 0 and MUST be		-: A bit reserved for future use. It MUST be set to zero and MUST
	ignored by the receiver.		be ignored by the receiver.

	N_G: N_G indicates the number of pictures in a PG. If N_G is		N_G: N_G indicates the number of pictures in a PG. If N_G is

	greater than 0, then the SS data allows the inter-picture		greater than zero, then the SS data allows the inter-picture
	dependency structure of the VP9 stream to be pre-declared, rather		dependency structure of the VP9 stream to be pre-declared, rather
	than indicating it on the fly with every packet. If N_G is		than indicating it on the fly with every packet. If N_G is

	greater than 0, then for N_G pictures in the PG, each picture's		greater than zero, then for N_G pictures in the PG, each picture's
	temporal-layer ID (TID), switch up point (U), and Reference		Temporal-layer ID (TID), switch up point (U), and reference
	indices (P_DIFFs) are specified.		indices (P_DIFFs) are specified.


	The first picture specified in the PG MUST have a TID set to 0.		The first picture specified in the PG MUST have a TID set to zero.


	G set to 0 or N_G set to 0 indicates that either there is only one		G set to zero or N_G set to zero indicates that either there is
	temporal layer (for non-flexible mode) or no fixed inter-picture		only one temporal layer (for non-flexible mode) or no fixed inter-
	dependency information is present (for flexible mode) going		picture dependency information is present (for flexible mode)
	forward in the bitstream.		going forward in the bitstream.

	Note that for a given picture, all frames follow the same inter-		Note that for a given picture, all frames follow the same inter-
	picture dependency structure. However, the frame rate of each		picture dependency structure. However, the frame rate of each
	spatial layer can be different from each other; this can be		spatial layer can be different from each other; this can be
	described with the use of the D bit described above. The		described with the use of the D bit described above. The
	specified dependency structure in the SS data MUST be for the		specified dependency structure in the SS data MUST be for the
	highest frame rate layer.		highest frame rate layer.


			R: The number of P_DIFF fields that are present.

	In a scalable stream sent with a fixed pattern, the SS data SHOULD be		In a scalable stream sent with a fixed pattern, the SS data SHOULD be
	included in the first packet of every key frame. This is a packet		included in the first packet of every key frame. This is a packet

	with the P bit equal to 0, SID or L bit equal to 0, and B bit equal		with the P bit equal to zero, SID or L bit equal to zero, and B bit
	to 1. The SS data MUST only be changed on the picture that		equal to one. The SS data MUST only be changed on the picture that
	corresponds to the first picture specified in the previous SS data's		corresponds to the first picture specified in the previous SS data's

	PG (if the previous SS data's N_G was greater than 0).		PG (if the previous SS data's N_G was greater than zero).

	4.3. Frame Fragmentation		4.3. Frame Fragmentation

	VP9 frames are fragmented into packets in RTP sequence number order:		VP9 frames are fragmented into packets in RTP sequence number order:
	beginning with a packet with the B bit set and ending with a packet		beginning with a packet with the B bit set and ending with a packet
	with the E bit set. There is no mechanism for finer-grained access		with the E bit set. There is no mechanism for finer-grained access
	to parts of a VP9 frame.		to parts of a VP9 frame.

	4.4. Scalable Encoding Considerations		4.4. Scalable Encoding Considerations


	skipping to change at line 641 ¶		skipping to change at line 649 ¶
	+----------+---------+------------+---------+		+----------+---------+------------+---------+

	Table 1: Example Scalability Structure		Table 1: Example Scalability Structure

	This structure is constructed such that the U bit can always be set.		This structure is constructed such that the U bit can always be set.

	5. Feedback Messages and Header Extensions		5. Feedback Messages and Header Extensions

	5.1. Reference Picture Selection Indication (RPSI)		5.1. Reference Picture Selection Indication (RPSI)


	The reference picture selection index is a payload-specific feedback		The RPSI is a payload-specific feedback message defined within the
	message defined within the RTCP-based feedback format. The RPSI		RTCP-based feedback format. The RPSI message is generated by a
	message is generated by a receiver and can be used in two ways:		receiver and can be used in two ways: either it can signal a
	either it can signal a preferred reference picture when a loss has		preferred reference picture when a loss has been detected by the
	been detected by the decoder (preferably a reference that the decoder		decoder (preferably a reference that the decoder knows is perfect) or
	knows is perfect) or it can be used as positive feedback information		it can be used as positive feedback information to acknowledge
	to acknowledge correct decoding of certain reference pictures. The		correct decoding of certain reference pictures. The positive
	positive feedback method is useful for VP9 used for point-to-point		feedback method is useful for VP9 used for point-to-point (unicast)
	(unicast) communication. The use of RPSI for VP9 is preferably		communication. The use of RPSI for VP9 is preferably combined with a
	combined with a special update pattern of the codec's two special		special update pattern of the codec's two special reference frames --
	reference frames -- the golden frame and the altref frame -- in which		the golden frame and the altref frame -- in which they are updated in
	they are updated in an alternating leapfrog fashion. When a receiver		an alternating leapfrog fashion. When a receiver has received and
	has received and correctly decoded a golden or altref frame, and that		correctly decoded a golden or altref frame, and that frame had a
	frame had a Picture ID in the payload descriptor, the receiver can		Picture ID in the payload descriptor, the receiver can acknowledge
	acknowledge this simply by sending an RPSI message back to the		this simply by sending an RPSI message back to the sender. The
	sender. The message body (i.e., the "native RPSI bit string" in		message body (i.e., the "native RPSI bit string" in [RFC4585]) is
	[RFC4585]) is simply the (7- or 15-bit) Picture ID of the received		simply the (7- or 15-bit) Picture ID of the received frame.
	frame.


	Note: because all frames of the same picture must have the same		\| Note: because all frames of the same picture must have the same
	inter-picture reference structure, there is no need for a message to		\| inter-picture reference structure, there is no need for a
	specify which frame is being selected.		\| message to specify which frame is being selected.

	5.2. Full Intra Request (FIR)		5.2. Full Intra Request (FIR)

	The Full Intra Request (FIR) [RFC5104] RTCP feedback message allows a		The Full Intra Request (FIR) [RFC5104] RTCP feedback message allows a
	receiver to request a full state refresh of an encoded stream.		receiver to request a full state refresh of an encoded stream.

	Upon receipt of a FIR request, a VP9 sender MUST send a picture with		Upon receipt of a FIR request, a VP9 sender MUST send a picture with
	a keyframe for its spatial-layer 0 layer frame and then send frames		a keyframe for its spatial-layer 0 layer frame and then send frames
	without inter-picture prediction (P=0) for any higher-layer frames.		without inter-picture prediction (P=0) for any higher-layer frames.


	skipping to change at line 688 ¶		skipping to change at line 695 ¶

	+---------------+---------------+		+---------------+---------------+
	\|0\|1\|2\|3\|4\|5\|6\|7\|0\|1\|2\|3\|4\|5\|6\|7\|		\|0\|1\|2\|3\|4\|5\|6\|7\|0\|1\|2\|3\|4\|5\|6\|7\|
	+---------------+---------+-----+		+---------------+---------+-----+
	\| RES \| TID \| RES \| SID \|		\| RES \| TID \| RES \| SID \|
	+---------------+---------+-----+		+---------------+---------+-----+

	Figure 5: LRR Index Format		Figure 5: LRR Index Format

	Figure 5 shows the format of an LRR's layer index fields for VP9		Figure 5 shows the format of an LRR's layer index fields for VP9

	streams. The two "RES" fields MUST be set to 0 on transmission and		streams. The two "RES" fields MUST be set to zero on transmission
	ignored on reception. See Section 4.2 for details on the TID and SID		and ignored on reception. See Section 4.2 for details on the TID and
	fields.		SID fields.

	Identification of a layer refresh frame can be derived from the		Identification of a layer refresh frame can be derived from the
	reference IDs of each frame by backtracking the dependency chain		reference IDs of each frame by backtracking the dependency chain
	until reaching a point where only decodable frames are being		until reaching a point where only decodable frames are being
	referenced. Therefore, it's recommended for both the flexible and		referenced. Therefore, it's recommended for both the flexible and
	the non-flexible mode that, when switching up points are being		the non-flexible mode that, when switching up points are being
	encoded in response to an LRR, those packets contain layer indices		encoded in response to an LRR, those packets contain layer indices
	and the reference field or fields so that the decoder or selective		and the reference field or fields so that the decoder or selective
	forwarding middleboxes [RFC7667] can make this derivation.		forwarding middleboxes [RFC7667] can make this derivation.

	Example:		Example:

	LRR {1,0}, {2,1} is sent by a Multipoint Control Unit (MCU) when it		LRR {1,0}, {2,1} is sent by a Multipoint Control Unit (MCU) when it

	is currently relaying {1,0} to a receiver and which wants to upgrade		is currently relaying {1,0} to a receiver that wants to upgrade to
	to {2,1}. In response, the encoder should encode the next frames in		{2,1}. In response, the encoder should encode the next frames in
	layers {1,1} and {2,1} by only referring to frames in {1,0}, or		layers {1,1} and {2,1} by only referring to frames in {1,0} or {0,0}.
	{0,0}.

	In the non-flexible mode, periodic upgrade frames can be defined by		In the non-flexible mode, periodic upgrade frames can be defined by
	the layer structure of the SS; thus, periodic upgrade frames can be		the layer structure of the SS; thus, periodic upgrade frames can be
	automatically identified by the Picture ID.		automatically identified by the Picture ID.

	6. Payload Format Parameters		6. Payload Format Parameters

	This payload format has three optional parameters: max-fr, max-fs,		This payload format has three optional parameters: max-fr, max-fs,
	and profile-id.		and profile-id.

	The max-fr and max-fs parameters are used to signal the capabilities		The max-fr and max-fs parameters are used to signal the capabilities
	of a receiver implementation. If the implementation is willing to		of a receiver implementation. If the implementation is willing to
	receive media, both parameters MUST be provided. These parameters		receive media, both parameters MUST be provided. These parameters
	MUST NOT be used for any other purpose. A media sender SHOULD NOT		MUST NOT be used for any other purpose. A media sender SHOULD NOT
	send media with a frame rate or frame size exceeding the max-fr and		send media with a frame rate or frame size exceeding the max-fr and
	max-fs values signaled. (There may be scenarios, such as pre-encoded		max-fs values signaled. (There may be scenarios, such as pre-encoded
	media or selective forwarding middleboxes [RFC7667], where a media		media or selective forwarding middleboxes [RFC7667], where a media
	sender does not have media available that fits within a receiver's		sender does not have media available that fits within a receiver's

	max-fs and max-fr value; in such scenarios, a sender MAY exceed the		max-fs and max-fr values; in such scenarios, a sender MAY exceed the
	signaled values.)		signaled values.)

	max-fr: The value of max-fr is an integer indicating the maximum		max-fr: The value of max-fr is an integer indicating the maximum
	frame rate in units of frames per second that the decoder is		frame rate in units of frames per second that the decoder is
	capable of decoding.		capable of decoding.

	max-fs: The value of max-fs is an integer indicating the maximum		max-fs: The value of max-fs is an integer indicating the maximum
	frame size in units of macroblocks that the decoder is capable of		frame size in units of macroblocks that the decoder is capable of
	decoding.		decoding.

	The decoder is capable of decoding this frame size as long as the		The decoder is capable of decoding this frame size as long as the

	width and height of the frame in macroblocks are less than		width and height of the frame in macroblocks are each less than
	int(sqrt(max-fs * 8)); for instance, a max-fs of 1200 (capable of		int(sqrt(max-fs * 8)); for instance, a max-fs of 1200 (capable of
	supporting 640x480 resolution) will support widths and heights up		supporting 640x480 resolution) will support widths and heights up
	to 1552 pixels (97 macroblocks).		to 1552 pixels (97 macroblocks).

	profile-id: The value of profile-id is an integer indicating the		profile-id: The value of profile-id is an integer indicating the
	default coding profile (the subset of coding tools that may have		default coding profile (the subset of coding tools that may have
	been used to generate the stream or that the receiver supports).		been used to generate the stream or that the receiver supports).
	Table 2 lists all of the profiles defined in Section 7.2 of		Table 2 lists all of the profiles defined in Section 7.2 of
	[VP9-BITSTREAM] and the corresponding integer values to be used.		[VP9-BITSTREAM] and the corresponding integer values to be used.


	skipping to change at line 772 ¶		skipping to change at line 778 ¶
	+=========+============+		+=========+============+
	\| 0 \| 0 \|		\| 0 \| 0 \|
	+---------+------------+		+---------+------------+
	\| 1 \| 1 \|		\| 1 \| 1 \|
	+---------+------------+		+---------+------------+
	\| 2 \| 2 \|		\| 2 \| 2 \|
	+---------+------------+		+---------+------------+
	\| 3 \| 3 \|		\| 3 \| 3 \|
	+---------+------------+		+---------+------------+


	Table 2: Comparison of		Table 2:
			Correspondence between
	profile-id to VP9		profile-id to VP9
	Profile Integer		Profile Integer

	+=========+===========+=================+==========================+		+=========+===========+=================+==========================+
	\| Profile \| Bit Depth \| SRGB Colorspace \| Chroma Subsampling \|		\| Profile \| Bit Depth \| SRGB Colorspace \| Chroma Subsampling \|
	+=========+===========+=================+==========================+		+=========+===========+=================+==========================+
	\| 0 \| 8 \| No \| YUV 4:2:0 \|		\| 0 \| 8 \| No \| YUV 4:2:0 \|
	+---------+-----------+-----------------+--------------------------+		+---------+-----------+-----------------+--------------------------+
	\| 1 \| 8 \| Yes \| YUV 4:2:2,4:4:0 or 4:4:4 \|		\| 1 \| 8 \| Yes \| YUV 4:2:2,4:4:0 or 4:4:4 \|
	+---------+-----------+-----------------+--------------------------+		+---------+-----------+-----------------+--------------------------+
	\| 2 \| 10 or 12 \| No \| YUV 4:2:0 \|		\| 2 \| 10 or 12 \| No \| YUV 4:2:0 \|
	+---------+-----------+-----------------+--------------------------+		+---------+-----------+-----------------+--------------------------+
	\| 3 \| 10 or 12 \| Yes \| YUV 4:2:2,4:4:0 or 4:4:4 \|		\| 3 \| 10 or 12 \| Yes \| YUV 4:2:2,4:4:0 or 4:4:4 \|
	+---------+-----------+-----------------+--------------------------+		+---------+-----------+-----------------+--------------------------+

	Table 3: Profile Capabilities		Table 3: Profile Capabilities


			\| Note: SRGB (often sRGB) = Standard Red-Green-Blue

	6.1. SDP Parameters		6.1. SDP Parameters

	6.1.1. Mapping of Media Subtype Parameters to SDP		6.1.1. Mapping of Media Subtype Parameters to SDP

	The media type video/vp9 string is mapped to fields in the Session		The media type video/vp9 string is mapped to fields in the Session
	Description Protocol (SDP) [RFC8866] as follows:		Description Protocol (SDP) [RFC8866] as follows:

	* The media name in the "m=" line of SDP MUST be video.		* The media name in the "m=" line of SDP MUST be video.

	* The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the		* The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the

	skipping to change at line 894 ¶		skipping to change at line 903 ¶
	Lennox <jonathan.lennox@8x8.com>		Lennox <jonathan.lennox@8x8.com>

	Intended usage: COMMON		Intended usage: COMMON

	Restrictions on usage: This media type depends on RTP framing;		Restrictions on usage: This media type depends on RTP framing;
	hence, it is only defined for transfer via RTP [RFC3550].		hence, it is only defined for transfer via RTP [RFC3550].

	Author: Jonathan Lennox <jonathan.lennox@8x8.com>		Author: Jonathan Lennox <jonathan.lennox@8x8.com>

	Change controller: IETF AVTCore Working Group delegated from the		Change controller: IETF AVTCore Working Group delegated from the

	IESG.		IETF.

	8. Security Considerations		8. Security Considerations

	RTP packets using the payload format defined in this specification		RTP packets using the payload format defined in this specification
	are subject to the security considerations discussed in the RTP		are subject to the security considerations discussed in the RTP
	specification [RFC3550], and in any applicable RTP profile such as		specification [RFC3550], and in any applicable RTP profile such as
	RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/		RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
	SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP		SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP
	Does Not Mandate a Single Media Security Solution" [RFC7202]		Does Not Mandate a Single Media Security Solution" [RFC7202]
	discusses, it is not an RTP payload format's responsibility to		discusses, it is not an RTP payload format's responsibility to
	discuss or mandate what solutions are used to meet the basic security		discuss or mandate what solutions are used to meet the basic security
	goals like confidentiality, integrity, and source authenticity for		goals like confidentiality, integrity, and source authenticity for
	RTP in general. This responsibility lies with anyone using RTP in an		RTP in general. This responsibility lies with anyone using RTP in an
	application. They can find guidance on available security mechanisms		application. They can find guidance on available security mechanisms

	in "Options for Securing RTP Sessions [RFC7201]. Applications SHOULD		in "Options for Securing RTP Sessions" [RFC7201]. Applications
	use one or more appropriate strong security mechanisms.		SHOULD use one or more appropriate strong security mechanisms.

	Implementations of this RTP payload format need to take appropriate		Implementations of this RTP payload format need to take appropriate
	security considerations into account. It is extremely important for		security considerations into account. It is extremely important for
	the decoder to be robust against malicious or malformed payloads and		the decoder to be robust against malicious or malformed payloads and
	ensure that they do not cause the decoder to overrun its allocated		ensure that they do not cause the decoder to overrun its allocated
	memory or otherwise misbehave. An overrun in allocated memory could		memory or otherwise misbehave. An overrun in allocated memory could
	lead to arbitrary code execution by an attacker. The same applies to		lead to arbitrary code execution by an attacker. The same applies to
	the encoder, even though problems in encoders are (typically) rarer.		the encoder, even though problems in encoders are (typically) rarer.

	This RTP payload format and its media decoder do not exhibit any		This RTP payload format and its media decoder do not exhibit any

	skipping to change at line 944 ¶		skipping to change at line 953 ¶
	non-reference frames and discard them in order to reduce network		non-reference frames and discard them in order to reduce network
	congestion. Note that discarding of non-reference frames cannot be		congestion. Note that discarding of non-reference frames cannot be
	done if the stream is encrypted (because the non-reference marker is		done if the stream is encrypted (because the non-reference marker is
	encrypted).		encrypted).

	10. IANA Considerations		10. IANA Considerations

	IANA has registered the media type registration "video/vp9" as		IANA has registered the media type registration "video/vp9" as
	specified in Section 7. The media type has also been added to the		specified in Section 7. The media type has also been added to the
	"RTP Payload Format Media Types" <https://www.iana.org/assignments/		"RTP Payload Format Media Types" <https://www.iana.org/assignments/

	rtp-parameters> subregistry of the "Real-Time Transport Protocol		rtp-parameters> registry of the "Real-Time Transport Protocol (RTP)
	(RTP) Paramaeters" registry.		Paramaeters" registry group as follows.

			Media Type: video
			Subtype: VP9
			Clock Rate (Hz): 90000
			Reference: RFC 9628

	11. References		11. References

	11.1. Normative References		11.1. Normative References

	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate		[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
	Requirement Levels", BCP 14, RFC 2119,		Requirement Levels", BCP 14, RFC 2119,
	DOI 10.17487/RFC2119, March 1997,		DOI 10.17487/RFC2119, March 1997,
	<https://www.rfc-editor.org/info/rfc2119>.		<https://www.rfc-editor.org/info/rfc2119>.


	skipping to change at line 997 ¶		skipping to change at line 1011 ¶
	2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,		2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
	May 2017, <https://www.rfc-editor.org/info/rfc8174>.		May 2017, <https://www.rfc-editor.org/info/rfc8174>.

	[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:		[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
	Session Description Protocol", RFC 8866,		Session Description Protocol", RFC 8866,
	DOI 10.17487/RFC8866, January 2021,		DOI 10.17487/RFC8866, January 2021,
	<https://www.rfc-editor.org/info/rfc8866>.		<https://www.rfc-editor.org/info/rfc8866>.

	[RFC9627] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M.		[RFC9627] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M.
	Flodman, "The Layer Refresh Request (LRR) RTCP Feedback		Flodman, "The Layer Refresh Request (LRR) RTCP Feedback

	Message", RFC 9627, DOI 10.17487/RFC9627, August 2024,		Message", RFC 9627, DOI 10.17487/RFC9627, February 2025,
	<https://www.rfc-editor.org/info/rfc9627>.		<https://www.rfc-editor.org/info/rfc9627>.

	[VP9-BITSTREAM]		[VP9-BITSTREAM]
	Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream &		Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream &
	Decoding Process Specification", Version 0.6, 31 March		Decoding Process Specification", Version 0.6, 31 March
	2016,		2016,
	<https://storage.googleapis.com/downloads.webmproject.org/		<https://storage.googleapis.com/downloads.webmproject.org/
	docs/vp9/vp9-bitstream-specification-		docs/vp9/vp9-bitstream-specification-
	v0.6-20160331-draft.pdf>.		v0.6-20160331-draft.pdf>.


End of changes. 64 change blocks.
	189 lines changed or deleted		203 lines changed or added
This html diff was produced by rfcdiff 1.48.