360 lines
13 KiB
XML
360 lines
13 KiB
XML
<?xml version="1.0" encoding="US-ASCII"?>
|
|
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
|
|
<?rfc toc="yes"?>
|
|
<?rfc tocompact="yes"?>
|
|
<?rfc tocdepth="3"?>
|
|
<?rfc tocindent="yes"?>
|
|
<?rfc symrefs="yes"?>
|
|
<?rfc sortrefs="yes"?>
|
|
<?rfc comments="yes"?>
|
|
<?rfc inline="yes"?>
|
|
<?rfc compact="yes"?>
|
|
<?rfc subcompact="no"?>
|
|
<rfc category="std" docName="draft-ietf-codec-opus-update-02"
|
|
ipr="trust200902">
|
|
<front>
|
|
<title abbrev="Opus Update">Updates to the Opus Audio Codec</title>
|
|
|
|
<author initials="JM" surname="Valin" fullname="Jean-Marc Valin">
|
|
<organization>Mozilla Corporation</organization>
|
|
<address>
|
|
<postal>
|
|
<street>331 E. Evelyn Avenue</street>
|
|
<city>Mountain View</city>
|
|
<region>CA</region>
|
|
<code>94041</code>
|
|
<country>USA</country>
|
|
</postal>
|
|
<phone>+1 650 903-0800</phone>
|
|
<email>jmvalin@jmvalin.ca</email>
|
|
</address>
|
|
</author>
|
|
|
|
<author initials="K." surname="Vos" fullname="Koen Vos">
|
|
<organization>vocTone</organization>
|
|
<address>
|
|
<postal>
|
|
<street></street>
|
|
<city></city>
|
|
<region></region>
|
|
<code></code>
|
|
<country></country>
|
|
</postal>
|
|
<phone></phone>
|
|
<email>koenvos74@gmail.com</email>
|
|
</address>
|
|
</author>
|
|
|
|
|
|
|
|
<date day="1" month="July" year="2016" />
|
|
|
|
<abstract>
|
|
<t>This document addresses minor issues that were found in the specification
|
|
of the Opus audio codec in <xref target="RFC6716">RFC 6716</xref>.</t>
|
|
</abstract>
|
|
</front>
|
|
|
|
<middle>
|
|
<section title="Introduction">
|
|
<t>This document addresses minor issues that were discovered in the reference
|
|
implementation of the Opus codec that serves as the specification in
|
|
<xref target="RFC6716">RFC 6716</xref>. Only issues affecting the decoder are
|
|
listed here. An up-to-date implementation of the Opus encoder can be found at
|
|
http://opus-codec.org/. The updated specification remains fully compatible with
|
|
the original specification and only one of the changes results in any difference
|
|
in the audio output.
|
|
</t>
|
|
</section>
|
|
|
|
<section title="Terminology">
|
|
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
|
document are to be interpreted as described in <xref
|
|
target="RFC2119">RFC 2119</xref>.</t>
|
|
</section>
|
|
|
|
<section title="Stereo State Reset in SILK">
|
|
<t>The reference implementation does not reinitialize the stereo state
|
|
during a mode switch. The old stereo memory can produce a brief impulse
|
|
(i.e. single sample) in the decoded audio. This can be fixed by changing
|
|
silk/dec_API.c at line 72:
|
|
</t>
|
|
<figure>
|
|
<artwork><![CDATA[
|
|
for( n = 0; n < DECODER_NUM_CHANNELS; n++ ) {
|
|
ret = silk_init_decoder( &channel_state[ n ] );
|
|
}
|
|
+ silk_memset(&((silk_decoder *)decState)->sStereo, 0,
|
|
+ sizeof(((silk_decoder *)decState)->sStereo));
|
|
+ /* Not strictly needed, but it's cleaner that way */
|
|
+ ((silk_decoder *)decState)->prev_decode_only_middle = 0;
|
|
|
|
return ret;
|
|
}
|
|
]]></artwork>
|
|
</figure>
|
|
<t>
|
|
This change affects the normative part of the decoder, although the
|
|
amount of change is too small to make a significant impact on testvectors.
|
|
</t>
|
|
</section>
|
|
|
|
<section anchor="padding" title="Parsing of the Opus Packet Padding">
|
|
<t>It was discovered that some invalid packets of very large size could trigger
|
|
an out-of-bounds read in the Opus packet parsing code responsible for padding.
|
|
This is due to an integer overflow if the signaled padding exceeds 2^31-1 bytes
|
|
(the actual packet may be smaller). The code can be fixed by applying the following
|
|
changes at line 596 of src/opus_decoder.c:
|
|
</t>
|
|
<figure>
|
|
<artwork><![CDATA[
|
|
/* Padding flag is bit 6 */
|
|
if (ch&0x40)
|
|
{
|
|
- int padding=0;
|
|
int p;
|
|
do {
|
|
if (len<=0)
|
|
return OPUS_INVALID_PACKET;
|
|
p = *data++;
|
|
len--;
|
|
- padding += p==255 ? 254: p;
|
|
+ len -= p==255 ? 254: p;
|
|
} while (p==255);
|
|
- len -= padding;
|
|
}
|
|
]]></artwork>
|
|
</figure>
|
|
<t>This packet parsing issue is limited to reading memory up
|
|
to about 60 kB beyond the compressed buffer. This can only be triggered
|
|
by a compressed packet more than about 16 MB long, so it's not a problem
|
|
for RTP. In theory, it <spanx style="emph">could</spanx> crash a file
|
|
decoder (e.g. Opus in Ogg) if the memory just after the incoming packet
|
|
is out-of-range, but our attempts to trigger such a crash in a production
|
|
application built using an affected version of the Opus decoder failed.</t>
|
|
</section>
|
|
|
|
<section anchor="resampler" title="Resampler buffer">
|
|
<t>The SILK resampler had the following issues:
|
|
<list style="numbers">
|
|
<t>The calls to memcpy() were using sizeof(opus_int32), but the type of the
|
|
local buffer was opus_int16.</t>
|
|
<t>Because the size was wrong, this potentially allowed the source
|
|
and destination regions of the memcpy() to overlap.
|
|
We <spanx style="emph">believe</spanx> that nSamplesIn is at least fs_in_khZ,
|
|
which is at least 8.
|
|
Since RESAMPLER_ORDER_FIR_12 is only 8, that should not be a problem once
|
|
the type size is fixed.</t>
|
|
<t>The size of the buffer used RESAMPLER_MAX_BATCH_SIZE_IN, but the
|
|
data stored in it was actually _twice_ the input batch size
|
|
(nSamplesIn<<1).</t>
|
|
</list></t>
|
|
<t>
|
|
The fact that the code never produced any error in testing (including when run under the
|
|
Valgrind memory debugger), suggests that in practice
|
|
the batch sizes are reasonable enough that none of the issues above
|
|
was ever a problem. However, proving that is non-obvious.
|
|
</t>
|
|
<t>The code can be fixed by applying the following changes to line 70 of silk/resampler_private_IIR_FIR.c:
|
|
</t>
|
|
<figure>
|
|
<artwork><![CDATA[
|
|
)
|
|
{
|
|
silk_resampler_state_struct *S = \
|
|
(silk_resampler_state_struct *)SS;
|
|
opus_int32 nSamplesIn;
|
|
opus_int32 max_index_Q16, index_increment_Q16;
|
|
- opus_int16 buf[ RESAMPLER_MAX_BATCH_SIZE_IN + \
|
|
RESAMPLER_ORDER_FIR_12 ];
|
|
+ opus_int16 buf[ 2*RESAMPLER_MAX_BATCH_SIZE_IN + \
|
|
RESAMPLER_ORDER_FIR_12 ];
|
|
|
|
/* Copy buffered samples to start of buffer */
|
|
- silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \
|
|
* sizeof( opus_int32 ) );
|
|
+ silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \
|
|
* sizeof( opus_int16 ) );
|
|
|
|
/* Iterate over blocks of frameSizeIn input samples */
|
|
index_increment_Q16 = S->invRatio_Q16;
|
|
while( 1 ) {
|
|
nSamplesIn = silk_min( inLen, S->batchSize );
|
|
|
|
/* Upsample 2x */
|
|
silk_resampler_private_up2_HQ( S->sIIR, &buf[ \
|
|
RESAMPLER_ORDER_FIR_12 ], in, nSamplesIn );
|
|
|
|
max_index_Q16 = silk_LSHIFT32( nSamplesIn, 16 + 1 \
|
|
); /* + 1 because 2x upsampling */
|
|
out = silk_resampler_private_IIR_FIR_INTERPOL( out, \
|
|
buf, max_index_Q16, index_increment_Q16 );
|
|
in += nSamplesIn;
|
|
inLen -= nSamplesIn;
|
|
|
|
if( inLen > 0 ) {
|
|
/* More iterations to do; copy last part of \
|
|
filtered signal to beginning of buffer */
|
|
- silk_memcpy( buf, &buf[ nSamplesIn << 1 ], \
|
|
RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );
|
|
+ silk_memmove( buf, &buf[ nSamplesIn << 1 ], \
|
|
RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );
|
|
} else {
|
|
break;
|
|
}
|
|
}
|
|
|
|
/* Copy last part of filtered signal to the state for \
|
|
the next call */
|
|
- silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \
|
|
RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );
|
|
+ silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \
|
|
RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );
|
|
}
|
|
]]></artwork>
|
|
</figure>
|
|
<t>
|
|
Note: due to RFC formatting conventions, lines exceeding the column width
|
|
in the patch above are split using a backslash character. The backslashes
|
|
at the end of a line and the white space at the beginning
|
|
of the following line are not part of the patch. A properly formatted patch
|
|
including the three changes above is available at
|
|
<eref target="https://jmvalin.ca/misc_stuff/opus_update.patch"/>. (EDITOR:
|
|
change to an ietf.org link when ready)
|
|
</t>
|
|
</section>
|
|
|
|
<section title="Downmix to Mono" anchor="stereo">
|
|
<t>The last issue is not strictly a bug, but it is an issue that has been reported
|
|
when downmixing an Opus decoded stream to mono, whether this is done inside the decoder
|
|
or as a post-processing step on the stereo decoder output. Opus intensity stereo allows
|
|
optionally coding the two channels 180-degrees out of phase on a per-band basis.
|
|
This provides better stereo quality than forcing the two channels to be in phase,
|
|
but when the output is downmixed to mono, the energy in the affected bands is cancelled
|
|
sometimes resulting in audible artefacts.
|
|
</t>
|
|
<t>As a work-around for this issue, the decoder MAY choose not to apply the 180-degree
|
|
phase shift when the output is meant to be downmixed (inside or
|
|
outside of the decoder).
|
|
</t>
|
|
</section>
|
|
|
|
<section title="Hybrid Folding" anchor="folding">
|
|
<t>When encoding in hybrid mode at low bitrate, we sometimes only have
|
|
enough bits to code a single CELT band (8 - 9.6 kHz). When that happens,
|
|
the second band (CELT band 18, from 9.6 to 12 kHz) cannot use folding
|
|
because it is wider than the amount already coded, and falls back to
|
|
LCG noise. Because it can also happen on transients (e.g. stops), it
|
|
can cause audible pre-echo.
|
|
</t>
|
|
<t>
|
|
To address the issue, we change the folding behaviour so that it is
|
|
never forced to fall back to LCG due to the first band not containing
|
|
enough coefficients to fold onto the second band. This
|
|
is achieved by simply repeating part of the first band in the folding
|
|
of the second band. This changes the code in celt/bands.c around line 1237:
|
|
</t>
|
|
<figure>
|
|
<artwork><![CDATA[
|
|
b = 0;
|
|
}
|
|
|
|
- if (resynth && M*eBands[i]-N >= M*eBands[start] && \
|
|
(update_lowband || lowband_offset==0))
|
|
+ if (resynth && (M*eBands[i]-N >= M*eBands[start] || \
|
|
i==start+1) && (update_lowband || lowband_offset==0))
|
|
lowband_offset = i;
|
|
|
|
+ if (i == start+1)
|
|
+ {
|
|
+ int n1, n2;
|
|
+ int offset;
|
|
+ n1 = M*(eBands[start+1]-eBands[start]);
|
|
+ n2 = M*(eBands[start+2]-eBands[start+1]);
|
|
+ offset = M*eBands[start];
|
|
+ /* Duplicate enough of the first band folding data to \
|
|
be able to fold the second band.
|
|
+ Copies no data for CELT-only mode. */
|
|
+ OPUS_COPY(&norm[offset+n1], &norm[offset+2*n1 - n2], n2-n1);
|
|
+ if (C==2)
|
|
+ OPUS_COPY(&norm2[offset+n1], &norm2[offset+2*n1 - n2], \
|
|
n2-n1);
|
|
+ }
|
|
+
|
|
tf_change = tf_res[i];
|
|
if (i>=m->effEBands)
|
|
{
|
|
]]></artwork>
|
|
</figure>
|
|
|
|
<t>
|
|
as well as line 1260:
|
|
</t>
|
|
|
|
<figure>
|
|
<artwork><![CDATA[
|
|
fold_start = lowband_offset;
|
|
while(M*eBands[--fold_start] > effective_lowband);
|
|
fold_end = lowband_offset-1;
|
|
- while(M*eBands[++fold_end] < effective_lowband+N);
|
|
+ while(++fold_end < i && M*eBands[fold_end] < \
|
|
effective_lowband+N);
|
|
x_cm = y_cm = 0;
|
|
fold_i = fold_start; do {
|
|
x_cm |= collapse_masks[fold_i*C+0];
|
|
|
|
]]></artwork>
|
|
</figure>
|
|
<t>
|
|
The fix does not impact compatibility, because the improvement does
|
|
not depend on the encoder doing anything special. There is also no
|
|
reasonable way for an encoder to use the original behaviour to
|
|
improve quality over the proposed change.
|
|
</t>
|
|
</section>
|
|
|
|
<section title="New Test Vectors">
|
|
<t>Changes in <xref target="stereo"/> and <xref target="folding"/> have
|
|
sufficient impact on the testvectors to make them fail. For this reason,
|
|
this document also updates the Opus test vectors. The new test vectors now
|
|
include two decoded outputs for the same bitstream. The outputs with
|
|
suffix 'm' do not apply the CELT 180-degree phase shift as allowed in
|
|
<xref target="stereo"/>, while the outputs with suffix 's' do. An
|
|
implementation is compliant as long as it passes either the 'm' or the
|
|
's' set of vectors.
|
|
</t>
|
|
<t>
|
|
In addition, any Opus implementation
|
|
that passes the original test vectors from <xref target="RFC6716">RFC 6716</xref>
|
|
is still compliant with the Opus specification. However, newer implementations
|
|
SHOULD be based on the new test vectors rather than the old ones.
|
|
</t>
|
|
<t>The new test vectors are located at
|
|
<eref target="https://jmvalin.ca/misc_stuff/opus_newvectors.tar.gz"/>. (EDITOR:
|
|
change to an ietf.org link when ready)
|
|
</t>
|
|
</section>
|
|
|
|
<section anchor="IANA" title="IANA Considerations">
|
|
<t>This document makes no request of IANA.</t>
|
|
|
|
<t>Note to RFC Editor: this section may be removed on publication as an
|
|
RFC.</t>
|
|
</section>
|
|
|
|
<section anchor="Acknowledgements" title="Acknowledgements">
|
|
<t>We would like to thank Juri Aedla for reporting the issue with the parsing of
|
|
the Opus padding.</t>
|
|
</section>
|
|
</middle>
|
|
|
|
<back>
|
|
<references title="References">
|
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
|
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml"?>
|
|
|
|
|
|
</references>
|
|
</back>
|
|
</rfc>
|