- Downscale to a fourth (2x2 output):

NO multiplications required other than the usual dequantizing.

Need only the 2x2 top left (low-freq) input DCT coefficients.

**=> A LOT FASTER!** - Downscale to a half (4x4 output):

NO multiplications required in the even part calculation.

Only 3 multiplications (1 rotation) in the odd part.

(The old implementation uses 2 multiplications in the

even part, and 8 multiplications in the odd part, so total 10.)

Need only the 4x4 top left (low-freq) input DCT coefficients.

**=> A LOT FASTER!** - Compact, easy to comprehend source code: 9 KBytes new versus
14 KBytes old.
- No other changes to the library or header files required, just
replace and recompile.

No specific adaptions required.

Since the method is equivalent to the method of Dugad and Ahuja, the visual and measured quality of the downsampled output is at least as good as common bilinear (averaging) downsampling.

/ cos(pi/4) cos(pi/4) \ / 1/sqrt(2) 1/sqrt(2) \ / 1 1 \ | | = | | = 1/sqrt(2) | | \ cos(pi/4) -cos(pi/4) / \ 1/sqrt(2) -1/sqrt(2) / \ 1 -1 /This leads to a simple add/sub (sum/diff) calculation scheme. The scalar multiplication is deferred to the descaling operation at the end of the calculation. After column and row application the factor is 1/sqrt(2) * 1/sqrt(2) = 1/2. We also have to account for the different size DCT with a factor of 1/4, so the final descaling factor is 1/2 * 1/4 = 1/8 (right shift 3).

/ C4 C4 C4 C4 \ / 1 1 1 1 \ | | | | | C2 C6 -C6 -C2 | | c2 c6 -c6 -c2 | 1/sqrt(2) | | = 1/2 | | | C4 -C4 -C4 C4 | | 1 -1 -1 1 | | | | | \ C6 -C2 C2 -C6 / \ c6 -c2 c2 -c6 / where Ck = cos(k*pi/16), ck = sqrt(2) * Ck (note C4 = 1/sqrt(2))We use the indexing from the 8x8 point DCT for comparison. The IDCT is the transpose of the DCT, hence

/ 1 c2 1 c6 \ | | | 1 c6 -1 -c2 | 1/2 | | | 1 -c6 -1 c2 | | | \ 1 -c2 1 -c6 /The scalar multiplication is deferred to the descaling operation at the end of the calculation. After column and row application the factor is 1/2 * 1/2 = 1/4. We also have to account for the different size DCT with a factor of 1/2, so the final descaling factor is 1/4 * 1/2 = 1/8 (right shift 3).

Now let (x0 x1 x2 x3) be our input vector, (y0 y1 y2 y3) our output. Then we have

/ y0 \ / 1 c2 1 c6 \ / x0 \ / x0 + c2 * x1 + x2 + c6 * x3 \ | | | | | | | | | y1 | | 1 c6 -1 -c2 | | x1 | | x0 + c6 * x1 - x2 - c2 * x3 | | | = | | * | | = | | | y2 | | 1 -c6 -1 c2 | | x2 | | x0 - c6 * x1 - x2 + c2 * x3 | | | | | | | | | \ y3 / \ 1 -c2 1 -c6 / \ x3 / \ x0 - c2 * x1 + x2 - c6 * x3 /You can see that the even part (x0 x2) does not contain any multiplication.

Now let

t10 = x0 + x2 t12 = x0 - x2 (even part) t0 = c2 * x1 + c6 * x3 t2 = c6 * x1 - c2 * x3 (odd part)Then we have

y0 = t10 + t0 y3 = t10 - t0 y1 = t12 + t2 y2 = t12 - t2This would require 4 multiplications. We can save 1 multiplication by simple algebraic transformation:

t0 = (c2 - c6) * x1 + c6 * (x1 + x3) t2 = c6 * (x1 + x3) - (c2 + c6) * x3This requires 3 multiplications with

c6 = sqrt(2) * cos(6*pi/16) = 0.541196100 [c2 = sqrt(2) * cos(2*pi/16) not needed] c2 + c6 = 1.847759065 c2 - c6 = 0.765366865This is the same 'rotation' as in the even part of the 8x8 point LL&M IDCT algorithm.