/ cos(pi/4) cos(pi/4) \ / 1/sqrt(2) 1/sqrt(2) \ / 1 1 \ | | = | | = 1/sqrt(2) | | \ cos(pi/4) -cos(pi/4) / \ 1/sqrt(2) -1/sqrt(2) / \ 1 -1 /This leads to a simple add/sub (sum/diff) calculation scheme. The scalar multiplication is deferred to the descaling operation at the end of the calculation. After column and row application the factor is 1/sqrt(2) * 1/sqrt(2) = 1/2. We also have to account for the different size DCT with a factor of 1/4, so the final descaling factor is 1/2 * 1/4 = 1/8 (right shift 3).
             /  C4   C4   C4   C4  \          /   1    1    1    1  \
             |                     |          |                     |
             |  C2   C6  -C6  -C2  |          |  c2   c6  -c6  -c2  |
  1/sqrt(2)  |                     |  =  1/2  |                     |
             |  C4  -C4  -C4   C4  |          |   1   -1   -1    1  |
             |                     |          |                     |
             \  C6  -C2   C2  -C6  /          \  c6  -c2   c2  -c6  /
  where  Ck = cos(k*pi/16),    ck = sqrt(2) * Ck    (note C4 = 1/sqrt(2))
We use the indexing from the 8x8 point DCT for comparison.
The IDCT is the transpose of the DCT, hence
       /  1   c2   1   c6  \
       |                   |
       |  1   c6  -1  -c2  |
  1/2  |                   |
       |  1  -c6  -1   c2  |
       |                   |
       \  1  -c2   1  -c6  /
The scalar multiplication is deferred to the descaling operation at the end of the
calculation. After column and row application the factor is 1/2 * 1/2 =
1/4.
We also have to account for the different size DCT with a factor of 1/2,
so the final descaling factor is 1/4 * 1/2 = 1/8 (right shift 3).
Now let (x0 x1 x2 x3) be our input vector, (y0 y1 y2 y3) our output. Then we have
/ y0 \ / 1 c2 1 c6 \ / x0 \ / x0 + c2 * x1 + x2 + c6 * x3 \ | | | | | | | | | y1 | | 1 c6 -1 -c2 | | x1 | | x0 + c6 * x1 - x2 - c2 * x3 | | | = | | * | | = | | | y2 | | 1 -c6 -1 c2 | | x2 | | x0 - c6 * x1 - x2 + c2 * x3 | | | | | | | | | \ y3 / \ 1 -c2 1 -c6 / \ x3 / \ x0 - c2 * x1 + x2 - c6 * x3 /You can see that the even part (x0 x2) does not contain any multiplication.
t10 = x0 + x2 t12 = x0 - x2 (even part) t0 = c2 * x1 + c6 * x3 t2 = c6 * x1 - c2 * x3 (odd part)Then we have
y0 = t10 + t0 y3 = t10 - t0 y1 = t12 + t2 y2 = t12 - t2This would require 4 multiplications. We can save 1 multiplication by simple algebraic transformation:
t0 = (c2 - c6) * x1 + c6 * (x1 + x3) t2 = c6 * (x1 + x3) - (c2 + c6) * x3This requires 3 multiplications with
c6 = sqrt(2) * cos(6*pi/16) = 0.541196100 [c2 = sqrt(2) * cos(2*pi/16) not needed] c2 + c6 = 1.847759065 c2 - c6 = 0.765366865This is the same 'rotation' as in the even part of the 8x8 point LL&M IDCT algorithm.