This feature can also be used for direct internal color upsampling while normal (1:1) decoding of 2hx2v color subsampled JPEG images (also 2hx1v and 1hx2v with enhanced library).

A fast 12x12 IDCT has been developed to provide a 'djpeg -scale 3/2' option for
2:3 (8:12) image upscaling.

This is interesting because it is the first non-integer upsampling feature in the
IJG context.

- Normal decoding (1:1, 8:8, djpeg -scale 1/1):

The used LL&M 8-point IDCT kernel requires 12 multiplications (3 even + 9 odd).

This gives 12x16/64 (16 = 8 column loops + 8 row loops) =**3**mults per output pixel. - Downscaling to a half (2:1, 8:4, djpeg -scale 1/2):

The developed 4-point IDCT kernel requires 3 multiplications (0 even + 3 odd).

This gives 3x8/16 (8 = 4 column loops + 4 row loops) =**1.5**mults per output pixel.

This is twice the speed than normal 1:1 decoding! - Upscaling x2 (1:2, 8:16, djpeg -scale 2/1):

The developed 16-point IDCT kernel requires 28 multiplications (8 even + 20 odd).

This gives 28x24/256 (24 = 8 column loops + 16 row loops) =**2.625**mults per output pixel.

This is faster than normal 1:1 decoding regarding the output size! - Upscaling x1.5 (2:3, 8:12, djpeg -scale 3/2):

The developed 12-point IDCT kernel requires 15 multiplications (2 even + 13 odd).

This gives 15x20/144 (20 = 8 column loops + 12 row loops) =**2.08**mults per output pixel.

This makes the 2:3 upscaling faster than normal 1:1 decoding regarding the output size!

Note that we only need to calculate the upper 8 of 12 full matrix rows, since we have to apply them only to 8x8 input values (see later in transposed IDCT scheme).

The dotted vertical line in the middle marks the central (anti)symmetry axis, so we can further simplify computation by only calculating the left half and then mirror it alternately as is or with sign alternation to the right half.

Furthermore we omit the scalar scaling factor because it turns out that the final scaling is always by a factor of 1/8 for any output (I)DCT size.

col 0 1 2 3 4 5 6 7 8 9 10 11 index : / C6 C6 C6 C6 C6 C6 : C6 C6 C6 C6 C6 C6 \ | : | | C1 C3 C5 C7 C9 C11 :-C11 -C9 -C7 -C5 -C3 -C1 | | : | | C2 C6 C10 -C10 -C6 -C2 :-C2 -C6 -C10 C10 C6 C2 | | : | | C3 C9 -C9 -C3 -C3 -C9 : C9 -C3 C3 C9 -C9 -C3 | | : | | C4 0 -C4 -C4 0 C4 : C4 0 -C4 -C4 0 C4 | | : | | C5 -C9 -C1 -C11 C3 C7 :-C7 -C3 C11 C1 C9 -C5 | | : | | C6 -C6 -C6 C6 C6 -C6 :-C6 C6 C6 -C6 -C6 C6 | | : | | C7 -C3 -C11 C1 -C9 -C5 : C5 C9 -C1 C11 C3 -C7 | |....................................:...................................| | : | where Ck = cos(k*pi/24)Now the IDCT is the transpose of the DCT, hence

col 0 1 2 3 4 5 6 7 index / C6 C1 C2 C3 C4 C5 C6 C7 | | C6 C3 C6 C9 0 -C9 -C6 -C3 | | C6 C5 C10 -C9 -C4 -C1 -C6 -C11 | | C6 C7 -C10 -C3 -C4 -C11 C6 C1 | | C6 C9 -C6 -C3 0 C3 C6 -C9 | | C6 C11 -C2 -C9 C4 C7 -C6 -C5 |---------------------------------------------- | C6 -C11 -C2 C9 C4 -C7 -C6 C5 | | C6 -C9 -C6 -C3 0 -C3 C6 C9 | | C6 -C7 -C10 C3 -C4 C11 C6 -C1 | | C6 -C5 C10 C9 -C4 C1 -C6 C11 | | C6 -C3 C6 -C9 0 C9 -C6 C3 | \ C6 -C1 C2 -C3 C4 -C5 C6 -C7With ck = sqrt(2) * Ck and C6 = 1/sqrt(2) we get

col 0 1 2 3 4 5 6 7 index / 1 c1 c2 c3 c4 c5 1 c7 | | 1 c3 1 c9 0 -c9 -1 -c3 | | 1 c5 c10 -c9 -c4 -c1 -1 -c11 | | 1 c7 -c10 -c3 -c4 -c11 1 c1 | | 1 c9 -1 -c3 0 c3 1 -c9 | | 1 c11 -c2 -c9 c4 c7 -1 -c5 |---------------------------------------------- | 1 -c11 -c2 c9 c4 -c7 -1 c5 | | 1 -c9 -1 -c3 0 -c3 1 c9 | | 1 -c7 -c10 c3 -c4 c11 1 -c1 | | 1 -c5 c10 c9 -c4 c1 -1 c11 | | 1 -c3 1 -c9 0 c9 -1 c3 | \ 1 -c1 c2 -c3 C4 -c5 1 -c7 where c1 = 1.402114769 c2 = 1.366025404 c3 = 1.306562965 c4 = 1.224744871 c5 = 1.121971054 [c6 = 1 not needed] c7 = 0.860918669 [c8 not needed] c9 = 0.541196100 c10 = 0.366025404 c11 = 0.184591911

You see that we have only 3 multiplicators in this part: c4, c2, and c10. Furthermore we see from the numbers below that c10 = c2 - 1. Thus multiplication with c10 can be replaced by subtraction. This leaves us with just

**Odd part optimization (columns 1, 3, 5, 7):**

Rows 1 and 4 form a 'rotation' expression (see fast 4x4 IDCT derivation) which can
be spanned over the full odd columns block (1, 3, 5, 7) and thereby 'normalized' by
substituting x1 - x7 and x3 - x5 with factors c3 and c9.

Column 3 has just 2 multiplicators (c3 and c9).
The remaining elements can be reduced to 8 multiplications.

This gives us 3 (rotation) + 2 (column 3) + 8 = **13** mults in the odd part
calculation.

Note that the rotation with c3 and c9 is again the same as in the even part of the 8x8 point LL&M IDCT algorithm.