Block Diagram of MPEG AAC

Design of MPEG-4 AAC Encoder Authors: Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, KangYan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien Outline        Introduction Psychoacoustic Model M/S Coding Window Switch Temporal Noise Shaping Experiments & Demonstration Conclusion Introduction– NCTU-AAC Encoder Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing Bit Reservoir TNS M/S Bit Allocation Quantization VLC Introduction– NCTU-AAC Encoder Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing Bit Reservoir TNS M/S Bit Allocation Quantization VLC 1. Introduction– NCTU-AAC Encoder Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing Bit Reservoir TNS M/S Bit Allocation Quantization VLC 1. Introduction  Modules     Psychoacoustic Model M/S Coding Window Switch Temporal Noise Shaping Theoretical Frameworks Quality Complexity  Objective    2. Psychoacoustic Model  Approach MDCT-based instead of FFT-based.  New Masking Models  Detection of tonal attack band.  Detection of tone-rich signal.  2. Psychoacoustic Model (c.1)  MDCT and FFT    Similar spectrum. MDCT spectrum is chaotic due to the aliasing. MDCT leads to the consistent spectrum for analysis and encoding process. 2. Psychoacoustic Model (c.2) DCT Spectrum   Q-Bands instead of Lines or P-Bands Tone/Noise information based on  Band Flatness instead of Frame Predictivity N 1 1 GM b 1 N 1 flatness b  , GM b   xi N , AM b   xi AM b N i 0 i 0   For tone-rich signal in band, flatnessb approximates to 0 For noise-rich signal in band, flatnessb approximates to 1 2. Psychoacoustic Model-Adaptive TMN and NMT offset  Utilization Human Perception   Insensitivity in high frequency The masking effect in high frequency is higher than the lower one Offset 4 3.5 3 2.5 2 1.5 1 0.5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Offset 2. Psychoacoustic Model– Tone/Harmonic  Tonal Attack and Tone-Rich Signals   Tonal attack. Tone-rich signals. Masking adjustment Disable window switch Reconstructed Spectrum  Solution   Original Spectrum 2. Psychoacoustic Model– Concluding Remarks  New Models      Filterbank instead of FFT. SFM instead of unpredictivity. Detection of tonal attack bands. Detection of tonal-rich signals. Noise masking effect alone.  Results   Speedup by 70% and 65% for AAC and MP3. Quality improves by 0.2 and 0.1 for AAC and MP3. 3. M/S Coding Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing Bit Reservoir TNS M/S Bit Allocation Quantization VLC 3. M/S Coding Issues & Approach  Band-Level Switching Decision  Viterbi Algorithm from O(249) to O(49) Conservative masking threshold Allocation Entropy Coupling  M/S Psychoacoustic Model   Bit Allocated to M/S Channels   Joint Design with Window Switch  3. M/S Coding-- Viterbi Algorithm  Find the Optimal Solution   SLR(i) and SMS(i) represent the optimal accumulated cost found in i-th band αLR,LR, αLR,MS, αMS,LR and αMS,MS represent the transition cost S LR (0) nLR (0) S LR (1) S LR (47) S LR (48)  LR, LR nLR (1) nLR (47)  LR, LR nLR (48)  LR,MS  MS , LR nMS (0) S MS (0)  LR,MS  MS , LR nMS (1) n MS (47)  MS ,MS  MS ,MS n MS (48) S MS (48) S MS (1) S MS (47) Scale factor band 3. M/S Coding– Frame-Level Switching  Compare the AE of MS and LR  C1 is a constant factor False AE_MS < C1 * AE_LR ? True Use M/S Frame Use L/R Frame 3. M/S Coding– M/S Psychoacoustic Model  Noise of Reconstructed Signal L'i [k ]  M 'i [k ]  S 'i [k ] R' i [ k ]  M ' i [ k ]  S ' i [ k ] L'i [k ]  Li [k ]  N Li [k ]  M i [k ]  S i [k ]  N M i [k ]  N Si [k ] R'i [k ]  Ri [k ]  N Ri [k ]  M i [ k ]  S i [ k ]  N M i [ k ]  N Si [ k ] 3. M/S Coding– M/S Psychoacoustic Model  Variance of Noise 2 2  2   N   N  TL 2 2  2   N   N  TR NL i Mi Si i NR i Mi Si i  2  0.5  Min(TL , TR )  2  0.5  Min(TL , TR ) NS i i i NM i i i   TX is the masking threshold of X channel σX is the variance of X channel TM i  0.5  Min(TLi , TRi )  Threshold of M/S Channels TSi  0.5  Min(TLi , TRi ) 3. M/S Coding– Allocation Entropy  Ei   Ti Bi  0  if ( Ei  Ti Bi ) if ( Ei  Ti * Bi ) SMRChanneli AEChanneli  Wi  log( SMRChanneli  1)    Ei is the energy of i-th quantization band Bi is effective bandwidth of i-th quantization band Wi is the bandwidth of i-th quantization band 3. M/S Coding–  Available Bits in the M/S Channels Channel Allocation Bits L/R band ? True False AEM Bit M  B AEM  AES AEM = AEM + L_AE[i] AES = AES + R_AE[i] AEM = AEM + M_AE[i] AES = AES + S_AE[i] False i < 49 ? True AE S Bit S  B AE M  AE S  B is allocated bits for current frame 4. Window Switch Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing Bit Reservoir TNS M/S Bit Allocation Quantization VLC 4. Window Switch Design Issues     Window Decision Psychoacoustic Model Window Grouping Joint Design with Other AAC Modules 4. Window Switch– Window Decision    Global Energy Ratio Zero-Crossing Ratio Tonal Attack 4. Window Switch– Psychoacoustic Model  Models based on Long Window  Calculate SMRs for Short Windows From SMRs for Long Windows band SMRs for short window band SMRs for long window 4. Window Switch– Window Grouping  Calculate the Scale Factor  Bit allocation module calculate the scale factor for each band.  Error of Scale Factors Eg   sfb, w  sharedsfg ,b  bandwidthb b wg  Criterion   Minimizes the Grouping Number Eg in each group should be smaller than a threshold M 5. Temporal Noise Shaping Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing Bit Reservoir TNS M/S Bit Allocation Quantization VLC 5. TNS  Three Artifacts    Error Amplification at Attack periods Time-Aliasing TNS order vs Error. Detection Mechanism TNS Design  Design Issues ?   5. TNS  Remarks   Pre-aliasing leads to the tradeoff with Pre-echo Post-aliasing may be masked by post-aliasing 5. TNS-- Ease Aliasing Artifacts  Combining with Window Switch  Long Start and Long Stop window 6. Experiments      Psychoacoustic Model M/S Coding Window Switch TNS Overall 6. Experiments-- Test Samples Track 1 2 3 4 5 6 Time 10 8 7 10 12 11 es01 es02 es03 sc01 sc02 sc03 Signal description vocal (Suzan Vega) German speech English speech Trumpet solo and orchestra Orchestral piece Contemporary pop music Complex sound mixtures Speech signal 7 8 9 10 11 12 7 7 27 11 10 13 si01 si02 si03 sm01 sm02 sm03 Harpsichord Castanets pitch pipe Bagpipes Glockenspiel Plucked strings Simple sound mixtures Single instruments 6. Experiments–   Psychoacoustic Model Intel vTune 7.0 Psychoacoustic Models   P1: Psychoacoustic Model II P4: MDCT Psychoacoustic Model  Speed up 72.58% over P1 1 P1 30.24 2 29.66 3 29.75 4 29.96 5 27.75 Average Speedup (%) 29.47 72.58 P4 8.57 8.94 8.00 7.31 7.59 8.08 6. Experiments- Psychoacoustic Model Speed up 14.59% over P1 Tracks es01 Length 02:51 P1 26 P4 19 Percentage (%) 26.92 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03 Average 02:17 04:03 02:55 03:23 03:04 04:47 03:05 05:34 04:27 02:01 04:11 19 36 22 28 27 39 30 49 38 18 38 30.8 14 27 18 23 23 36 26 45 35 16 34 26.3 26.32 25.00 18.18 17.86 14.81 7.69 13.33 8.16 7.89 11.11 10.53 14.59 6. Experiments- Psychoacoustic Model Category Result   P4 gets better quality than P1 in speech signal, single instrument and simple sound mixtures For complex sound mixtures, only sc02 is worse than P1 es01 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 P1 P4 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03 6. Experiments– M/S Coding  Environment  Coding Mode es01 es02 es03 sc01 sc02 sc03 L/R -1.57 -2.03 -2.21 -0.74 -1.11 -0.7 New M/S -0.82 -0.55 -0.84 -0.54 -0.83 -0.52  Disable bit reservoir, window switch and TNS Uses P4  Improve 0.39 of average ODG si01 si02 si03 sm01 sm02 sm03 Average -1.16 -3.24 -1.29 -0.9 -1.54 -1.37 -1.4883 -1.05 -3.01 -1.21 -0.93 -1.4 -1.5 -1.1 6. Experiments– Window Switch  Coupling Method  Average ODGs of with and without coupling method are −0.7025 and −0.8483 Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S 01 02 03 sm sm sm si0 si0 si0 es es es sc sc sc 0 -0.2 -0.4 -0.6 ODG -0.8 -1 -1.2 -1.4 -1.6 NCTU_AAC without Coupling Method NCTU_AAC with Coupling Method A ve r 01 02 03 01 02 03 1 2 3 ag e 6. Experiments– TNS  Easing Aliasing Method   Improve quality except sm01 Especially for si02 6. Experiments– Overall Nero 6.3 QuickTime 6.3 NCTU-AAC  es01 es02 -0.6 -0.45 -0.32 -0.11 -0.27 -0.15 Commercial Encoders   es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03 Average -0.51 -0.88 -1.38 -0.84 -1.32 -0.82 -1.59 -1.36 -0.72 -1.29 -0.98 0.02 -0.22 -0.84 -0.64 -0.71 -0.72 -0.78 -0.75 -0.37 -0.73 -0.51417 -0.23 -0.45 -0.66 -0.4 -0.62 -0.54 -0.98 -0.61 -0.53 -0.62 -0.505  Nero 6.3 QuickTime 6.3 NCTU-AAC has better quality in all tracks as compared to Nero 6.3 NCTU-AAC has better quality in 7 tracks as compared to QuickTime 6.3 NCTU-AAC performs better than these two encoders in average Result    Encoders with Audio Patch Method Nero 6.3 es01 es02 es03 sc01 sc02 sc03 si01 si02 -0.6 -0.45 -0.51 -0.88 -1.38 -0.84 -1.32 -0.82 Nero6.3 +APM -0.38 -0.44 -0.43 -0.73 -0.70 -0.40 -0.52 -0.63 QuickTime 6.3 -0.32 -0.11 0.02 -0.22 -0.84 -0.64 -0.71 -0.72 QT6.3 +APM -0.26 -0.18 -0.02 -0.21 -0.43 -0.32 -0.47 -0.55 NCTU-AAC -0.27 -0.15 -0.23 -0.45 -0.66 -0.4 -0.62 -0.54 NCTUAAC +APM -0.28 -0.14 -0.24 -0.43 -0.51 -0.37 -0.43 -0.53 si03 sm01 sm02 sm03 Average -1.59 -1.36 -0.72 -1.29 -0.98 -0.64 -0.83 -0.73 -0.55 -0.5817 -0.78 -0.75 -0.37 -0.73 -0.51417 -0.43 -0.53 -0.38 -0.35 -0.34417 -0.98 -0.61 -0.53 -0.62 -0.505 -0.51 -0.46 -0.54 -0.42 -0.4050 QuickTime 6.3 with APM gets the best quality in average Conclusion Quality and Efficiency  Efficient Psychoacoustic Model   DCT-based Approach. Tonal Attack bands and Tone-Rich Signals. Efficient decision method. Psychoacoustic model for M/S channels. Viterbi algorithm.  M/S Coding     Window Switch    Switch Detection. New grouping method. Psychoacosutic Model for Short Window. Conclusion  TNS   Window Detection New window switch policy Single Loop Approach Two-Step Approach.  Bit Allocation   Bit Reservoir   Filter bank  Fast DCT method Zero band and High frequency extension.  Audio Patch Method  5. NCTU- AAC CODEC Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing PatchEnable Decoder Bit Reservoir TNS M/S Bit Allocation Quantization Effect VLC 5. NCTU- AAC CODEC (Patents) Audio in W-Switch Psychoacoustic Model Filterbank Bit-Stream Packing PatchEnable Decoder Bit Reservoir TNS M/S Bit Allocation Quantization Effect VLC SC03 Original SC03 QT 6.3 QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 SC03 Nero 6.3 QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 SC03 Lame 3.88 QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 SC03 NCTU-AAC QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 SC03 NCTU-MP3 QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 -0.91 SC03 QT 6.3+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 SC03 Nero 6.3+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM -0.4 NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 SC03 NCTU-AAC+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM -0.4 NCTU -AAC +APM -0.37 NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 SC03 NCTU-MP3+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM -0.4 NCTU -AAC +APM -0.37 NCTU -MP3 +APM -0.38 Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 SC03 Lame 3.88 + APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM -0.4 NCTU -AAC +APM -0.37 NCTU -MP3 +APM -0.38 Lame 3.88 +APM -0.41 -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 Questions

Related docs
MPEG propreties
Views: 164  |  Downloads: 24
Video Compression - MPEG
Views: 60  |  Downloads: 10
TechTronix_MPEG_Fundamentals
Views: 23  |  Downloads: 10
. BLOCK DIAGRAM OF THE
Views: 32  |  Downloads: 2
Literature MPEG Pocketguide
Views: 11  |  Downloads: 8
BLOCK DIAGRAM
Views: 15  |  Downloads: 2
AAC by the Bay 2007 Registration
Views: 1  |  Downloads: 0
SPMagic Block Diagram Block Diagram
Views: 15  |  Downloads: 0
Pure Java-Based Streaming MPEG Player
Views: 37  |  Downloads: 0
Block diagram of the Pep7 computer
Views: 102  |  Downloads: 8
BLOCK DIAGRAM � SCHEME � BLOCK SCHEMA
Views: 82  |  Downloads: 1
Other docs by l1ve65
Finance Lecture2
Views: 537  |  Downloads: 8
Sale of accounts of business
Views: 268  |  Downloads: 3
Sample Executive Summary proprietary media
Views: 233  |  Downloads: 1
Default and insecurity clause
Views: 271  |  Downloads: 2
Right to Request Initial Move
Views: 146  |  Downloads: 0
Offer to purchase or sell by partner
Views: 238  |  Downloads: 5
Patent for Cotton Gin info
Views: 177  |  Downloads: 1
Storage space
Views: 287  |  Downloads: 5
Minutes of Shareholders Meeting
Views: 277  |  Downloads: 7
Rental agreement
Views: 2982  |  Downloads: 35
Trusteeship agreement for failing business
Views: 219  |  Downloads: 3