MP3 Compression Techniques

The Motion Picture Expert Group (MPEG) developed the compression techniques and the acoustic model required to compress audio data. The main part of the compression technique employs psychoacoustic masking principles to remove sound that is not normally audible by the human ear, thereby reducing the size of the data file.

The optimum acoustic curve, profiled through testing the hearing capability of a diverse group of people, provided the basis to determine which parts of the audio spectrum to remove. There are three parts to this acoustic model described below.

Threshold and Sensitivity

The human hearing is most sensitive between 2 kHz and 4 kHz. The image above shows the threshold level of sensitivity.

Masking

The second part deals with the masking effect. When a loud sound is present, the weaker sounds are masked.

A loud 1 kHz sound represented by the black curve will mask the lesser sensitive parts of the hearing curve shown in red.

The hearing is less sensitive after 4.1 kHz and onwards, and therefore that area is masked.

Temporal Masking

The third part deals with temporal masking. When a loud tone of a certain frequency occurs, the sound at the adjacent frequencies, for a short duration even when the loud tone is has already stopped, is masked.

Principle of MP3

They divided the audio spectrum into 32 frequency bands, and defined each band with a threshold level determined by the acoustic curve. Therefore, removing any sound falling outside of the threshold level reduces the size of the data file.

The encoder analyses the incoming sound and creates a filter bank. The acoustic model is typically stored in the perceptual block. Comparison of the data in the filter bank with that in the perpetual block enables the algorithm to remove sound based on the model described above.