Perceptual coding: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Rcooley~enwiki
Entirely worthless article. Redirect to Psychoacoustics as agreed on talk:Perceptual_coding.
 
imported>Gabriel Bouvigne
links in "see also" section
 
Line 1: Line 1:
#REDIRECT [[Psychoacoustics]]
{{Multiple issues|
{{refimprove|date=August 2025}}
{{Sources exist|date=August 2025}}
}}
'''Perceptual coding''' is a method of [[Lossy compression|''lossy'']] ''[[data compression]]''  that exploits the limitations of [[Sensory system|human sensory system]] in order to reduce data size. It is widely applied in audio, image and video compression standards, where certain details are removed or simplified because they are unlikely to be noticed under regular listening or viewing conditions.
 
Perceptual coding is based on models of human hearing and vision, such as those studied in [[psychoacoustics]] and [[psychovisual]] research. By discarding or reducing components of a signal that fall below perceptual thresholds, it achieves significant reductions in [[bit rate]] while maintaining a subjectively acceptable quality.
 
== Applications ==
Perceptual coding is central to many everyday technologies, including:
 
* Audio compression: Formats such as [[MP3]], [[Advanced Audio Coding|AAC]], and [[Opus (audio format)|Opus]] apply psychoacoustic models to remove inaudible frequencies or sounds masked by louder ones.
* Image compression: Standards such as [[JPEG]] rely on psychovisual principles, such as chroma subsampling (reducing color resolution relative to brightness).
* Video compression: Standards such as [[MPEG-2]], [[H.264/AVC]], and [[HEVC]] use similar compression methods as image compression and add other principles, such as temporal masking (reducing detail during rapid motion).
 
== History ==
 
=== Early analog applications ===
The principles of perceptual coding were applied in analog communication systems long before the advent of digital media.
 
* [[Telephony]]: Early telephone networks restricted audio transmission to a narrow band of roughly 300 Hz–3.4 kHz. Although much of the audible spectrum was discarded, this range was sufficient for intelligible speech, exploiting the fact that human listeners rely primarily on mid-range frequencies for communication.<ref>{{Cite journal |last=Mathialagan |first=A. |date=1984-10-01 |title=Automatic Telephony: Components |url=https://www.tandfonline.com/doi/abs/10.1080/09747338.1984.11436022 |journal=IETE Journal of Education |volume=25 |issue=4 |pages=119–122 |language=EN |doi=10.1080/09747338.1984.11436022 |issn=0974-7338|url-access=subscription }}</ref>
* [[FM stereo|FM stereo broadcasting]]: Introduced in the 1960s, FM stereo used a [[Joint encoding|sum-and-difference]] transmission system. The (L+R) signal carried the main content at full fidelity, while the (L−R) difference was modulated onto a subcarrier. This reduced bandwidth usage by assuming that much of the information in stereo channels is shared, while still providing an adequate sense of spatial separation.<ref>{{Cite journal |last=Sterling |first=Christopher H. |date=1971-06-01 |title=Decade of Development: FM Radio in the 1960s |url=https://doi.org/10.1177/107769907104800204 |journal=Journalism Quarterly |language=EN |volume=48 |issue=2 |pages=222–230 |doi=10.1177/107769907104800204 |issn=0022-5533|url-access=subscription }}</ref>
* [[Color television]]: Beginning in the 1950s, color TV systems such as [[NTSC]], [[PAL]], and [[SECAM]] took advantage of the human eye’s greater sensitivity to brightness than to color. They encoded a high-resolution [[Luminance (video)|luminance]] channel alongside lower-resolution [[Chroma subsampling|chrominance]] channels, allowing backward compatibility with black-and-white sets and conserving broadcast bandwidth.<ref>{{Cite journal |last1=Sterne |first1=Jonathan |last2=Mulvin |first2=Dylan |date=2014-08-01 |title=The Low Acuity for Blue: Perceptual Technics and American Color Television |url=https://doi.org/10.1177/1470412914529110 |journal=Journal of Visual Culture |language=EN |volume=13 |issue=2 |pages=118–138 |doi=10.1177/1470412914529110 |issn=1470-4129}}</ref>
* [[Fax]] transmission: Facsimile (fax) machines, particularly with the ITU-T Group 3 standard (1980), employed compression methods such as [[run-length encoding]] to reduce data. Standard fax resolutions (e.g., 200 × 100 dpi) were chosen to preserve legibility of text while omitting fine details like paper texture or ink gradients, relying on the psychovisual observation that readers perceive documents as intact even when such subtleties are lost.<ref>{{Cite book |last=McCullough |first=T. L. |chapter=CCITT standardization for digital facsimile |date=1980-05-19 |title=Proceedings of the May 19-22, 1980, national computer conference on - AFIPS '80 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=409–413 |doi=10.1145/1500518.1500582 |isbn=978-1-4503-7923-6|doi-access=free }}</ref><ref>{{Cite book |last=Mitchell |first=Joan L. |chapter=Facsimile image coding |date=1980-05-19 |title=Proceedings of the May 19-22, 1980, national computer conference on - AFIPS '80 |chapter-url=https://dl.acm.org/doi/10.1145/1500518.1500584 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=423–426 |doi=10.1145/1500518.1500584 |isbn=978-1-4503-7923-6|chapter-url-access=subscription }}</ref>
 
These analog systems demonstrated the effectiveness of tailoring transmission to the characteristics of human perception, laying the groundwork for digital perceptual coding methods.
 
=== Digital development ===
Research in the 1970s and 1980s on [[psychoacoustics]] and [[psychovisual]] modeling provided the basis for digital perceptual coding. In audio, this led to the late-1980s development of the [[Moving Picture Experts Group|MPEG]] audio formats (such as [[MP3]]), which achieved major reductions in [[bit rate]] by discarding inaudible sound components. At the same time, the MPEG standards applied similar principles to video, using techniques such as [[chroma subsampling]] and motion-adaptive coding.
 
During the 1990s and 2000s, perceptual coding was embedded in widely used formats including [[Advanced Audio Coding|AAC]], [[MPEG-2 video]], and [[H.264/AVC]], supporting the rise of digital media distribution on CDs, DVDs, and early internet platforms. More recent codecs, such as [[High Efficiency Video Coding|HEVC]], [[AV1]], and [[Opus (audio format)|Opus]], continue to refine perceptual models to balance compression efficiency with quality on modern networks and devices.
 
== Relation to other fields ==
Perceptual coding draws directly on [[psychoacoustics]] (the study of auditory perception) and [[psychovisual]] research (the study of visual perception). These disciplines provide the models that determine which parts of a signal can be safely removed without affecting perceived quality.
 
== See also ==
* [[Psychoacoustics]]
* [[Psychovisual|Psychovisuals]]
* [[Lossy compression]]
* [[Audio coding format]]
* [[Video coding format]]
 
== References ==
{{reflist}}
 
[[Category:Perception]]
[[Category:Coding theory]]

Latest revision as of 13:02, 7 October 2025

Script error: No such module "Unsubst". Perceptual coding is a method of lossy data compression that exploits the limitations of human sensory system in order to reduce data size. It is widely applied in audio, image and video compression standards, where certain details are removed or simplified because they are unlikely to be noticed under regular listening or viewing conditions.

Perceptual coding is based on models of human hearing and vision, such as those studied in psychoacoustics and psychovisual research. By discarding or reducing components of a signal that fall below perceptual thresholds, it achieves significant reductions in bit rate while maintaining a subjectively acceptable quality.

Applications

Perceptual coding is central to many everyday technologies, including:

  • Audio compression: Formats such as MP3, AAC, and Opus apply psychoacoustic models to remove inaudible frequencies or sounds masked by louder ones.
  • Image compression: Standards such as JPEG rely on psychovisual principles, such as chroma subsampling (reducing color resolution relative to brightness).
  • Video compression: Standards such as MPEG-2, H.264/AVC, and HEVC use similar compression methods as image compression and add other principles, such as temporal masking (reducing detail during rapid motion).

History

Early analog applications

The principles of perceptual coding were applied in analog communication systems long before the advent of digital media.

  • Telephony: Early telephone networks restricted audio transmission to a narrow band of roughly 300 Hz–3.4 kHz. Although much of the audible spectrum was discarded, this range was sufficient for intelligible speech, exploiting the fact that human listeners rely primarily on mid-range frequencies for communication.[1]
  • FM stereo broadcasting: Introduced in the 1960s, FM stereo used a sum-and-difference transmission system. The (L+R) signal carried the main content at full fidelity, while the (L−R) difference was modulated onto a subcarrier. This reduced bandwidth usage by assuming that much of the information in stereo channels is shared, while still providing an adequate sense of spatial separation.[2]
  • Color television: Beginning in the 1950s, color TV systems such as NTSC, PAL, and SECAM took advantage of the human eye’s greater sensitivity to brightness than to color. They encoded a high-resolution luminance channel alongside lower-resolution chrominance channels, allowing backward compatibility with black-and-white sets and conserving broadcast bandwidth.[3]
  • Fax transmission: Facsimile (fax) machines, particularly with the ITU-T Group 3 standard (1980), employed compression methods such as run-length encoding to reduce data. Standard fax resolutions (e.g., 200 × 100 dpi) were chosen to preserve legibility of text while omitting fine details like paper texture or ink gradients, relying on the psychovisual observation that readers perceive documents as intact even when such subtleties are lost.[4][5]

These analog systems demonstrated the effectiveness of tailoring transmission to the characteristics of human perception, laying the groundwork for digital perceptual coding methods.

Digital development

Research in the 1970s and 1980s on psychoacoustics and psychovisual modeling provided the basis for digital perceptual coding. In audio, this led to the late-1980s development of the MPEG audio formats (such as MP3), which achieved major reductions in bit rate by discarding inaudible sound components. At the same time, the MPEG standards applied similar principles to video, using techniques such as chroma subsampling and motion-adaptive coding.

During the 1990s and 2000s, perceptual coding was embedded in widely used formats including AAC, MPEG-2 video, and H.264/AVC, supporting the rise of digital media distribution on CDs, DVDs, and early internet platforms. More recent codecs, such as HEVC, AV1, and Opus, continue to refine perceptual models to balance compression efficiency with quality on modern networks and devices.

Relation to other fields

Perceptual coding draws directly on psychoacoustics (the study of auditory perception) and psychovisual research (the study of visual perception). These disciplines provide the models that determine which parts of a signal can be safely removed without affecting perceived quality.

See also

References

<templatestyles src="Reflist/styles.css" />

  1. Script error: No such module "Citation/CS1".
  2. Script error: No such module "Citation/CS1".
  3. Script error: No such module "Citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".