Digital signal processor: Difference between revisions

Latest revision as of 16:27, 28 November 2025

An L7A1045 DSP chip, as used in several Akai samplers and the Hyper Neo Geo 64 arcade board

The NeXTcube from 1990 had a Motorola 68040 (25 MHz) and a digital signal processor Motorola 56001 with 25 MHz which was directly accessible via an interface.

A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing.^[1]Template:Rp^[2] DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips.^[3]^[4] They are widely used in audio signal processing, telecommunications, digital image processing, radar, sonar and speech recognition systems, and in common consumer electronic devices such as mobile phones, disk drives and high-definition television (HDTV) products.^[3]

The goal of a DSP is usually to measure, filter or compress continuous real-world analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but may not be able to keep up with such processing continuously in real-time. Also, dedicated DSPs usually have better power efficiency, thus they are more suitable in portable devices such as mobile phones because of power consumption constraints.^[5] DSPs often use special memory architectures that are able to fetch multiple data or instructions at the same time.

Overview

File:DSP block diagram.svg

A typical digital processing system

Digital signal processing (DSP) algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.

Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power efficiency constraints.^[5] A specialized DSP, however, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialised cooling or large batteries.Script error: No such module "Unsubst".

Such performance improvements have led to the introduction of digital signal processing in commercial communications satellites where hundreds or even thousands of analog filters, switches, frequency converters and so on are required to receive and process the uplinked signals and ready them for downlinking, and can be replaced with specialised DSPs with significant benefits to the satellites' weight, power consumption, complexity/cost of construction, reliability and flexibility of operation. For example, the SES-12 and SES-14 satellites from operator SES launched in 2018, were both built by Airbus Defence and Space with 25% of capacity using DSP.^[6]

The architecture of a DSP is optimized specifically for digital signal processing. Most also support some of the features of an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.

Architecture

Software architecture

By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple ARM or x86 instructions to compute might require only one instruction in a DSP-optimized instruction set.

One implication for software architecture is that hand-optimized assembly-code routines (assembly programs) are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms. Even with modern compiler optimizations, hand-optimized assembly code is more efficient,Script error: No such module "Unsubst". and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations.

Instruction sets

multiply–accumulates (MACs, including fused multiply–add, FMA) operations
- used extensively in all kinds of matrix operations
  - convolution for filtering
  - dot product
  - polynomial evaluation
- Fundamental DSP algorithms depend heavily on multiply–accumulate performance
  - FIR filters
  - Fast Fourier transform (FFT)
related instructions:
- SIMD
- VLIW
Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing
DSPs sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.Script error: No such module "Unsubst".
Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle – typically supporting reading 2 data values from 2 separate data buses and the next instruction (from the instruction cache, or a 3rd program memory) simultaneously.^[7]^[8]^[9]^[10]
Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing—such as zero-overhead looping^[11]^[12] and hardware loop buffers.^[13]^[14]

Data instructions

Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold, rather than wrapping around (maximum + 1 doesn't overflow to minimum, as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
Fixed-point arithmetic is often used to speed up arithmetic processing.
Single-cycle operations to increase the benefits of pipelining.

Program flow

Floating-point unit integrated directly into the datapath
Pipelined architecture
Highly parallel multiplier–accumulators (MAC units)
Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations

Hardware architecture

Memory architecture

DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data or instructions at the same time, such as the Harvard architecture or Modified von Neumann architecture, which use separate program and data memories (sometimes even concurrent access on multiple data buses).

DSPs can sometimes rely on supporting code to know about cache hierarchies and the associated delays. This is a tradeoff that allows for better performanceScript error: No such module "Unsubst".. In addition, extensive use of DMA is employed.

Addressing and virtual memory

DSPs frequently use multi-tasking operating systems, but have no support for virtual memory or memory protection. Operating systems that use virtual memory require more time for context switching among processes, which increases latency.

Hardware modulo addressing
- Allows circular buffers to be implemented without having to test for wrapping
Bit-reversed addressing, a special addressing mode
- useful for calculating FFTs
Exclusion of a memory management unit
Address generation unit

History

File:TRW 1010J 1.jpg

TRW TDC1010 multiplier-accumulator

Development

In 1976, Richard Wiggins proposed the Speak & Spell concept to Paul Breedlove, Larry Brantingham, and Gene Frantz at Texas Instruments' Dallas research facility. Two years later in 1978, they produced the first Speak & Spell, with the technological centerpiece being the TMS5100,^[15] the industry's first digital signal processor. It also set other milestones, being the first chip to use linear predictive coding to perform speech synthesis.^[16] The chip was made possible with a 7 μm PMOS fabrication process.^[17]

In 1978, American Microsystems (AMI) released the S2811.^[3]^[4] The AMI S2811 "signal processing peripheral", like many later DSPs, has a hardware multiplier that enables it to do multiply–accumulate operation in a single instruction.^[18] The S2281 was the first integrated circuit chip specifically designed as a DSP, and fabricated using vertical metal oxide semiconductor (VMOS, V-groove MOS), a technology that had previously not been mass-produced.^[4] It was designed as a microprocessor peripheral, for the Motorola 6800,^[3] and it had to be initialized by the host. The S2811 was not successful in the market.

In 1979, Intel released the 2920 as an "analog signal processor".^[19] It had an on-chip ADC/DAC with an internal signal processor, but it didn't have a hardware multiplier and was not successful in the market.

In 1980, the first stand-alone, complete DSPs – Nippon Electric Corporation's NEC μPD7720 based on the modified Harvard architecture^[20] and AT&T's DSP1 – were presented at the International Solid-State Circuits Conference '80. Both processors were inspired by the research in public switched telephone network (PSTN) telecommunications. The μPD7720, introduced for voiceband applications, was one of the most commercially successful early DSPs.^[3]

The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.Script error: No such module "Unsubst".

Another DSP produced by Texas Instruments (TI), the TMS32010 presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs.

About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate tight loops; they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model only required about 21 ns for a MAC. Members of this generation were, for example, the AT&T DSP16A or the Motorola 56000.

The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80.

The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD extensions were added, and VLIW and the superscalar architecture appeared. As always, the clock speeds have increased; a 3 ns MAC now became possible.

Modern DSPs

Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry, and a wider bus system. Not all DSPs provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300.

Texas Instruments produces the C6000 series DSPs, which have clock speeds of 1.2 GHz and implement separate instruction and data caches. They also have an 8 MiB 2nd level cache and 64 EDMA channels. The top models are capable of as many as 8000 MIPS (millions of instructions per second), use VLIW (very long instruction word), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSPs, and the newest generation C6000 chips support floating point as well as fixed point processing.

Freescale produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz.

XMOS produces a multi-core multi-threaded line of processors well suited to DSP operations. They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4-core device would support up to 32 real-time threads. Threads communicate with each other through buffered channels that are capable of up to 80 Mbit/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGAs

CEVA, Inc. produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16-bit or 32-bit word-widths and single or dual MACs. The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR) modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs.

Analog Devices produce the SHARC-based DSP and range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400 MHz/2400 MFLOPS. Some models support multiple multipliers and ALUs, SIMD instructions and audio processing-specific components and peripherals. The Blackfin family of embedded digital signal processors combines the features of a DSP with those of a general-purpose processor. As a result, these processors can run simple operating systems like μCLinux, Velocity and Nucleus RTOS while operating on real-time data. The SHARC-based ADSP-210xx provides both delayed branches and non-delayed branches.^[21]

NXP Semiconductors produce DSPs based on TriMedia VLIW technology, optimized for audio and video processing. In some products, the DSP core is hidden as a fixed-function block into a SoC, but NXP also provides a range of flexible single-core media processors. The TriMedia media processors support both fixed-point arithmetic as well as floating-point arithmetic, and have specific instructions to deal with complex filters and entropy coding.

CSR produces the Quatro family of SoCs that contain one or more custom Imaging DSPs optimized for processing document image data for scanner and copier applications.

Microchip Technology produces the PIC24 based dsPIC line of DSPs. Introduced in 2004, the dsPIC is designed for applications needing a true DSP as well as a true microcontroller, such as motor control and in power supplies. The dsPIC runs at up to 40MIPS, and has support for 16-bit fixed-point MAC, bit reverse and modulo addressing, as well as DMA.

Most DSPs use fixed-point arithmetic, because in real-world signal processing, the additional range provided by floating point is not needed, and there is a large speed and cost benefit due to reduced hardware complexity. Floating-point DSPs may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating-point DSPs to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.

Generally, DSPs are dedicated integrated circuits; however, DSP functionality can also be produced by using field-programmable gate array chips (FPGAs).

Embedded general-purpose RISC processors are becoming increasingly DSP like in functionality. For example, the OMAP3 processors include an ARM Cortex-A8 and C6000 DSP.

In communications, a new breed of DSPs offering the fusion of both DSP functions and hardware acceleration functions is making its way into the mainstream. Such Modem processors include ASOCS ModemX and CEVA's XC4000.

In May 2018, Huarui-2 designed by Nanjing Research Institute of Electronics Technology of China Electronics Technology Group passed acceptance. With a processing speed of 0.4 TFLOPS, the chip can achieve better performance than current mainstream DSP chips.^[22] The design team has begun to create Huarui-3, which has a processing speed in TFLOPS level and support for artificial intelligence.^[23]

DSP-based tuners for analog radio Script error: No such module "anchor".

File:Panasonic RF-2400D FM AM portable radio - Front.jpg

Panasonic RF-2400D AM/FM radio. Despite a modern DSP-based internal design^[24] this retains a traditional layout and mechanical tuning, and the same external appearance as the older analog RF-2400.

Since the 2010s, an increasing proportion of radios designed for reception of traditional analog FM and AM short and medium wave broadcasts have replaced much of the analog tuning circuitry in older designs with DSP-based digital ICs which perform the bulk of the processing and decoding in the digital domain. An example of such an IC is the Silicon Labs/Skyworks Si4831/35 series, which supports both FM and AM decoding within a single chip.^[25]^[26]

Many such ICs (including the Si4831/35 above) are suitable for use with- and designed for- externally traditional, mechanically-tuned designs.^[26]^[24] Compared to traditional "true" analog circuitry, these may exhibit noticeable tuning and audio idiosyncracies (e.g. tuning jumping in discrete "steps" rather than continuously),^[27] particularly with older DSP-based designs.

References

↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ ^a ^b ^c ^d ^e Script error: No such module "citation/CS1".
↑ ^a ^b ^c Script error: No such module "citation/CS1".
↑ ^a ^b Script error: No such module "citation/CS1".
↑ Beyond Frontiers Broadgate Publications (September 2016) pp22
↑ "Memory and DSP Processors".
↑ Script error: No such module "citation/CS1".
↑ "Architecture of the Digital Signal Processor"
↑ "ARC XY Memory DSP Option".
↑ "Zero Overhead Loops".
↑ "ADSP-BF533 Blackfin Processor Hardware Reference". p. 4-15.
↑ "Understanding Advanced Processor Features Promotes Efficient Coding".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Alberto Luis Andres. "Digital Graphic Audio Equalizer". p. 48.
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ ^a ^b Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ ^a ^b Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

External links

Script error: No such module "Navbox". Template:Hardware acceleration Template:Authority control

[1] Script error: No such module "citation/CS1".

[Liptak-2] Script error: No such module "citation/CS1".

[computerhistory1979-3] Script error: No such module "citation/CS1".

[edn-4] Script error: No such module "citation/CS1".

[schaum-2004-5] Script error: No such module "citation/CS1".

[6] Beyond Frontiers Broadgate Publications (September 2016) pp22

[7] "Memory and DSP Processors".

[8] Script error: No such module "citation/CS1".

[9] "Architecture of the Digital Signal Processor"

[10] "ARC XY Memory DSP Option".

[11] "Zero Overhead Loops".

[12] "ADSP-BF533 Blackfin Processor Hardware Reference". p. 4-15.

[13] "Understanding Advanced Processor Features Promotes Efficient Coding".

[14] Script error: No such module "citation/CS1".

[15] Script error: No such module "citation/CS1".

[16] Script error: No such module "citation/CS1".

[17] Script error: No such module "citation/CS1".

[18] Alberto Luis Andres. "Digital Graphic Audio Equalizer". p. 48.

[19] Script error: No such module "citation/CS1".

[20] Script error: No such module "citation/CS1".

[21] Script error: No such module "citation/CS1".

[22] Script error: No such module "citation/CS1".

[xinhua-23] Script error: No such module "citation/CS1".

[panasonic_shop-24] Script error: No such module "citation/CS1".

[digikey-25] Script error: No such module "citation/CS1".

[si4831_35_b30-26] Script error: No such module "citation/CS1".

[swling-27] Script error: No such module "citation/CS1".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

@@ Line 3: / Line 3: @@
 [[File:NeXTcube motherboard.jpg|thumb|The [[NeXTcube]] from 1990 had a [[Motorola 68040]] (25&nbsp;MHz) and a digital signal processor [[Motorola 56001]] with 25&nbsp;MHz which was directly accessible via an interface.]]
-A '''digital signal processor''' ('''DSP''') is a specialized [[microprocessor]] chip, with its architecture optimized for the operational needs of [[digital signal processing]].<ref>{{cite book |editor-last1=Yovits |editor-first1=Marshall C. |last1=Dyer |first1=Stephen A. |last2=Harms |first2=Brian K. |chapter=Digital Signal Processing |title=Advances in Computers |date=1993-08-13 |volume=37 |pages=59{{hyphen}}118 |publisher=[[Academic Press]] |doi=10.1016/S0065-2458(08)60403-9 |isbn=978-0120121373 |issn=0065-2458 |lccn=59015761 |chapter-url=https://books.google.com/books?id=vL-bB7GALAwC&pg=PA104 |ol=OL10070096M |oclc=858439915 |df=dmy-all}}</ref>{{rp|pages=104{{hyphen}}107}}<ref name="Liptak">{{cite book |last=Liptak |first=B. G. |title=Process Control and Optimization |series=Instrument Engineers' Handbook |edition=4th |year=2006 |volume=2 |pages=11–12 |publisher=CRC Press |isbn=978-0849310812 |url=https://books.google.com/books?id=TxKynbyaIAMC&pg=PA11 |via=[[Google Books]]}}</ref> DSPs are [[semiconductor device fabrication|fabricated]] on [[MOSFET|metal–oxide–semiconductor]] (MOS) [[integrated circuit]] chips.<ref name="computerhistory1979">{{cite web |title=1979: Single Chip Digital Signal Processor Introduced |url=https://www.computerhistory.org/siliconengine/single-chip-digital-signal-processor-introduced/ |access-date=14 October 2019 |website=The Silicon Engine |publisher=[[Computer History Museum]]}}</ref><ref name="edn">{{cite web |last1=Taranovich |first1=Steve |date=August 27, 2012 |title=30 years of DSP: From a child's toy to 4G and beyond |url=https://www.edn.com/design/systems-design/4394792/30-years-of-DSP--From-a-child-s-toy-to-4G-and-beyond |access-date=14 October 2019 |website=[[EDN (magazine)|EDN]]}}</ref> They are widely used in [[audio signal processing]], [[telecommunications]], [[digital image processing]], [[radar]], [[sonar]] and [[speech recognition]] systems, and in common [[consumer electronic]] devices such as [[mobile phones]], [[disk drives]] and [[high-definition television]] (HDTV) products.<ref name="computerhistory1979"/>
+A '''digital signal processor''' ('''DSP''') is a specialized [[microprocessor]] chip, with its architecture optimized for the operational needs of [[digital signal processing]].<ref>{{cite book |editor-last1=Yovits |editor-first1=Marshall C. |last1=Dyer |first1=Stephen A. |last2=Harms |first2=Brian K. |chapter=Digital Signal Processing |title=Advances in Computers |date=1993-08-13 |volume=37 |pages=59{{hyphen}}118 |publisher=[[Academic Press]] |doi=10.1016/S0065-2458(08)60403-9 |isbn=978-0120121373 |issn=0065-2458 |lccn=59015761 |chapter-url=https://books.google.com/books?id=vL-bB7GALAwC&pg=PA104 |ol=OL10070096M |oclc=858439915 |df=dmy-all}}</ref>{{rp|pages=104{{hyphen}}107}}<ref name="Liptak">{{cite book |last=Liptak |first=B. G. |title=Process Control and Optimization |series=Instrument Engineers' Handbook |edition=4th |year=2006 |volume=2 |pages=11–12 |publisher=CRC Press |isbn=978-0849310812 |url=https://books.google.com/books?id=TxKynbyaIAMC&pg=PA11 |via=[[Google Books]]}}</ref> DSPs are [[semiconductor device fabrication|fabricated]] on [[metal–oxide–semiconductor]] (MOS) [[integrated circuit]] chips.<ref name="computerhistory1979">{{cite web |title=1979: Single Chip Digital Signal Processor Introduced |url=https://www.computerhistory.org/siliconengine/single-chip-digital-signal-processor-introduced/ |access-date=14 October 2019 |website=The Silicon Engine |publisher=[[Computer History Museum]]}}</ref><ref name="edn">{{cite web |last1=Taranovich |first1=Steve |date=August 27, 2012 |title=30 years of DSP: From a child's toy to 4G and beyond |url=https://www.edn.com/design/systems-design/4394792/30-years-of-DSP--From-a-child-s-toy-to-4G-and-beyond |access-date=14 October 2019 |website=[[EDN (magazine)|EDN]]}}</ref> They are widely used in [[audio signal processing]], [[telecommunications]], [[digital image processing]], [[radar]], [[sonar]] and [[speech recognition]] systems, and in common [[consumer electronic]] devices such as [[mobile phones]], [[disk drives]] and [[high-definition television]] (HDTV) products.<ref name="computerhistory1979"/>
 The goal of a DSP is usually to measure, filter or compress continuous real-world [[analog signals]]. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but may not be able to keep up with such processing continuously in real-time. Also, dedicated DSPs usually have better power efficiency, thus they are more suitable in portable devices such as [[mobile phone]]s because of power consumption constraints.<ref name="schaum-2004">{{cite web
@@ Line 28: / Line 28: @@
 ===Software architecture===
-By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple [[ARM architecture family|ARM]] or [[x86]] instructions to compute might require only one instruction in a DSP optimized instruction set.
+By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple [[ARM architecture family|ARM]] or [[x86]] instructions to compute might require only one instruction in a DSP-optimized instruction set.
-One implication for software architecture is that hand-optimized [[assembly language|assembly-code]] [[Subroutine|routines]] (assembly programs) are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms. Even with modern compiler optimizations hand-optimized assembly code is more efficient and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations.
+One implication for software architecture is that hand-optimized [[assembly language|assembly-code]] [[Subroutine|routines]] (assembly programs) are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms. Even with modern compiler optimizations, hand-optimized assembly code is more efficient,{{cn|date=November 2025}} and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations.
 ====Instruction sets====
@@ Line 42: / Line 42: @@
 ***[[Fast Fourier transform]] (FFT)
 *related instructions:
-**[[Single instruction, multiple data|SIMD]]
+**[[SIMD]]
 **[[VLIW]]
-*Specialized instructions for [[modular arithmetic|modulo]] addressing in [[circular buffer|ring buffers]] and bit-reversed addressing mode for [[Fast Fourier transform|FFT]] cross-referencing
+*Specialized instructions for [[modular arithmetic|modulo]] addressing in [[circular buffer|ring buffers]] and bit-reversed addressing mode for [[FFT]] cross-referencing
 *DSPs sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.{{Citation needed|date=March 2020|reason=Please link to something that defines 'time-stationary encoding'}}
 *Multiple arithmetic units may require [[memory architecture]]s to support several accesses per instruction cycle – typically supporting reading 2 data values from 2 separate data buses and the next instruction (from the instruction cache, or a 3rd program memory) simultaneously.<ref>
@@ Line 63: / Line 63: @@
 ====Data instructions====
-*[[Saturation arithmetic]], in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
+*[[Saturation arithmetic]], in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold, rather than wrapping around (maximum&nbsp;+ 1 doesn't overflow to minimum, as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
 *[[Fixed-point arithmetic]] is often used to speed up arithmetic processing.
 *Single-cycle operations to increase the benefits of [[Pipeline (computing)|pipelining]].
@@ Line 95: / Line 95: @@
 [[File:TRW 1010J 1.jpg|thumb|TRW TDC1010 multiplier-accumulator]]
 ===Development===
-In 1976, Richard Wiggins proposed the [[Speak & Spell (toy)|Speak & Spell]] concept to Paul Breedlove, Larry Brantingham, and Gene Frantz at [[Texas Instruments]]' Dallas research facility. Two years later in 1978, they produced the first Speak & Spell, with the technological centerpiece being the [[TMS5100]],<ref>{{cite web |publisher=IEEE |work=IEEE Milestones |title=Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978 |url=http://www.ieeeghn.org/wiki/index.php/Milestones:Speak_%26_Spell,_the_First_Use_of_a_Digital_Signal_Processing_IC_for_Speech_Generation,_1978 |access-date=2012-03-02}}</ref> the industry's first digital signal processor. It also set other milestones, being the first chip to use linear predictive coding to perform [[speech synthesis]].<ref>{{cite web |author=Bogdanowicz, A. |title=IEEE Milestones Honor Three |url=http://theinstitute.ieee.org/technology-focus/technology-history/ieee-milestones-honor-four476 |work=The Institute |publisher=IEEE |date=2009-10-06 |access-date=2012-03-02 |archive-url=https://web.archive.org/web/20160304200210/http://theinstitute.ieee.org/technology-focus/technology-history/ieee-milestones-honor-four476 |archive-date=2016-03-04 |url-status=dead}}</ref> The chip was made possible with a [[10 μm process|7 μm]] [[PMOS logic|PMOS]] [[semiconductor device fabrication|fabrication process]].<ref>{{cite book |last1=Khan |first1=Gul N. |last2=Iniewski |first2=Krzysztof |title=Embedded and Networking Systems: Design, Software, and Implementation |date=2017 |publisher=[[CRC Press]] |isbn=9781351831567 |page=2 |url=https://books.google.com/books?id=vx8uDwAAQBAJ&pg=PR14}}</ref>
+In 1976, Richard Wiggins proposed the [[Speak & Spell (toy)|Speak & Spell]] concept to Paul Breedlove, Larry Brantingham, and Gene Frantz at [[Texas Instruments]]' Dallas research facility. Two years later in 1978, they produced the first Speak & Spell, with the technological centerpiece being the [[TMS5100]],<ref>{{cite web |publisher=IEEE |work=IEEE Milestones |title=Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978 |url=http://www.ieeeghn.org/wiki/index.php/Milestones:Speak_%26_Spell,_the_First_Use_of_a_Digital_Signal_Processing_IC_for_Speech_Generation,_1978 |access-date=2012-03-02}}</ref> the industry's first digital signal processor. It also set other milestones, being the first chip to use linear predictive coding to perform [[speech synthesis]].<ref>{{cite web |author=Bogdanowicz, A. |title=IEEE Milestones Honor Three |url=http://theinstitute.ieee.org/technology-focus/technology-history/ieee-milestones-honor-four476 |work=The Institute |publisher=IEEE |date=2009-10-06 |access-date=2012-03-02 |archive-url=https://web.archive.org/web/20160304200210/http://theinstitute.ieee.org/technology-focus/technology-history/ieee-milestones-honor-four476 |archive-date=2016-03-04 |url-status=dead}}</ref> The chip was made possible with a [[10 μm process|7 μm]] [[PMOS logic|PMOS]] [[fabrication process]].<ref>{{cite book |last1=Khan |first1=Gul N. |last2=Iniewski |first2=Krzysztof |title=Embedded and Networking Systems: Design, Software, and Implementation |date=2017 |publisher=[[CRC Press]] |isbn=9781351831567 |page=2 |url=https://books.google.com/books?id=vx8uDwAAQBAJ&pg=PR14}}</ref>
 In 1978, [[American Microsystems]] (AMI) released the S2811.<ref name="computerhistory1979"/><ref name="edn"/> The AMI S2811 "signal processing peripheral", like many later DSPs, has a hardware multiplier that enables it to do [[multiply–accumulate operation]] in a single instruction.<ref>Alberto Luis Andres. [http://scholarworks.csun.edu/bitstream/handle/10211.3/126902/AndresAlberto1983.pdf "Digital Graphic Audio Equalizer"]. p. 48.</ref> The S2281 was the first [[integrated circuit]] chip specifically designed as a DSP, and fabricated using vertical metal oxide semiconductor ([[VMOS]], V-groove MOS), a technology that had previously not been mass-produced.<ref name="edn"/> It was designed as a microprocessor peripheral, for the [[Motorola 6800]],<ref name="computerhistory1979"/> and it had to be initialized by the host. The S2811 was not successful in the market.
@@ Line 105: / Line 105: @@
 The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.{{citation needed|reason=no mention on the web, except of WP text copies and translations|date=December 2014}}
-Another DSP produced by Texas Instruments (TI), the [[Texas Instruments TMS320|TMS32010]] presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390&nbsp;ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs.
+Another DSP produced by Texas Instruments (TI), the [[TMS32010]] presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390&nbsp;ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs.
-About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate [[tight loop]]s; they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model only required about 21&nbsp;ns for a MAC. Members of this generation were for example the AT&T DSP16A or the [[Motorola 56000]].
+About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate [[tight loop]]s; they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model only required about 21&nbsp;ns for a MAC. Members of this generation were, for example, the AT&T DSP16A or the [[Motorola 56000]].
-The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier-transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80.
+The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80.
-The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD extensions were added, and VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased; a 3&nbsp;ns MAC now became possible.
+The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD extensions were added, and VLIW and the superscalar architecture appeared. As always, the clock speeds have increased; a 3&nbsp;ns MAC now became possible.
 ==Modern DSPs==
@@ Line 121: / Line 121: @@
 [[Freescale]] produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1&nbsp;GHz.
-[[XMOS]] produces a multi-core multi-threaded line of processor well suited to DSP operations, They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4 core device would support up to 32 real time threads. Threads communicate between each other with buffered channels that are capable of up to 80&nbsp;Mbit/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGAs
+[[XMOS]] produces a multi-core multi-threaded line of processors well suited to DSP operations. They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4-core device would support up to 32 real-time threads. Threads communicate with each other through buffered channels that are capable of up to {{nowrap|80 Mbit/s}}. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGAs
 [[CEVA, Inc.]] produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16-bit or 32-bit word-widths and single or dual [[Multiply–accumulate operation|MACs]]. The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets [[Software-defined radio|Software-defined Radio (SDR)]] modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs.
-[[Analog Devices]] produce the [[Super Harvard Architecture Single-Chip Computer|SHARC]]-based DSP and range in performance from 66&nbsp;MHz/198 [[MFLOPS]] (million floating-point operations per second) to 400&nbsp;MHz/2400 MFLOPS. Some models support multiple [[binary multiplier|multiplier]]s and [[Arithmetic logic unit|ALU]]s, [[Single instruction, multiple data|SIMD]] instructions and audio processing-specific components and peripherals. The [[Blackfin]] family of embedded digital signal processors combine the features of a DSP with those of a general use processor. As a result, these processors can run simple [[operating system]]s like [[μCLinux]], velocity and [[Nucleus RTOS]] while operating on real-time data. The SHARC-based ADSP-210xx provides both [[delay slot|delayed branches]] and non-delayed branches.<ref>{{cite web |url=https://www.brown.edu/Departments/Engineering/Courses/En164/files/lab3_files/sharc_application_manual/chap1.pdf |title=Introduction of ADSP-21000 Family digital signal processors. |page=6 |accessdate=2023-12-01}}</ref>
+[[Analog Devices]] produce the [[Super Harvard Architecture Single-Chip Computer|SHARC]]-based DSP and range in performance from 66&nbsp;MHz/198 [[MFLOPS]] (million floating-point operations per second) to 400&nbsp;MHz/2400 MFLOPS. Some models support multiple [[binary multiplier|multiplier]]s and [[Arithmetic logic unit|ALU]]s, [[SIMD]] instructions and audio processing-specific components and peripherals. The [[Blackfin]] family of embedded digital signal processors combines the features of a DSP with those of a general-purpose processor. As a result, these processors can run simple [[operating system]]s like [[μCLinux]], Velocity and [[Nucleus RTOS]] while operating on real-time data. The SHARC-based ADSP-210xx provides both [[delay slot|delayed branches]] and non-delayed branches.<ref>{{cite web |url=https://www.brown.edu/Departments/Engineering/Courses/En164/files/lab3_files/sharc_application_manual/chap1.pdf |title=Introduction of ADSP-21000 Family digital signal processors. |page=6 |accessdate=2023-12-01}}</ref>
-[[NXP Semiconductors]] produce DSPs based on [[TriMedia (mediaprocessor)|TriMedia]] [[VLIW]] technology, optimized for audio and video processing. In some products the DSP core is hidden as a fixed-function block into a [[System-on-a-chip|SoC]], but NXP also provides a range of flexible single core media processors. The TriMedia media processors support both [[fixed-point arithmetic]] as well as [[floating-point arithmetic]], and have specific instructions to deal with complex filters and entropy coding.
+[[NXP Semiconductors]] produce DSPs based on [[TriMedia (mediaprocessor)|TriMedia]] [[VLIW]] technology, optimized for audio and video processing. In some products, the DSP core is hidden as a fixed-function block into a [[System-on-a-chip|SoC]], but NXP also provides a range of flexible single-core media processors. The TriMedia media processors support both [[fixed-point arithmetic]] as well as [[floating-point arithmetic]], and have specific instructions to deal with complex filters and entropy coding.
 [[CSR plc|CSR]] produces the Quatro family of SoCs that contain one or more custom Imaging DSPs optimized for processing document image data for scanner and copier applications.
-[[Microchip Technology]] produces the PIC24 based dsPIC line of DSPs. Introduced in 2004, the dsPIC is designed for applications needing a true DSP as well as a true [[microcontroller]], such as motor control and in power supplies. The dsPIC runs at up to 40MIPS, and has support for 16 bit fixed point MAC, bit reverse and modulo addressing, as well as DMA.
+[[Microchip Technology]] produces the PIC24 based dsPIC line of DSPs. Introduced in 2004, the dsPIC is designed for applications needing a true DSP as well as a true [[microcontroller]], such as motor control and in power supplies. The dsPIC runs at up to 40MIPS, and has support for 16-bit fixed-point MAC, bit reverse and modulo addressing, as well as DMA.
-Most DSPs use fixed-point arithmetic, because in real world signal processing the additional range provided by floating point is not needed, and there is a large speed benefit and cost benefit due to reduced hardware complexity. Floating point DSPs may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating point DSPs to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.
+Most DSPs use fixed-point arithmetic, because in real-world signal processing, the additional range provided by floating point is not needed, and there is a large speed and cost benefit due to reduced hardware complexity. Floating-point DSPs may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating-point DSPs to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.
-Generally, DSPs are dedicated integrated circuits; however DSP functionality can also be produced by using [[field-programmable gate array]] chips (FPGAs).
+Generally, DSPs are dedicated integrated circuits; however, DSP functionality can also be produced by using [[field-programmable gate array]] chips (FPGAs).
 Embedded general-purpose RISC processors are becoming increasingly DSP like in functionality. For example, the [[Texas Instruments OMAP|OMAP3]] processors include an [[ARM Cortex-A8]] and C6000 DSP.
-In Communications a new breed of DSPs offering the fusion of both DSP functions and H/W acceleration function is making its way into the mainstream. Such Modem processors include [[ASOCS]] ModemX and CEVA's XC4000.
+In communications, a new breed of DSPs offering the fusion of both DSP functions and hardware acceleration functions is making its way into the mainstream. Such Modem processors include [[ASOCS]] ModemX and CEVA's XC4000.
-In May 2018, Huarui-2 designed by Nanjing Research Institute of Electronics Technology of [[China Electronics Technology Group]] passed acceptance. With a processing speed of 0.4 TFLOPS, the chip can achieve better performance than current mainstream DSP chips.<ref>{{cite web|url=http://www.stdaily.com/index/kejixinwen/2018-06/15/content_681419.shtml|title=国产新型雷达芯片华睿2号与组网中心同时亮相-科技新闻-中国科技网首页|work=[[科技日报]]|access-date=2 July 2018}}</ref> The design team has begun to create Huarui-3, which has a processing speed in TFLOPS level and a support for [[artificial intelligence]].<ref name="xinhua">{{cite web|url=http://www.xinhuanet.com/fortune/2018-05/24/c_1122884014.htm|archive-url=https://web.archive.org/web/20180526123855/http://www.xinhuanet.com/fortune/2018-05/24/c_1122884014.htm|url-status=dead|archive-date=May 26, 2018|title=全国产芯片华睿２号通过"核高基"验收-新华网|author=王珏玢|work=[[Xinhua News Agency]]|access-date=2 July 2018|location=南京}}</ref>
+In May 2018, Huarui-2 designed by Nanjing Research Institute of Electronics Technology of [[China Electronics Technology Group]] passed acceptance. With a processing speed of 0.4 TFLOPS, the chip can achieve better performance than current mainstream DSP chips.<ref>{{cite web|url=http://www.stdaily.com/index/kejixinwen/2018-06/15/content_681419.shtml|title=国产新型雷达芯片华睿2号与组网中心同时亮相-科技新闻-中国科技网首页|work=[[科技日报]]|access-date=2 July 2018}}</ref> The design team has begun to create Huarui-3, which has a processing speed in TFLOPS level and support for [[artificial intelligence]].<ref name="xinhua">{{cite web|url=http://www.xinhuanet.com/fortune/2018-05/24/c_1122884014.htm|archive-url=https://web.archive.org/web/20180526123855/http://www.xinhuanet.com/fortune/2018-05/24/c_1122884014.htm|url-status=dead|archive-date=May 26, 2018|title=全国产芯片华睿２号通过"核高基"验收-新华网|author=王珏玢|work=[[Xinhua News Agency]]|access-date=2 July 2018|location=南京}}</ref>
 ===DSP-based tuners for analog radio {{anchor|analog_radio}} ===
@@ Line 147: / Line 147: @@
 [[File:Panasonic RF-2400D FM AM portable radio - Front.jpg|thumb|right|Panasonic RF-2400D AM/FM radio. Despite a modern DSP-based internal design<ref name='panasonic_shop' /> this retains a traditional layout and mechanical tuning, and the same external appearance as the older analog RF-2400.]]
-Since the 2010s, an increasing proportion of radios designed for reception of traditional analog [[FM broadcasting|FM]] and [[AM_broadcasting|AM short and medium wave broadcasts]] have replaced much of the analog tuning circuitry in older designs with DSP-based digital [[Integrated circuit|ICs]] which perform the bulk of the processing and decoding in the digital domain. An example of such an IC is the [[Silicon Labs]]/[[Skyworks Solutions|Skyworks]] Si4831/35 series which supports both FM and AM decoding within a single chip.<ref name='digikey' /><ref name='si4831_35_b30' />
+Since the 2010s, an increasing proportion of radios designed for reception of traditional analog [[FM broadcasting|FM]] and [[AM_broadcasting|AM short and medium wave broadcasts]] have replaced much of the analog tuning circuitry in older designs with DSP-based digital [[Integrated circuit|ICs]] which perform the bulk of the processing and decoding in the digital domain. An example of such an IC is the [[Silicon Labs]]/[[Skyworks Solutions|Skyworks]] Si4831/35 series, which supports both FM and AM decoding within a single chip.<ref name='digikey' /><ref name='si4831_35_b30' />
 Many such ICs (including the Si4831/35 above) are suitable for use with- and designed for- externally traditional, mechanically-tuned designs.<ref name='si4831_35_b30' /><ref name='panasonic_shop' /> Compared to traditional "true" analog circuitry, these may exhibit noticeable tuning and audio idiosyncracies (e.g. tuning jumping in discrete "steps" rather than continuously),<ref name='swling' /> particularly with older DSP-based designs.
@@ Line 154: / Line 154: @@
 * [[Digital signal controller]]
 * [[Graphics processing unit]]
-*[[System on a chip]]
+* [[Hardware acceleration]]
-*[[Hardware acceleration]]
-* [[Vision processing unit]]
 * [[MDSP]] – a multiprocessor DSP
 * [[OpenCL]]
 * [[Sound card]]
+* [[System on a chip]]
+* [[Vision processing unit]]
 ==References==
-{{Reflist|30em|refs=
+{{Reflist|refs=
-<!--<ref name="Stankovic_2012">{{cite journal |last1=Stanković |first1=Radomir S. |last2=Astola |first2=Jaakko T. |title=Reminiscences of the Early Work in DCT: Interview with K.R. Rao |journal=Reprints from the Early Days of Information Sciences |publisher=Tampere International Center for Signal Processing |date=2012 |volume=60 |url=https://ethw.org/w/images/1/19/Report-60.pdf |access-date=2021-12-30 |archive-url=https://web.archive.org/web/20211230214050/https://ethw.org/w/images/1/19/Report-60.pdf |archive-date=2021-12-30 |url-status=live |issn=1456-2774 |isbn=978-9521528187 |via=[[Engineering and Technology History Wiki|ETHW]] |df=dmy-all}}</ref>-->
+<!--<ref name="Stankovic_2012">{{cite journal |last1=Stanković |first1=Radomir S. |last2=Astola |first2=Jaakko T. |title=Reminiscences of the Early Work in DCT: Interview with K.R. Rao |journal=Reprints from the Early Days of Information Sciences |publisher=Tampere International Center for Signal Processing |date=2012 |volume=60 |url=https://ethw.org/w/images/1/19/Report-60.pdf |access-date=2021-12-30 |archive-url=https://web.archive.org/web/20211230214050/https://ethw.org/w/images/1/19/Report-60.pdf |archive-date=2021-12-30 |url-status=live |issn=1456-2774 |isbn=978-9521528187 |via=[[ETHW]] |df=dmy-all}}</ref>-->
 <ref name='digikey'>{{cite web|url=https://www.digikey.at/htmldatasheets/production/1022018/0/0/1/si4831-35.html|publisher=Silicon Labs|title=Si4831/35 AM/FM/SW Tuner Frequently Asked Questions|quote=Si4831/35 is digital in nature—an MCU and a DSP are integrated inside the chip.|access-date=2025-10-27}}</ref>

Digital signal processor: Difference between revisions

Latest revision as of 16:27, 28 November 2025

Contents

Overview

Architecture

Software architecture

Instruction sets

Data instructions

Program flow

Hardware architecture

Memory architecture

Addressing and virtual memory

History

Development

Modern DSPs

DSP-based tuners for analog radio Script error: No such module "anchor".

See also

References

External links

Navigation menu

Digital signal processor: Difference between revisions

Latest revision as of 16:27, 28 November 2025

Overview

Architecture

Software architecture

Instruction sets

Data instructions

Program flow

Hardware architecture

Memory architecture

Addressing and virtual memory

History

Development

Modern DSPs

DSP-based tuners for analog radio Script error: No such module "anchor".

See also

References

External links

Navigation menu

Search