imported>Rofraja: Replaced 1 bare URLs by {{Cite web}}; Replaced "Archived copy" by actual titles

2025-06-29T18:50:14Z

Replaced 1 bare URLs by {{Cite web}}; Replaced "Archived copy" by actual titles

New page

{{Short description|Type of integrated circuit}}
A '''massively parallel processor array''', also known as a '''multi purpose processor array''' ('''MPPA''') is a type of [[integrated circuit]] which has a [[massively parallel]] array of hundreds or thousands of [[Central processing unit|CPU]]s and [[Random-access memory|RAM]] memories. These processors pass work to one another through a [[Reconfigurability|reconfigurable]] interconnect of [[Channel (communications)|channels]]. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel [[programming model]] for developing high-performance [[embedded system]] applications.

==Architecture==

MPPA is a [[Multiple instruction, multiple data|MIMD]] (Multiple Instruction streams, Multiple Data) architecture, with [[distributed memory]] accessed locally, not shared globally. Each processor is strictly encapsulated, accessing only its own code and memory. Point-to-point communication between processors is directly realized in the configurable interconnect.<ref>Mike Butts, "Synchronization through Communication in a Massively Parallel Processor Array", IEEE Micro, vol. 27, no. 5, September/October 2007, [[IEEE Computer Society]]</ref>

The MPPA's massive parallelism and its distributed memory MIMD architecture distinguishes it from [[Multi-core (computing)|multicore]] and [[Manycore processor|manycore]] architectures, which have fewer processors and an [[symmetric multiprocessing|SMP]] or other [[Shared memory architecture|shared memory]] architecture, mainly intended for general-purpose computing. It's also distinguished from [[GPGPU]]s with [[Single instruction, multiple data|SIMD]] architectures, used for [[High-performance computing|HPC]] applications.<ref>Mike Butts, "Multicore and Massively Parallel Platforms and Moore's Law Scalability", Proceedings of the Embedded Systems Conference - Silicon Valley, April 2008</ref>

==Programming==

An MPPA application is developed by expressing it as a hierarchical [[block diagram]] or [[workflow]], whose basic objects run in parallel, each on their own processor. Likewise, large data objects may be broken up and distributed into local memories with parallel access. Objects communicate over a parallel structure of dedicated channels. The objective is to maximize aggregate throughput while minimizing local latency, optimizing performance and efficiency. An MPPA's [[model of computation]] is similar to a [[Kahn process network]] or [[communicating sequential processes]] (CSP).<ref>Mike Butts, Brad Budlong, Paul Wasson, Ed White, "Reconfigurable Work Farms on a Massively Parallel Processor Array", Proceedings of [[FCCM]], April 2008, [[IEEE Computer Society]]</ref>

==Applications==

MPPAs are used in high-performance [[embedded system]]s and [[hardware acceleration]] of [[desktop computer]] and [[Server (computing)|server]] applications, such as [[video compression]],<ref>Laurent Bonetto, "Massively parallel processing arrays (MPPAs) for embedded HD video and imaging (Part 1)", Video/Imaging DesignLine, May 16, 2008 http://www.eetimes.com/document.asp?doc_id=1273823</ref><ref>Laurent Bonetto, "Massively parallel processing arrays (MPPAs) for embedded HD video and imaging (Part 2)", Video/Imaging DesignLine, July 18, 2008 http://www.eetimes.com/document.asp?doc_id=1273830</ref> [[image processing]],<ref>Paul Chen, "Multimode sensor processing using Massively Parallel Processor Arrays (MPPAs)", Programmable Logic DesignLine, March 18, 2008 http://www.pldesignline.com/howto/206904379</ref> [[medical imaging]], [[network processing]], [[software-defined radio]] and other compute-intensive streaming media applications, which otherwise would use [[FPGA]], [[digital signal processor|DSP]] and/or [[Application-specific integrated circuit|ASIC]] chips.

==Examples==

MPPAs developed in companies include ones designed at: [[Ambric]], [[PicoChip]], [[Intel]],<ref>Vangal, Sriram R., Jason Howard, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan et al. "An 80-tile sub-100-w teraflops processor in 65-nm cmos." Solid-State Circuits, IEEE Journal of 43, no. 1 (2008): 29-41.</ref> [[IntellaSys]], [[GreenArrays]], [[ASOCS]], [[Tilera]], [[Kalray]], [[Coherent Logix]], [[Tabula (company)|Tabula]], and [[Adapteva]]. [[Aspex (Ericsson)]] Linedancer differs in that it was a Massive wide ''SIMD'' Array rather than an MPPA. Strictly speaking it could qualify as [[Single Instruction Multiple Threads|SIMT]] due to all 4096 of the 3,000 gate cores having its own Content-Addressable Memory.<ref>{{Cite book|chapter-url=https://link.springer.com/chapter/10.1007/978-94-009-0643-3_39|doi = 10.1007/978-94-009-0643-3_39|chapter = Artificial Neural Network on a Massively Parallel Associative Architecture|title = International Neural Network Conference|year = 1990|last1 = Krikelis|first1 = A.|page = 673|isbn = 978-0-7923-0831-7}}</ref><ref>{{Cite web| title=Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture | url=https://core.ac.uk/download/pdf/25268094.pdf | archive-url=https://web.archive.org/web/20210606003056/https://core.ac.uk/download/pdf/25268094.pdf | archive-date=2021-06-06}}</ref>

Fabricated MPPAs developed in universities include: 36-core<ref>Yu, Zhiyi, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, and Bevan Baas. "An asynchronous array of simple processors for DSP applications." In IEEE International Solid-State Circuits Conference,(ISSCC’06), vol. 49, pp. 428-429. 2006</ref> and 167-core<ref>Truong, Dean, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen et al. "A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling." In Symposium on VLSI Circuits, pp. 22-23. 2008</ref> [[Asynchronous Array of Simple Processors|Asynchronous Array of Simple Processors (AsAP)]] arrays from the [[University of California, Davis]], 16-core RAW<ref>Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffmann, Paul Johnson, Walter Lee, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Saman Amarasinghe, and Anant Agarwal, "A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network," Proceedings of the IEEE International Solid-State Circuits Conference, February 2003</ref> from [[MIT]], and 16-core<ref>Yu, Zhiyi, Kaidi You, Ruijin Xiao, Heng Quan, Peng Ou, Yan Ying, Haofan Yang, and Xiaoyang Zeng. "An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms." In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International, pp. 64-66. IEEE, 2012.</ref> and 24-core<ref>Ou, Peng, Jiajie Zhang, Heng Quan, Yi Li, Maofei He, Zheng Yu, Xueqiu Yu et al. "A 65nm 39GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array." In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, pp. 56-57. IEEE, 2013.</ref> arrays from [[Fudan University]].

The Chinese [[Sunway (processor)|Sunway]] project developed their own 260-core [[SW26010]] manycore chip for the [[TaihuLight]] supercomputer, which is as of 2016 the world's fastest supercomputer.<ref name=dongarra2016>{{Cite web|url=http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-report-2016.pdf|title=Report on the Sunway TaihuLight System|last=Dongarra|first=Jack|date=June 20, 2016|website=www.netlib.org|access-date=June 20, 2016}}</ref><ref>{{Cite journal| last1 = Fu| first1 = Haohuan| last2 = Liao| first2 = Junfeng| last3 = Yang| first3 = Jinzhe| last4 = Wang| first4 = Lanning| last5 = Song| first5 = Zhenya| last6 = Huang| first6 = Xiaomeng| last7 = Yang| first7 = Chao| last8 = Xue| first8 = Wei| last9 = Liu| first9 = Fangfang| last10 = Qiao| first10 = Fangli| last11 = Zhao| first11 = Wei| last12 = Yin| first12 = Xunqiang| last13 = Hou| first13 = Chaofeng| last14 = Zhang| first14 = Chenglong| last15 = Ge| first15 = Wei| last16 = Zhang| first16 = Jian| last17 = Wang| first17 = Yangang| last18 = Zhou| first18 = Chunbo| last19 = Yang| first19 = Guangwen|display-authors=3|date=2016|title=The Sunway TaihuLight Supercomputer: System and Applications|journal=Sci. China Inf. Sci.| volume = 59| issue = 7|doi=10.1007/s11432-016-5588-7| doi-access = free}}</ref>

Anton 3 processors, designed by [[D. E. Shaw Research]] for [[molecular dynamics]] simulations, contain arrays of 576 processors arranged in a 12×24 tiled grid of pairs of cores; a routed network links these tiles together and extends off-chip to other nodes in a full system.<ref>{{Cite book |last1=Shaw |first1=David E. |last2=Adams |first2=Peter J. |last3=Azaria |first3=Asaph |last4=Bank |first4=Joseph A. |last5=Batson |first5=Brannon |last6=Bell |first6=Alistair |last7=Bergdorf |first7=Michael |last8=Bhatt |first8=Jhanvi |last9=Butts |first9=J. Adam |last10=Correia |first10=Timothy |last11=Dirks |first11=Robert M. |last12=Dror |first12=Ron O. |last13=Eastwood |first13=Michael P. |last14=Edwards |first14=Bruce |last15=Even |first15=Amos |title=Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis |chapter=Anton 3 |date=2021-11-14 |language=en |location=St. Louis Missouri |publisher=ACM |pages=1–11 |doi=10.1145/3458817.3487397 |isbn=978-1-4503-8442-1|s2cid=239036976 |doi-access=free }}</ref><ref>{{Cite book |last1=Adams |first1=Peter J. |last2=Batson |first2=Brannon |last3=Bell |first3=Alistair |last4=Bhatt |first4=Jhanvi |last5=Butts |first5=J. Adam |last6=Correia |first6=Timothy |last7=Edwards |first7=Bruce |last8=Feldmann |first8=Peter |last9=Fenton |first9=Christopher H. |last10=Forte |first10=Anthony |last11=Gagliardo |first11=Joseph |last12=Gill |first12=Gennette |last13=Gorlatova |first13=Maria |last14=Greskamp |first14=Brian |last15=Grossman |first15=J.P. |title=2021 IEEE Hot Chips 33 Symposium (HCS) |chapter=The ΛNTON 3 ASIC: A Fire-Breathing Monster for Molecular Dynamics Simulations |date=2021-08-22 |chapter-url=https://ieeexplore.ieee.org/document/9567084 |location=Palo Alto, CA, USA |publisher=IEEE |pages=1–22 |doi=10.1109/HCS52781.2021.9567084 |isbn=978-1-6654-1397-8|s2cid=239039245 }}</ref>

==See also==
* [[Manycore processor]]
* [[AI accelerator]]
* [[Asynchronous array of simple processors]]
* [[SW26010]]
* [[Array processor]]

==References==
{{Reflist}}

[[Category:Manycore processors]]
[[Category:Parallel computing]]

Massively parallel processor array - Revision history

imported>Rofraja: Replaced 1 bare URLs by {{Cite web}}; Replaced "Archived copy" by actual titles