This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article contains content that is written like an advertisement. Please help improve it by removing promotional content and inappropriate external links, and by adding encyclopedic content written from a neutral point of view. (January 2021) (Learn how and when to remove this template message) This article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic. Please help improve it by rewriting it in an encyclopedic style. (January 2021) (Learn how and when to remove this template message) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "Asynchronous array of simple processors" – news · newspapers · books · scholar · JSTOR (January 2021) (Learn how and when to remove this template message) (Learn how and when to remove this template message)

The asynchronous array of simple processors (AsAP) architecture comprises a 2-D array of reduced complexity programmable processors with small scratchpad memories interconnected by a reconfigurable mesh network. AsAP was developed by researchers in the VLSI Computation Laboratory (VCL) at the University of California, Davis and achieves high performance and energy-efficiency, while using a relatively small circuit area. It was made in 2006.[1]

AsAP processors are well suited for implementation in future fabrication technologies, and are clocked in a globally asynchronous locally synchronous (GALS) fashion. Individual oscillators fully halt (leakage only) in 9 cycles when there is no work to do, and restart at full speed in less than one cycle after work is available. The chip requires no crystal oscillators, phase-locked loops, delay-locked loops, global clock signal, or any global frequency or phase-related signals whatsoever.

The multi-processor architecture efficiently makes use of task-level parallelism in many complex DSP applications, and also efficiently computes many large tasks using fine-grained parallelism.

Key features

Block diagrams of a single AsAP processor and the 6x6 AsAP 1.0 chip
Block diagrams of a single AsAP processor and the 6x6 AsAP 1.0 chip

AsAP uses several novel key features, of which four are:

AsAP 1 chip: 36 processors

Die photograph of the first generation 36-processor AsAP chip
Die photograph of the first generation 36-processor AsAP chip

A chip containing 36 (6x6) programmable processors was taped-out in May 2005 in 0.18 μm CMOS using a synthesized standard cell technology and is fully functional. Processors on the chip operate at clock rates from 520 MHz to 540 MHz at 1.8V and each processor dissipates 32 mW on average while executing applications at 475 MHz.

Most processors run at clock rates over 600 MHz at 2.0 V, which makes AsAP among the highest known clock rate fabricated processors (programmable or non-programmable) ever designed in a university; it is the second highest known in published research papers.

At 0.9 V, the average application power per processor is 2.4 mW at 116 MHz. Each processor occupies only 0.66 mm².

AsAP 2 chip: 167 processors

Die photograph of the second generation 167-processor AsAP 2 chip
Die photograph of the second generation 167-processor AsAP 2 chip

A second generation 65 nm CMOS design contains 167 processors with dedicated fast Fourier transform (FFT), Viterbi decoder, and video motion estimation processors; 16 KB shared memories; and long-distance inter-processor interconnect. The programmable processors can individually and dynamically change their supply voltage and clock frequency. The chip is fully functional. Processors operate up to 1.2 GHz at 1.3 V which is believed to be the highest clock rate fabricated processor designed in any university. At 1.2 V, they operate at 1.07 GHz and 47 mW when 100% active. At 0.675 V, they operate at 66 MHz and 608 μW when 100% active. This operating point enables 1 trillion MAC or arithmetic logic unit (ALU) ops/sec with a power dissipation of only 9.2 watts. Due to its MIMD architecture and fine-grain clock oscillator stalling, this energy efficiency per operation is almost perfectly constant across widely varying workloads, which is not the case for many architectures.

Applications

The coding of many DSP and general tasks for AsAP has been completed. Mapped tasks include: filters, convolutional coders, interleavers, sorting, square root, CORDIC sin/cos/arcsin/arccos, matrix multiplication, pseudo random number generators, fast Fourier transforms (FFTs) of lengths 32–1024, a complete k=7 Viterbi decoder, a JPEG encoder, a complete fully compliant baseband processor for an IEEE 802.11a/g wireless LAN transmitter and receiver, and a complete CAVLC compression block for an H.264 encoder. Blocks plug directly together with no required modifications. Power, throughput, and area results are typically many times better than existing programmable DSP processors.

The architecture enables a clean separation between programming and inter-processor timing handled entirely by hardware. A recently finished C compiler and automatic mapping tool further simplify programming.

See also

References

  1. ^ Yu, Zhiyi; Meeuwsen, Michael J.; Apperson, Ryan W.; Sattari, Omar; Lai, Michael; Webb, Jeremy W.; Work, Eric W.; Truong, Dean; Mohsenin, Tinoosh; Baas, Bevan M. (March 2008). "AsAP: An Asynchronous Array of Simple Processors". IEEE Journal of Solid-State Circuits. 43 (3): 695–705. doi:10.1109/JSSC.2007.916616. ISSN 0018-9200.