SPAR-2: A SIMD Processor Array for Machine Learning in IoT Devices

Jan 1, 2020ยท
Suhail Basalama
Suhail Basalama
,
Atiyehsadat Panahi
,
Ange-Thierry Ishimwe
,
David Andrews
ยท 0 min read
Abstract
In this paper, SPAR-2, a SIMD processor array for migrating machine learning applications into FPGA IoT edge devices is presented. SPAR-2 is a second-generation processorin/ near-memory architecture developed as a programmable overlay for modern FPGAs. In contrast to point designs, SPAR-2 can be programmed to implement different classes of machine learning algorithms. The overlay architecture is based on Processor-In-Memory (PIM) tiles, which integrate bit-serial ALUs with distributed Block RAMs (BRAMs). Forming PIM tiles increases the size of the multiply-accumulate array and on-chip storage capacity. User-visible inference latencies are reduced by exploiting concurrent accesses to weights and partial results stored throughout the distributed BRAMs. The sizing and performance analysis of SPAR-2 running a standard Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) model is provided. Results show that our approach enables packing up to 16,384 processing elements in a Virtex-7 FPGA. Runtime performance comparisons show that SPAR-2 achieves up to 24.51x speedup compared to a High-Level Synthesis (HLS) equivalent design and 1.75x speedup over a previous custom-tuned design.
Type
Publication
In IEEE International Conference on Data Intelligence and Security