A Customizable Domain-Specific Memory-Centric FPGA Overlay for Machine Learning Applications

Jan 1, 2021ยท
Atiyehsadat Panahi
Suhail Basalama
Suhail Basalama
,
Ange-Thierry Ishimwe
,
David Andrews
ยท 0 min read
Abstract
This paper presents an overview and performance analysis of a software-programmable domain-customizable System-on-Chip (SoC) overlay for low-latency inferencing of variable and low-precision Machine Learning (ML) networks targeting Internet-of-Things (IoT) edge devices. The SoC includes a 2-D processor array that can be customized at design time for FPGA logic families. The overlay resolves historic issues of poor designer productivity associated with traditional Field Programmable Gate Array (FPGA) design flows without the performance losses normally incurred by overlays. A standard Instruction Set Architecture (ISA) allows different ML networks to be quickly compiled and run on the overlay without the need to resynthesize. Performance results are presented that show the overlay achieves $1.3 imes-8.0 imes$ speedup over custom designs while still allowing rapid changes to ML algorithms on the FPGA through standard compilation.
Type
Publication
In IEEE International Conference on Field-Programmable Logic and Applications