We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Gradient-based Automatic Per-Weight Mixed Precision Quantization for Neural Networks On-Chip

Abstract: Model size and inference speed at deployment time, are major challenges in many deep learning applications. A promising strategy to overcome these challenges is quantization. However, a straightforward uniform quantization to very low precision can result in significant accuracy loss. Mixed-precision quantization, based on the idea that certain parts of the network can accommodate lower precision without compromising performance compared to other parts, offers a potential solution. In this work, we present High Granularity Quantization (HGQ), an innovative quantization-aware training method designed to fine-tune the per-weight and per-activation precision in an automatic way for ultra-low latency and low power neural networks which are to be deployed on FPGAs. We demonstrate that HGQ can outperform existing methods by a substantial margin, achieving resource reduction by up to a factor of 20 and latency improvement by a factor of 5 while preserving accuracy.
Subjects: Machine Learning (cs.LG); Instrumentation and Detectors (physics.ins-det)
Cite as: arXiv:2405.00645 [cs.LG]
  (or arXiv:2405.00645v1 [cs.LG] for this version)

Submission history

From: Chang Sun [view email]
[v1] Wed, 1 May 2024 17:18:46 GMT (223kb,D)

Link back to: arXiv, form interface, contact.