Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Dong, Pingcheng; Tan, Yonghao; Zhang, Dong; Ni, Tianwei; Liu, Xuejiao; Liu, Yu; Luo, Peng; Liang, Luhong; Liu, Shih-Yang; Huang, Xijie; Zhu, Huaiyu; Pan, Yun; An, Fengwei; Cheng, Kwang-Ting

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2403

Computer Science > Machine Learning

Title: Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Authors: Pingcheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng

(Submitted on 28 Mar 2024 (v1), last revised 29 Mar 2024 (this version, v2))

Abstract: Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https:// github.com/PingchengDong/GQA-LUT.

Comments:	61st ACM/IEEE Design Automation Conference (DAC) 2024
Subjects:	Machine Learning (cs.LG); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2403.19591 [cs.LG]
	(or arXiv:2403.19591v2 [cs.LG] for this version)

Submission history

From: Pingcheng Dong [view email]
[v1] Thu, 28 Mar 2024 17:13:47 GMT (858kb,D)
[v2] Fri, 29 Mar 2024 14:13:11 GMT (858kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.19591

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Submission history