Compressing Large Language Models by Streamlining the Unimportant Layer

Chen, Xiaodong; Hu, Yuxuan; Zhang, Jing

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2403

Computer Science > Computation and Language

Title: Compressing Large Language Models by Streamlining the Unimportant Layer

Authors: Xiaodong Chen, Yuxuan Hu, Jing Zhang

(Submitted on 28 Mar 2024 (v1), last revised 31 Mar 2024 (this version, v2))

Abstract: Large language models (LLM) have been extensively applied in various natural language tasks and domains, but their applicability is constrained by the large number of parameters of the models. Consequently, there is an increasing emphasis on compact models that exhibit high performance. In this study, we observe that different layers in LLM have varying degrees of perturbation on the hidden states, which allows us to identify less important layers. Based on this phenomenon, we propose LLM-Streamline, which consists of two parts: layer pruning, where we remove a set of consecutive layers with the lowest importance in the model according to the target sparsity; and layer replacement, where we train a lightweight model to substitute the pruned layers, thereby mitigating the performance degradation caused by pruning. In our experiments, we utilize structures such as a multi-layer perceptron (MLP) and a transformer layer as lightweight models and ultimately demonstrate that a single MLP can effectively fit the pruned layers. Comprehensive experiments show that our proposed method, LLM-Streamline, outperforms previous state-of-the-art (SOTA) model pruning methods.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
ACM classes:	I.2.7
Cite as:	arXiv:2403.19135 [cs.CL]
	(or arXiv:2403.19135v2 [cs.CL] for this version)

Submission history

From: Xiaodong Chen [view email]
[v1] Thu, 28 Mar 2024 04:12:13 GMT (217kb,D)
[v2] Sun, 31 Mar 2024 08:16:58 GMT (217kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.19135

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Compressing Large Language Models by Streamlining the Unimportant Layer

Submission history