Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Liu, Z; Lou, J; Bao, W; Hu, Y; Li, B; Qin, Z; Ren, K

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2402

Computer Science > Machine Learning

Title: Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Authors: Z Liu, J Lou, W Bao, Y Hu, B Li, Z Qin, K Ren

(Submitted on 12 Feb 2024 (v1), last revised 9 May 2024 (this version, v4))

Abstract: Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of DP LLM fine-tuning methods is the satisfactory tradeoff among privacy, utility, and scalability. Most existing methods build upon the seminal work of DP-SGD. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD.
In this paper, we investigate the potential of DP zeroth-order methods for LLM pretraining, which avoids the scalability bottleneck of SGD by approximating the gradient with the more efficient zeroth-order gradient. Rather than treating the zeroth-order method as a drop-in replacement for SGD, this paper presents a comprehensive study both theoretically and empirically. First, we propose the stagewise DP zeroth-order method (DP-ZOSO) that dynamically schedules key hyperparameters. This design is grounded on the synergy between DP random perturbation and the gradient approximation error of the zeroth-order method, and its effect on fine-tuning trajectory.
We provide theoretical analysis for both proposed methods. We conduct extensive empirical analysis on both encoder-only masked language model and decoder-only autoregressive language model, achieving impressive results in terms of scalability and utility (compared with DPZero, DP-ZOPO improves 4.5% on SST-5, 5.5% on MNLI with RoBERTa-Large and 9.2% on CB, 3.9% on BoolQ with OPT-2.7B when $\epsilon=4$).

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2402.07818 [cs.LG]
	(or arXiv:2402.07818v4 [cs.LG] for this version)

Submission history

From: Zhihao Liu [view email]
[v1] Mon, 12 Feb 2024 17:24:15 GMT (292kb,D)
[v2] Wed, 21 Feb 2024 06:11:02 GMT (292kb,D)
[v3] Wed, 8 May 2024 07:14:42 GMT (3894kb,D)
[v4] Thu, 9 May 2024 09:41:23 GMT (3894kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2402.07818

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Submission history