Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Batched Low-Rank Adaptation of Foundation Models
(Submitted on 9 Dec 2023 (v1), last revised 25 Apr 2024 (this version, v3))
Abstract: Low-Rank Adaptation (LoRA) has recently gained attention for fine-tuning foundation models by incorporating trainable low-rank matrices, thereby reducing the number of trainable parameters. While LoRA offers numerous advantages, its applicability for real-time serving to a diverse and global user base is constrained by its incapability to handle multiple task-specific adapters efficiently. This imposes a performance bottleneck in scenarios requiring personalized, task-specific adaptations for each incoming request. To mitigate this constraint, we introduce Fast LoRA (FLoRA), a framework in which each input example in a minibatch can be associated with its unique low-rank adaptation weights, allowing for efficient batching of heterogeneous requests. We empirically demonstrate that FLoRA retains the performance merits of LoRA, showcasing competitive results on the MultiPL-E code generation benchmark spanning over 8 languages and a multilingual speech recognition task across 6 languages.
Submission history
From: Yeming Wen [view email][v1] Sat, 9 Dec 2023 20:51:48 GMT (304kb,D)
[v2] Tue, 26 Mar 2024 22:53:56 GMT (319kb,D)
[v3] Thu, 25 Apr 2024 21:45:35 GMT (319kb,D)
Link back to: arXiv, form interface, contact.