SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Abstract
Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 1.7 and a measured speedup of up to 1.4 while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning. This will support the development of fine-tuning methods that are both parameter- and computation-efficient.
Citation
@inproceedings{khaki2025sparselora,
title = {SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity},
author = {Khaki, Samir and Li, Xiuyu and Guo, Junxian and Zhu, Ligeng and Plataniotis, Konstantinos N and Yazdanbakhsh, Amir and Keutzer, Kurt and Han, Song and Liu, Zhijian},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2025}
}