LServe: Efficient Long-Sequence LLM Serving with Unified Sparse Attention

Shang Yang*, Junxian Guo*, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han

MLSys 2025
Paper Code