Z Lab

News

Jul 2026 DFlash is accepted to ICML 2026! DFlash uses block diffusion for speculative decoding, achieving up to 6× lossless acceleration.
Jan 2026 ParoQuant is accepted to ICLR 2026! ParoQuant enables efficient reasoning LLM inference through pairwise rotation quantization.
Jan 2026 DFlash is released! DFlash uses block diffusion for speculative decoding, enabling efficient and high-quality parallel drafting.
Jun 2025 SparseVILA is accepted to ICCV 2025! SparseVILA decouples visual token sparsity for efficient vision-language model inference.
Jun 2025 SparseLoRA is accepted to ICML 2025! SparseLoRA applies contextual sparsity to skip unnecessary computations during fine-tuning, achieving up to 2.2× compute reduction.

News