
We focus on making AI smaller, faster, and more efficient through full-stack innovations:
- 🧠 Algorithm: Designing efficient model architectures and approximations (e.g., sparsity, compression).
- ⚙️ System: Building hardware-aware system support to accelerate emerging AI workloads.
- 🚀 Application: Working with real-world use cases in generative AI, robotics, and scientific discovery.
We are part of the UCSD ML Systems Group and the UCSD Center for Visual Computing.
News
- Jan 2026 ParoQuant is accepted to ICLR 2026! ParoQuant enables efficient reasoning LLM inference through pairwise rotation quantization.
- Jan 2026 DFlash is released! DFlash uses block diffusion for speculative decoding, enabling efficient and high-quality parallel drafting.
- Jun 2025 SparseVILA is accepted to ICCV 2025! SparseVILA decouples visual token sparsity for efficient vision-language model inference.
- Jun 2025 SparseLoRA is accepted to ICML 2025! SparseLoRA applies contextual sparsity to skip unnecessary computations during fine-tuning, achieving up to 2.2x compute reduction.
Z Lab