We focus on making AI smaller, faster, and more efficient through full-stack innovations:

  • 🧠 Algorithm: Designing efficient model architectures and approximations (e.g., sparsity, compression).
  • ⚙️ System: Building hardware-aware system support to accelerate emerging AI workloads.
  • 🚀 Application: Working with real-world use cases in generative AI, robotics, and scientific discovery.

We are part of the UCSD ML Systems Group and the UCSD Center for Visual Computing.

News

  • Jan 2026 ParoQuant is accepted to ICLR 2026! ParoQuant enables efficient reasoning LLM inference through pairwise rotation quantization.
  • Jan 2026 DFlash is released! DFlash uses block diffusion for speculative decoding, enabling efficient and high-quality parallel drafting.
  • Jun 2025 SparseVILA is accepted to ICCV 2025! SparseVILA decouples visual token sparsity for efficient vision-language model inference.
  • Jun 2025 SparseLoRA is accepted to ICML 2025! SparseLoRA applies contextual sparsity to skip unnecessary computations during fine-tuning, achieving up to 2.2x compute reduction.

Highlights