About
I am a PhD candidate at the State Key Laboratory of Processors, Institute of Computing Technology, CAS, and University of Chinese Academy of Sciences, advised by Prof. Huimin Cui and Assoc. Prof. Jiacheng Zhao.
My research bridges compiler technology and large language models to build reliable AI-driven systems. I am broadly interested in end-to-end system support for LLM workloads — from compiler toolchains to runtime orchestration. I received my B.Eng. in Computer Science from UCAS in 2021.
News
- May 2026 — Two papers accepted at ICML 2026, including one spotlight (CONTINUUM).
- Apr 2026 — When Grammar Guides the Attack accepted at CCS 2026.
- Mar 2026 — LEGO-Compiler accepted at CCF THPC.
- Feb 2026 — T2T receives the Distinguished Paper Award at CGO 2026.
- Sep 2025 — Two papers accepted at NeurIPS 2025 (posters).
Selected Publications
- [ICML 2026 spotlight] CONTINUUM: Restoring the Contiguous Tensor Abstraction Efficiently for Dynamic AI Workloads via Hardware Virtualization — Yangyu Zhang†, Shuoming Zhang†, et al.
- [ICML 2026] LEGO: An LLM-Enabled Hierarchical Optimizer for Tensor Computation Graphs with Structure-Aware Search and Compositional Synthesis — Ruiyuan Xu†, Shuoming Zhang†, et al.
- [CCS 2026] When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output — Shuoming Zhang, et al.
- [CGO 2026 Distinguished Paper Award] From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization — Shuaijiang Li, Jiacheng Zhao, Ying Liu, Shuoming Zhang, et al.
See the full publication list.
Research Interests
- LLM infrastructure and serving systems
- Compiler optimization for AI accelerators
- LLM-based program synthesis and debugging
- Reliability and safety for LLM-driven systems
Research Projects
Current
- LLM-guided compilation workflows — Designing model-in-the-loop compilation pipelines that adapt source-to-assembly translation and error recovery with LLM feedback.
- LLM-aware compiler construction — Building reusable compiler components that leverage LLM reasoning for IR transformation, code generation, and verification.
- Secure and robust LLM decoding — Studying constrained decoding strategies to mitigate LLM safety vulnerabilities while preserving task performance.
Previous
- Heterogeneous model offloading with TVM (Intel collaboration) — Explored NPU/CPU co-execution and scheduling strategies within the TVM stack, prototyped a new TVM backend for simulator-based NPU.
- VLIW instruction scheduling (Huawei collaboration) — Developed instruction scheduling heuristics targeting domain-specific VLIW architectures.
Education
- Ph.D. in Computer Architecture, ICT CAS & UCAS, 2021 – 2026 (expected)
- B.Eng. in Computer Science, UCAS, 2017 – 2021