Zilong Zheng's Homepage

I received my Ph.D. degree (21’) from the Department of Computer Science at University of California, Los Angeles (UCLA). My research interests lie in the intersection of statistical machine learning, natural language processing and cognition. Current research themes include:

Trustworthy AI: Crafting faithful, interpretable and trustworthy AI frameworks.
Human-like Conversational Agents: Building interactive models that align with human values and social norms.
Efficient Language Models: Efficient training and inference of long-context language models.
Generative Modeling: Statistical generative modeling (e.g. EBMs, diffusions) on high-dimensional data.

I am always looking for self-motivated students and long-term collaborators. Please contact me if you have excellent background or share similar research interests with me.

NEWS

Jan, 2026	Four papers are accepted to ICLR 2026! Congratulations to Yang, Zhaowei and teams!
Jan, 2026	Our Absolute Zero is featured by WIRED as headline on Business on 1/7.
Nov, 2025	I will be serving as Area Chair for ICML 2026 and Senior Area Chair for ACL 2026.
Sep, 2025	Two papers are accepted to NeurIPS 2025! Absolute Zero is selected as Spolight (Top 3.2%)!
Aug, 2025	I will be serving as Area Chair for ICLR 2026.
Aug, 2025	Three papers on MoE routers (RouterLens), reinforced query reasoners for deep retrieval (TongSearch), new preference optimization formula with utility anchors (UAPO) are accepted to EMNLP 2025!
Jun, 2025	VideoLLaMB is accepted to ICCV 2025. Congratulations to Yuxuan and Yiqi!
Jun, 2025	In-context Value Alignment and Navi2Gaze are accepted to IROS’25 for Oral Presentations!
May, 2025	I will be serving as Senior Area Chair for EMNLP 2025.
May, 2025	Three papers on bidirectional LLM Encoder, ReflectEvo (Meta Reflection Learning) and Causal Value Steering are accepted to ACL’25! One paper on combinational creativity in VLMs is accepted to CogSci’25 for Oral presentation! Congratulations to Ziyong, Jiaqi, Yipeng and Yongqian!
May, 2025	Three papers on TokenSwift (long sequence acceleration), ToEdit (LLM model collapse) and MCU (open-ended agent evaluation) are accepted to ICML’25! MCU is awarded as Spotlight Poster! Congratulations to Tong, Xuekai and Xinyue!
Mar, 2025	OmniMMI is accepted to CVPR’25 . We devised the first-ever benchmark for streaming interactive Omni understanding. Please try your models on OmniMMI Leaderboard.
Jan, 2025	Three papers on in-context knowledge editing, multimodal knowledge editing and in-context alignment are accepted to ICLR’25!
Dec, 2024	I will co-host 1st workshop on Large Language Models and Structure Modeling. Stay tuned .
Dec, 2024	Diver-CT is accepted to AAAI’25. Congratulations to Andrew!

selected publications

SEE ALL PUBLICATIONS

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Tong Wu^*, Yang Liu^*, Jun Bai^*, Zixia Jia, Shuyi Zhang, Ziyong Lin, Yanting Wang, Song-Chun Zhu, and Zilong Zheng^#, Preprint, 2026.

Abs arXiv Bib Code Website X YouTube

#1 Paper of the Day

We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.
@misc{wu2025npr, title={Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning}, author={Tong Wu and Yang Liu and Jun Bai and Zixia Jia and Shuyi Zhang and Ziyong Lin and Yanting Wang and Song-Chun Zhu and Zilong Zheng}, year={2025}, eprint={2512.07461}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07461}, }
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Hengli Li^*, Chenxi Li^*, Tong Wu, Xuekai Zhu, Yuxuan Wang, Zhaoxin Yu, Eric Hanchen Jiang, Song-Chun Zhu, Zixia Jia, Ying Nian Wu^#, and Zilong Zheng^#, Preprint, 2026.

Abs arXiv Bib Code Website

Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has improved under the training scaling law, significant challenges remain, particularly with respect to training algorithms, such as catastrophic forgetting, and the limited availability of novel training data. As an alternative, test-time scaling enhances reasoning performance by increasing test-time computation without parameter updating. Unlike prior methods in this paradigm focused on token space, we propose leveraging latent space for more effective reasoning and better adherence to the test-time scaling law. We introduce LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space. Specifically, LatentSeek leverages policy gradient to iteratively update latent representations, guided by self-generated reward signals. LatentSeek is evaluated on a range of reasoning benchmarks, including GSM8K, MATH-500, and AIME2024, across multiple LLM architectures. Results show that LatentSeek consistently outperforms strong baselines, such as Chain-of-Thought prompting and fine-tuning-based methods. Furthermore, our analysis demonstrates that LatentSeek is highly efficient, typically converging within a few iterations for problems of average complexity, while also benefiting from additional iterations, thereby highlighting the potential of test-time scaling in the latent space. These findings position LatentSeek as a lightweight, scalable, and effective solution for enhancing the reasoning capabilities of LLMs.
@misc{li2025seekdarkreasoningtesttime, title={Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space}, author={Hengli Li and Chenxi Li and Tong Wu and Xuekai Zhu and Yuxuan Wang and Zhaoxin Yu and Eric Hanchen Jiang and Song-Chun Zhu and Zixia Jia and Ying Nian Wu and Zilong Zheng}, year={2025}, eprint={2505.13308}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.13308}, }
Absolute Zero: Reinforced Self-play Reasoning with Zero Data NeurIPS'25 Spotlight

Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng^#, and Gao Huang^#, in NeurIPS, 2025.

Abs arXiv Bib Code Website Model Wired X

#1 Paper of the Day

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.
@inproceedings{zhao2025absolutezero, title={Absolute Zero: Reinforced Self-play Reasoning with Zero Data}, author={Andrew Zhao and Yiran Wu and Yang Yue and Tong Wu and Quentin Xu and Yang Yue and Matthieu Lin and Shenzhi Wang and Qingyun Wu and Zilong Zheng and Gao Huang}, year={2025}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, url={https://arxiv.org/abs/2505.03335}, }
MCU: An Evaluation Framework for Open-Ended Game Agents ICML'25 Spotlight

Xinyue Zheng^*, Haowei Lin^*, Kaichen He, Zihao Wang, Zilong Zheng^#, and Yitao Liang^#, in ICML, 2025.

Abs arXiv Bib Code Website

Developing AI agents capable of interacting with open-world environments to solve diverse tasks is a compelling challenge. However, evaluating such open-ended agents remains difficult, with current benchmarks facing scalability limitations. To address this, we introduce Minecraft Universe (MCU), a comprehensive evaluation framework set within the open-world video game Minecraft. MCU incorporates three key components: (1) an expanding collection of 3,452 composable atomic tasks that encompasses 11 major categories and 41 subcategories of challenges; (2) a task composition mechanism capable of generating infinite diverse tasks with varying difficulty; and (3) a general evaluation framework that achieves 91.5% alignment with human ratings for open-ended task assessment. Empirical results reveal that even state-of-the-art foundation agents struggle with the increasing diversity and complexity of tasks. These findings highlight the necessity of MCU as a robust benchmark to drive progress in AI agent development within open-ended environments.
@inproceedings{zheng2025mcu, title={MCU: An Evaluation Framework for Open-Ended Game Agents}, author={Zheng, Xinyue and Lin, Haowei and He, Kaichen and Wang, Zihao and Zheng, Zilong and Liang, Yitao}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, year={2025} }
TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation ICML'25

Tong Wu, Junzhe Shen, Zixia Jia, Yuxuan Wang, and Zilong Zheng^#, in ICML, 2025.

Abs arXiv Bib Code Website X

Generating ultra-long sequences with large language models (LLMs) has become increasingly crucial but remains a highly time-intensive task, particularly for sequences up to 100K tokens. While traditional speculative decoding methods exist, simply extending their generation limits fails to accelerate the process and can be detrimental. Through an in-depth analysis, we identify three major challenges hindering efficient generation: frequent model reloading, dynamic key-value (KV) management and repetitive generation. To address these issues, we introduce TOKENSWIFT, a novel framework designed to substantially accelerate the generation process of ultra-long sequences while maintaining the target model's inherent quality. Experimental results demonstrate that TOKENSWIFT achieves over 3 times speedup across models of varying scales (1.5B, 7B, 8B, 14B) and architectures (MHA, GQA). This acceleration translates to hours of time savings for ultra-long sequence generation, establishing TOKENSWIFT as a scalable and effective solution at unprecedented lengths. Code can be found at this URL.
@inproceedings{wu2025tokenswift, title={TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation}, author={Wu, Tong and Shen, Junzhe and Jia, Zixia and Wang, Yuxuan and Zheng, Zilong}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, year={2025} }
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts CVPR'25

Yuxuan Wang, Yueqian Wang, Bo Chen, Tong Wu, Dongyan Zhao, and Zilong Zheng^#, in CVPR, 2025.

Abs arXiv Bib Code Website

The rapid advancement of multi-modal language models (MLLMs) like GPT-4o has propelled the development of Omni language models, designed to process and proactively respond to continuous streams of multi-modal data. Despite their potential, evaluating their real-world interactive capabilities in streaming video contexts remains a formidable challenge. In this work, we introduce OmniMMI, a comprehensive multi-modal interaction benchmark tailored for OmniLLMs in streaming video contexts. OmniMMI encompasses over 1,121 real-world interactive videos and 2,290 questions, addressing two critical yet underexplored challenges in existing video benchmarks: streaming video understanding and proactive reasoning, across six distinct subtasks. Moreover, we propose a novel framework, Multi-modal Multiplexing Modeling (M4), designed to enhance real-time interactive reasoning with minimum finetuning on pre-trained MLLMs. Extensive experimental results reveal that the existing MLLMs fall short in interactive streaming understanding, particularly struggling with proactive tasks and multi-turn queries. Our proposed M4, though lightweight, demonstrates a significant improvement in handling proactive tasks and real-time interactions.
@inproceedings{cvpr25omnimmi, title={OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts}, author={Wang, Yuxuan and Wang, Yueqian and Chen, Bo and Wu, Tong and Zhao, Dongyan and Zheng, Zilong}, booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)}, year={2025} }
In situ bidirectional human-robot value alignment ScienceRobotics

Luyao Yuan^*#, Xiaofeng Gao^*, Zilong Zheng^*, Mark Edmonds^#, Ying Nian Wu, Federico Rossano, Hongjing Lu^#, Yixin Zhu^#, and Song-Chun Zhu^#, in Science Robotics, 2022.

Abs DOI Bib Supp Video Code Website TechXplore 科技日报/新华网

A prerequisite for social coordination is bidirectional communication between teammates, each playing two roles simultaneously: as receptive listeners and expressive speakers. For robots working with humans in complex situations with multiple goals that differ in importance, failure to fulfill the expectation of either role could undermine group performance due to misalignment of values between humans and robots. Specifically, a robot needs to serve as an effective listener to infer human users’ intents from instructions and feedback and as an expressive speaker to explain its decision processes to users. Here, we investigate how to foster effective bidirectional human-robot communications in the context of value alignment—collaborative robots and users form an aligned understanding of the importance of possible task goals. We propose an explainable artificial intelligence (XAI) system in which a group of robots predicts users’ values by taking in situ feedback into consideration while communicating their decision processes to users through explanations. To learn from human feedback, our XAI system integrates a cooperative communication model for inferring human values associated with multiple desirable goals. To be interpretable to humans, the system simulates human mental dynamics and predicts optimal explanations using graphical models. We conducted psychological experiments to examine the core components of the proposed computational framework. Our results show that real-time human-robot mutual understanding in complex cooperative tasks is achievable with a learning model based on bidirectional communication. We believe that this interaction framework can shed light on bidirectional value alignment in communicative XAI systems and, more broadly, in future human-machine teaming systems. An explainable artificial intelligence collaboration framework enables in situ bidirectional human-robot value alignment.
@article{ doi:10.1126/scirobotics.abm4183, author = {Luyao Yuan and Xiaofeng Gao and Zilong Zheng and Mark Edmonds and Ying Nian Wu and Federico Rossano and Hongjing Lu and Yixin Zhu and Song-Chun Zhu }, title = {In situ bidirectional human-robot value alignment}, journal = {Science Robotics}, volume = {7}, number = {68}, pages = {eabm4183}, year = {2022}, doi = {10.1126/scirobotics.abm4183}, URL = {https://www.science.org/doi/abs/10.1126/scirobotics.abm4183}, eprint = {https://www.science.org/doi/pdf/10.1126/scirobotics.abm4183} }
Patchwise Generative ConvNet: Training Energy-Based Models from a Single Natural Image for Internal Learning CVPR'21 Oral

Zilong Zheng, Jianwen Xie, and Ping Li, in CVPR, 2021.

Abs Bib PDF Supp Code Website

Exploiting internal statistics of a single natural image has long been recognized as a significant research paradigm where the goal is to learn the internal distribution of patches within the image without relying on external training data. Different from prior works that model such a distribution implicitly with a top-down latent variable model (e.g., generator), this paper proposes to explicitly represent the statistical distribution within a single natural image by using an energy-based generative framework, where a pyramid of energy functions, each parameterized by a bottom-up deep neural network, are used to capture the distributions of patches at different resolutions. Meanwhile, a coarse-to-fine sequential training and sampling strategy is presented to train the model efficiently. Besides learning to generate random samples from white noise, the model can learn in parallel with a self-supervised task (e.g., recover the input image from its corrupted version), which can further improve the descriptive power of the learned model. The proposed model is simple and natural in that it does not require an auxiliary model (e.g., discriminator) to assist the training. Besides, it also unifies internal statistics learning and image generation in a single framework. Experimental results presented on various image generation and manipulation tasks, including super-resolution, image editing, harmonization, style transfer, etc., have demonstrated the effectiveness of our model for internal learning.
@inproceedings{zheng2021patchgencn, title={Patchwise Generative ConvNet: Training Energy-Based Models from a Single Natural Image for Internal Learning}, author={Zheng, Zilong and Xie, Jianwen and Li, Ping}, booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)}, year={2021} }
Reasoning Visual Dialogs with Structural and Partial Observations CVPR'19 Oral

Zilong Zheng^*, Wenguan Wang^*, Siyuan Qi^*, and Song-Chun Zhu, in CVPR, 2019.

Abs arXiv Bib Code

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.
@inproceedings{zheng2019reasoning, title={Reasoning Visual Dialogs with Structural and Partial Observations}, author={Zheng, Zilong and Wang, Wenguan and Qi, Siyuan and Zhu, Song-Chun}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2019 IEEE Conference on}, year={2019} }
Learning Descriptor Networks for 3D Shape Synthesis and Analysis CVPR'18 Oral

Jianwen Xie^*, Zilong Zheng^*, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, and Ying Nian Wu, in CVPR, 2018.

Abs arXiv Bib Code Website

This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns. The maximum likelihood training of the model follows an “analysis by synthesis” scheme and can be interpreted as a mode seeking and mode shifting process. The model can synthesize 3D shape patterns by sampling from the probability distribution via MCMC such as Langevin dynamics. The model can be used to train a 3D generator network via MCMC teaching. The conditional version of the 3D shape descriptor net can be used for 3D object recovery and 3D object super-resolution. Experiments demonstrate that the proposed model can generate realistic 3D shape patterns and can be useful for 3D shape analysis.
@inproceedings{xie2018learning, title={Learning Descriptor Networks for 3D Shape Synthesis and Analysis}, author={Xie, Jianwen and Zheng, Zilong and Gao, Ruiqi and Wang, Wenguan and Zhu, Song-Chun and Wu, Ying Nian}, booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)}, pages={8629--8638}, year={2018} }