Publications

*: Equal contribution, : Corresponding author

Preprint

  1. JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
    Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, and Yitao Liang.
  2. VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
    Yuxuan Wang, Cihang Xie, Yang Liu, and Zilong Zheng.
  3. Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
    Chao Lou, Zixia Jia, Zilong Zheng, and Kewei Tu.
  4. Large language models are in-context semantic reasoners rather than symbolic reasoners
    Xiaojuan Tang*, Zilong Zheng*, Jiaqi Li, Fanxu Meng, Song-Chun Zhu, Yitao Liang, and Muhan Zhang.
  5. In-Context Editing: Learning Knowledge from Self-Induced Distributions
    Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, and Zilong Zheng.
  6. DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints
    Andrew Zhao, Quentin Xu, Matthieu Liu, Shenzhi Wang, Yong-jin Liu, Zilong Zheng, and Gao Huang.

2024

  1. MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation Oral SIGDIAL'24
    Shuwen Qiu, Mingdian Liu, Hengli Li, Song-Chun Zhu, and Zilong Zheng, in SIGDIAL, 2024. (also in Workshop on Theory-of-Mind at ICML 2023)
  2. Mars: Situated Inductive Reasoning in an Open-World Environment NeurIPS'24
    Xiaojuan Tang, Jiaqi Li, Yitao Liang, Muhan Zhang, and Zilong Zheng, in NeurIPS D&B Track, 2024.
  3. Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent “Middle” Enhancement NeurIPS'24
    Tong Wu, Yanpeng Zhao, and Zilong Zheng, in NeurIPS, 2024.
  4. Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge for Long Video Understanding EMNLP'24
    Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Yang Liu, and Zilong Zheng, in EMNLP, 2024.
  5. Varying Sentence Representations via Condition-Specified Routers EMNLP'24
    Ziyong Lin, Quansen Wang, Zixia Jia, and Zilong Zheng, in EMNLP, 2024.
  6. ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning CoLM'24
    Yuxuan Wang, Alan Yuille, Zhuowan Li, and Zilong Zheng, in CoLM, 2024.
  7. Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling ACL'24
    Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Shaofei Wang, Shiji Song, and Gao Huang, in ACL Findings, 2024.
  8. LooGLE: Can Long-Context Language Models Understand Long Contexts? ACL'24
    Jiaqi Li, Mengmeng Wang, Zilong Zheng, and Muhan Zhang, in ACL, 2024.
  9. LangSuit⋅E: Controlling, Planning, and Interacting with Large Language Models in Embodied Text Environments ACL'24
    Zixia Jia, Mengmeng Wang, Baichen Tong, Song-Chun Zhu, and Zilong Zheng, in ACL Findings, 2024. (also in SpLU-RoboNLP Workshop at ACL 2024)
  10. Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels ACL'24
    Zixia Jia, Junpeng Li, Shichuan Zhang, and Zilong Zheng, in ACL, 2024.
  11. MindAgent: Emergent Gaming Interaction
    Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Demetri Terzopoulos, Fei-Fei Li, and Jianfeng Gao, in NAACL Findings, 2024.

2023

  1. ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab NeurIPS'23
    Jieming Cui*, Ziren Gong*, Baoxiong Jia*, Siyuan Huang, Zilong Zheng, Jianzhu Ma, and Yixin Zhu, in NeurIPS D&B Track, 2023.
  2. DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning NeurIPS'23
    Hengli Li, Song-Chun Zhu, and Zilong Zheng, in NeurIPS D&B Track, 2023.
  3. SQA3D: Situated Question Answering in 3D Scenes ICLR'23
    Xiaojian Ma*, Silong Yong*, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, and Siyuan Huang, in ICLR, 2023.
  4. Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models EMNLP'23
    Junpeng Li*, Zixia Jia*, and Zilong Zheng, in EMNLP, 2023.
  5. VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions ACL'23
    Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang, and Dongyan Zhao, in ACL, 2023.
  6. Modeling Instance Interactions for Joint Information Extraction with Neural High-Order Conditional Random Field ACL'23
    Zixia Jia, Zhaohui Yan, Wenjuan Han, Zilong Zheng, and Kewei Tu, in ACL, 2023.
  7. Shuō Wén Jiě Zì: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training ACL'23
    Yuxuan Wang, Jianghui Wang, Dongyan Zhao, and Zilong Zheng, in ACL-Findings, 2023.

2022

  1. In situ bidirectional human-robot value alignment ScienceRobotics
    Luyao Yuan*, Xiaofeng Gao*, Zilong Zheng*, Mark Edmonds, Ying Nian Wu, Federico Rossano, Hongjing Lu, Yixin Zhu, and Song-Chun Zhu, Science Robotics, 2022.
  2. SHARP: Search-Based Adversarial Attack for Structured Prediction NAACL'22
    Liwen Zhang, Zixia Jia, Wenjuan Han, Zilong Zheng, and Kewei Tu, in NAACL Findings, 2022.
  3. VGStore: A Multimodal Extension to SPARQL for Querying RDF Scene Graph ISWC'22
    Yanzeng Li, Zilong Zheng, Wenjuan Han, and Lei Zou, in ISWC Poster & Demo Track, 2022.
  4. Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling Oral ICLR'22
    Bo Wan, Wenjuan Han, Zilong Zheng, and Tinne Tuytelaars, in ICLR, 2022.
  5. Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships CVPR'22
    Chao Lou*, Wenjuan Han, Yuhuan Lin, and Zilong Zheng*, in CVPR, 2022.
  6. Energy-Based Generative Cooperative Saliency Prediction Oral AAAI'22
    Jing Zhang, Jianwen Xie, Zilong Zheng, and Nick Barnes, in AAAI, 2022.

2021

  1. Cooperative Training of Fast Thinking Initializer and Slow Thinking Solver for Multi-Modal Conditional Learning TPAMI
    Jianwen Xie*, Zilong Zheng*, Xiaolin Fang, Song-Chun Zhu, and Ying Nian Wu, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
  2. Learning Triadic Belief Dynamics in Nonverbal Communication from Videos Oral CVPR'21
    Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, and Yixin Zhu, in CVPR, 2021.
  3. Patchwise Generative ConvNet: Training Energy-Based Models from a Single Natural Image for Internal Learning Oral CVPR'21
    Zilong Zheng, Jianwen Xie, and Ping Li, in CVPR, 2021.
  4. Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification CVPR'21
    Jianwen Xie, Yifei Xu, Zilong Zheng, Song-Chun Zhu, and Ying Nian Wu, in CVPR, 2021.
  5. GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning ACL'21
    Zilong Zheng, Shuwen Qiu, Lifeng Fan, Yixin Zhu, and Song-Chun Zhu, in ACL Findings, 2021.
  6. Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler AAAI'21
    Jianwen Xie, Zilong Zheng, and Ping Li, in AAAI, 2021.
  7. Learning Cycle-Consistent Cooperative Networks via Alternating MCMC Teaching for Unsupervised Cross-Domain Translation AAAI'21
    Jianwen Xie*, Zilong Zheng*, Xiaolin Fang, Song-Chun Zhu, and Ying Nian Wu, in AAAI, 2021.

2020

  1. Generative VoxelNet: Learning Energy-Based Models for 3D Shape Synthesis and Analysis TPAMI
    Jianwen Xie*, Zilong Zheng*, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, and Ying Nian Wu, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
  2. Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs ICRA'20
    Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, and Song-Chun Zhu, in ICRA, 2020.
  3. Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns Oral AAAI'20
    Jianwen Xie*, Ruiqi Gao*, Zilong Zheng, Song-Chun Zhu, and Ying Nian Wu, in AAAI, 2020.

2019

  1. Reasoning Visual Dialogs with Structural and Partial Observations Oral CVPR'19
    Zilong Zheng*, Wenguan Wang*, Siyuan Qi*, and Song-Chun Zhu, in CVPR, 2019.
  2. Learning Dynamic Generator Model by Alternating Back-Propagation Through Time Spotlight AAAI'19
    Jianwen Xie*, Ruiqi Gao*, Zilong Zheng, Song-Chun Zhu, and Ying Nian Wu, in AAAI, 2019.

2018

  1. Learning Descriptor Networks for 3D Shape Synthesis and Analysis Oral CVPR'18
    Jianwen Xie*, Zilong Zheng*, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, and Ying Nian Wu, in CVPR, 2018.