Publication

*: indicating equal contribution or alphabetic ordering.

Google Scholar.

2024

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, Weijie J. Su
arXiv:2405.16455

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
The 41st International Conference on Machine Learning (ICML), 2024
(The early version of this work is at arXiv:2310.10505)

When is RL better than DPO in RLHF? A Representation and Optimization Perspective
Ziniu Li*, Tian Xu*, Yang Yu
The 12th International Conference on Learning Representations (ICLR) (Tiny Paper Track), 2024
(This paper is selected as an oral presentation, with an early version at arXiv:2312.10584)

Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo
arXiv:2402.16788

2023

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
Ziniu Li* , Tian Xu*, Zeyu Qin, Yang Yu, Zhi-Quan Luo
In Neural Information Processing System (NeurIPS) 37, 2023
(This paper is selected as an spotlight presentation, with an early version at arXiv:2301.11687)

Provably Efficient Adversarial Imitation Learning with Unknown Transitions
Tian Xu*, Ziniu Li* , Yang Yu, Zhi-Quan Luo
The 39th Conference on Uncertainty in Artificial Intelligence (UAI), 2023
(This paper is selected as an oral presentation, with an early version at arXiv:2106.10424v2)

Sensing Jamming Strategy from Limited Observations: An Imitation Learning Perspective
Youlin Fan, Bo Jiu, Wenqiang Pu, Ziniu Li, Kang Li, Hongwei Liu
Submitted to IEEE Transactions on Signal Processing

Deploying Offline Reinforcement Learning with Human Feedback
Ziniu Li, Ke Xu, Liu Liu, Lanqing Li, Deheng Ye, Peilin Zhao
arXiv:2303.07046

2022

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis
Tian Xu*, Ziniu Li* , Yang Yu, Zhi-Quan Luo
arXiv:2208.01899
(The early version of this work is at arXiv:2106.10424v3)

Rethinking ValueDice: Does It Really Improve Performance?
Ziniu Li* , Tian Xu*, Yang Yu, Zhi-Quan Luo
The 10th International Conference on Learning Representations (ICLR) (Blog Track), 2022

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Ziniu Li, Yingru Li, Yushun Zhang, Tong Zhang, Zhi-Quan Luo
The 10th International Conference on Learning Representations (ICLR), 2022
(This work is selected as an oral presentation at Workshop on Ecological Theory of Reinforcement Learning at NeurIPS, 2021)

2021

A Concise Introduction to Imitation Learning (In Chinese)
Tian Xu, Ziniu Li, Yang Yu
Online Available

Error Bounds of Imitating Policies and Environments for Reinforcement Learning
Tian Xu, Ziniu Li, Yang Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

2020

Error Bounds of Imitating Policies and Environments
Tian Xu, Ziniu Li, Yang Yu
Advances in Neural Information Processing Systems 34 (NeurIPS), 2020.

Efficient Exploration by Novelty-pursuit
Ziniu Li*, Xiong-Hui Chen*
The 2nd International Conference on Distributed Artificial Intelligence (DAI), 2020

Self-Guided Evolution Strategies with Historical Estimated Gradients
Fei-yu Liu, Ziniu Li, Chao Qian
The 29th International Conference on Joint Artificial Intelligence (IJCAI), 2020

Solving The Inverse Design Problem of Electrical Fuse with Machine Learning
Xinjian Huang, Ziniu Li, Zhiyuan Liu, Bin Xiang, Yingsan Geng, Jianhua Wang
IEEE Access, 8, 74137-74144, 2020

2019

On Value Discrepancy of Imitation Learning
Tian Xu, Ziniu Li, Yang Yu
arXiv:1911.07027