Ziniu Li

alt text 

Ph.D. student,
School of Data Science,
The Chinese University of Hong Kong, Shenzhen

Email: ziniuli@link.cuhk.edu.cn

[Twitter] [Zhihu]

About me

I am a Ph.D. student at The Chinese University of Hong Kong, Shenzhen (CUHKSZ), advised by Prof. Zhi-Quan (Tom) Luo.

I am interested in artificial intelligence, especially reinforcement learning and large language models.

I have worked/interned at Tencent, Nanjing University, Cardinal Operations, etc.

My curriculum vitae can be downloaded from here.

Feel free to contact me if you want to discuss some ideas.

Recent Highlights

*: indicating equal contribution or alphabetic ordering.

Policy Optimization in RLHF: The Impact of Out-of-preference Data
Ziniu Li* , Tian Xu*, Yang Yu
The 12th International Conference on Learning Representations (ICLR) (Tiny Paper Track), 2024

TL;DR: This work analyzes policy optimization errors in RLHF and shows that out-of-preference data is important for RL algorithms such as PPO and ReMax

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo

TL;DR: This work develops an RL method called ReMax, which is more simple (6 lines of code) and efficient (less memory and fast training) than PPO when used in RLHF

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
Ziniu Li* , Tian Xu*, Zeyu Qin, Yang Yu, Zhi-Quan Luo
Spotlight Presentation (acceptance rate < 5%), In Neural Information Processing System (NeurIPS) 37, 2023

TL;DR: This work validates that importance sampling is effective in data selection when leveraging multiple imperfect (out-of-distribution and low-quality) data sources

Provably Efficient Adversarial Imitation Learning with Unknown Transitions
Tian Xu*, Ziniu Li* , Yang Yu, Zhi-Quan Luo
Oral Presentation (acceptance rate < 3%), The 39th Conference on Uncertainty in Artificial Intelligence (UAI), 2023

TL;DR: This work addresses the online sample efficiency issue of adversarial imitation learning by bridging it with reward-free exploration

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis
Tian Xu*, Ziniu Li* , Yang Yu, Zhi-Quan Luo

TL;DR: This work presents the first horizon-free sample complexity bound of adversarial imitation learning (AIL), providing an answer to the open question –why does AIL outperform BC by a wide margin, particularly in the low data regime ? (raised in CoRL 2019 best paper by Ghasemipour et al.)



NeurIPS (Top Reviewer), ICML (Outstanding Reviewer), ICLR (Highlighted Reviewer).

Teaching Assistant

  • DDA6111: Discrete Optimization. 2022 Spring @ CUHKSZ

  • DDA6060: Machine Learning. 2023 Spring @ CUHKSZ

  • FTE4560: Basic Machine Learning. 2021 Spring @ CUHKSZ.

  • CSC4120: Design and Analysis of Algorithms. 2022 Fall, 2021 Fall @ CUHKSZ

  • MAT3007: Introduction to Optimization. 2020 Fall @ CUHKSZ


  • Machine Learning (Summer Course for Senior High School Students) @ X ACADEMY 2022 TechX


  • [2024-01] Runner-up of poster presentation award at the third doctoral and postdoctoral forum of Shenzhen Research Institute of Big Data. $5,000 RMB

  • [2023-12] Guotaijunan Scholarship. $20,000 RMB

  • [2021-04] Best oral presentation award at the first doctoral and postdoctoral forum of Shenzhen Research Institute of Big Data. $5,000 RMB