Ziniu Li
About meI hold a Ph.D. from The Chinese University of Hong Kong, Shenzhen, where my research focused on large-scale reinforcement learning training and its applications in large language models. I was advised by Prof. Tom Luo, a prominent applied mathematician in optimization and signal processing. My academic lineage extends to Prof. John Tsitsiklis of MIT—my advisor’s own advisor—who pioneered foundational reinforcement learning theory and co-introduced the actor-critic algorithm in 1999. Experience
Recent Highlights*: indicating equal contribution or alphabetic ordering. Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation TL;DR: This work introduces a knapsack-based exploration framework for RL training in LLMs, unlocking their capability to solve hard tasks and expand performance frontiers Preserving Diversity in Supervised Fine-tuning of Large Language Models TL;DR: This work introduces a game-theoretic distribution matching method to address the diversity-reducing and knowledge-forgetting issues in SFT ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models TL;DR: This work provides the foundation of REINFORCE-style methods in LLM training and introduces a method called ReMax that is computationally efficient than PPO |