Publication*: indicating equal contribution or alphabetic ordering. 2024Adam-mini: Use Fewer Learning Rates To Gain More On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models When is RL better than DPO in RLHF? A Representation and Optimization Perspective Why Transformers Need Adam: A Hessian Perspective 2023Imitation Learning from Imperfection: Theoretical Justifications and Algorithms Provably Efficient Adversarial Imitation Learning with Unknown Transitions Sensing Jamming Strategy from Limited Observations: An Imitation Learning Perspective Deploying Offline Reinforcement Learning with Human Feedback 2022Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis Rethinking ValueDice: Does It Really Improve Performance? A Note on Target Q-learning for Solving Finite MDPs with A Generative Oracle HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning 2021A Concise Introduction to Imitation Learning (In Chinese) Error Bounds of Imitating Policies and Environments for Reinforcement Learning 2020Error Bounds of Imitating Policies and Environments Efficient Exploration by Novelty-pursuit Self-Guided Evolution Strategies with Historical Estimated Gradients Solving The Inverse Design Problem of Electrical Fuse with Machine Learning 2019On Value Discrepancy of Imitation Learning |