Publication*: indicating equal contribution or alphabetic ordering. 2024Why Transformers Need Adam: A Hessian Perspective Policy Optimization in RLHF: The Impact of Out-of-preference Data 2023ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models Imitation Learning from Imperfection: Theoretical Justifications and Algorithms Provably Efficient Adversarial Imitation Learning with Unknown Transitions Sensing Jamming Strategy from Limited Observations: An Imitation Learning Perspective Deploying Offline Reinforcement Learning with Human Feedback 2022Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis Rethinking ValueDice: Does It Really Improve Performance? A Note on Target Q-learning for Solving Finite MDPs with A Generative Oracle HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning 2021A Concise Introduction to Imitation Learning (In Chinese) Error Bounds of Imitating Policies and Environments for Reinforcement Learning 2020Error Bounds of Imitating Policies and Environments Efficient Exploration by Novelty-pursuit Self-Guided Evolution Strategies with Historical Estimated Gradients Solving The Inverse Design Problem of Electrical Fuse with Machine Learning 2019On Value Discrepancy of Imitation Learning |