## Research“To complicate is easy. To simplify is difficult.” – Bruno Munari My research focuses on machine learning applications and aims to design effective and efficient algorithms and provide the associated theoretical analysis. Existing research results (finished with many collaborators) are summarized below. ## Imitation LearningBackground: Imitation learning trains good policies from expert/human demonstrations, with applications in autonomous driving, robotics, etc. Contribution: A general framework to study the error bounds of imitating policies and environments.[1][2]
A reduction of off-line AIL methods to BC.[3]
A horizon-free sample complexity bound to explain and support the superior performance of online AIL methods.[4]
An efficient algorithm to address the interaction efficiency of online AIL methods.[5]
Theoretical justifications and algorithms in selecting effective data for imitation learning.[6]
## Reinforcement LearningGeneral Background: Reinforcement learning (RL) refers to a class of algorithms that solve long-term decision-making problems. ## ExplorationBackground: Typically, RL applications have huge state-action space (i.e., enormous decision choices) and noisy feedback (due to transition and reward noise). Contribution: A intrinsically motivated exploration method that performs well especially on SuperMarioBros games.[7]
A randomized exploration method that solves many challenging tasks with cheap computation.[8]
## TrainingBackground: In addition to the exploration issue, RL methods need to solve non-linear Bellman equations when training. Contribution: A theoretical verification of target Q-learning, an algorithm that is widely used in practice, in the tabular setting.[9]
A heuristic-guided black-box optimization algorithm for deep RL (and other derivative-free problems).[10]
## Reference[1] Xu, T., Li, Z., and Yu, Y. Error Bounds of Imitating Policies and Environments. NeurIPS 2020. [2] Xu, T., Li, Z., and Yu, Y. Error Bounds of Imitating Policies and Environments for Reinforcement Learning. TPAMI 2021. [3] Li, Z., Xu, T., Yu, Y., and Luo, Z.-Q. Rethinking ValueDice: Does It Really Improve Performance? ICLR 2021. [4] Xu, T., Li, Z., Yu, Y., and Luo, Z.-Q. Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis. arXiv:2208.01899. [5] Xu, T., Li, Z., Yu, Y., and Luo, Z.-Q. Provably Efficient Adversarial Imitation Learning with Unknown Transitions. UAI 2023. [6] Li, Z., Xu, T., Qin, Z., Yu, Y., and Luo, Z.-Q. Data Selection in Imitation Learning: Theoretical Justifications and Algorithms. NeurIPS 2023. [7] Li, Z., and Chen, X.-H. Efficient Exploration by Novelty-Pursuit. DAI 2020. [8] Li, Z., Li, Y., Zhang, Y., Zhang, T., and Luo, Z.-Q. HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning. ICLR 2022. [9] Li, Z., Xu, T., and Yu, Y. A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle. arXiv:2203.11489. [10] Liu, F.-Y., Li, Z., and Qian, C. Self-Guided Evolution Strategies with Historical Estimated Gradients. IJCAI 2020 |