Research“To complicate is easy. To simplify is difficult.” – Bruno Munari My research focuses on machine learning applications and aims to design effective and efficient algorithms and provide the associated theoretical analysis. Existing research results (finished with many collaborators) are summarized below. Imitation LearningBackground: Imitation learning trains good policies from expert/human demonstrations, with applications in autonomous driving, robotics, etc. Contribution:
Reinforcement LearningGeneral Background: Reinforcement learning (RL) refers to a class of algorithms that solve long-term decision-making problems. ExplorationBackground: Typically, RL applications have huge state-action space (i.e., enormous decision choices) and noisy feedback (due to transition and reward noise). Contribution:
TrainingBackground: In addition to the exploration issue, RL methods need to solve non-linear Bellman equations when training. Contribution:
Reference[1] Xu, T., Li, Z., and Yu, Y. Error Bounds of Imitating Policies and Environments. NeurIPS 2020. [2] Xu, T., Li, Z., and Yu, Y. Error Bounds of Imitating Policies and Environments for Reinforcement Learning. TPAMI 2021. [3] Li, Z., Xu, T., Yu, Y., and Luo, Z.-Q. Rethinking ValueDice: Does It Really Improve Performance? ICLR 2021. [4] Xu, T., Li, Z., Yu, Y., and Luo, Z.-Q. Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis. arXiv:2208.01899. [5] Xu, T., Li, Z., and Yu, Y. More efficient adversarial imitation learning algorithms with known and unknown transitions. arXiv, 2106.10424. [6] Li, Z., Xu, T., Yu, Y., and Luo, Z.-Q. Theoretical Analysis of Offline Imitation With Supplementary Dataset. arXiv:2301.11687. [7] Li, Z., and Chen, X.-H. Efficient Exploration by Novelty-Pursuit. DAI 2020. [8] Li, Z., Li, Y., Zhang, Y., Zhang, T., and Luo, Z.-Q. HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning. ICLR 2022. [9] Li, Z., Xu, T., and Yu, Y. A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle. arXiv:2203.11489. [10] Liu, F.-Y., Li, Z., and Qian, C. Self-Guided Evolution Strategies with Historical Estimated Gradients. IJCAI 2020 |