Publications

TimeChamber: A Massively Parallel Large Scale Self-Play Framework

Published in Github, 2022

TimeChamber is a large scale self-play framework running on parallel simulation. Running self-play algorithms always need lots of hardware resources, especially on 3D physically simulated environments. We provide a self-play framework that can achieve fast training and evaluation with ONLY ONE GPU.

Recommended citation: Huang Ziming, Ziyi Liu, Wu Yutong, Flood Sung. TimeChamber: A Massively Parallel Large Scale Self-Play Framework. https://github.com/inspirai/TimeChamber

Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning

Published in 2022 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2022

Deep reinforcement learning algorithms can enable agents to learn policies for complex tasks without expert knowledge. However, the learned policies are typically specialized to one specific task and can not generalize to new tasks. While meta-reinforcement learning (meta-RL) algorithms can enable agents to solve new tasks based on prior experience, most of them build on on-policy reinforcement learning algorithms which require large amounts of samples during meta-training and do not consider task-specific features across different tasks and thus make it very difficult to train an agent with high performance. To address these challenges, in this paper, we propose an off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration. Our approach outperforms baseline methods both in sample efficiency and asymptotic performance on several meta-RL benchmarks.

Recommended citation: Liu Z, Li Z, Cao Q, et al. Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning[C]//2022 IEEE International Conference on Real-time Computing and Robotics (RCAR). IEEE, 2022: 499-504.

Multi-robot Cooperation Learning Based on Powell Deep Deterministic Policy Gradient

Published in 2022 International Conference on Intelligent Robotics and Applications, 2022

Model-free deep reinforcement learning algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods could not perform well in multi-agent environments due to the instability of teammates’ strategies. In this paper, a novel reinforcement learning method called Powell Deep Deterministic Policy Gradient (PDDPG) is proposed, which integrates Powell’s unconstrained optimization method and deep deterministic policy gradient. Specifically, each agent is regarded as a one-dimensional variable and the process of multi-robot cooperation learning is corresponding to optimal vector searching. A conjugate direction in Powell-method is constructed and is used to update the policies of agents. Finally, the proposed method is validated in a dogfight-like multi-agent environment. The results suggest that the proposed method outperforms much better than independent Deep Deterministic Policy Gradient (IDDPG), revealing a promising way in realizing high-quality independent learning.

Recommended citation: Li Z, Xiao C, Liu Z, et al. Multi-robot Cooperation Learning Based on Powell Deep Deterministic Policy Gradient[C]//International Conference on Intelligent Robotics and Applications. Springer, Cham, 2022: 77-87.

Discriminative deep asymmetric supervised hashing for cross-modal retrieval

Published in Knowledge-Based Systems, 2020

Due to the advantages of low storage cost and high retrieval efficiency, cross-modal hashing has received considerate attention. Most existing deep cross-modal hashing adopt a symmetric strategy to learn same deep hash functions for both query instances and database instances. However, the training of these symmetric deep cross-modal hashing methods is time-consuming, which makes them hard to effectively utilize the supervised information for cases with large-scale datasets. Inspired by the latest advance in the asymmetric hashing scheme, in this paper, we propose a discriminative deep asymmetric supervised hashing (DDASH) for cross-modal retrieval. Specifically, asymmetric hashing only learns hash codes of query instances by deep hash functions while learning the hash codes of the database instances by hand-crafted matrices. It cannot only make full use of the information in large-scale datasets, but also reduce the training time. Besides, we introduce discrete optimization to reduce the binary quantization error. Furthermore, a mapping matrix which maps generated hash codes into the corresponding labels is introduced to ensure that the hash codes are discriminative. We also calculate the level of similarity between instances as supervised information. Experiments on three common datasets for cross-modal retrieval show that DDASH outperforms state-of-the-art cross-modal hashing methods.

Recommended citation: Qiang H, Wan Y, Liu Z, et al. Discriminative deep asymmetric supervised hashing for cross-modal retrieval[J]. Knowledge-Based Systems, 2020, 204: 106188.