Dynamic Target Surveillance under Ballistic Threat
Many of the challenges faced in the battlefield by the Army can be attributed to planning and strategic decision making. Optimal decision making in the battlefield is a complex problem that requires considering other actors, reasoning about uncertainty in how events may play out, and addressing the fact that strategies must be executed using noisy observations of the world. Of particular interest, is the problem of target surveillance under ballistic threat (see Figure 1), in which a team of surveillance resources attempts to monitor a group of targets of strategic interest while in the presence of enemy forces with ballistic capabilities.
AHPCRC researchers are developing algorithms and computational methodologies that can be used to solve the problem of target surveillance under ballistic threat as well as other complex problems with multiple interacting actors. They approach the problem using deep reinforcement learning (DRL) in which a neural network learns intelligent strategies through simulated interactions with the environment and other actors. They are focused on solving problems in which multiple cooperating actors attempt to complete a specific task.
A number of fundamental limitations in DRL algorithms prevent them from being useful on multi-agent tasks. In order to address these limitations and in order to make multi-agent deep reinforcement learning (MADRL) a viable technique for solving multi-agent decision making problems, the focus is on reducing sample complexity during learning, scaling to problems with thousands of interacting agents, and solving problems in which the agents are heterogeneous.
Recent work in DRL demonstrates that it can be used as a tool for end-to-end learning, i.e. mapping raw observations directly to actions or commands. This has demonstrated end-to-end learning in multi-agent settings, such as target surveillance under ballistic threat (see Figure 2). A convolutional neural network is used to learn an intelligent strategy directly from an image-like representation of the agent’s surroundings, with each channel in the image representing a different entity. The neural network in the figure learns an action-value function, which guides the decision making process of the agents.
Kochenderfer’s group has successfully extended three types of DRL algorithms to multi-agent settings, namely, temporal difference, actor critic, and policy gradient. The algorithms have been validated, benchmarked, and tested on a variety of multi-agent problems. The group has shown the MADRL approaches to work in environments with hundreds of interacting agents, opening the door to a new way for planning and decision making in multi-agent scenarios. There is an ongoing project with ARL staff to validate the MADRL approaches in hardware using neuromorphic chip architectures.
General purpose algorithms that can learn intelligent strategies can aid Army personnel in decision making, or can be used as controllers in robotic systems such as autonomous vehicles. Deep reinforcement learning provides an approach of learning intelligent strategies directly from high-dimensional observation data, as would be done by a human. Leveraging this approach can enable a new way to plan and make decisions in the battlefield.
A DRL framework that can run on a neuromorphic device is extremely relevant to Army mission and has a wide range of applications. Prominently, this includes the coordination of autonomous vehicles, e.g., gathering intelligence, search and rescue missions, and time-critical scene assessment. Near term relevance includes the development of toolchains for deployment and interfacing with neuromorphic architectures, which would benefit other neuromorphic research efforts at ARL. – Dr. Barry Seacrest, ARL