Page 1 of 13
Transactions on Machine Learning and Artificial Intelligence - Vol. 9, No. 3
Publication Date: June, 25, 2021
DOI:10.14738/tmlai.93.10149.
Menon, U. R., & Menon, A. R. (2021). An Efficient Application of Neuroevolution for Competitive Multiagent Learning. Transactions
on Machine Learning and Artificial Intelligence, 9(3). 01-13.
Services for Science and Education – United Kingdom
An Efficient Application of Neuroevolution for Competitive
Multiagent Learning
Unnikrishnan Rajendran Menon
School of Electrical Engineering, Vellore Institute of Technology,
Vellore, Tamil Nadu, India;
Anirudh Rajiv Menon
School of Electronics Engineering, Vellore Institute of Technology,
Vellore, Tamil Nadu, India;
ABSTRACT
Multiagent systems provide an ideal environment for the evaluation and analysis of
real-world problems using reinforcement learning algorithms. Most traditional
approaches to multiagent learning are affected by long training periods as well as
high computational complexity. NEAT (NeuroEvolution of Augmenting Topologies)
is a popular evolutionary strategy used to obtain the best performing neural
network architecture often used to tackle optimization problems in the field of
artificial intelligence. This paper utilizes the NEAT algorithm to achieve competitive
multiagent learning on a modified pong game environment in an efficient manner.
The competing agents abide by different rules while having similar observation
space parameters. The proposed algorithm utilizes this property of the
environment to define a singular neuroevolutionary procedure that obtains the
optimal policy for all the agents. The compiled results indicate that the proposed
implementation achieves ideal behaviour in a very short training period when
compared to existing multiagent reinforcement learning models.
Keywords: Genetic Algorithm; NeuroEvolution; Neural Networks; Reinforcement
Learning; Multiagent Environment
INTRODUCTION
Due to the rapid developments in the field of Artificial Intelligence, algorithms must be analyzed
and observed constantly. Game environments consisting of many variables with unpredictable
behavior serve as ideal platforms for this purpose [1]. The traditional algorithms used to solve
these environments utilize multiple search paradigms and mechanisms which take a very long
time to find optima [2].
Reinforcement learning (RL) is a field of machine learning, that trains agents to learn the
optimal policy to successfully navigate an environment. The agent learns to do so by trying to
maximize the cumulative reward it receives, based on the actions it takes during different states
in the environment, during its training experiences. Over the past decade, research in the field
of Reinforcement Learning (RL) has gained huge popularity due to its extensive applications in
control systems, robotics, and other optimization techniques for solving real world problems.
For instance, the use of reinforcement learning based AI agents by DeepMind to cool Google
Page 2 of 13
2
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 9, Issue 3, June - 2021
Services for Science and Education – United Kingdom
Data Centers led to a 40% reduction in energy spending. The centers are now fully controlled
by these agents under the supervision of data center experts [3].
Game environments provide quantified state parameters that accurately represent the agents
and other contributing factors for any given time step. These parameters can be normalized
into a range compatible with various self-learning algorithms including those that rely on
neural networks as their backbone structures. These algorithms return numerically encoded
actions back to the environment which are translated into the appropriate changes in states
and the associated rewards pertaining to the new state.
Reinforcement algorithms such as Deep Q-Learning have been extensively used to study the
behaviour of agents by creating different scenarios using game environments. For instance,
Deep Q-networks that process raw pixel data, of the states of a pong environment modified to
pit 2 agents against a hard-coded paddle, using convolutional neural networks (CNNs) have
been used to explore the development of cooperation between agents with a shared goal [4].
Hence, game environments are a medium to simulate a lot of practically infeasible scenarios for
the evaluation and testing of various algorithms as they save time and resources.
These DQN based techniques have been extended to form generic agents that can beat most
major Atari games. However, these algorithms could take hours to train properly as it takes
some time for the network to process the most relevant features from the raw-pixel data
returned by the environment [5].
Genetic algorithms are random heuristic operations that optimize by searching the local
neighborhood of the parameters to be improved and promote more effective parameters by
imitating the mechanics of natural selection and genetics. In comparison to the traditional
algorithms, Genetic Algorithms are fast, robust, and, with modifications, even escape
converging to local optima. Their functionality not only includes the solving of general, but also
unconventional optimization problems often encountered in artificial intelligence [6,7,8].
Genetic algorithms have been used to model and study cooperative behavior among micro- organisms like bacteria and viruses [9,10]. Such evolution-based computations have played a
key role in studying Ant and bee colonies as well [11,12].
NeuroEvolution of Augmenting Topologies (NEAT) algorithm utilizes concepts stemming from
genetic evolution to search the space of neural network policies for maximizing the fitness
function metric [13]. The fitness value of an agent represents how well it performed in each
training episode. While value-based reinforcement learning algorithms deploy single agents
that learn progressively, NEAT algorithm uses a population of agents to find the best policies.
Offspring topologies are obtained via mutation and crossover procedures on the population's
fittest individuals. The population eventually performs well enough to cross the desired fitness
threshold. It has been observed that for many Reinforcement Learning applications, the NEAT
algorithm outperforms other conventional methods [15]. Neuroevolution based algorithms
have thus been functional in modelling intelligent agents that can efficiently adapt to strategy- based games, the design of mobile robots, autonomous vehicles and even control systems in
aerospace [16,17,18,19,20].
Page 3 of 13
3
Menon, U. R., & Menon, A. R. (2021). An Efficient Application of Neuroevolution for Competitive Multiagent Learning. Transactions on Machine
Learning and Artificial Intelligence, 9(3). 01-13.
URL: http://dx.doi.org/10.14738/tmlai.93.10149.
Most multiagent environments, such as the one used in this paper, involve all the agents
interacting with the environment under different rules while having similar observation and
action spaces, Consequently, NEAT has proved to be an effective algorithm for multiagent
learning [21].
An efficient implementation of Neuroevolution on a predator-prey environment has been done
to evaluate this approach in terms of the development of policies such that a collaborative
nature arises among the predators and on how reward structure and coordination mechanism
affect multiagent evolution [22]. Additionally, neuroevolutionary techniques such as
Cooperative Synapse Neuroevolution (CoSyNE) have been developed to solve multiagent
versions of the pole balancing problem that involve continuous state spaces [14].
This paper implements a single neuroevolutionary strategy to optimize all the agents involved
in an indigenously designed multi-paddle pong environment, propagating only the best
performing architectures to promote a competitive learning paradigm. The initialization of a
single population for agents with different action spaces contributes to the better performance
of the proposed algorithm compared to traditional applications of NEAT, as well as other
reinforcement learning based techniques, on multiagent systems.
PROPOSED MULTIAGENT ENVIRONMENT
The environment comprises of a continuous state space within a window of dimensions
800 × 800 pixels. The aim is to obtain paddles of all 4 classes, one for each side of the window,
with optimized neural networks backing their actions to prevent any of them from missing the
ball, via the proposed algorithm. For any paddle initialized during the training stage, the action
and observation spaces will be governed by the side of the window it gets initialized on. The
paddles initialized on the top and bottom sides of the window can move left or right, while those
on the left and right sides of the window can move in the upward or downward directions.
The ball for every training episode is initialized in the middle of the environment and given a
random velocity (whose magnitude and direction are arbitrarily chosen within a continuous
range) to study how reward structure and individual genome fitness result in competitive
multiagent evolution.
The proposed algorithm involves initializing a single population for every training episode
which will include multiple paddles belonging to the 4 classes as shown in Figure 1. As a result,
the inputs to the neural network for each paddle must comprise of components from the
environment that can be universally calculated and represented for all 4 paddle classes. The
dynamic input metrics used are delineated in section 4.1.