TMLAI-10149.pdf

Page 1 of 13

Transactions on Machine Learning and Artificial Intelligence - Vol. 9, No. 3

Publication Date: June, 25, 2021

DOI:10.14738/tmlai.93.10149.

Menon, U. R., & Menon, A. R. (2021). An Efficient Application of Neuroevolution for Competitive Multiagent Learning. Transactions

on Machine Learning and Artificial Intelligence, 9(3). 01-13.

Services for Science and Education – United Kingdom

An Efficient Application of Neuroevolution for Competitive

Multiagent Learning

Unnikrishnan Rajendran Menon

School of Electrical Engineering, Vellore Institute of Technology,

Vellore, Tamil Nadu, India;

Anirudh Rajiv Menon

School of Electronics Engineering, Vellore Institute of Technology,

Vellore, Tamil Nadu, India;

ABSTRACT

Multiagent systems provide an ideal environment for the evaluation and analysis of

real-world problems using reinforcement learning algorithms. Most traditional

approaches to multiagent learning are affected by long training periods as well as

high computational complexity. NEAT (NeuroEvolution of Augmenting Topologies)

is a popular evolutionary strategy used to obtain the best performing neural

network architecture often used to tackle optimization problems in the field of

artificial intelligence. This paper utilizes the NEAT algorithm to achieve competitive

multiagent learning on a modified pong game environment in an efficient manner.

The competing agents abide by different rules while having similar observation

space parameters. The proposed algorithm utilizes this property of the

environment to define a singular neuroevolutionary procedure that obtains the

optimal policy for all the agents. The compiled results indicate that the proposed

implementation achieves ideal behaviour in a very short training period when

compared to existing multiagent reinforcement learning models.

Keywords: Genetic Algorithm; NeuroEvolution; Neural Networks; Reinforcement

Learning; Multiagent Environment

INTRODUCTION

Due to the rapid developments in the field of Artificial Intelligence, algorithms must be analyzed

and observed constantly. Game environments consisting of many variables with unpredictable

behavior serve as ideal platforms for this purpose [1]. The traditional algorithms used to solve

these environments utilize multiple search paradigms and mechanisms which take a very long

time to find optima [2].

Reinforcement learning (RL) is a field of machine learning, that trains agents to learn the

optimal policy to successfully navigate an environment. The agent learns to do so by trying to

maximize the cumulative reward it receives, based on the actions it takes during different states

in the environment, during its training experiences. Over the past decade, research in the field

of Reinforcement Learning (RL) has gained huge popularity due to its extensive applications in

control systems, robotics, and other optimization techniques for solving real world problems.

For instance, the use of reinforcement learning based AI agents by DeepMind to cool Google

Page 2 of 13

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 9, Issue 3, June - 2021

Services for Science and Education – United Kingdom

Data Centers led to a 40% reduction in energy spending. The centers are now fully controlled

by these agents under the supervision of data center experts [3].

Game environments provide quantified state parameters that accurately represent the agents

and other contributing factors for any given time step. These parameters can be normalized

into a range compatible with various self-learning algorithms including those that rely on

neural networks as their backbone structures. These algorithms return numerically encoded

actions back to the environment which are translated into the appropriate changes in states

and the associated rewards pertaining to the new state.

Reinforcement algorithms such as Deep Q-Learning have been extensively used to study the

behaviour of agents by creating different scenarios using game environments. For instance,

Deep Q-networks that process raw pixel data, of the states of a pong environment modified to

pit 2 agents against a hard-coded paddle, using convolutional neural networks (CNNs) have

been used to explore the development of cooperation between agents with a shared goal [4].

Hence, game environments are a medium to simulate a lot of practically infeasible scenarios for

the evaluation and testing of various algorithms as they save time and resources.

These DQN based techniques have been extended to form generic agents that can beat most

major Atari games. However, these algorithms could take hours to train properly as it takes

some time for the network to process the most relevant features from the raw-pixel data

returned by the environment [5].

Genetic algorithms are random heuristic operations that optimize by searching the local

neighborhood of the parameters to be improved and promote more effective parameters by

imitating the mechanics of natural selection and genetics. In comparison to the traditional

algorithms, Genetic Algorithms are fast, robust, and, with modifications, even escape

converging to local optima. Their functionality not only includes the solving of general, but also

unconventional optimization problems often encountered in artificial intelligence [6,7,8].

Genetic algorithms have been used to model and study cooperative behavior among micro- organisms like bacteria and viruses [9,10]. Such evolution-based computations have played a

key role in studying Ant and bee colonies as well [11,12].

NeuroEvolution of Augmenting Topologies (NEAT) algorithm utilizes concepts stemming from

genetic evolution to search the space of neural network policies for maximizing the fitness

function metric [13]. The fitness value of an agent represents how well it performed in each

training episode. While value-based reinforcement learning algorithms deploy single agents

that learn progressively, NEAT algorithm uses a population of agents to find the best policies.

Offspring topologies are obtained via mutation and crossover procedures on the population's

fittest individuals. The population eventually performs well enough to cross the desired fitness

threshold. It has been observed that for many Reinforcement Learning applications, the NEAT

algorithm outperforms other conventional methods [15]. Neuroevolution based algorithms

have thus been functional in modelling intelligent agents that can efficiently adapt to strategy- based games, the design of mobile robots, autonomous vehicles and even control systems in

aerospace [16,17,18,19,20].

Page 3 of 13

Menon, U. R., & Menon, A. R. (2021). An Efficient Application of Neuroevolution for Competitive Multiagent Learning. Transactions on Machine

Learning and Artificial Intelligence, 9(3). 01-13.

URL: http://dx.doi.org/10.14738/tmlai.93.10149.

Most multiagent environments, such as the one used in this paper, involve all the agents

interacting with the environment under different rules while having similar observation and

action spaces, Consequently, NEAT has proved to be an effective algorithm for multiagent

learning [21].

An efficient implementation of Neuroevolution on a predator-prey environment has been done

to evaluate this approach in terms of the development of policies such that a collaborative

nature arises among the predators and on how reward structure and coordination mechanism

affect multiagent evolution [22]. Additionally, neuroevolutionary techniques such as

Cooperative Synapse Neuroevolution (CoSyNE) have been developed to solve multiagent

versions of the pole balancing problem that involve continuous state spaces [14].

This paper implements a single neuroevolutionary strategy to optimize all the agents involved

in an indigenously designed multi-paddle pong environment, propagating only the best

performing architectures to promote a competitive learning paradigm. The initialization of a

single population for agents with different action spaces contributes to the better performance

of the proposed algorithm compared to traditional applications of NEAT, as well as other

reinforcement learning based techniques, on multiagent systems.

PROPOSED MULTIAGENT ENVIRONMENT

The environment comprises of a continuous state space within a window of dimensions

800 × 800 pixels. The aim is to obtain paddles of all 4 classes, one for each side of the window,

with optimized neural networks backing their actions to prevent any of them from missing the

ball, via the proposed algorithm. For any paddle initialized during the training stage, the action

and observation spaces will be governed by the side of the window it gets initialized on. The

paddles initialized on the top and bottom sides of the window can move left or right, while those

on the left and right sides of the window can move in the upward or downward directions.

The ball for every training episode is initialized in the middle of the environment and given a

random velocity (whose magnitude and direction are arbitrarily chosen within a continuous

range) to study how reward structure and individual genome fitness result in competitive

multiagent evolution.

The proposed algorithm involves initializing a single population for every training episode

which will include multiple paddles belonging to the 4 classes as shown in Figure 1. As a result,

the inputs to the neural network for each paddle must comprise of components from the

environment that can be universally calculated and represented for all 4 paddle classes. The

dynamic input metrics used are delineated in section 4.1.