EJAS-11578.pdf

Page 1 of 11

European Journal of Applied Sciences – Vol. 10, No. 1

Publication Date: February 25, 2022

DOI:10.14738/aivp.101.11578. Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising

Accuracy and Efficiency in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification

Dataset. European Journal of Applied Sciences, 10(1). 135-145.

Services for Science and Education – United Kingdom

Intertwining Binary Decision Trees and Probabilistic Neural

Networks for Maximising Accuracy and Efficiency in Classification

Tasks – a Pilot /Proof–of Concept Demonstration on the Iris

Benchmark Classification Dataset

Tatiana Tambouratzis

Department of Industrial Management & Technology

University of Piraeus, 107 Deligiorgi St., Piraeus 185 34, Greece

Dora Souliou

School of Electrical and Computer Engineering

National Technical University of Athens, 9 Iroon Polytechniou St.

Zografou, Athens 157 80, Greece

ABSTRACT

Intertwining binary decision trees (BDTs) and probabilistic neural networks

(PNNs) is put forward as a powerful custom–made methodology for simultaneously

maximising accuracy and efficiency in classification tasks. The proposed

methodology brings together the complementary strengths of (I) the parametric,

global, recursive, efficient as well as maximal dichotomisation of the BDT, and (II)

the non–parametric, accurate–to–local–detail, multi–class identification of the

PNN. The BDT/PNN combination evolves level–by–level, with each level comprising

two steps: (step 1) the optimal BDT (hyper–)linear bi–partitioning of the entire

dataset for the first level, and of every non–terminal partition thereof for the next

levels, is determined, with each resulting partition being assigned to a new node of

the next BDT level; (step 2) as many PNNs are created as there are multi–class

partitions resulting from (step 1), for implementing the non–parametric creation of

the hyperplane that maximises class separability over each corresponding partition

of the problem–space. BDT/PNN expansion is applied recursively – and

independently – to each non–terminal partition of the dataset, concluding once the

cumulative classification accuracy (CCA) of the terminal – BDT and/or PNN –

nodes of the BDT/PNN is maximised. A proof–of–concept demonstration on the iris

benchmark classification dataset (IBCD) illustrates the details of the BDT/PNN

combination, while attesting to its simplicity, effectiveness, and efficiency of

operation. Follow–up research shall focus upon the classification of additional

benchmark datasets, as well as of “hard” and “Big” real–world problems, for further

evaluating and validating the proposed methodology.

Keywords: parametric binary decision tree (BDT), non–parametric probabilistic neural

network (PNN), data partitioning, class separability, cumulative classification accuracy

(CCA), computational efficiency, iris benchmark classification dataset (IBCD), five–fold

cross–validation (FFCV)

Page 2 of 11

136

European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022

Services for Science and Education – United Kingdom

INTRODUCTION

Pattern classification can be defined as the automated identification of characteristics

(patterns) in data, which have the potential to be used in a consistent manner for allocating

each datum to its most representative class, thereby ensuring that the members of the same

class share identical/“sufficiently” similar characteristics, while the members of different

classes display distinct/divergent characteristics. More formally [1] (Duda et al., 2000), pattern

classification constitutes the assignment of a label to each member of a group of interest,

according to a set of salient characteristics/features which are shared by the members of the

same group, but which differ across the members of different groups.

Among a multitude of classification methodologies, (I) decision trees (DTs) – especially binary

decision trees (BDTs) [2–3] (Loh & Vanichsetakul, 1988; Mitchell, 1997) – and (II) probabilistic

artificial neural networks – (PNNs) [4–5] (Specht, 1990a; Specht, 1990b) – constitute

successful, as well as transparent state–of–the–art pattern classification methodologies. Owing

to their differing characteristics, the two methodologies operate according to distinct –yet

complementary – problem–space partitioning rules and criteria, principles, as well as aims. The

proposed combination demonstrates that it is advantageous to intertwine BDTs and PNNs in a

systematic manner for simultaneously maximising prediction accuracy and efficiency in

classification tasks, thus reaching a level that is far beyond that accomplished independently by

either constituent classification methodology.

In the following Sections, a critical exposition of the BDT/PNN construction, operation and

potential is provided, followed by a demonstration on the well–known iris benchmark

classification dataset (IBCD) [6] (Anderson, 1935), where an improvement in both classification

accuracy and operation efficiency is observed over the entire and pruned BDTs, as well as over

the entire and partial PNNs. Testing the methodology on a benchmark classification dataset

aims at verifying the classification prowess of the BDT/PNN combination, thus promoting its

application to a variety of hard problems, while also defining criteria concerning the extent of

BDT and PNN creation/expansion that is appropriate for attaining maximally accurate,

consistent and – at the same time – efficient classification.

The remainder of this contribution is organised as follows: Section 2 presents the two

constituent methodologies (BDTs and PNNs), followed by a description of the IBCD, which has

been used for demonstrating the proof–of–concept of the proposed BDT/PNN combination;

Section 3 introduces the means of combining/intertwining the two methodologies,

subsequently presenting the BDT/PNN performance on the IBCD, as this is expressed in terms

of construction characteristics, as well as of prediction prowess in the guise of cumulative

classification accuracy (CCA) and computational complexity; Section 4 critically highlights the

characteristics and evaluates – as well as justifies - the superior performance that is

accomplished by the systematic BDT/PNN combination in classification tasks over both of the

constituent methodologies; finally, Section 5 summarises the main features of and the

advancement/improvement afforded by the proposed combination of the BDT and PNN

methodologies, thus concluding this contribution.

Page 3 of 11

137

Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency

in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,

10(1). 135-145.

URL: http://dx.doi.org/10.14738/aivp.101.11578

CONSTITUENT METHODOLOGIES & DEMONSTRATION DATA

Binary Decision Trees (BDTs)

The parametric BDT is created recursively, on a level–by–level basis. Initially, the entire dataset

is assigned to the first (single top–level) node of the BDT. If the top–level node contains patterns

of a single class, it becomes a terminal node and the BDT expansion procedure is terminated.

Otherwise, the hyper–surface that maximally bi–partitions the patterns of the top-level node is

created, with the resulting pair of partitions being assigned to a pair of nodes of the second

BDT level. The process of bi–partitioning each non–terminal node is repeated recursively for

each next level of the BDT, terminating as soon as, either every node of a given level contains

patterns of a single class, or no further improvement in pattern classification is possible

according to problem–derived and/or implementation/application–specific efficiency–versus–

accuracy priorities and constraints (to be detailed in Section 3.1.).

It is important that the exclusive – as well as recursive, in the form of data(set) dichotomisation

at successive levels – application of binary classification ensures efficient (NP–complete) BDT

construction [7] (Hyafil & Rivest, 1976), as well as most effective operation/dataset

classification.

Probabilistic Neural Networks (PNNs)

Based on Bayesian classification, the non–parametric four–layered PNN comprises:

(1) the input layer, containing as many nodes as there are input dimensions of the data

pattens to be classified, and performing independent normalisation of each

characteristic/dimension of the input patterns;

(2) the pattern layer, comprising as many nodes as there are training pattens, and

implementing full connectivity with the input layer;

(3) the summation layer, encoding one (distinct) class of the dataset per summation node,

with each node being connected exclusively to the nodes of the pattern layer that belong

to the class pertaining to the summation node;

(4) the output layer, containing as many nodes as there are classes, and comparing the

(distance–based weighted) activation values of the summation nodes in order to assign

each test pattern to the “winner” class, i.e. to the class represented by the summation

node that returns the highest activation/output (implemented as a max–function).

Both PNN training and testing are swift, with training being completed following a single

presentation of the training patterns, and testing (classification) being applied via a forward

sweep across the PNN layers for each test pattern, in turn. Thus, provided that (i) the training

set is adequately representative of the problem–space and (ii) the value of the sole tunable PNN

parameter (the smoothing factor, also termed spread σ€[0 1]1) is selected appropriately, a

purely data–driven, non–parametric, maximally simple and general – yet discriminative (i.e.

accurately representing the local class distribution of the problem–space) – hyper–surface is

created, which implements optimal class assignment of the training patterns, while further

1 also known as spread, σ determines the (constant over the entire training set) strength and extent of influence

that each training pattern exerts on the class–separating hypersurface; σ is expressed as a bell–shaped decreasing

function of the distance between the test pattern and each training pattern, with the extent of influence and the

expression of decay being proportional and inversely proportional, respectively, to the value of σ

Page 4 of 11

138

European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022

Services for Science and Education – United Kingdom

preserving the local morphology of the separating hypersurface. Consequently, the PNN is

rendered not only capable of correctly identifying the class of each training pattern, but also of

accurately deducing the class of each test/novel pattern, provided that the novel pattern shares

the characteristics of (i.e. is derived from the same distribution as) the training–pattern

space/exemplars.

PNNs demonstrate accuracy, efficiency, as well as robustness to noise and to missing

information, further proving resilient to corrupt inputs and outliers. However – and owing to

the explicit representation of the data in the nodes of the pattern layer –, it may be expedient to

perform training pattern selection/extraction2 for reducing the cardinality of the pattern layer

nodes and, thus also, the time–and space–complexity of PNN operation/ classification.

The Iris Benchmark Dataset (IBCD)

The “classic” IBCD comprises four features relating to petal length and width, as well as to sepal

length and width, of 150 iris flowers pertaining to three species of iris (namely Iris setosa, Iris

virginica and Iris versicolor), with 50 specimens having been collected from each species.

Based on combinations between the four features, [8] (Fisher, 1936) developed a linear

discriminant model for distinguishing between the three species. In a nutshell, the three iris

classes are separable in the projection of the four parameters on the principal component, with

(I) only the species of Iris setosa being linearly separable from both of the other two species,

and (II) separability from the Iris virginica and Iris versicolor species being derived from the

petal length and petal width IBD features (as can be inferred directly from the ranges of each

feature shown in Table 1 per Iris class).

The IBCD is used for demonstrating the proposed methodology in the following Sections as: on

the one hand, both the dataset and its partitioning are tractable, whereby the incremental

nature of the PNN–BDT combination methodology can be detailed in such a manner as to be

easy to follow and duplicate; on the other hand, there is a considerable body of classification

methodologies that have been implemented for identifying the three IBCD classes (e.g. [9]

Gorban et al., 2007), whereby comparisons with the proposed methodology (in terms of

classification accuracy, as well as efficiency) can be made directly by the interested reader.

Table 1. Ranges of the four characteristics/features (measured in cm) for each of the three iris

classes

Iris Class Sepal

length

Sepal

width

Petal

length

Petal

width

Setosa [4.3 5.8] [2.3 4.4] [1.0 1.9] [0.1 0.6]

Versicolor [4.9 7.0] [2.0 3.4] [3.0 5.1] [1.0 1.8]

Virginica [4.9 7.9] [2.2 3.8] [4.5 6.9] [1.4 2.5]

IMPLEMENTATION OF THE PROPOSED BDT/PNN COMBINATION

Given their different characteristics (as expounded in Sections 2.1–2.2), the two constituent

methodologies (BDTs and PNNs) operate according to divergent problem–space partitioning

rules/criteria, which result – in turn – in distinct (yet complementary) means of partitioning

2 ensuring that the distribution of the training patterns in the pattern space is not altered in any way

Page 5 of 11

139

Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency

in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,

10(1). 135-145.

URL: http://dx.doi.org/10.14738/aivp.101.11578

the problem space, so that the iris classes can be distinguished from each other in an optimal

manner.

It is demonstrated in the following that it is advantageous to combine/intertwine BDTs and

PNNs for simultaneously maximising prediction accuracy and efficiency in classification tasks,

thus reaching a level that is well beyond that accomplished independently by either constituent

methodology. Such a combination is also of interest as most of the relevant literature critically

compares the operation of the two methodologies (e.g. [10–12] Curram & Mingers, 1994;

Bouzida & Cuppens, 2006; Tso & Yau, 2007), with a significantly poorer body of research

combining the two methodologies to advantage (e.g. [13-14] Jerez–Aragonés et al., 2003; Goel

et al., 2003).

IBCD Classification via the Two-Step Core and the Level-by-Level Expansion of the

BDT/PNN Combination

The proposed methodology is quite straightforward. For the following demonstration and the

ensuing tests/results, as well as in order to observe generality and ease of operation, the value

of the σ parameter has been kept constant to 0.13 over all PNNs (i.e. all levels of the BDT/PNN

and all folds of the FFCV procedure [15] Devinger & Kittler, 1982).

The core of the BDT/PNN combination is expressed via the pair of steps described below:

(step 1) the optimal BDT (hyper–)linear bi–partitioning of the entire dataset for the first level,

and of every part thereof for the next levels, is determined, with each partition being assigned

to a new node of the next BDT level:

(step 2) while each single–class partition derived from (step 1) automatically renders the

corresponding node a terminal node, a PNN is assigned to each multi/mixed–class partition for

implementing the non–parametric creation of the hypersurface that maximises class

separability over the corresponding part of the problem–space.

BDT/PNN level-by-level expansion is applied via step1 & step2, and recursively to each non–

terminal partition of the dataset, ending once the cumulative classification accuracy (CCA) of

the terminal (either BDT or PNN) node(s) is maximised. By defining the termination criterion

as the maximisation of CCA, both (I) the thoroughly successful partitioning/ classification of as

many patterns as possible, and (II) the concurrent identification of the partitions that are

deemed too small to allow reliable classification, are implemented.

In a nutshell, the BDT/PNN is implemented as follows:

• The PNN classifier is constructed on the entire training set of the dataset; the training

and test errors TR and TS, respectively, are evaluated. If both TR and TS equal 0, the

process is terminated and the classes are returned. Otherwise

• The first/top node of the BDT is created, with all the training patterns being assigned to

this node. A pair of nodes of the next BDT level is created such that class separability of

the patterns assigned to each partition (“branch”) is maximised. While each “next BDT

level” node containing members of a single class becomes a terminal node returning the

3 the default value (σ=0.1) of PNN operation has been used in order to demonstrate generality and robustness of

operation, i.e. without necessitating fine–tuning of the involved parameters

Page 6 of 11

140

European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022

Services for Science and Education – United Kingdom

class of the corresponding part of the problem–space, each remaining (non–terminal

node) is assigned to a PNN.

• Each of the created PNNs is tackled independently, as follows: The training (TR) and test

(TS) errors for the left branch (namely TR1l and TS1l) and for the right branch (namely

TR1r and TS1r) of the BDT are calculated.

• If TR or TS assume non–zero values, a check is made as to if both relations {TR1l<TR and

TS1l<TS} as well as {TR1r<TR and TS1r<TS} hold for the left and the right branches,

respectively.

• If both relations hold, the BDT is further expanded along each branch and the procedure

is repeated.

• Otherwise (namely if either (1) a terminal BDT node is reached, or (2) at least one of the

aforementioned relations fails to hold), the expansion procedure of the BDT is

discontinued along the specific branch. It has been observed that, in these cases, the

corresponding part of the dataset is deemed too small for the separating surface that has

been created during PNN training to accurately/consistently represent the pattern

space in terms of both pattern-class distribution and class separation/boundaries.

Consequently, (I) each new level and each new branch of the BDT/PNN is handled

independently, which signifies that it is not necessary for all PNNs to correspond to

branches of the same BDT/PNN level; (II) whenever a node of a branch becomes

terminal, the other node of the branch is tackled by a BDT, for efficiency as well as for

uniformity between BDT-paired branches.

Following construction, the BDT/PNN combination is capable of accurately – as well as most

efficiently – (Ι) identifying/predicting the class of each training/test pattern, respectively, while

also (II) correctly predicting the class of each test/novel pattern, provided that the test patterns

have been derived from the same distribution as (i.e. share the characteristics of) the training

patterns.

Classification Accuracy

FFCV has been performed for evaluating the prediction potential of the PNN, of the BDT and of

their combination on the IDB, as this form of CV has been used by the majority of classification

methodologies in the relevant literature. Another – equally important – reason for selecting

FFCV is that the CV scheme of training on 4/5ths of the dataset – i.e. 120 patterns – and testing

on the remaining 1/5th of the dataset – i.e. 30 patterns –, affords the most stringent conditions

(from among the most commonly used CV schemes) of validating class prediction.

The FFCV procedure is implemented as follows: the IDB is partitioned randomly into five

equally sized sets, with each set comprising 10 patterns from each iris class. For the ith (i=1,

2,...,5) execution of FFCV, the ith fold is reserved for testing each of the three methodologies, as

these have been set–up/trained using the union of the remaining four folds (i.e. 1st to (i–1)th

and (i+1)th to 5th). Such a validation set–up ensures that, at the completion of the FFCV

procedure, each pattern of the IBD shall have been used (a) four times for training/setting up,

and (b) exactly once for testing, the BDT and PNN methodologies, with the test sets of the five

folds fully covering the problem space with no duplication. By being more stringent than both

of the most popular (namely 10–fold and leave–one–out) versions of CV, the implemented

scheme is also more adept at highlighting the potential of the proposed methodology; in the

Page 7 of 11

141

Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency

in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,

10(1). 135-145.

URL: http://dx.doi.org/10.14738/aivp.101.11578

following, the cumulative classification accuracy (CCA) is used for presenting the results of

applying FFCV to the IBCD.

Fig. 1. BDT created for the IBD for the first fold of FFCV; “PNN” and grey bars denote terminal

PNN and BDT nodes, respectively

Table 2. BDT CCA for training and test sets per class and overall

Classification

Accuracy (%)

Setosa Versicolor Virginica All

Training set 100 93

88 93.67

Test set 100 90 94.9

Table 3. PNN CCA for training and test sets per class and overall

Classification

Accuracy (%)

Setosa Versicolor Virginica All

Training set 100 96.5

94.1

97 97.83

Test set 100 94 96.6

Table 4. BDT/PNN CCA for training and test sets per class and overall

Classification

accuracy (%)

Setosa Versicolor Virginica All

Training set 100 100 100 100

Test set 100 96.2 95.1 97.9

It is important that stable results have been collected over the FFCV tests for each methodology

(BDT, PNN and their combination), with very similar results obtained (I) for all five folds, as

well as (II) also under conditions of LOO and 10–fold CV.

Tables 2–4 present CCA overall, as well as computed independently per IBD class by the BDT,

the PNN, and the BDT/PNN combination methodologies, respectively. PNN–derived

classification has been found superior to that of the BDT by 4% overall, with the BDT/PNN

combined classification (i) demonstrating a significantly superior performance over both

constituent methodologies, and in all three cases (ii) observing the error–free identification of

Page 8 of 11

142

European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022

Services for Science and Education – United Kingdom

the Iris setosa. The distinct (yet complementary) strengths and weaknesses of the two

methodologies are highlighted by comparing the CCAs of the BDT, the PNN and their

combination. Overall, while the BDT is especially adept at generalising to/predicting novel iris

flowers, the PNN is most accurate in the classification of known iris flowers. It is also interesting

– especially when efficiency is of the essence – that (as can be discerned from the same Tables)

the CCA performance of both the PNN and the BDT/PNN combination for the IBCD is

comparable to that of the PNN following the very first level of BDT/PNN construction.

Table 5. BDT/PNN CCA after the first level for training and test sets per class and overall

Classification

accuracy (%)

Setosa Versicolor Virginica All

Training set 100 96.5 96 97.5

Test set 100 94 95 97

While the size of the PNN of all the folds of FFCV for the IBD is fixed (equaling 4–120–3–1), the

BDT structures of the five folds have been found slightly different to each other, mainly as far

as the exact choice of parameter values is concerned (rather than in terms of the sequence –

per se – of the choice of parameters for dichotomisation), thus demonstrating satisfactory

stability. The BDT corresponding to the first fold of FFCV for the IBCD (comprising five levels,

five non–terminal nodes and six terminal nodes) is illustrated in Figure 1; the parameter/value

bi–partitioning pair corresponding to each node is also shown, with the terminal nodes being

represented by filled rectangles. A comparison between BDT (Fig. 1) and BDT/PNN (Fig. 2)

construction further highlights the efficiency endowed to classification by applying the

proposed combination.

Fig. 2. BDT/PNN combination created for the IBCD; first fold of FFCV; aiming at efficiency, the

combination is initiated invariably with a BDT; “PNN” and grey bars denote terminal PNN and

BDT nodes, respectively

Computational Complexity

In terms of computational complexity, the two constituent methodologies are clearly

complementary to each other: on the one hand, even though the BDT is three times more

complex than the PNN during creation/training, it is rendered 160 times more efficient than

the PNN during testing (Tables 6–7); on the other hand, while the PNN implements

simultaneous classification over all classes, the BDT follows a “dual”, more focused, sequential

divide–and–conquer handling of the dataset.

Page 9 of 11

143

Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency

in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,

10(1). 135-145.

URL: http://dx.doi.org/10.14738/aivp.101.11578

Table 6. BDT computational complexity, averaged results of applying FFCV

Comparisons

during creation 356.4

during testing 2.21

Table 7. PNN computational complexity, averaged results of applying FFCV

Comparisons

during training 120

during testing 120

Table 8. BDT/PNN computational complexity during the first step; averaged results of applying

FFCV

Comparisons

during creation 153.2 (113.2 for BDT & 40 for PNN)

during testing 41 ( 1 for BDT & 40 for PNN)

Table 9. BDT/PNN computational complexity, averaged results of applying FFCV

Comparisons

during

creation

238.60 (198.60 for BDT & 40 for PNN)

during testing 41.67 ( 1.67 for BDT & 40 for PNN)

A comparison between Tables 8 and 9 confirms the significant decrease in computational

complexity observed at each successive application of the BDT/PNN combination. A detailed

demonstration of the BDT/PNN combination on the IBCD, under conditions of five–fold cross–

validation (FFCV)4 [15], is presented in the following Section, for illustrating the operational

details of the BDT/PNN combination, while also attesting to its competence in terms of

simplicity, consistency, efficiency and effectiveness of operation. A novel termination criterion

concerning the stability of BDT expansion is also used – and proposed – in the next Section.

AN EVALUATION OF THE BDT-, THE PNN- AND THE BDT/PNN-COMBINATION

CLASSIFICATION OF THE IBCD

It is important that stable results have been collected by all the tests, namely BDT–, PNN–, as

well as BDT/PNN–based, under conditions of LOO, 10FCV, as well as FFCV.

While the size of the PNN of all the folds of FFCV for the IBD is fixed, equaling 4–120–3–1, the

BDTs of the five folds have been found slightly different to each other, mainly as far as the choice

of parameter values (rather than the choice of parameter per se) for each node is concerned.

The BDT corresponding to the first fold of FFCV for the IBD is illustrated in Figure 1, comprising

five levels, five non–terminal nodes and six terminal nodes; the parameter–value pair

corresponding to each node is shown in the figure, with the terminal nodes being represented

by filled rectangles.

Comparing the two methodologies, it can be observed that both are 100% correct in identifying

the Setosa irises (the class that is linearly separable from the other two classes), with the

4 complying to the customary/prevalent means of setting up and testing this dataset in the existing literature

Page 10 of 11

144

European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022

Services for Science and Education – United Kingdom

Versicolor and Virginica iris classes not being linearly separable. On the one hand, the PNN is

more accurate than the BDT, both overall (by about 4%), and at classifying the Versicolor and

Virginica irises of the training sets. In a complementary fashion, the BDT appears to be slightly

more accurate than the PNN at classifying the Versicolor irises of the test sets, as well as at

generalising to novel iris patterns; the BDT demonstrates, in fact, higher classification accuracy

in classifying (and generalising to) unknown – rather than known – irises. Tables 4 and 5

describe the computational complexity of the two methodologies during creation and operation

(testing). Clearly, the PNN is moderately complex, both during training and testing. Conversely,

the BDT is quite computationally complex during creation, namely in determining the optimal

parameter–value pair that maximally separates training patterns from different classes, yet

becomes extremely efficient during testing, with an average of just 2.21 comparisons required

for classifying a novel iris pattern.

Table 6 tabulates the classification accuracy accomplished by the BDT/PNN combination

methodology, demonstrating the superior performance accomplished by combining the

advantages of the two constituent methodologies, in terms of both accuracy and computational

complexity. Even though the PNN has been found –already –, superior to the BDT for the IBD

data (Tables 2–3), classification accuracy is further improved by the combined use of the PNN

and the BDT; in fact, the training set is always 100% accurately/correctly classified. Figure 2

depicts the optimal BDT/PNN combination for the first fold of FFCV for the IBD, with the created

BDT comprising four levels, four non–terminal nodes and five terminal nodes5, and with the

average size of the two PNNs corresponding to the two non–terminal nodes of the third level

equaling 4–40–3–1. The computational complexity of the BDT/PNN combination is also clearly

lower than that of both the entire PNN and the entire BDT during training/creation, even

though it is found slightly higher during testing.

When considering both accuracy and computational complexity during training/creation, as

well as during testing, the proposed BDT/PNN methodology is clearly superior to both

constituent methodologies. By demonstrating the BDT/PNN combination on a benchmark

classification dataset, its suitability to a variety of hard “interesting” problems is put forward,

with the designation of criteria concerning the optimal extent of BDT creation that is necessary

for attaining maximally accurate and efficient classification being provided as well as discussed.

CONCLUSIONS

A systematic combination of BDTs and PNNs has been put forward for simultaneously

maximising prediction accuracy and efficiency beyond that accomplished independently by

each constituent methodology. The global and linear pattern separation of the BDTs has been

combined systematically with the localised, versatile and non–linear categorisation of the PNN:

BDT/PNN creation and expansion alternate, with BDT creation, expansion, and the assignment

of the resulting dataset partitions to as many PNNs being implemented. The procedure is

iterative/ recursive, culminating as soon as the CCA of the terminal BDT nodes, as well as of the

PNNs created from the non–terminal BDT nodes is maximised.

In all, the proposed methodology unites the strengths of the two constituent classification

methods, with BDT and PNN creation proceeding concurrently in a level–by–level fashion. Once

55 two per PNN for the classes of Iris setosa and of Iris versicoloc & virginica

Page 11 of 11

145

Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency

in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,

10(1). 135-145.

URL: http://dx.doi.org/10.14738/aivp.101.11578

the nodes of each next BDT level have been created, the dataset partitions resulting from each

branch of the newly created nodes are determined, with a PNN being constructed for classifying

the dataset partitions corresponding to non–terminal BDT nodes. BDT expansion ceases as

soon as the CCA of the terminal BDT and PNNs nodes is maximised.

By demonstrating the BDT/PNN combination on a benchmark classification dataset, its

suitability to a variety of hard “interesting” problems is put forward, while the designation of

criteria concerning the optimal extent of BDT creation that is necessary for attaining maximally

accurate and efficient classification have been provided as well as discussed. Future research

shall be focused upon further validation of the proposed combination methodology on more

benchmark datasets, as well as on Big datasets, for further confirming the advantages – as well

as the limitations –of the BDT/PNN combination methodology.

References

[1]. Duda, R.O., Hart, P.E., Stork, D.G. (2000). Pattern Classification, 2nd Edition (ISBN: 978–0–471–05669–

0 November 2000)

[2]. Loh W.Y., Vanichsetakul N. (1988). Tree–structured classification via generalized discriminant analysis, Journal of

the American Statistical Association, vol. 83, pp. 715–728

[3]. Mitchell, T. (1997). Decision tree learning, in Machine Learning, The McGraw–Hill Companies, Inc., pp. 52–78

[4]. Specht, D. (1990). Probabilistic neural networks, Neural Networks, vol. 3, pp. 109–118

[5]. Specht, D.F. (1990). Probabilistic neural networks for classification, mapping, or associative memory, IEEE 1998

International Conference on Neural Networks, vol. 1, pp. 111–121

[6]. Anderson, E. (1935). The irises of the Gaspé Peninsula, Bulletin of the American Iris Society, vol. 59, pp. 2–5

[7]. Hyafil, L, Rivest, R.L. (1976). "Constructing optimal binary decision trees is NP–complete." Information Processing

Letters 5 (1976), pp. 15–17

[8]. Fisher, R.A. (1936). The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, vol. 7, pp. 179–

188

[9]. Gorban A.N., Sumner N.R., Zinovyev A.Y 2007 · Topological grammars for data approximation, Applied

Mathematics Letters, vol. 20 (2007) pp. 382–386, DOI:10.1016/j.aml.2006.04.022

[10]. Curram, S.P., Mingers, J. (1994). Neural networks, decision tree induction and discriminant analysis: an empirical

comparison, Journal of the Operational Research Society, vol. 45, pp. 440–450

[11]. Bouzida, Y., Cuppens, F. (2006). Neural networks vs. decision trees for intrusion detection, Proceedings of the

IEEE/IST Workshop on Monitoring, Attack Detection and Mitigation, Tuebingen, Germany, September 28th–29th,

2006

[12]. Tso, G.K.F., Yau, K.W. (2007). Predicting electricity energy consumption: A comparison of regression analysis,

decision tree and neural networks, Energy, vol. 32, pp. 1761–1768

[13]. Jerez–Aragonés, J.M.,Gómez–Ruiz, J.A., Ramos–Jiménez, G.,Muñoz–Pérez, J.,Alba–Conejo, E. (2003). A combined

neural network and decision trees model for prognosis of breast cancer relapse, Artificial Intelligence in Medicine,

vol. 27, pp. 45–63

[14]. Goel, P.K., Prasher, S.O., Patel, R.M., Landry, A.J., Bonnell, R.B., Viau, A.A. (2003). Classification of hyperspectral

data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn, Computers

and Electronics in Agriculture, vol. 39, pp. 67–93

[15]. Devijver P.A., Kittler J. (1982). Pattern Recognition: A Statistical Approach, Prentice–Hall, London, U.K.