Page 1 of 11
European Journal of Applied Sciences – Vol. 10, No. 1
Publication Date: February 25, 2022
DOI:10.14738/aivp.101.11578. Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising
Accuracy and Efficiency in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification
Dataset. European Journal of Applied Sciences, 10(1). 135-145.
Services for Science and Education – United Kingdom
Intertwining Binary Decision Trees and Probabilistic Neural
Networks for Maximising Accuracy and Efficiency in Classification
Tasks – a Pilot /Proof–of Concept Demonstration on the Iris
Benchmark Classification Dataset
Tatiana Tambouratzis
Department of Industrial Management & Technology
University of Piraeus, 107 Deligiorgi St., Piraeus 185 34, Greece
Dora Souliou
School of Electrical and Computer Engineering
National Technical University of Athens, 9 Iroon Polytechniou St.
Zografou, Athens 157 80, Greece
ABSTRACT
Intertwining binary decision trees (BDTs) and probabilistic neural networks
(PNNs) is put forward as a powerful custom–made methodology for simultaneously
maximising accuracy and efficiency in classification tasks. The proposed
methodology brings together the complementary strengths of (I) the parametric,
global, recursive, efficient as well as maximal dichotomisation of the BDT, and (II)
the non–parametric, accurate–to–local–detail, multi–class identification of the
PNN. The BDT/PNN combination evolves level–by–level, with each level comprising
two steps: (step 1) the optimal BDT (hyper–)linear bi–partitioning of the entire
dataset for the first level, and of every non–terminal partition thereof for the next
levels, is determined, with each resulting partition being assigned to a new node of
the next BDT level; (step 2) as many PNNs are created as there are multi–class
partitions resulting from (step 1), for implementing the non–parametric creation of
the hyperplane that maximises class separability over each corresponding partition
of the problem–space. BDT/PNN expansion is applied recursively – and
independently – to each non–terminal partition of the dataset, concluding once the
cumulative classification accuracy (CCA) of the terminal – BDT and/or PNN –
nodes of the BDT/PNN is maximised. A proof–of–concept demonstration on the iris
benchmark classification dataset (IBCD) illustrates the details of the BDT/PNN
combination, while attesting to its simplicity, effectiveness, and efficiency of
operation. Follow–up research shall focus upon the classification of additional
benchmark datasets, as well as of “hard” and “Big” real–world problems, for further
evaluating and validating the proposed methodology.
Keywords: parametric binary decision tree (BDT), non–parametric probabilistic neural
network (PNN), data partitioning, class separability, cumulative classification accuracy
(CCA), computational efficiency, iris benchmark classification dataset (IBCD), five–fold
cross–validation (FFCV)
Page 2 of 11
136
European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022
Services for Science and Education – United Kingdom
INTRODUCTION
Pattern classification can be defined as the automated identification of characteristics
(patterns) in data, which have the potential to be used in a consistent manner for allocating
each datum to its most representative class, thereby ensuring that the members of the same
class share identical/“sufficiently” similar characteristics, while the members of different
classes display distinct/divergent characteristics. More formally [1] (Duda et al., 2000), pattern
classification constitutes the assignment of a label to each member of a group of interest,
according to a set of salient characteristics/features which are shared by the members of the
same group, but which differ across the members of different groups.
Among a multitude of classification methodologies, (I) decision trees (DTs) – especially binary
decision trees (BDTs) [2–3] (Loh & Vanichsetakul, 1988; Mitchell, 1997) – and (II) probabilistic
artificial neural networks – (PNNs) [4–5] (Specht, 1990a; Specht, 1990b) – constitute
successful, as well as transparent state–of–the–art pattern classification methodologies. Owing
to their differing characteristics, the two methodologies operate according to distinct –yet
complementary – problem–space partitioning rules and criteria, principles, as well as aims. The
proposed combination demonstrates that it is advantageous to intertwine BDTs and PNNs in a
systematic manner for simultaneously maximising prediction accuracy and efficiency in
classification tasks, thus reaching a level that is far beyond that accomplished independently by
either constituent classification methodology.
In the following Sections, a critical exposition of the BDT/PNN construction, operation and
potential is provided, followed by a demonstration on the well–known iris benchmark
classification dataset (IBCD) [6] (Anderson, 1935), where an improvement in both classification
accuracy and operation efficiency is observed over the entire and pruned BDTs, as well as over
the entire and partial PNNs. Testing the methodology on a benchmark classification dataset
aims at verifying the classification prowess of the BDT/PNN combination, thus promoting its
application to a variety of hard problems, while also defining criteria concerning the extent of
BDT and PNN creation/expansion that is appropriate for attaining maximally accurate,
consistent and – at the same time – efficient classification.
The remainder of this contribution is organised as follows: Section 2 presents the two
constituent methodologies (BDTs and PNNs), followed by a description of the IBCD, which has
been used for demonstrating the proof–of–concept of the proposed BDT/PNN combination;
Section 3 introduces the means of combining/intertwining the two methodologies,
subsequently presenting the BDT/PNN performance on the IBCD, as this is expressed in terms
of construction characteristics, as well as of prediction prowess in the guise of cumulative
classification accuracy (CCA) and computational complexity; Section 4 critically highlights the
characteristics and evaluates – as well as justifies - the superior performance that is
accomplished by the systematic BDT/PNN combination in classification tasks over both of the
constituent methodologies; finally, Section 5 summarises the main features of and the
advancement/improvement afforded by the proposed combination of the BDT and PNN
methodologies, thus concluding this contribution.
Page 3 of 11
137
Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency
in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,
10(1). 135-145.
URL: http://dx.doi.org/10.14738/aivp.101.11578
CONSTITUENT METHODOLOGIES & DEMONSTRATION DATA
Binary Decision Trees (BDTs)
The parametric BDT is created recursively, on a level–by–level basis. Initially, the entire dataset
is assigned to the first (single top–level) node of the BDT. If the top–level node contains patterns
of a single class, it becomes a terminal node and the BDT expansion procedure is terminated.
Otherwise, the hyper–surface that maximally bi–partitions the patterns of the top-level node is
created, with the resulting pair of partitions being assigned to a pair of nodes of the second
BDT level. The process of bi–partitioning each non–terminal node is repeated recursively for
each next level of the BDT, terminating as soon as, either every node of a given level contains
patterns of a single class, or no further improvement in pattern classification is possible
according to problem–derived and/or implementation/application–specific efficiency–versus–
accuracy priorities and constraints (to be detailed in Section 3.1.).
It is important that the exclusive – as well as recursive, in the form of data(set) dichotomisation
at successive levels – application of binary classification ensures efficient (NP–complete) BDT
construction [7] (Hyafil & Rivest, 1976), as well as most effective operation/dataset
classification.
Probabilistic Neural Networks (PNNs)
Based on Bayesian classification, the non–parametric four–layered PNN comprises:
(1) the input layer, containing as many nodes as there are input dimensions of the data
pattens to be classified, and performing independent normalisation of each
characteristic/dimension of the input patterns;
(2) the pattern layer, comprising as many nodes as there are training pattens, and
implementing full connectivity with the input layer;
(3) the summation layer, encoding one (distinct) class of the dataset per summation node,
with each node being connected exclusively to the nodes of the pattern layer that belong
to the class pertaining to the summation node;
(4) the output layer, containing as many nodes as there are classes, and comparing the
(distance–based weighted) activation values of the summation nodes in order to assign
each test pattern to the “winner” class, i.e. to the class represented by the summation
node that returns the highest activation/output (implemented as a max–function).
Both PNN training and testing are swift, with training being completed following a single
presentation of the training patterns, and testing (classification) being applied via a forward
sweep across the PNN layers for each test pattern, in turn. Thus, provided that (i) the training
set is adequately representative of the problem–space and (ii) the value of the sole tunable PNN
parameter (the smoothing factor, also termed spread σ€[0 1]1) is selected appropriately, a
purely data–driven, non–parametric, maximally simple and general – yet discriminative (i.e.
accurately representing the local class distribution of the problem–space) – hyper–surface is
created, which implements optimal class assignment of the training patterns, while further
1 also known as spread, σ determines the (constant over the entire training set) strength and extent of influence
that each training pattern exerts on the class–separating hypersurface; σ is expressed as a bell–shaped decreasing
function of the distance between the test pattern and each training pattern, with the extent of influence and the
expression of decay being proportional and inversely proportional, respectively, to the value of σ
Page 4 of 11
138
European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022
Services for Science and Education – United Kingdom
preserving the local morphology of the separating hypersurface. Consequently, the PNN is
rendered not only capable of correctly identifying the class of each training pattern, but also of
accurately deducing the class of each test/novel pattern, provided that the novel pattern shares
the characteristics of (i.e. is derived from the same distribution as) the training–pattern
space/exemplars.
PNNs demonstrate accuracy, efficiency, as well as robustness to noise and to missing
information, further proving resilient to corrupt inputs and outliers. However – and owing to
the explicit representation of the data in the nodes of the pattern layer –, it may be expedient to
perform training pattern selection/extraction2 for reducing the cardinality of the pattern layer
nodes and, thus also, the time–and space–complexity of PNN operation/ classification.
The Iris Benchmark Dataset (IBCD)
The “classic” IBCD comprises four features relating to petal length and width, as well as to sepal
length and width, of 150 iris flowers pertaining to three species of iris (namely Iris setosa, Iris
virginica and Iris versicolor), with 50 specimens having been collected from each species.
Based on combinations between the four features, [8] (Fisher, 1936) developed a linear
discriminant model for distinguishing between the three species. In a nutshell, the three iris
classes are separable in the projection of the four parameters on the principal component, with
(I) only the species of Iris setosa being linearly separable from both of the other two species,
and (II) separability from the Iris virginica and Iris versicolor species being derived from the
petal length and petal width IBD features (as can be inferred directly from the ranges of each
feature shown in Table 1 per Iris class).
The IBCD is used for demonstrating the proposed methodology in the following Sections as: on
the one hand, both the dataset and its partitioning are tractable, whereby the incremental
nature of the PNN–BDT combination methodology can be detailed in such a manner as to be
easy to follow and duplicate; on the other hand, there is a considerable body of classification
methodologies that have been implemented for identifying the three IBCD classes (e.g. [9]
Gorban et al., 2007), whereby comparisons with the proposed methodology (in terms of
classification accuracy, as well as efficiency) can be made directly by the interested reader.
Table 1. Ranges of the four characteristics/features (measured in cm) for each of the three iris
classes
Iris Class Sepal
length
Sepal
width
Petal
length
Petal
width
Setosa [4.3 5.8] [2.3 4.4] [1.0 1.9] [0.1 0.6]
Versicolor [4.9 7.0] [2.0 3.4] [3.0 5.1] [1.0 1.8]
Virginica [4.9 7.9] [2.2 3.8] [4.5 6.9] [1.4 2.5]
IMPLEMENTATION OF THE PROPOSED BDT/PNN COMBINATION
Given their different characteristics (as expounded in Sections 2.1–2.2), the two constituent
methodologies (BDTs and PNNs) operate according to divergent problem–space partitioning
rules/criteria, which result – in turn – in distinct (yet complementary) means of partitioning
2 ensuring that the distribution of the training patterns in the pattern space is not altered in any way
Page 5 of 11
139
Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency
in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,
10(1). 135-145.
URL: http://dx.doi.org/10.14738/aivp.101.11578
the problem space, so that the iris classes can be distinguished from each other in an optimal
manner.
It is demonstrated in the following that it is advantageous to combine/intertwine BDTs and
PNNs for simultaneously maximising prediction accuracy and efficiency in classification tasks,
thus reaching a level that is well beyond that accomplished independently by either constituent
methodology. Such a combination is also of interest as most of the relevant literature critically
compares the operation of the two methodologies (e.g. [10–12] Curram & Mingers, 1994;
Bouzida & Cuppens, 2006; Tso & Yau, 2007), with a significantly poorer body of research
combining the two methodologies to advantage (e.g. [13-14] Jerez–Aragonés et al., 2003; Goel
et al., 2003).
IBCD Classification via the Two-Step Core and the Level-by-Level Expansion of the
BDT/PNN Combination
The proposed methodology is quite straightforward. For the following demonstration and the
ensuing tests/results, as well as in order to observe generality and ease of operation, the value
of the σ parameter has been kept constant to 0.13 over all PNNs (i.e. all levels of the BDT/PNN
and all folds of the FFCV procedure [15] Devinger & Kittler, 1982).
The core of the BDT/PNN combination is expressed via the pair of steps described below:
(step 1) the optimal BDT (hyper–)linear bi–partitioning of the entire dataset for the first level,
and of every part thereof for the next levels, is determined, with each partition being assigned
to a new node of the next BDT level:
(step 2) while each single–class partition derived from (step 1) automatically renders the
corresponding node a terminal node, a PNN is assigned to each multi/mixed–class partition for
implementing the non–parametric creation of the hypersurface that maximises class
separability over the corresponding part of the problem–space.
BDT/PNN level-by-level expansion is applied via step1 & step2, and recursively to each non–
terminal partition of the dataset, ending once the cumulative classification accuracy (CCA) of
the terminal (either BDT or PNN) node(s) is maximised. By defining the termination criterion
as the maximisation of CCA, both (I) the thoroughly successful partitioning/ classification of as
many patterns as possible, and (II) the concurrent identification of the partitions that are
deemed too small to allow reliable classification, are implemented.
In a nutshell, the BDT/PNN is implemented as follows:
• The PNN classifier is constructed on the entire training set of the dataset; the training
and test errors TR and TS, respectively, are evaluated. If both TR and TS equal 0, the
process is terminated and the classes are returned. Otherwise
• The first/top node of the BDT is created, with all the training patterns being assigned to
this node. A pair of nodes of the next BDT level is created such that class separability of
the patterns assigned to each partition (“branch”) is maximised. While each “next BDT
level” node containing members of a single class becomes a terminal node returning the
3 the default value (σ=0.1) of PNN operation has been used in order to demonstrate generality and robustness of
operation, i.e. without necessitating fine–tuning of the involved parameters
Page 6 of 11
140
European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022
Services for Science and Education – United Kingdom
class of the corresponding part of the problem–space, each remaining (non–terminal
node) is assigned to a PNN.
• Each of the created PNNs is tackled independently, as follows: The training (TR) and test
(TS) errors for the left branch (namely TR1l and TS1l) and for the right branch (namely
TR1r and TS1r) of the BDT are calculated.
• If TR or TS assume non–zero values, a check is made as to if both relations {TR1l<TR and
TS1l<TS} as well as {TR1r<TR and TS1r<TS} hold for the left and the right branches,
respectively.
• If both relations hold, the BDT is further expanded along each branch and the procedure
is repeated.
• Otherwise (namely if either (1) a terminal BDT node is reached, or (2) at least one of the
aforementioned relations fails to hold), the expansion procedure of the BDT is
discontinued along the specific branch. It has been observed that, in these cases, the
corresponding part of the dataset is deemed too small for the separating surface that has
been created during PNN training to accurately/consistently represent the pattern
space in terms of both pattern-class distribution and class separation/boundaries.
Consequently, (I) each new level and each new branch of the BDT/PNN is handled
independently, which signifies that it is not necessary for all PNNs to correspond to
branches of the same BDT/PNN level; (II) whenever a node of a branch becomes
terminal, the other node of the branch is tackled by a BDT, for efficiency as well as for
uniformity between BDT-paired branches.
Following construction, the BDT/PNN combination is capable of accurately – as well as most
efficiently – (Ι) identifying/predicting the class of each training/test pattern, respectively, while
also (II) correctly predicting the class of each test/novel pattern, provided that the test patterns
have been derived from the same distribution as (i.e. share the characteristics of) the training
patterns.
Classification Accuracy
FFCV has been performed for evaluating the prediction potential of the PNN, of the BDT and of
their combination on the IDB, as this form of CV has been used by the majority of classification
methodologies in the relevant literature. Another – equally important – reason for selecting
FFCV is that the CV scheme of training on 4/5ths of the dataset – i.e. 120 patterns – and testing
on the remaining 1/5th of the dataset – i.e. 30 patterns –, affords the most stringent conditions
(from among the most commonly used CV schemes) of validating class prediction.
The FFCV procedure is implemented as follows: the IDB is partitioned randomly into five
equally sized sets, with each set comprising 10 patterns from each iris class. For the ith (i=1,
2,...,5) execution of FFCV, the ith fold is reserved for testing each of the three methodologies, as
these have been set–up/trained using the union of the remaining four folds (i.e. 1st to (i–1)th
and (i+1)th to 5th). Such a validation set–up ensures that, at the completion of the FFCV
procedure, each pattern of the IBD shall have been used (a) four times for training/setting up,
and (b) exactly once for testing, the BDT and PNN methodologies, with the test sets of the five
folds fully covering the problem space with no duplication. By being more stringent than both
of the most popular (namely 10–fold and leave–one–out) versions of CV, the implemented
scheme is also more adept at highlighting the potential of the proposed methodology; in the
Page 7 of 11
141
Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency
in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,
10(1). 135-145.
URL: http://dx.doi.org/10.14738/aivp.101.11578
following, the cumulative classification accuracy (CCA) is used for presenting the results of
applying FFCV to the IBCD.
Fig. 1. BDT created for the IBD for the first fold of FFCV; “PNN” and grey bars denote terminal
PNN and BDT nodes, respectively
Table 2. BDT CCA for training and test sets per class and overall
Classification
Accuracy (%)
Setosa Versicolor Virginica All
Training set 100 93
95
88 93.67
Test set 100 90 94.9
Table 3. PNN CCA for training and test sets per class and overall
Classification
Accuracy (%)
Setosa Versicolor Virginica All
Training set 100 96.5
94.1
97 97.83
Test set 100 94 96.6
Table 4. BDT/PNN CCA for training and test sets per class and overall
Classification
accuracy (%)
Setosa Versicolor Virginica All
Training set 100 100 100 100
Test set 100 96.2 95.1 97.9
It is important that stable results have been collected over the FFCV tests for each methodology
(BDT, PNN and their combination), with very similar results obtained (I) for all five folds, as
well as (II) also under conditions of LOO and 10–fold CV.
Tables 2–4 present CCA overall, as well as computed independently per IBD class by the BDT,
the PNN, and the BDT/PNN combination methodologies, respectively. PNN–derived
classification has been found superior to that of the BDT by 4% overall, with the BDT/PNN
combined classification (i) demonstrating a significantly superior performance over both
constituent methodologies, and in all three cases (ii) observing the error–free identification of
Page 8 of 11
142
European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022
Services for Science and Education – United Kingdom
the Iris setosa. The distinct (yet complementary) strengths and weaknesses of the two
methodologies are highlighted by comparing the CCAs of the BDT, the PNN and their
combination. Overall, while the BDT is especially adept at generalising to/predicting novel iris
flowers, the PNN is most accurate in the classification of known iris flowers. It is also interesting
– especially when efficiency is of the essence – that (as can be discerned from the same Tables)
the CCA performance of both the PNN and the BDT/PNN combination for the IBCD is
comparable to that of the PNN following the very first level of BDT/PNN construction.
Table 5. BDT/PNN CCA after the first level for training and test sets per class and overall
Classification
accuracy (%)
Setosa Versicolor Virginica All
Training set 100 96.5 96 97.5
Test set 100 94 95 97
While the size of the PNN of all the folds of FFCV for the IBD is fixed (equaling 4–120–3–1), the
BDT structures of the five folds have been found slightly different to each other, mainly as far
as the exact choice of parameter values is concerned (rather than in terms of the sequence –
per se – of the choice of parameters for dichotomisation), thus demonstrating satisfactory
stability. The BDT corresponding to the first fold of FFCV for the IBCD (comprising five levels,
five non–terminal nodes and six terminal nodes) is illustrated in Figure 1; the parameter/value
bi–partitioning pair corresponding to each node is also shown, with the terminal nodes being
represented by filled rectangles. A comparison between BDT (Fig. 1) and BDT/PNN (Fig. 2)
construction further highlights the efficiency endowed to classification by applying the
proposed combination.
Fig. 2. BDT/PNN combination created for the IBCD; first fold of FFCV; aiming at efficiency, the
combination is initiated invariably with a BDT; “PNN” and grey bars denote terminal PNN and
BDT nodes, respectively
Computational Complexity
In terms of computational complexity, the two constituent methodologies are clearly
complementary to each other: on the one hand, even though the BDT is three times more
complex than the PNN during creation/training, it is rendered 160 times more efficient than
the PNN during testing (Tables 6–7); on the other hand, while the PNN implements
simultaneous classification over all classes, the BDT follows a “dual”, more focused, sequential
divide–and–conquer handling of the dataset.
Page 9 of 11
143
Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency
in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,
10(1). 135-145.
URL: http://dx.doi.org/10.14738/aivp.101.11578
Table 6. BDT computational complexity, averaged results of applying FFCV
Comparisons
during creation 356.4
during testing 2.21
Table 7. PNN computational complexity, averaged results of applying FFCV
Comparisons
during training 120
during testing 120
Table 8. BDT/PNN computational complexity during the first step; averaged results of applying
FFCV
Comparisons
during creation 153.2 (113.2 for BDT & 40 for PNN)
during testing 41 ( 1 for BDT & 40 for PNN)
Table 9. BDT/PNN computational complexity, averaged results of applying FFCV
Comparisons
during
creation
238.60 (198.60 for BDT & 40 for PNN)
during testing 41.67 ( 1.67 for BDT & 40 for PNN)
A comparison between Tables 8 and 9 confirms the significant decrease in computational
complexity observed at each successive application of the BDT/PNN combination. A detailed
demonstration of the BDT/PNN combination on the IBCD, under conditions of five–fold cross–
validation (FFCV)4 [15], is presented in the following Section, for illustrating the operational
details of the BDT/PNN combination, while also attesting to its competence in terms of
simplicity, consistency, efficiency and effectiveness of operation. A novel termination criterion
concerning the stability of BDT expansion is also used – and proposed – in the next Section.
AN EVALUATION OF THE BDT-, THE PNN- AND THE BDT/PNN-COMBINATION
CLASSIFICATION OF THE IBCD
It is important that stable results have been collected by all the tests, namely BDT–, PNN–, as
well as BDT/PNN–based, under conditions of LOO, 10FCV, as well as FFCV.
While the size of the PNN of all the folds of FFCV for the IBD is fixed, equaling 4–120–3–1, the
BDTs of the five folds have been found slightly different to each other, mainly as far as the choice
of parameter values (rather than the choice of parameter per se) for each node is concerned.
The BDT corresponding to the first fold of FFCV for the IBD is illustrated in Figure 1, comprising
five levels, five non–terminal nodes and six terminal nodes; the parameter–value pair
corresponding to each node is shown in the figure, with the terminal nodes being represented
by filled rectangles.
Comparing the two methodologies, it can be observed that both are 100% correct in identifying
the Setosa irises (the class that is linearly separable from the other two classes), with the
4 complying to the customary/prevalent means of setting up and testing this dataset in the existing literature
Page 10 of 11
144
European Journal of Applied Sciences (EJAS) Vol. 10, Issue 1, February-2022
Services for Science and Education – United Kingdom
Versicolor and Virginica iris classes not being linearly separable. On the one hand, the PNN is
more accurate than the BDT, both overall (by about 4%), and at classifying the Versicolor and
Virginica irises of the training sets. In a complementary fashion, the BDT appears to be slightly
more accurate than the PNN at classifying the Versicolor irises of the test sets, as well as at
generalising to novel iris patterns; the BDT demonstrates, in fact, higher classification accuracy
in classifying (and generalising to) unknown – rather than known – irises. Tables 4 and 5
describe the computational complexity of the two methodologies during creation and operation
(testing). Clearly, the PNN is moderately complex, both during training and testing. Conversely,
the BDT is quite computationally complex during creation, namely in determining the optimal
parameter–value pair that maximally separates training patterns from different classes, yet
becomes extremely efficient during testing, with an average of just 2.21 comparisons required
for classifying a novel iris pattern.
Table 6 tabulates the classification accuracy accomplished by the BDT/PNN combination
methodology, demonstrating the superior performance accomplished by combining the
advantages of the two constituent methodologies, in terms of both accuracy and computational
complexity. Even though the PNN has been found –already –, superior to the BDT for the IBD
data (Tables 2–3), classification accuracy is further improved by the combined use of the PNN
and the BDT; in fact, the training set is always 100% accurately/correctly classified. Figure 2
depicts the optimal BDT/PNN combination for the first fold of FFCV for the IBD, with the created
BDT comprising four levels, four non–terminal nodes and five terminal nodes5, and with the
average size of the two PNNs corresponding to the two non–terminal nodes of the third level
equaling 4–40–3–1. The computational complexity of the BDT/PNN combination is also clearly
lower than that of both the entire PNN and the entire BDT during training/creation, even
though it is found slightly higher during testing.
When considering both accuracy and computational complexity during training/creation, as
well as during testing, the proposed BDT/PNN methodology is clearly superior to both
constituent methodologies. By demonstrating the BDT/PNN combination on a benchmark
classification dataset, its suitability to a variety of hard “interesting” problems is put forward,
with the designation of criteria concerning the optimal extent of BDT creation that is necessary
for attaining maximally accurate and efficient classification being provided as well as discussed.
CONCLUSIONS
A systematic combination of BDTs and PNNs has been put forward for simultaneously
maximising prediction accuracy and efficiency beyond that accomplished independently by
each constituent methodology. The global and linear pattern separation of the BDTs has been
combined systematically with the localised, versatile and non–linear categorisation of the PNN:
BDT/PNN creation and expansion alternate, with BDT creation, expansion, and the assignment
of the resulting dataset partitions to as many PNNs being implemented. The procedure is
iterative/ recursive, culminating as soon as the CCA of the terminal BDT nodes, as well as of the
PNNs created from the non–terminal BDT nodes is maximised.
In all, the proposed methodology unites the strengths of the two constituent classification
methods, with BDT and PNN creation proceeding concurrently in a level–by–level fashion. Once
55 two per PNN for the classes of Iris setosa and of Iris versicoloc & virginica
Page 11 of 11
145
Tambouratzis, T., & Souliou, D. (2022). Intertwining Binary Decision Trees and Probabilistic Neural Networks for Maximising Accuracy and Efficiency
in Classification Tasks – a Pilot /Proof–of Concept Demonstration on the Iris Benchmark Classification Dataset. European Journal of Applied Sciences,
10(1). 135-145.
URL: http://dx.doi.org/10.14738/aivp.101.11578
the nodes of each next BDT level have been created, the dataset partitions resulting from each
branch of the newly created nodes are determined, with a PNN being constructed for classifying
the dataset partitions corresponding to non–terminal BDT nodes. BDT expansion ceases as
soon as the CCA of the terminal BDT and PNNs nodes is maximised.
By demonstrating the BDT/PNN combination on a benchmark classification dataset, its
suitability to a variety of hard “interesting” problems is put forward, while the designation of
criteria concerning the optimal extent of BDT creation that is necessary for attaining maximally
accurate and efficient classification have been provided as well as discussed. Future research
shall be focused upon further validation of the proposed combination methodology on more
benchmark datasets, as well as on Big datasets, for further confirming the advantages – as well
as the limitations –of the BDT/PNN combination methodology.
References
[1]. Duda, R.O., Hart, P.E., Stork, D.G. (2000). Pattern Classification, 2nd Edition (ISBN: 978–0–471–05669–
0 November 2000)
[2]. Loh W.Y., Vanichsetakul N. (1988). Tree–structured classification via generalized discriminant analysis, Journal of
the American Statistical Association, vol. 83, pp. 715–728
[3]. Mitchell, T. (1997). Decision tree learning, in Machine Learning, The McGraw–Hill Companies, Inc., pp. 52–78
[4]. Specht, D. (1990). Probabilistic neural networks, Neural Networks, vol. 3, pp. 109–118
[5]. Specht, D.F. (1990). Probabilistic neural networks for classification, mapping, or associative memory, IEEE 1998
International Conference on Neural Networks, vol. 1, pp. 111–121
[6]. Anderson, E. (1935). The irises of the Gaspé Peninsula, Bulletin of the American Iris Society, vol. 59, pp. 2–5
[7]. Hyafil, L, Rivest, R.L. (1976). "Constructing optimal binary decision trees is NP–complete." Information Processing
Letters 5 (1976), pp. 15–17
[8]. Fisher, R.A. (1936). The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, vol. 7, pp. 179–
188
[9]. Gorban A.N., Sumner N.R., Zinovyev A.Y 2007 · Topological grammars for data approximation, Applied
Mathematics Letters, vol. 20 (2007) pp. 382–386, DOI:10.1016/j.aml.2006.04.022
[10]. Curram, S.P., Mingers, J. (1994). Neural networks, decision tree induction and discriminant analysis: an empirical
comparison, Journal of the Operational Research Society, vol. 45, pp. 440–450
[11]. Bouzida, Y., Cuppens, F. (2006). Neural networks vs. decision trees for intrusion detection, Proceedings of the
IEEE/IST Workshop on Monitoring, Attack Detection and Mitigation, Tuebingen, Germany, September 28th–29th,
2006
[12]. Tso, G.K.F., Yau, K.W. (2007). Predicting electricity energy consumption: A comparison of regression analysis,
decision tree and neural networks, Energy, vol. 32, pp. 1761–1768
[13]. Jerez–Aragonés, J.M.,Gómez–Ruiz, J.A., Ramos–Jiménez, G.,Muñoz–Pérez, J.,Alba–Conejo, E. (2003). A combined
neural network and decision trees model for prognosis of breast cancer relapse, Artificial Intelligence in Medicine,
vol. 27, pp. 45–63
[14]. Goel, P.K., Prasher, S.O., Patel, R.M., Landry, A.J., Bonnell, R.B., Viau, A.A. (2003). Classification of hyperspectral
data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn, Computers
and Electronics in Agriculture, vol. 39, pp. 67–93
[15]. Devijver P.A., Kittler J. (1982). Pattern Recognition: A Statistical Approach, Prentice–Hall, London, U.K.