Page 1 of 7
European Journal of Applied Sciences – Vol. 9, No. 5
Publication Date: October 25, 2021
DOI:10.14738/aivp.95.10932. Abdelfattah, E. H. (2021). A Modern Way to Teach Statistics, With an Application. European Journal of Applied Sciences, 9(5). 210-
216.
Services for Science and Education – United Kingdom
A Modern Way to Teach Statistics, With an Application
Ezz H. Abdelfattah
Department of Statistics, Faculty of Science, King Abdulaziz University
ABSTRACT
The traditional way when teaching statistics is that Statistics has two main
branches, namely Descriptive and Inferential statistics, with probability as a “link”
between them. The Descriptive statistics are brief descriptive coefficients that
summarize a given data set, which can be either a representation of the entire or a
sample of a population, While Inferential statistics are based on a random sample
of data taken from a population to describe and make inferences about the
population. The modern way we suggest for teaching statistics is to divide the
Statistics into three branches. Namely Descriptive, Diagnostic and Predictive
Statistics. In this paper, we will re-classify the inferential Statistics tests to
Diagnostic Statistics tests and Predictive Statistics tests and give an applied
example.
Key words: Inferential Statistics, Diagnostic Statistics, Predictive Statistics
INTRODUCTION
Statisticians use to classify Statistics into two main parts, namely Descriptive and Inferential
Statistics. Here, we suggest reclassifying Inferential Statistics into two parts, namely Diagnostic
Statistics and Predictive Statistics. The Diagnostic statistics depends on Tests of Differences and
Associations, while the Predictive statistics depends on Estimation, Prediction and Forecasting,
with probability as a “link” between both descriptive and inferential statistics as shown in
Figure 1.
Figure 1 “Reclassifying Inferential statistics into Diagnostic and Predictive Statistics”
We will consider having three levels of statistical analysis, namely Descriptive, Diagnostic and
Predictive statistics and will summarize the statistical tools that should be used. In terms of
complexity of the algorithms and techniques involved, descriptive analytics are the simplest.
Both Descriptive and Diagnostic statistics are related to the data already collected, and hence
considered to be related to “past”. While Predictive statistics is related to what is expected to
happen, and hence considered to be related to “future”. Figure 2 shows a relative comparison
of the three different types of analytics related to time.
Page 2 of 7
211
Abdelfattah, E. H. (2021). A Modern Way to Teach Statistics, With an Application. European Journal of Applied Sciences, 9(5). 210-216.
URL: http://dx.doi.org/10.14738/aivp.95.10932
Figure 2 “The relative comparison of the four different types of analytics”
DESCRIPTIVE ANALYTICS
Descriptive analysis is the statistical tools that should answer the question “What had
happened?”. This form of analytics mainly deals with understanding the already gathered data.
It is mainly related to Graphs, Frequency tables, Measures of Central Tendency, Measures of
Variation and Measures of Shape. It involves the use of tools and algorithms to understand the
internal structure of the Data and find categorical or temporal patterns or trends in it. The
statistical tools may be summarized in the Table 1, based on the type of variable and its
measurement level:
Table 1. Basic Statistical tools for descriptive Statistics
Qualitative Quantitative
Nominal Ordinal Interval or Ratio
Basic Graphs Bar, Pie Bar, Pie
Bar (for discrete), Histogram, Polygon,
Curve, Ogive (for continuous), Line (for
time), Scatter diagram, (for binary
data).
Measures of
Central
Tendency
Mode Mode, Median
Mode, Median, Mean, Geometric mean,
Harmonic mean, trimmed mean.
Measures of
Variation - Quartile range Range, Variance, Standard deviation,
Coefficient of variation
Measures of
Position - Quartiles Standard Scores, Percentiles, Quartiles
and Deciles, Skewness, Kurtosis
DIAGNOSTIC ANALYTICS
Once the data is described, the next step is to seek independent variables affecting the Target
(Dependent) variable, through answering the question “Why did it happened?”. Diagnostic
analytics focuses on the reasons behind the observed patterns that are derived from descriptive
analytics. The principal point here is the Target variable’s measurement level and its relation
with the Independent variables (inputs). This may be checked mainly through the Tests for
Means values for the Target using tests of differences and the Tests of Association for the Target
with the inputs. Based on the Target’s measurement level, the statistical tools related to the
differences, (where the inputs’ are categorical) may be summarized in Tables 2, while the
statistical tools related to the association (for any Input’s measurement level) may be
summarized in Table 3.
Page 3 of 7
212
European Journal of Applied Sciences (EJAS) Vol. 9, Issue 5, October-2021
Services for Science and Education – United Kingdom
Table 2. Basic Diagnostic Statistics tools for checking the differences in the Target
Dependent
Independent
Target (Dependent) Measurement level
Qualitative Quantitative
Nominal Ordinal (Rank)
Interval or Ratio
Scale (from Non- Normal Population)
Scale (from Normal
Population)
Groups of the
Categorical
Independent variable
2 independent groups Chi-square test Mann-Whitney Mann-Whitney Independent sample
t test
3+ independent groups Chi-square test Kruskal-Wallis Kruskal-Wallis One-way ANOVA
2 matched groups McNemar Wilcoxon test Wilcoxon test Paired sample t test
3+ matched groups Chi-square test Friedman test Friedman test Repeated
Measurements
Table 3. Basic Diagnostic Statistics tools for checking the association with the Target
Dependent
Independent
Target (Dependent) Measurement level
Qualitative Quantitative
Nominal Ordinal (Rank) Interval or Ratio
Input (Independent)
Measurement level
Nominal
Phi
Contingency Coefficient
Lambda
Bi-serial
Point Bi-serial
Eta
Ordinal (Rank) Bi-serial
Kendall tau
Spearman Rho
Gamma
Ordinal Bi-serial
Interval or
Ratio
Point Bi-serial
Eta
Ordinal Bi- serial
Pearson
PREDICTIVE ANALYTICS
Given the current trends in data identified by both the descriptive and diagnostic analytics,
what might happen in the future is a crucial question. Predictive analytics tools provide insights
into the possible future scenarios. Predictive analytics uses the outcomes of descriptive and
diagnostic analytics to create a model for the future. In other words, analyzing what happened
and gives insights to prepare a model for what is possible in the future, through answering the
question “Why is likely to happen?”. The principal point here, as in the Diagnostic analytics, is
the Target variable’s measurement level and its relation with the Independent variables
(inputs). This may be checked mainly through the Estimation (in case of unknown population
parameter), or Prediction (mainly based on regression techniques) or Forecasting future value
(based on time series techniques), as indicated in Figure 1 and summarized in Table 4.
Page 4 of 7
213
Abdelfattah, E. H. (2021). A Modern Way to Teach Statistics, With an Application. European Journal of Applied Sciences, 9(5). 210-216.
URL: http://dx.doi.org/10.14738/aivp.95.10932
Table 4. Basic Predictive Statistics tools for checking the association with the Target
Estimation Confidence interval for Mean
Prediction Linear, Non linear, General linear model, Generalized Linear Model,
Generalized Linear Mixed Model
Forecasting Exponential Smoothing, ARMA, ARIMA, SARIMA
An application
Gynecologic cancer (malignant tumor) is any cancer that starts in a woman’s reproductive
organs. Types of Gynecologic Cancer is: Cervical cancer, Ovarian cancer, Uterine cancer
(Uterine cancers can be one of two types: endometrial cancer (common) and uterine sarcoma
(rare)), Vaginal cancer and Vulvar cancer. Each Gynecologic cancer is unique, with different
signs and symptoms, different risk factors (things that may increase your chance of getting a
disease), and different prevention strategies. All women are at risk for Gynecologic cancers, and
risk increases with age. When Gynecologic cancers are found early, treatment is most effective.
The following analysis are based on data collected from King Abdulaziz University Hospital,
Saudi Arabia, during the period from beginning of the year 2000 to the end of 2016.
Descriptive analytics
We have initial routine tests of 513 patients (228 Benign 285 Malignant), with 118 fields: [Age,
Nationality, Body Mass Index (BMI), Parity, Miscarriage, Date of admission, Marital status
(Married before or not), Medical illness (Contains 60 types of illness) , Previous surgery
(Contains 47 type of surgery] and Heart block (HB). Date of admission start from 2000-01-16
to 2016-11-09. Table 5 summarize the descriptive statistics for the continues variables, while
table 6 summarize the highest frequencies for the patients’ diseases or previous surgery.
Table 5. Descriptive Statistics for the continuous variables
Variable Min Max Mean SD Skewness
AGE 13 95 51.9 11.693 0.468
Parity 0 13 4.4 3.356 0.418
Miscarriage 0 8 0.5 1.099 2.761
BMI 14.6 168.4 31.3 9.808 6.107
HB 6.8 18.7 11.4 1.741 -0.006
Page 5 of 7
214
European Journal of Applied Sciences (EJAS) Vol. 9, Issue 5, October-2021
Services for Science and Education – United Kingdom
Table 6. Descriptive Statistics for the categorical variables
Variable n %
Tumor Type Benign 228 44.4
Malignant 285 55.6
Nationalities
Asian 427 83.2
African 84 16.4
Missing 2 0.4
Marital
status
Unmarried
before 29 5.7
Married before 481 93.8
Missing 3 0.6
Most
repeated
Medical
illness
I.HTN 175 34.1
I.DM 121 23.6
I.BA 30 5.8
Most
repeated
Previous
surgery
D&C 53 10.3
C/S 37 7.2
Laproscopic
cholecystectomy 18 3.5
Myomectomy 12 2.3
Hernia rep 9 1.8
Appendectomy 8 1.6
Vaginal repair 8 1.6
Diagnostic analytics
Among all the variables measured, only Age, Parity and Miscarriage are the only significant
continues variables affecting the independent variable (Tumor), this was done through using
the independent t-test, with p-values less than 0.05, as shown in table 7. While Previous
marriage, DM, hernia rep and myomectomy are the only significant discrete variables affecting
the independent variable (Tumor), this was done through using the Chi-square tests, with p- values less than 0.05, as shown in table 8.
Table 7. t-tests for the continuous variables
Tumor type N Mean SD t P-value
Age Benign 327 49.58 10.558 -2.722 0.007
Malignant 350 52.06 12.923
parity Benign 315 4.81 3.224 2.974 0.003
Malignant 332 4.04 3.357
Miscarriage Benign 314 0.69 1.171 3.874 0.000
Malignant 332 0.36 0.987
Page 6 of 7
215
Abdelfattah, E. H. (2021). A Modern Way to Teach Statistics, With an Application. European Journal of Applied Sciences, 9(5). 210-216.
URL: http://dx.doi.org/10.14738/aivp.95.10932
Table 8. c2-tests for the categorical variables
Tumor type Total
c2 P-value Odds Ratio Malignant Benign
n % n % n %
Previous marriage
Yes 320 49.90% 321 50.10% 641 100%
15.238 0.000 0.20
No 30 83.30% 6 16.70% 36 100%
DM
Yes 95 62.50% 57 37.50% 152 100%
9.533 0.002 1.78
No 256 48.30% 274 51.70% 530 100%
hernia rep
Yes 9 90.00% 1 10.00% 10 100%
6.033 0.014 8.68
No 342 50.90% 330 49.10% 672 100%
myomectomy
Yes 4 25.00% 12 75.00% 16 100%
4.595 0.032 0.31
No 347 52.10% 319 47.90% 666 100%
The odds ratio shown in table 8, is computed as, (odds in exposed/odds on unexposed). For
instance, for the previous marriage, the value 0.20 is calculated as ((320/321)/(30/6)) and can
be interpreted as an estimate of the ratio of the odds, in the population, of previously married
developing Malignant to the odds of a non-previously married this type of tumor. This odds
ratio can be said that a previously married has 0.20 times the risk of a non-previously married
of developing Malignant tumor. That means it has a good (negative) effect.
Also, the value 1.78 can be interpreted as an estimate of the ratio of the odds, in the population,
of a diabetes developing Malignant to the odds of a non-diabetes this type of tumor. This odds
ratio can be said that a diabetes has 1.78 times the risk of a non- diabetes of developing
Malignant tumor. That means it has a bad (positive) effect.
Similar interpretation can be given fot the hernia rep with positive effect and myomectomy with
negative effect on developing Malignant tumor.
Predictive analytics
Predictive analytics mainly interested in developing a predictive model, that can be used to
“predict” the Target, or the (Dependent) variable. We consider the Tumor’ type (i.e
Benign/Malignant) is the Dependent variable. Since it is nominal, we must use the Logistic
regression (Table 4). Stepwise logistic regression is used to include only the significant
variables affecting the patient’s type and in descending order of importance. Table 9 summarize
the significance parameter estimates (B) and the odds ratios (Exp (B)), for the significant
variables.
Table 9. The significance parameter estimates for the Logistic regression
B S.E. Sig. Exp(B)
Miscarriage -.260 .083 .002 .771
DM .593 .212 .006 1.809
Previous
Marriage
-1.145 .489 .020 .318
hernia rep 2.062 1.086 .058 7.864
Age .019 .008 .014 1.019
parity -.070 .027 .011 .932
myomectomy -1.214 .613 .046 .297
Constant .529 1.329 .539 1.697
Page 7 of 7
216
European Journal of Applied Sciences (EJAS) Vol. 9, Issue 5, October-2021
Services for Science and Education – United Kingdom
It is clear from the previous table that the Miscarriage is the 1st factor affecting the type of the
tumor. It is also clear that, from the negative values for the coefficients of miscarriage Previous
Marriage, parity and myomectomy, and consequently the corresponding values of the odds
ratios being less than 1, which means they decrease the chance of having a Malignant tumor.
While the positive values for the coefficients of DM, hernia rep, and Age, and consequently the
corresponding values of the odds ratios being more than 1, which means they increase the
chance of having a Malignant tumor.
For example for the Age, the value of 1.019 of odds ratio (Exp(B)) means that with the increase
of one year in age the risk of Malignant tumor is increased 1.019 times provided all other factors
are kept constant. Since one year increase does not give any significant change, therefore, we
can see the significant change after 10 years. This is calculated as:
eyears ×
= e10 × 0.019= 1.21. This indicates that with an increase of 10 years in age the risk
of Malignant tumor increases 1.21 times
Perspective analytics
Perspective analytics is a result of all the previous analytics and gives the –in advance- the
guides that can be used to “avoid” the problem. Here, the problem is to have a Malignant tumor.
Based on the results obtained, we may support that Marriage, parity and myomectomy, will
help decreasing the chance of having a Malignant tumor. While patients should avoid DM and
hernia rep, that increase the chance of having a Malignant tumor.
References
Abdelfattah, Ezz H “A Modern way to Teach Statistics, with an application on a Medical Example”, 4th Meeting,
SASS , King Abdulaziz University – Jeddah (2019).
Abdelfattah, Ezz H “Reclassifying inferential statistics into diagnostic and predictive statistics with an application
on gynecologic cancer”, Biometrics & Biostatistics International Journal, Volume 9 Issue 4, 2020
Punjani, Dipti N., and Kishor H. Atkotiya. "Cervical Cancer Prediction using Data Mining." International Journal
for Research in Applied Science & Engineering Technology (IJRASET) 5.
Sunny Sharma. " Cervical Cancer stage prediction using Decision Tree approach of Machine Learning."
International Journal of Advanced Research in Computer and Communication Engineering (2016).
Colas, Eva, et al. "Molecular markers of endometrial carcinoma detected in uterine aspirates." International
journal of cancer 129.10 (2011): 2435-2444.
https://www.cdc.gov/cancer/gynecologic/basic_info/what-is-gynecologic-cancer.htm
Meyer, David, and FH Technikum Wien. "Support vector machines." The Interface to libsvm in package e1071
(2015): 28.
Collins, Yvonne, et al. "Gynecologic cancer disparities: a report from the Health Disparities Taskforce of the
Society of Gynecologic Oncology." Gynecologic oncology 133.2 (2014): 353-361.
Ramachandran, P., N. Girija, and T. Bhuvaneswari. "Early detection and prevention of cancer using data mining
techniques." International Journal of Computer Applications 97.13 (2014).
Hanif, Ahmad and Abdelfattah. “Biostatistics For Health Students with Manual on Software Application”, 2nd
Edition, ISOSS, Lahore, Pakistan (2014).