Page 1 of 24
Advances in Social Sciences Research Journal – Vol. 8, No. 3
Publication Date: March 25, 2021
DOI:10.14738/assrj.83.9949. DeVaney, T. A. (2021). Mokken Scale Analysis: Discussion and Application. Advances in Social Sciences Research Journal, 8(3) 672-
695.
Mokken Scale Analysis: Discussion and Application
Thomas A. DeVaney
College of Education, Southeastern Louisiana University, United States
ABSTRACT
This article presents a discussion and illustration of Mokken scale
analysis (MSA), a nonparametric form of item response theory (IRT),
in relation to common IRT models such as Rasch and Guttman scaling.
The procedure can be used for dichotomous and ordinal polytomous
data commonly used with questionnaires. The assumptions of MSA
are discussed as well as characteristics that differentiate a Mokken
scale from a Guttman scale. MSA is illustrated using the mokken
package with R Studio and a data set that included over 3,340
responses to a modified version of the Statistical Anxiety Rating
Scale. Issues addressed in the illustration include monotonicity,
scalability, and invariant ordering. The R script for the illustration is
included.
Keywords: Mokken scale analysis, item response theory, statistics anxiety,
R studio.
INTRODUCTION
Embretson and Reise (2000) noted that classical test theory (CTT) was used in test development
for most of the 20th century. The central tenet of CTT is that a person’s observed score is defined
as a combination of the person’s true score and random error [2-4). This is often expressed as X
= T + E where X is the observed score for the individual, T is the true score for the individual, and
E is random error. Despite the dominance of CTT in test development, several limitations have
been identified: (a) statistics based on CTT are sample dependent [2, 5], (b) measurement error
is assumed constant for all scores [6], and (c) the focus of CTT is at the scale rather than item
level [7].
Lord (1980) noted that CTT does not allow for the prediction of performance on an item unless
the item has been previously administered to similar individuals. Item Response Theory (IRT)
allows for the predicted probability of response to an item, even if the item has not been
administered to similar individuals. IRT is widely used in psychology and education [5, 9];
however, Penfield noted that many early IRT models were designed for dichotomous data (e.g.,
items scored correct or incorrect). These include the 1, 2, and 3 parameter logistic models that
Finch et al. (2014) suggested are three of the most commonly used IRT models.
Ostini and Nering (2010) noted that polytomous response items, including Likert type items,
have become widely used in educational and psychological testing. Consequently, the need to
expand IRT to polytomous items resulted in models such as the graded response and rating scale
models [1]. Embretson and Reise noted that the graded response model is a generalization of the
Page 2 of 24
DeVaney, T. A. (2021). Mokken Scale Analysis: Discussion and Application. Advances in Social Sciences Research Journal, 8(3) 672-695.
673
5
URL: http://dx.doi.org/10.14738/assrj.83.9949
2-parameter logistic model and is used with ordered categorical responses. Further, the rating
scale model can be derived from the partial credit model which is an extension of the 1 parameter
logistic model.
In addition to the response format (dichotomous or polytomous), researchers must also consider
the use of a parametric or nonparametric item response model. Sijtsma and Molenaar (2002)
noted that parametric models specify a predetermined function, such as logistic, to describe the
item response and may be too restrictive. This restriction may cause items to be rejected because
the relationship between the item response and the value of the latent trait does not conform to
the predefined function. Avsar and Tavsancil (2017) noted nonparametric IRT models require
fewer assumptions than parametric IRT models.
Nonparametric IRT models include the Mokken and nonparametric regression estimation
models. Mokken scale analysis (MSA) is used with ordinal data [13] and is useful when examining
affective variables [14]. Roskam, van den Wollenberg, and Janson (1986) noted that MSA had
become more widely used; however, based on a Google Scholar search, van der Ark (2012)
concluded that MSA was less popular than Rasch analysis and factor analysis. The number of
Google scholar hits for Rasch analysis was 3.6 times higher than the number for MSA, and the
number of hits for factor analysis was 362.2 times higher than the number for MSA and 99.2
times more than Rasch analysis.
Purpose
This article presents a discussion and illustration of Mokken scale analysis (MSA), a
nonparametric form of IRT, in relation to common IRT models such as Rasch and Guttman
scaling. The article begins with a non-mathematical discussion that illustrates the differences
among Guttman scaling, probabilistic parametric IRT, and nonparametric IRT through an
examination of item characteristic curves. This is followed by a discussion of MSA and an
application of MSA to a data set that includes over 3,000 responses to a modified version of the
Statistical Anxiety Rating Scale. The application of MSA is illustrated using the mokken package
with R Studio.
From Guttman to Mokken (Through Rasch)
Regardless of the IRT model used, the item characteristic curve (ICC) is a fundamental concept.
The ICC describes the relationship between the value of the latent trait (q) and the probability of
a given response. For dichotomously scored items, the probability is associated with providing a
correct response to the item. For polytomous items, the probability is associated with providing
the corresponding response or higher on an item (e.g., responding with a 3 or higher on a 5-point
scale). The ICC associated with a Guttman scale is described as a step function due to the
deterministic nature of a Guttman scale [16]. As can be seen in Figure 1, the probability of a
correct or affirmative response remains zero (0) until a specified level of q is obtained. When the
specified level of q is reached, the probability of a correct or affirmative response becomes 1.
Therefore, only two probabilities are associated with items on a Guttman scale: 0 and 1.
Page 3 of 24
Advances in Social Sciences Research Journal (ASSRJ) Vol. 8, Issue 3, March-2021
Services for Science and Education – United Kingdom
Figure 1. Item characteristic curve for a Guttman scale item. When the ability level q reaches 0,
the probability of a positive response changes from 0 to 1.
Van Schuur (2011) noted that the Guttman model is very restrictive and the required response
pattern is highly unlikely even with high quality questions. Probabilistic models allow for
violations of the Guttman response pattern and simply require the probability of a correct
response to increase as the value of q increases.
For parametric IRT models, the shape of the ICC is predetermined and often corresponds to a
normal ogive shape that is defined by a logistic item response function. When the logistic model
is used, three parameters define the shape of the ICC [8]. Although the notation associated with
the parameters may vary, they correspond to item difficulty, discriminating ability, and guessing
[8]. Figure 2 illustrates two items for which the discriminating ability and guessing parameters
are held constant (i.e., the 1 parameter logistic or Rasch model). Both ICCs satisfy the expectation
that the probability of a correct response increases as the value of q increases. The shift to the
right associated with Item 2 is due to a higher item difficulty parameter.
Page 4 of 24
DeVaney, T. A. (2021). Mokken Scale Analysis: Discussion and Application. Advances in Social Sciences Research Journal, 8(3) 672-695.
675
5
URL: http://dx.doi.org/10.14738/assrj.83.9949
Figure 2. Item characteristic curves for two items in a 1 parameter logistic (Rasch) model. These
curves illustrate the normal ogive (S) shape curves associated with parametric, logistic item
response functions.
The effect of the discriminating and guessing parameters can be seen in Figure 3. For both items,
the difficulty parameter was held constant. Item 1 displays a higher discrimination parameter as
illustrated by the steeper slope of the middle portion of the curve. The impact of the guessing
parameter is illustrated in Item 2 as the upward shift in the lower bound of the curve.
Figure 3. Item characteristic curves for 2- and 3- parameter logistic models. These curves
illustrate the influence of the discrimination and guessing parameters on the shape of the item
characteristic curve.
The previous examples illustrate ICCs associated with models for which the item response was
dichotomous with each curve associated with a different item. For polytomous response items
(e.g., graded response model, rating scale model) each question will have k – 1 operating
Page 5 of 24
Advances in Social Sciences Research Journal (ASSRJ) Vol. 8, Issue 3, March-2021
Services for Science and Education – United Kingdom
characteristic curves (OCC), where k is the number of response choices [1]. An OCC represents
the probability that an individual will respond at or above the corresponding category in relation
to q [1, 17]. Penfield (2014) described these curves as results of step functions with each OCC
associated with a step from one response category to the next higher category. For example, the
first step function would be associated with the probability of transitioning from a response of 1
(strongly disagree) to 2 (disagree) based on the level of q. Because each step can be considered a
dichotomous event, functions can be expressed using models such as the 1- and 2-parameter
logistic models. This results in OCCs with the familiar S shape. Figure 4 illustrates the OCCs for a
polytomous item with five response options.
Figure 4. Operating characteristic curves associated with parametric, logistic item step response
functions for a polytomously scored item.
The previous discussion highlighted the transition from the deterministic Guttman scale to the
probabilistic models represented by the 1-, 2-, and 3-parameter logistic models and the
parametric polytomous models represented by the graded response and rating scales models.
For these models, there is an assumption that the curves or models take a specific form, such as
logistic. Consequently, the shapes differ only by the difficulty, discrimination, and guessing
parameters [11]. Because the only restriction on the relationship between the probability of a
response and q in nonparametric IRT models is that the probability of a response increases as
the value of q increases, item response functions are not limited to logistic functions. Instead, the
functions may include linear or exponential components; be step functions, irregular, or discrete;
or have different lower and upper asymptotes [11]. Figure 5 presents ICCs that satisfy the
nonparametric restriction of nondecreasing probability with increasing q.