Page 1 of 24

Advances in Social Sciences Research Journal – Vol. 8, No. 3

Publication Date: March 25, 2021

DOI:10.14738/assrj.83.9949. DeVaney, T. A. (2021). Mokken Scale Analysis: Discussion and Application. Advances in Social Sciences Research Journal, 8(3) 672-

695.

Mokken Scale Analysis: Discussion and Application

Thomas A. DeVaney

College of Education, Southeastern Louisiana University, United States

ABSTRACT

This article presents a discussion and illustration of Mokken scale

analysis (MSA), a nonparametric form of item response theory (IRT),

in relation to common IRT models such as Rasch and Guttman scaling.

The procedure can be used for dichotomous and ordinal polytomous

data commonly used with questionnaires. The assumptions of MSA

are discussed as well as characteristics that differentiate a Mokken

scale from a Guttman scale. MSA is illustrated using the mokken

package with R Studio and a data set that included over 3,340

responses to a modified version of the Statistical Anxiety Rating

Scale. Issues addressed in the illustration include monotonicity,

scalability, and invariant ordering. The R script for the illustration is

included.

Keywords: Mokken scale analysis, item response theory, statistics anxiety,

R studio.

INTRODUCTION

Embretson and Reise (2000) noted that classical test theory (CTT) was used in test development

for most of the 20th century. The central tenet of CTT is that a person’s observed score is defined

as a combination of the person’s true score and random error [2-4). This is often expressed as X

= T + E where X is the observed score for the individual, T is the true score for the individual, and

E is random error. Despite the dominance of CTT in test development, several limitations have

been identified: (a) statistics based on CTT are sample dependent [2, 5], (b) measurement error

is assumed constant for all scores [6], and (c) the focus of CTT is at the scale rather than item

level [7].

Lord (1980) noted that CTT does not allow for the prediction of performance on an item unless

the item has been previously administered to similar individuals. Item Response Theory (IRT)

allows for the predicted probability of response to an item, even if the item has not been

administered to similar individuals. IRT is widely used in psychology and education [5, 9];

however, Penfield noted that many early IRT models were designed for dichotomous data (e.g.,

items scored correct or incorrect). These include the 1, 2, and 3 parameter logistic models that

Finch et al. (2014) suggested are three of the most commonly used IRT models.

Ostini and Nering (2010) noted that polytomous response items, including Likert type items,

have become widely used in educational and psychological testing. Consequently, the need to

expand IRT to polytomous items resulted in models such as the graded response and rating scale

models [1]. Embretson and Reise noted that the graded response model is a generalization of the

Page 2 of 24

DeVaney, T. A. (2021). Mokken Scale Analysis: Discussion and Application. Advances in Social Sciences Research Journal, 8(3) 672-695.

673

5

URL: http://dx.doi.org/10.14738/assrj.83.9949

2-parameter logistic model and is used with ordered categorical responses. Further, the rating

scale model can be derived from the partial credit model which is an extension of the 1 parameter

logistic model.

In addition to the response format (dichotomous or polytomous), researchers must also consider

the use of a parametric or nonparametric item response model. Sijtsma and Molenaar (2002)

noted that parametric models specify a predetermined function, such as logistic, to describe the

item response and may be too restrictive. This restriction may cause items to be rejected because

the relationship between the item response and the value of the latent trait does not conform to

the predefined function. Avsar and Tavsancil (2017) noted nonparametric IRT models require

fewer assumptions than parametric IRT models.

Nonparametric IRT models include the Mokken and nonparametric regression estimation

models. Mokken scale analysis (MSA) is used with ordinal data [13] and is useful when examining

affective variables [14]. Roskam, van den Wollenberg, and Janson (1986) noted that MSA had

become more widely used; however, based on a Google Scholar search, van der Ark (2012)

concluded that MSA was less popular than Rasch analysis and factor analysis. The number of

Google scholar hits for Rasch analysis was 3.6 times higher than the number for MSA, and the

number of hits for factor analysis was 362.2 times higher than the number for MSA and 99.2

times more than Rasch analysis.

Purpose

This article presents a discussion and illustration of Mokken scale analysis (MSA), a

nonparametric form of IRT, in relation to common IRT models such as Rasch and Guttman

scaling. The article begins with a non-mathematical discussion that illustrates the differences

among Guttman scaling, probabilistic parametric IRT, and nonparametric IRT through an

examination of item characteristic curves. This is followed by a discussion of MSA and an

application of MSA to a data set that includes over 3,000 responses to a modified version of the

Statistical Anxiety Rating Scale. The application of MSA is illustrated using the mokken package

with R Studio.

From Guttman to Mokken (Through Rasch)

Regardless of the IRT model used, the item characteristic curve (ICC) is a fundamental concept.

The ICC describes the relationship between the value of the latent trait (q) and the probability of

a given response. For dichotomously scored items, the probability is associated with providing a

correct response to the item. For polytomous items, the probability is associated with providing

the corresponding response or higher on an item (e.g., responding with a 3 or higher on a 5-point

scale). The ICC associated with a Guttman scale is described as a step function due to the

deterministic nature of a Guttman scale [16]. As can be seen in Figure 1, the probability of a

correct or affirmative response remains zero (0) until a specified level of q is obtained. When the

specified level of q is reached, the probability of a correct or affirmative response becomes 1.

Therefore, only two probabilities are associated with items on a Guttman scale: 0 and 1.

Page 3 of 24

Advances in Social Sciences Research Journal (ASSRJ) Vol. 8, Issue 3, March-2021

Services for Science and Education – United Kingdom

Figure 1. Item characteristic curve for a Guttman scale item. When the ability level q reaches 0,

the probability of a positive response changes from 0 to 1.

Van Schuur (2011) noted that the Guttman model is very restrictive and the required response

pattern is highly unlikely even with high quality questions. Probabilistic models allow for

violations of the Guttman response pattern and simply require the probability of a correct

response to increase as the value of q increases.

For parametric IRT models, the shape of the ICC is predetermined and often corresponds to a

normal ogive shape that is defined by a logistic item response function. When the logistic model

is used, three parameters define the shape of the ICC [8]. Although the notation associated with

the parameters may vary, they correspond to item difficulty, discriminating ability, and guessing

[8]. Figure 2 illustrates two items for which the discriminating ability and guessing parameters

are held constant (i.e., the 1 parameter logistic or Rasch model). Both ICCs satisfy the expectation

that the probability of a correct response increases as the value of q increases. The shift to the

right associated with Item 2 is due to a higher item difficulty parameter.

Page 4 of 24

DeVaney, T. A. (2021). Mokken Scale Analysis: Discussion and Application. Advances in Social Sciences Research Journal, 8(3) 672-695.

675

5

URL: http://dx.doi.org/10.14738/assrj.83.9949

Figure 2. Item characteristic curves for two items in a 1 parameter logistic (Rasch) model. These

curves illustrate the normal ogive (S) shape curves associated with parametric, logistic item

response functions.

The effect of the discriminating and guessing parameters can be seen in Figure 3. For both items,

the difficulty parameter was held constant. Item 1 displays a higher discrimination parameter as

illustrated by the steeper slope of the middle portion of the curve. The impact of the guessing

parameter is illustrated in Item 2 as the upward shift in the lower bound of the curve.

Figure 3. Item characteristic curves for 2- and 3- parameter logistic models. These curves

illustrate the influence of the discrimination and guessing parameters on the shape of the item

characteristic curve.

The previous examples illustrate ICCs associated with models for which the item response was

dichotomous with each curve associated with a different item. For polytomous response items

(e.g., graded response model, rating scale model) each question will have k – 1 operating

Page 5 of 24

Advances in Social Sciences Research Journal (ASSRJ) Vol. 8, Issue 3, March-2021

Services for Science and Education – United Kingdom

characteristic curves (OCC), where k is the number of response choices [1]. An OCC represents

the probability that an individual will respond at or above the corresponding category in relation

to q [1, 17]. Penfield (2014) described these curves as results of step functions with each OCC

associated with a step from one response category to the next higher category. For example, the

first step function would be associated with the probability of transitioning from a response of 1

(strongly disagree) to 2 (disagree) based on the level of q. Because each step can be considered a

dichotomous event, functions can be expressed using models such as the 1- and 2-parameter

logistic models. This results in OCCs with the familiar S shape. Figure 4 illustrates the OCCs for a

polytomous item with five response options.

Figure 4. Operating characteristic curves associated with parametric, logistic item step response

functions for a polytomously scored item.

The previous discussion highlighted the transition from the deterministic Guttman scale to the

probabilistic models represented by the 1-, 2-, and 3-parameter logistic models and the

parametric polytomous models represented by the graded response and rating scales models.

For these models, there is an assumption that the curves or models take a specific form, such as

logistic. Consequently, the shapes differ only by the difficulty, discrimination, and guessing

parameters [11]. Because the only restriction on the relationship between the probability of a

response and q in nonparametric IRT models is that the probability of a response increases as

the value of q increases, item response functions are not limited to logistic functions. Instead, the

functions may include linear or exponential components; be step functions, irregular, or discrete;

or have different lower and upper asymptotes [11]. Figure 5 presents ICCs that satisfy the

nonparametric restriction of nondecreasing probability with increasing q.