Page 1 of 10

33

Advances in Social Sciences Research Journal – Vol.7, No.7

Publication Date: July 25, 2020

DOI:10.14738/assrj.77.8562.

Albaqshi, J. H. (2020). Amount of Vocabulary Required to Grasp Comprehension in a Text. Advances in Social Sciences Research Journal,

7(7) 33-42.

Amount of Vocabulary Required to Grasp Comprehension in a Text

Jalal H. Albaqshi

General Studies, Alahsa College of Technology,

Alahsa, Saudi Arabia

ABSTRACT

This study addresses the relationship between vocabulary and reading

comprehension and Showing some other factors which affect the

comprehension beside vocabulary. A descriptive analysis and

correlation test were applied to show why some L2 learners cannot

understand a gist of a text though they are familiar to many vocabulary.

The sample group in this study consisted of 64 male Saudi learners. They

were between the ages of 18 to 22 years old and had been studying

English in formal settings for six years in the form of moderate doses of

input but not intensive English. This study highlights the percentage

coverage of vocabulary in a text rather than the whole learners

‘reservoir of vocabulary in order to understand the general ideas of a

text. Results show that L2 learners need to be familiar at least with 62%

to 71% of vocabulary in a text to grasp the gist of whole passage. This

result helps to write down several implications for researchers and

teachers of English as a second and foreign language.

Keywords: streaming vocabulary, keyword vocabulary, reading

comprehension.

INTRODUCTION

Reading comprehension is mutually interwoven with vocabulary knowledge. Studying the

relationship between understanding a text and the amount of known vocabulary is fundamental to

realizing how one affects the other, if at all. In fact, vocabulary is a major constituent of reading

comprehension, while reading is considered a primary source of vocabulary acquisition. This

relationship leads to a common, logical belief which implies the more vocabulary a learner knows,

the better he/she is able to understand a text. However, the process of linguistic development is

more complicated than word recognition and understanding. Several factors can also play roles in

communicating with a written text. This fascinating relationship adds viability for both linguistic

components to contribute to linguistic competence. Also, other influential aspects in the

relationship between comprehension skills and vocabulary knowledge may be considered to

overcome some of the challenges of reading comprehension. When Scrivener (2009) discussed

reasons for difficulties with reading in a foreign language, he mentioned lack of vocabulary, slow

reading, and inability to understand all vocabulary. Essentially, according to Scrivener, the

vocabulary component is crucial for the entire reading process. It is important to consider the

possibility that the words of a text may be familiar for a learner, but he/she may still be unable to

understand the whole message. This is related to structural and pragmatic aspects which can be

mastered gradually in higher level of acquisition. Some other factors can be added as influential:

Page 2 of 10

URL: http://dx.doi.org/10.14738/assrj.77.8562 34

Albaqshi, J. H. (2020). Amount of Vocabulary Required to Grasp Comprehension in a Text. Advances in Social Sciences Research Journal, 7(7) 33-42.

critical thinking development, cognitive aspect, and schemata. All mentioned factors play

considerable roles in the effectiveness of processing texts to produce logical conclusions which

match the author’s intended message.

A focal issue that should be considered when examining the relationship between reading and

vocabulary is the following question: Should researchers highlight the amount of vocabulary

recognized (known) in a text by an L2 learner or the total amount of vocabulary an L2 learner knows?

Richards (2002) discussed dilemmas for second language reading instruction. One dilemma directly

related to our highlighted issues in this study is the many different contexts and genre structures

that L2 learners encounter. This dilemma drives us to emphasize vocabulary recognition percentage

in a text rather than the size of a learner’s whole reservoir of vocabulary. Let us suppose a learner

who has sufficient vocabulary to be fluent in English reading is exposed to a text such as an article

in an economic magazine. Would this learner understand the text as much as the other general

English texts? S/he would most likely not, because there would be more complicated specialized

terms and vocabulary which belong to a certain genre.

LITERATURE REVIEW

The relationship between vocabulary size and reading comprehension is a common topic which has

been attracting researchers for more than two decades. For example, the focus of a study conducted

in Malaysia by Kameli and Baki (2013) was the relationship between vocabulary breadth/size and

reading comprehension. The methodology used involved administering a vocabulary test designed

by Nation and an IELTS reading test. The data were analyzed in terms of correlation coefficient (r),

mean, and “p” value, and the results showed a high level of statistical significance between

vocabulary familiarity and reading comprehension. Another study conducted by Farvardin and

Koosha (2011) involved the examination of the relationship between depth and breadth of

vocabulary and reading comprehension. They submitted reading comprehension test and

vocabulary language test 1000, 3000, and 10000. The results showed a positive relationship

between the two aspects of breadth and depth and reading comprehension.

Another study was conducted by Gue and Roehrig (2011), who examined the roles of metacognitive

awareness of reading strategies, syntactic awareness in English, and English vocabulary knowledge

in the English reading comprehension of Chinese-speaking university students (n = 278). The

results suggested a two-factor model of a General Reading Knowledge factor (metacognitive

awareness employed during the English reading process) and a Second Language (L2) Specific

Knowledge factor (comprising vocabulary knowledge and syntactic awareness) offered the best fit

to the data; 87% of the variance in reading comprehension was explained by the two factors

together. L2 Specific Knowledge was a stronger predictor of reading comprehension than

metacognitive awareness. A multigroup analysis was conducted using structural equation modeling

to compare poor-reader and good-reader groups. The correlation between L2-specific knowledge

and metacognitive awareness and their relation to reading comprehension was the same across

groups. It should be noted that most studies involved examination of each learner’s vocabulary

reservoir rather than the recognized words of a text. This draws the attention toward a learner’s

reading levels rather than his/her focus on a text and its content in relation to his/her cognitive

communication via written codes. A group of studies, such as those conducted by Pang (2008),

Kameli and Baki (2013), and Laufer (2010), highlighted vocabulary size rather than text coverage

Page 3 of 10

35

Advances in Social Sciences Research Journal (ASSRJ) Vol.7, Issue 7, July-2020

to gauge each reader’s understanding of a text they had read. Nation (2006), in one of his articles,

focused on vocabulary size but also mentioned text coverage for comprehension.

RESEARCH METHODOLOGY

Descriptive analysis showing the data distribution in quadruple classification was used for this

study. After data collection, students’ responses were collected and classified as four areas: area a,

area b, area c and area d. These areas are displayed in a chart below to show the effect of vocabulary

size on reading comprehension. We hypothesized that our results would indicate the existence of a

strong relationship between vocabulary and reading comprehension; therefore, correlation is

included in results. In addition, displaying a linear relationship between the two variables is

addressed.

Research Questions

1. What is the minimum percentage of familiar streaming and keywords needed to comprehend a

text?

2. Why do some learners have low comprehension levels though they have high vocabulary

recognition?

DATA COLLECTION AND SAMPLING

The sample group in this study consisted of 64 male Saudi learners. They were between the ages of

18 to 22 years old and had been studying English in formal settings for six years in the form of

moderate doses of input but not intensive English. The sampling method used for the study was

systematic sampling — every individual was a potential participant. Data was collected with the aid

of a test which consisted of a short passage, and students were asked to underline each word that

was unknown to them. Then they read the whole passage and did their best to understand as much

as they could. During the final stage, they were asked to write in their own language what they

understood from the text.

RESULTS

Figure 1 below shows the results of the participants’ responses. The graph is divided into four areas:

area a, area b, area c, and area d. As explained in Table 1, area a includes the students whose

understanding was low and recognized only a few words from the text. On the contrary, area b

includes respondents with high vocabulary recognition and understanding as well. Areas a and b

represent the logical concept that a larger vocabulary leads to greater comprehension. Although

areas c and d constitute only 27% of the participants, they are in the circle of problematic and

questionable results (high vocabulary recognition and low understanding and vice versa).

Page 4 of 10

URL: http://dx.doi.org/10.14738/assrj.77.8562 36

Albaqshi, J. H. (2020). Amount of Vocabulary Required to Grasp Comprehension in a Text. Advances in Social Sciences Research Journal, 7(7) 33-42.

Figure 1

Table 1

Area Learners’ Results Factors affecting

A Low vocabulary + Low understanding

60%

Lexical factor

Linguistic intelligence

B High vocabulary + High understanding

11% Cognitive factor

C High vocabulary + Low understanding

15.8%

Structural aspect

Pragmatic aspect

D Low vocabulary + High understanding

12.6%

Inference

Guessing

Reflective learners

Table 1 shows the classification of respondents and their percentages as follows: area a (60%), area

b (11%), area c (15.8%), and area d (12.6%). Most of the respondents are condensed in area a, and

the lowest percentage of respondents fall into area b. The observations which are considered and

analyzed here are noticeable when they are in inverse relationship. This can be illustrated as in in

Figure 2 below. In Figure 2, there are three highlighted observations (cases of learners) which are

worth of discussion. They are observations a, b, and c. There are other observations with similar

situations, but none is as noticeable as these three. The three observations a, b and c are in inverse

relationship between vocabulary size and comprehension. In observation a, a learner is familiar

with 80% vocabulary of the text but understood only around 32% of the text. This can leads us to

think about beyond vocabulary effect on understanding. Observation b is in a similar situation of a.

However, observation c has high understanding (78%) but low vocabulary recognition (25%).

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100 120

Vocabulary

Understanding

Area B

Area C

Area D

Area A

Page 5 of 10

37

Advances in Social Sciences Research Journal (ASSRJ) Vol.7, Issue 7, July-2020

Figure 2: Relationship between vocabulary and understanding

Table 2: Correlation results among comprehension score, stream words and keywords

Correlation

Variables Pearson correlation P-Value

DV Comprehension score 0.632 0.000

IV Stream vocabulary recognized

DV Comprehension score 0.664 0.000

IV Keyword recognized

When Pearson correlation was processed, as the results in Table 2 indicate, a relatively considerable

correlation was identified between keyword recognition and understanding with 0.664 and 0.632

for comprehension and streaming words.

DISCUSSION

The results of this study indicate that there are four cases regarding the relationship between the

amount of vocabulary a learner recognizes in a text and his/her level of understanding of the text.

Some learners show high understanding as a result of high recognition of the vocabulary in the text.

Such recognition enables learners to decode the message of a text; however, this group of learners

constituted only 11% of the sample (as seen in Table 1). In contrast, 60% of participants could not

decode the message effectively due to the lack of vocabulary mastery. Up until this point, we are at

the logical baseline because more vocabulary facilitates greater comprehension as words are a basic

part of the linguistic components in a text. However, the learners with high vocabulary recognition

and low understanding can be put in a speculative and questionable stance. This can be caused by

cognitive factors, automaticity and lack of reading strategies employment. Stephen D. Krashen

(1987) suggested that, in order to help a learner to progress in language learning, one should

provide comprehensible input. This can be achieved in different ways:

1. A: slower rate and clearer articulation

2. B: more use of high frequency vocabulary

0

20

40

60

80

100

120

1 3 5 7 9 111315171921232527293133353739414345474951535557596163

vocabulary

Understandin

g

C

B

A

Page 6 of 10

URL: http://dx.doi.org/10.14738/assrj.77.8562 38

Albaqshi, J. H. (2020). Amount of Vocabulary Required to Grasp Comprehension in a Text. Advances in Social Sciences Research Journal, 7(7) 33-42.

3. C: syntactic simplification (shorter sentences)

Upon examination of the text which acted as a test in this study, it can be said that it conforms to

Krashen’s suggestions in which it was conducted in written form and learners were given ample

time and never asked to hand in their papers. Also, the text was carefully selected to contain general

and high frequency vocabulary (colors). After the test was over, the researcher made sure that the

idea of the topic was familiar to the learners' background and that they already knew in their native

language the information mentioned in the text. Therefore, students of areas a and b (as shown in

Table 2) met Krashen’s conditions.

Table 3

Area Learners’ Results

A Low vocabulary + Low understanding 60%

B High vocabulary + High understanding 11%

C High vocabulary + Low understanding 15.8%

D Low vocabulary + High understanding 12.6%

Based on input hypothesis, Krashen (2003) asserted that L2 learners need comprehensible input to

progress with second language acquisition, and the input should not be only comprehensible but

also sufficient, as in the formula “i + 1 is present”. This hypothesis motivates and gives this study its

significance in which researchers and English teachers need to realize how much familiar

vocabulary should be in a text in order to help a learner develop his/her skills in reading. Full

familiarity of words in a target text would contradict the i + 1 hypothesis, while a text including too

many unknown words in a target text would lead to deficiency in comprehension, frustration and

eventually failure. That gives this study its value and significance in one big question: How much

vocabulary recognition is needed in order to comprehend a text? Based on the results of this study,

a learner needs to be familiar with at least between 62% and 71% of the vocabulary in a text to

obtain a fair comprehension of its main ideas and general concepts along with a few details. This

result leads to more speculation related to Krashen's input hypothesis. Can we suppose that i + 1

means that a text is introduced to a learner with familiarity of vocabulary and grammatical rules

between 55% to 60 %? Yes, it is true that comprehensible input would facilitate learning, but other

factors must be present in order to put together the pieces of the puzzle. One of these factors is

learning style, such as field-dependent and field-independent processing. Synthesizing parts and

pieces together to formulate a whole spectrum of a picture and understanding relationships of

scattered pieces in randomly formed pattern can be a trait of a field-independent (FI) learner whom

Brown (2002) describes in his explanation, while field-dependent (FD) describes a learner who can

see the whole picture and barely analyzes smaller pieces from the entire frame. FI learners are more

likely to use guessing and inferences in their reading of bottom-up processing where FD learners

may struggle without enough vocabulary to decode messages.

Therefore, participants who had a higher understanding of the text with less vocabulary recognition

might be FI learners. This could be processed inductively — starts from specific to general, from

word to phrase, and ending up with sentences and texts. This can be obvious from the sequence of

language learning development suggested by Pienemann and Håkansson. Pienemann and

Håkansson (1999) explained the sequence in which process skills develop in language learning. This

sequence includes five stages:

Page 7 of 10

39

Advances in Social Sciences Research Journal (ASSRJ) Vol.7, Issue 7, July-2020

1. Lemma: Words are processed but do not yet carry any grammatical information. Our

participants from areas a and c can be categorized in this stage and may barely pass it.

2. Category procedures: Lexical items are categorized, and grammatical representation may

be added. Our participants from area b can be categorized in this stage.

3. Phrasal procedures: Operations at phrase level occur.

4. S-procedure: Grammatical information may be exchanged across phrase boundaries. Our

participants from area d can be categorized in this stage due to their advanced linguistic

development.

5. Clause boundary: Main and subordinate clause structures may be handled differently (as

cited in Troike 2006 , p. 77).

Background knowledge and understanding the world (schemata) can be another factor which

enhances greater comprehension. Horwitz (2008) elaborates in his discussion that reading can be

classified into two strategies: phonics-based or whole language approaches. A phonics-based

approach depends on bottom-up processing while whole approaches emphasize top-down

processing by using background knowledge to understand the smaller pieces of a text and details.

It can be logically stated that, with a foreign language, both strategies are useful according to the

level of the learners. Beginners utilize a phonics-based approach more than any other strategies.

But schemata (whole language approach) is often more compatible for adults since they have more

life experience and knowledge, as with our study participants. Brown (1994) clarified the

distinction between bottom-up and top-down processing of data: Bottom-up processing that starts

from smaller linguistic units toward larger sentences and paragraphs needs sophisticated

knowledge of the language itself to perceive the data given. This can answer the question of why

some learners have low comprehension though they have good vocabulary recognition . They need

to be competent in perceiving data and form a global concept based on the small segments of a text.

As Christine Nuttall (1996, pp. 16-17) described, the distinction between them is as follows: bottom- up is comparable to a microscope view while top-down can be like an eagle’s view of a landscape.

Since our participants were in the range of beginners, they used the bottom-up method when they

read so they could try to understand the text from small words and pieces of linguistic

representations. For some of them, this may have impeded the flow of the holistic picture of the

message in a text.

In Pienemann and Håkansson’s sequence, words are an initial stage which, by itself, cannot be

considered as a generator of comprehensive understanding of a message within a text. This

sequence explains our results, especially in areas a and b in which more vocabulary acquisition led

to greater understanding. However, the process should move into the second stage in which

grammatical aspect is involved to reach stage three for a phrasal representation. After that, sentence

level comes up to be based on words connected with the aid of certain semantic and pragmatic

awareness. This linkage was assured by Jeffrey Elman and his colleagues (as cited in Lightbown &

Spada, 2006, p. 23). They explained language acquisition in terms of links and connections between

words and phrases and where they occur. These links and connections are significant initial steps

to comprehension. Cognitive ability is required to triangulate the process from one word to another

word to an actual life situation. These abilities of cognition and linking linguistic segments to

formulate a meaning was an obstacle for those with low understanding and high vocabulary

recognition. Lightbown and Spada (2006) elucidated the connectionist model in which language

acquisition is not just a process of associating words and phrases with reality but also a process of

Page 8 of 10

URL: http://dx.doi.org/10.14738/assrj.77.8562 40

Albaqshi, J. H. (2020). Amount of Vocabulary Required to Grasp Comprehension in a Text. Advances in Social Sciences Research Journal, 7(7) 33-42.

how words and phrases are associated and related to each other (p. 24). We can add to this

triangulation the syntactic and pragmatic aspects in which there are more chances for guessing

strategies and conclusion. Therefore, a learner needs to employ these aspects while reading a text.

Moreover, the level and type of a text play considerable roles in learning development. Indeed, there

should be a concordance between a learner’s cognitive abilities and the level of the input provided

with which to communicate with a text. In addition to age and level of a text, automaticity is

significant in learning and development processes. This is related to paying attention and time spent

in learning a language. According to the cognitive psychologist, Norman Segalowitz (as cited in

Lightbown & Spade, 2003, p. 39), an information processing model is based, in linguistic

development, on noticing and “paying attention” to aspects of linguistic representation. However,

there is a limit to how much information a learner can pay attention to. This conforms to linking

information into a single idea but cannot be achieved in a particular level of development until

learners reach a higher level of attention and processing when proceeding to automaticity.

Automaticity involves, according to Brown (1994), “a timely movement of the control of a few

language forms into the automatic, fluent processing of a relatively unlimited number of language

forms” (pp. 64). This explains those participants who achieved low scores in comprehension but

recognized a considerable amount of vocabulary (area c). It is worth mentioning one more factor in

which, as Brown described, learners are either reflective or impulsive in their learning styles. The

former depends on systematic steps in reading and performing mental tasks, while the latter is a

risk-taker and depends on guessing. Reflectivity results in accuracy but slower performance. So risk

takers may achieve more even if they are not very accurate in their performance.

Reading comprehension requires not only vocabulary recognition but also reading strategies and

skills to be implemented while communicating with a text. Lindsay and Knight (2006) listed reading

skills learners need to develop. In here, we will examine some focal ones which affected our

participants’ results. They are as follows:

1. Understanding the relationship between sentences. This skill was focal and essential in

our study because it acted as a rationale for the area c participants depicted in Table 2 (High

vocabulary + Low understanding 15.8%). These students had good vocabulary recognition

but low comprehension. Conceiving a relationship between sentences and knowing how a

sentence could affect the meaning of subsequent or preceding one is an integral part of

grasping the frame of a text’s message.

2. Guessing meaning. In fact, two areas in Table 2 were affected by this skill. Learners from

areas a and d demonstrated low vocabulary recognition but varied in terms of understanding

level. Only 12% were able to guess the meaning of words, while 60% were unable to skillfully

guess words from context.

3. Background knowledge of the text. The researcher made sure that all students were

provided with sufficient background on the text. He discussed the topic with participants

before the text and discovered that they had mastered the concept of the text.

A question that is always asked is: why are some learners more successful than others? As stated by

Troike (2006), part of the answer may be embedded in learners’ psychological differences. Other

differences include age, sex, aptitude, cognitive style, and motivation. Highlighting age and aptitude

would be related to our research. Although the participants in this study are adults aged 18 to 22

with the benefit of analytical abilities, pragmatic skills, and real-world knowledge, our results show