Is It a Professional Career or Job? Using Sketch Engine to Investigate Synonyms

As an important yet intricate linguistic feature in English language, synonymy poses a great challenge for second language learners. Using the 100 million-word British National Corpus (BNC) as data and the software Sketch Engine (SkE) as an analyzing tool, this article compares the usage of career and job by conducting the analysis of concordance, collocation, word sketches and sketch difference. The results show that different functions of SkE can make different contributions to the discrimination of career and job . The pedagogical implications of the findings are also discussed


INTRODUCTION
English is particular rich in synonyms due to historical reasons, which enables English speakers "to convey meanings more precisely and effectively for the right audience and context" (Liu & Espino 2012: 198). But synonyms also constitute a thorny area for EFL (English as a Foreign Language) learners because of their subtle nuances and variations in meaning and usage. Synonyms are not completely interchangeable. In fact, they differ in shades of meaning and vary in their collocations. Collocations are inaccessible to a speaker's conscious introspection (Hunston 2002: 142;Louw 1993: 173;Partington 1998: 68). However, with the development of corpora and corpora analysis tools, collocations have been addressed much more easily and frequently by linguists (Hunston 2002;Louw 1993Louw , 2000Partington 1998;Schmitt & Carter 2004;Sinclair 1991;Stubbs 1995Stubbs , 1996Stubbs , 2001Xiao & Mcenery 2006).
The paper is structured as follows. Section two gives an overview of related work by introducing corpus studies of collocation, and its relevance to the study of synonyms. Section 3 introduces corpus data and tools used in this study. The results of this study are presented and analyzed in Section 4, where we show the success of Sketch Engine in researching synonyms. The final section summarizes major findings and pedagogical implications of this study.

RELATED WORK Corpus studies of lexical semantics
The approach of using coprus evidence to study meaning of words or phrases is often labeled as corpus semantics or empirical semantics, and the most active and influential scholars are called neo-Firthian corpus linguists. The leading figure is John Sinclair who might as well be one of the first people to bring Firth's ideas together with a corpus linguistic methodology Reading concordance and calculating collocates from corpus are two important ways to study a lexical item in its context (Sinclair 1991). The concordance is the basic tool for anyone working with a corpus. Even far before the emergence of corpus linguistics, concordances to major works such as the Bible and Shakespeare have been available. With the help of computers, concordances are much easier to compile. For Sinclair (1991: 32), "A concordance is a collection of the occurrences of a word-form, each in its own textual environment. In its simplest form, it is an index. Each word-form is indexed, and a reference is given to the place of each occurrence in a text." In corpus linguistics, a simple and effective convention called KWIC (Key Word In Context) has been widely used.
Closely related to concordance is the notion of collocation. Collocation has been studied for at least five decades. Collocation was first used as a technical term by Firth (1957) when he said 'I propose to bring forward as a technical term, meaning by collocation, and apply the test of collocability' (Firth 1957: 194). According to Firth (1968: 181), 'collocations of a given word are statements of the habitual or customary places of that word.' Firth's research on collocation, however, is largely intuition-based. It is in sharp contrast with most corpus linguists' belief that the only way to reliably identify the collocates of a given word is to study patterns of co-occurrence in a corpus. For example, Hunston (2002, p. 68) argues, 'Collocation may be observed informally in any instance of language, but it is more reliable to measure it statistically, and for this a corpus is essential.' Sinclair operationalized the idea of Firth, proposing that a collocation is a co-occurrence pattern that exists between two items that frequently occur in proximity to one another, but not necessarily adjacently or, indeed, in any fixed order. Node and collocates are two notions closely related to collocation. A node is an item whose total pattern of co-occurrence with other words is under examination; and a collocate is any one of the items which appears with the node within a specified span (Sinclair et al., 2004, p. 10).

Corpus-based studies of synonyms
In this paper, synonyms refer to lexical pairs that have very similar cognitive meanings or denotational meanings, but which may differ in collocations. Synonymous words, therefore, are not collocationally interchangeable (Tognini-Bonelli 2001: 34). For example, Halliday (1976: 73) observed that although strong and powerful share similar denotational meanings, tea is typically described as strong rather than powerful whereas a car is more likely to be described as powerful than strong. Gilquin (2003) investigates the difference between the English causative verbs get and have. Glynn (2007) compares intra-and extralinguistic factors in the contexts of hassle, bother and annoy. Gries & Otani (2010) study the synonyms big, great and large and their antonyms little, small and tiny. Other sets of synonyms that have attracted attention include strong and powerful (Church et al. 1991), absolutely, completely and entirely (Partington 1998), big, large and great (Biber et al. 1998), quake and quiver (Atkins & Levin 1995), principal, primary, chief, main and major (Liu 2010), and actually, genuinely, really, and truly (Liu & Espino, 2012)

METHOD Corpus Data: BNC
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written (Aston & Burnard 1998). The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins.
BNC is, by nature, monolingual, synchronic, general and sample-based, in that it deals with modern British English, it covers British English of the late twentieth century, it includes many different styles and varieties instead of being limited to any particular subject field, genre or register, and that it contains many samples which allows for a wider coverage of texts within the 100 million limit. The corpus is encoded according to the Guidelines of the Text Encoding Initiative (TEI) to represent both the output from CLAWS (automatic part-of-speech tagger) and a variety of other structural properties of texts (e.g. headings, paragraphs, lists etc.). Full classification, contextual and bibliographic information is also included with each text in the form of a TEI-conformant header.

Corpus Tool and Analysis Procedure
The Sketch Engine (SkE) is a leading corpus tool, widely used in lexicography, language teaching, translation and the like (Kilgarriff et al. 2004). It actually refers to two different things: the software, and the web service. The web service includes, as well as the core software, a large number of corpora pre-loaded and 'ready for use', and tools for creating, installing and managing users' own corpora. Corpora in SkE are often annotated with additional linguistic information, the most common being part of speech information (for example, whether something is a noun or a verb), which allows large-scale grammatical analyses to be carried out. SkE has a number of core functions: Thesaurus, Wordlist, Concordance, Collocation, word sketches, and Sketch Diff.

RESULTS AND ANALYSIS Synonyms of Career
In SkE the automatic identification of synonymy is achieved by the tool Thesaurus. SkE prepares a 'distributional thesaurus' for a corpus, a thesaurus created on the basis of common collocation. Two words will appear in each other's thesaurus entry if they have many collocates in common. For instance, if we find examples of both professional career and professional job, that is some evidence that the two nouns career and job are similar. These two words share the collocate professional in the modifier relation. In a very large computation, for all pairs of words, SkE computes how many collocates they share. The ones which share most are the ones that appear in a word's thesaurus entry. The thesaurus entry for the noun career is shown in Figure 1. From Figure 1, we can see that the word that share most collocates with career is job. In the following sections, we are going to make a comparison between the usage of career and job.

The Frequencies of Career and Job
The concordance function of SkE enables researchers to compare frequencies of synonymous words. As shown in Table 1, the frequency of job is more than 3 times of career. The Collocates of Career and Job   prospect, opportunity, officer, stage, career, progression, beginning, end, advancement, development, structure, path, guidance, graduate, choice, playing, training, acting, racing, artist, education, promotion and football. Of the above collocates, the meanings of some nouns are positive, such as prospect, opportunity, progression, advancement, development, guidance and promotion. When these pleasant companies collocate with career, the meanings of the occurrences are positive, as in (1) to (3). The meanings of the rest nouns are neutral, which render the meanings of the occurrences neutral too, as in (4) to (6).
(1) I commend her to you without reservation -she is an outstanding professional with excellent career prospects, and will be an asset to any library authority.
(2) In Britain also, several types of paraprofessional training programmes have been developed that provide useful avenues for career advancement.
(3) It is the policy of the Group to afford disabled persons full and fair consideration for employment and subsequent training, career development and promotion on the basis of their aptitudes and abilities. (4) 'I used to know Wapping well at one stage of my career,' Devlin said. (5) A typical career path might be a young European starting in the hotel industry as a management trainee, gaining experience in a variety of establishments in the far East and Europe, then progressing from, say, a major hotel chain through the food and beverage side to a management position at the top end of the leisure market. (6) When a football career ends and reality tackles back, many footballers are attracted like a magnet to the world they know best.
10 out of 50 (20%) collocates of career are verbs: pursue, begin, embark, span, start, launch, ruin, resume, choose and spend. When pursue, begin, embark, span, start, launch and resume collocate with career, the extended contexts render the meanings of the occurrences favorable, as in (7) to (9). When ruin collocates with career, however, the meanings of the occurrences are negative, as in (10). Choose and spend are neutral, and when they collocate with career the meanings of the occurrences are neutral, as in (11).
(7) As a center of philosophical activity, Edinburgh remains at the forefront in the British Isles, and many of its postgraduates have gone on to pursue academic careers around the world. (8) Tony Rudd retires this month after a distinguished career spanning 53 years -13 at Rolls-Royce, 18 at BRM and 22 at Lotus. (9) The two had a son, Neville, and it was not until after the war years that she was in a position to resume her golfing career. 10) He tells of one teacher who was interviewed by police for 12 hours, and then suspended from work for six months after one of his pupils accused him of touching her up. Eventually the charges were dropped, but the teacher's career was ruined. 11) The answers to these questions will have important consequences for anyone who is about to choose a career or a potential employer.
11 out of 50 (22%) collocates of career are adjectives: distinguished, successful, academic, professional, moral, early, political, entire, promising, future and subsequent. Some adjectives are positive, such as distinguished, successful, professional and promising. When these positive words collocate with career, the meanings of the occurrences are pleasant, as in (12) and (13). The rest adjectives are neutral, which render the meanings of the occurrences neutral, as in (14) and (15). 12) JOAN CROSS can be said to have had two distinguished performing careers: pre-war and post-war. 13) Building on the TV success that rocketed her to overnight fame, she is set to make TWO movies and is launching a promising pop career. 14) I used to think about an academic career when I was a student. That was before I got married, of course. 15) Cubby (seated centre) spent his entire career in Dundee, starting as an apprentice at Westport Branch.
The remaining 6 (12%) collocates of career are function words: throughout, during, his, whose, my and their. When these words are used with career, the meanings of the occurrences are neutral, as in (16).
(16) Crawford explained, 'Throughout my acting career , I have always taken notes of myself from directors and from actors.
From the above analysis, we can see out of the 50 collocates of career, 36% are positive, 62% are neutral and 2% are negative. As is shown in Table 3, the dominant collocates of job can be grouped into four grammatical categories: nouns, verbs, adjectives and function words. 16 out of 50 (32%) collocates of job are nouns: loss, job, description, satisfaction, creation, worker, training, people, opportunity, thousand, industry, security, work, manager, hundred and part. Of the above nouns, several words are positive, such as satisfaction, creation, opportunity and security. These positive collocates render the meanings of the occurrences pleasant, as in (17) and (18). Some words are neutral, such as job, description, worker, training, people, thousand, industry, work, manager, hundred and part. When these neutral collocates occur with job, the meanings of the Copyright © Society for Science and Education, United Kingdom occurrences are also neutral, as in (19) and (20). The unpleasant meaning of loss, however, makes the meanings of the occurrences negative, as in (21).
(17) The final system design is evaluated on the basis of job satisfaction of those working on it as well as its efficiency. (18) Graduates enjoy greater job opportunities than those entering employment direct from school. (19) Erm as to the assertion that Harrogate wants office jobs and not industrial jobs, and I think the main point there is that we we're simply trying to achieve jobs to meet the needs of our resident workforce. (20) We need more money to improve transport in London, and provide jobs where people need them. (21) However White and Mackay has ruled out any major job losses from the takeover. 15 out of 50 (30%) collocates of job are verbs: lose, create, get, do, offer, cut, find, apply, take, give, pay, interview, keep, finish and want. Of the above verbs, the meanings of some words are neutral (create, get, offer, find, take, give and keep), but the extended contexts render the meanings of the occurrences positive, as in (22) and (23). The meanings of lose and cut, on the other hand, are negative, and when these two words occur with job, the meanings of the occurrences are unfavorable, as in (24).
(22) Speaking at the launch, party leader John Smith said that adapting to environmental changes could create 700,000 jobs and generate business worth £140,000 million by the end of the century. (23) She found her father a job -as the internal postman. (24) The point that this motion makes is to try to make action, the facilitating attitudes over there is gonna lead to five hundred people losing their jobs in April. 11 out of 50 (22%) collocates of job are function words: their, my, your, for, a, whose, because, would, have, his and will. These function words are all neutral. When these function words collocate with job, the meanings of the occurrences are neutral (27).
(27) I think most of the issues that have been raised tonight already, are ones for the police authority and I hope they will get on with their job .
From the above analysis, we can see that of the 50 collocates of job, 26% are positive, 68% are neutral and 6% are negative.

The Syntactic Patterns of Career and Job
The function that gives the Sketch Engine its name is the word sketch: a one-page summary of a word's grammatical and collocational behavior. Figure 2 demonstrates part of the word sketch for career. Its collocates are grouped according to grammatical relations in which they occur. For example, in the first column, a number of words such as distinguished, successful, academic, playing and moral are grouped under the category of modifiers of "career". Figure 3 shows part of the word sketch for job. In order to present a fine-grained comparison, we summarized the 10 patterns of career and 15 patterns of job in table 4 and table 5. possessors of "career" 7.56 A major retrospective of the artist's career is scheduled to verbs with particle "up" and "career" as object 0.91 Cindy Gallop has given up her career in theatre marketing verbs with particle "out" and "career" as object 0.22 Dennis carved out a career in the building industry The President has done a good job of putting himself nouns and verbs modified by "job" 13.09 there will be further job losses -around 250 verbs with "job" as object 40.93 how the clerks did their job verbs with "job" as subject 12.04 My job entailed being on call for shipping "job" and/or… 7.42 French people would take priority for jobs and housing prepositional phrases * She does a variety of jobs for learned bodies in law adjective predicates of "job" 1.62 the new system will make their job less secure "job" is a… 0.45 ending of temporary jobs was the reason for unemployment possessors of "job" 2.30 he was offered the manager's job at Birmingham City usage patterns 0.27 it's your job to take immediate steps Verbs with particle "up" and "job" as object 0.70 What in fact he did was give up his regular job …is a "job" 0.57 The librarian's job is a job of management of information verbs with particle "out" and "job" as object 0.18 they still are unable to carry out the job verbs with particle "down" and "job" as object 0.15 She managed to hold down a job as a journalist verbs with particle "over" and "job" as object 0.03 waiting for a robot to take over your job It has to be noted that although career and job share many similarities in their syntactic patterns, there are also apparent differences, as is shown by the Sketch Diff function of SkE.

Comparison of Lexical and Grammatical Collocates of Career and Job
The Sketch Diff function of SkE allows users to visually compare synonymous words based on their grammatical relations and collocational behaviors. Figure 4 shows part of the differences between career and job automatically generated by SkE. In the figure, the greener a word is, the more closely it relates to career. The redder a word is, the more closely it relates to job. For example, in the "career/job" and/or pattern, reputation, background, relation and interest frequently collocate with career, but are never used with job. On the other hand, income, status, growth and profession always collocate with job, but are never used with career. There are some other words that can occur both with career and job in the "career/job" and/or relation, such as life, marriage, relationship, employment, family, career, education, money, training, job and home.
In order to make the comparison results clearer, we present part of the differences in the following tables. As is shown in table 6, only career can be the subject of some verbs, such as span, counsel, last, flourish, develop, demonstrate, move and reach, as in (28). Only job can be subject of some verbs, such as keep, demand, boost, create, axe, lose, disappear, require, involve and entail, as in (29). Both career and job can be the subject of some verbs, such as begin, end, start, take, include, be and go. Take 'go' as an example, it collocates with career for 20 times, and collocates with job for 118 times, as in (30) and (31). (28) Alan Crosskill, general service manager and Press spokesman for Cleveland ambulance, is leaving the service after ten years to become self employed in public relations and career counselling. (29) The only possibilities for such areas might lie in attracting to them (through government policy) low-paid jobs demanding minimal skills. (30) Justin hasn't had this much ink for a decade, not since his million-pound transfer from Norwich to Nottingham Forest, where, from being the highest scorer in the First Division, his career went into a premature tailspin. (31) Over the years 's job has gone from unloading individual bags to today's bulk deliveries of around 40,000 tonnes each year.
As is shown in table 7, some adjectives only modify career, such as distinguished, successful, playing, acting, moral, promising, racing, entire, subsequent and test, as in (32). Some adjectives only modify job, such as excellent, proper, odd, manufacturing, difficult, permanent, top, temporary and part-time, as in (33). Some adjectives modify career and job, such as academic, brilliant, professional, full-time, new and good. Take 'professional' as an example, it collocates with career for 67 times, and it collocates with job for 52 times, as in (34) and (35).
Copyright © Society for Science and Education, United Kingdom  (37). Some words not only collocate with career but also with job, such as service and industry. Take 'service' as an example, it collocates with career for 20 times, and also collocates with job for 20 times, as in (38) and (39). (36) Once out of the Army, John Moynihan, after a couple of false starts, began his career in journalism, before long landing a job with the Evening Standard on their 'In London Last Night' column. (37) In 1940 I was fortunate to find a summer job in the local tomato factory, one of the innumerable similar factories scattered in the Parma province. (38) Sheila Mossman, who as part of the lifetime career in the service of music was an examiner with the board, died in 1971. (39) Again it has been shown that jobs in services, clerical work, agriculture and construction are attractive, but factory work is only the fifth choice, preferred by only 11 per cent (see figure 3.7).

Limitations of SkE
So far we have demonstrated how to use some core functions of SkE to research synonymous nouns career and job. However, it has to be pointed out that SkE has not without its limitations. One apparent limitation is its automatic extraction of similar words. In Figure 1, some of the synonyms provided by the tool Thesaurus seems to have little similarities with career, such as education, success, history, season, game, etc. A recent study carried out by Perisman et al.
(2015) on how to automatically identifying and extracting synonyms might be able to help SkE to improve its accuracy. In addition, SkE cannot semantically annotate a corpus as another web-based corpus tool Wmatrix does. The SkE team may wish to solve this problem in the future.

CONCLUSION
Researching synonymy is a crucial task in the field of lexical semantics because of its importance and intricacy. In this paper, we have introduced the leading corpus tool SkE and its advantages in investigating synonymous verbs. The results show that different functions of SkE can make different contributions to the discrimination of career and job.
This study has a number of pedagogical implications. First, from the above analysis we can see that synonyms usually differ a lot in their collocations, so the traditional practice of explaining meanings to learners by offering synonyms should be used very carefully. Teaching synonyms in this way can be a potential trap for learners which emphasizes the denotational meaning of words rather than their usage (Tognini-Bonelli 2001: 34). Second, second language acquisition studies show that native-speakers memorize lots of chunks of words, and these ready-made or prefabricated units contribute to naturalness and fluency of their utterances. Therefore, if EFL learners want to achieve native-like selection and native-like fluency, they also need to store the collocational patterns from Table 2 to 3. Third, given the huge number of synonyms in English, teachers cannot teach the collocational behaviour and semantic prosody of all the synonyms to students. Teach students how to use SkE to conduct their own research is a better solution.