LELA 30922 English Corpus Linguistics
The aim of this course is to introduce the field of corpus linguistics in general, and to learn how to use a particular resource – the British National Corpus (BNC). General issues about the design, collection and analysis of corpus material will be examined. In particular the BNC will be introduced and, in practical sessions, students will learn how to search the corpus using the software tools provided. Students will undertake a corpus-based investigation of some aspect of English language usage, for example distribution of near synonyms, use of particular syntactic constructions or idioms, or comparison of language usage among different subgroups (age, social background, region, etc.). The course will also briefly explore related issues of corpus-based stylistics and computational stylometrics, including forensic applications such as authorship attribution.
|Week||Tues 3pm SamAlex A102||Thurs 10am Roscoe 2.3 (until week 5)|
|1. Jan 29/31||Introduction to syllabus; history of CL||Uses of corpora in general|
|2. Feb 5/7||Corpus design I||Corpus design II|
|3. Feb 12/14||Corpus annotation and SGML||Lemmatization and tagging|
|4. Feb 19/21||Parsing||Word lists, concordances, collocation|
|5. Feb 26/28||Statistics in corpus analysis||Case study I: verbs of sound|
|6. Mar 4/6||Case study II||Lab in Hum. Bridgeford 2.1|
|7. Mar 11/13||Case study III|
|8. Apr 8/10||Computational stylometrics ||Lab in Hum. Bridgeford 2.2|
|9. Apr 15/17||Authorship attribution |
|10. Apr 22/24||Corpora and translation|
| Assignment part I due in Friday Apr 25|
|11. Apr 29/May 1||Corpora and language teaching|| No lab |
| Assignment part II due in Friday May 9|
Main recommended texts
Kennedy, G.D. (1998) An introduction to corpus linguistics. London: Longman.
McEnery, T. & A. Wilson (2001, 2nd ed) Corpus linguistics. Edinburgh: Edinburgh University Press.
Meyer, C. (2002) English corpus linguistics: An introduction. Cambridge: Cambridge University Press.
Assessment is in the form of a single piece of coursework split into two parts. Students will undertake a practical project making use of the BNC (or other approved corpus material) to investigate some question of English language usage. Project write-up will include relevant background material and results and discussion of a corpus-based analysis.
More specifically, students are encouraged to find a previous study and to replicate it, changing some relevant detail. For example if it was based on another corpus, they could see if the findings hold true if applied to (part of) the BNC. Another example might be a study of partial synonymy based on one set of words: students could use the same techniques to study another set of near synonyms. A third case might be to take a slightly flawed study and to try to imporive on its methodology.
Assignment due dates: The assignment is split into two parts. The first part, a description of the problem and review of the relevant literature, should be between between 1000-1500 words and is due in Friday 25 April.
The second part (1500-2000 words) is due in Friday 9 May. Although the work is handed in in two parts, it should be thought of as a unified piece of work, and so it is quite legitimate for example to cross refer from one part to the other.
|Link to LLE page on resources|
|Direct link to BNC query page (needs UoM login)|
|BNC tagset definitions and examples ...|
|grouped by part of speech|
| alphabetical listing|
|BNCweb (CQP) simple query syntax|
|Some corpus-based studies of language and gender|
Slides and lecture notes
|3|| Corpus design I|
|4|| Corpus design II|
|5|| Annotation and SGML|
|6|| Lemmatization and tagging|
|8|| Concordances, collocation and connotation|
|10|| Case study based on Levin et al. (1997)'s study of verbs of sound (Int. J. Corpus Ling.2:23-64)
|11|| Case studies based on verb forms and tenses
|12|| Church et al. using statistics
|13|| Stylistics and stylometry
|14|| Authorship attribution
|15|| Corpora and translation
|16|| Corpora and Language teaching
|Lab 1|| Teacher's script for Lab class 1
|Lab 2|| Teacher's script for Lab class 2
|Lab 3|| Slides for Lab class 3|
| || Spreadsheet for t-test|
| || Spreadsheet for chi-squared test|