LELA 30922 English Corpus Linguistics

Course outline · Recommended reading · Assessment · Resources · Lecture notes

The aim of this course is to introduce the field of corpus linguistics in general, and to learn how to use a particular resource – the British National Corpus (BNC). General issues about the design, collection and analysis of corpus material will be examined. In particular the BNC will be introduced and, in practical sessions, students will learn how to search the corpus using the software tools provided. Students will undertake a corpus-based investigation of some aspect of English language usage, for example distribution of near synonyms, use of particular syntactic constructions or idioms, or comparison of language usage among different subgroups (age, social background, region, etc.). The course will also briefly explore related issues of corpus-based stylistics and computational stylometrics, including forensic applications such as authorship attribution.

Course outline

WeekTues 3pm SamAlex A102Thurs 10am Roscoe 2.3 (until week 5)
1. Jan 29/31Introduction to syllabus; history of CLUses of corpora in general
2. Feb 5/7Corpus design ICorpus design II
3. Feb 12/14Corpus annotation and SGMLLemmatization and tagging
4. Feb 19/21ParsingWord lists, concordances, collocation
5. Feb 26/28Statistics in corpus analysisCase study I: verbs of sound
6. Mar 4/6Case study IILab in Hum. Bridgeford 2.1
7. Mar 11/13Case study III
EASTER BREAK
8. Apr 8/10Computational stylometrics Lab in Hum. Bridgeford 2.2
9. Apr 15/17Authorship attribution
10. Apr 22/24Corpora and translation
Assignment part I due in Friday Apr 25
11. Apr 29/May 1Corpora and language teaching No lab
Assignment part II due in Friday May 9

Main recommended texts

Kennedy, G.D. (1998) An introduction to corpus linguistics. London: Longman.
McEnery, T. & A. Wilson (2001, 2nd ed) Corpus linguistics. Edinburgh: Edinburgh University Press.
Meyer, C. (2002) English corpus linguistics: An introduction. Cambridge: Cambridge University Press.

Assessment

Assessment is in the form of a single piece of coursework split into two parts. Students will undertake a practical project making use of the BNC (or other approved corpus material) to investigate some question of English language usage. Project write-up will include relevant background material and results and discussion of a corpus-based analysis.
More specifically, students are encouraged to find a previous study and to replicate it, changing some relevant detail. For example if it was based on another corpus, they could see if the findings hold true if applied to (part of) the BNC. Another example might be a study of partial synonymy based on one set of words: students could use the same techniques to study another set of near synonyms. A third case might be to take a slightly flawed study and to try to imporive on its methodology.

Assignment due dates: The assignment is split into two parts. The first part, a description of the problem and review of the relevant literature, should be between between 1000-1500 words and is due in Friday 25 April. The second part (1500-2000 words) is due in Friday 9 May. Although the work is handed in in two parts, it should be thought of as a unified piece of work, and so it is quite legitimate for example to cross refer from one part to the other.

Resources

syllabus Syllabus
URLLink to LLE page on resources
URLDirect link to BNC query page (needs UoM login)
BNC tagset definitions and examples ...
BNC tagsetgrouped by part of speech
URL alphabetical listing
PDFBNCweb (CQP) simple query syntax
docSome corpus-based studies of language and gender

Slides and lecture notes

1PowerpointIntroduction
2Powerpoint Applications
3Powerpoint Corpus design I
4Powerpoint Corpus design II
5Powerpoint Annotation and SGML
6Powerpoint Lemmatization and tagging
7Powerpoint Parsing
8Powerpoint Concordances, collocation and connotation
9Powerpoint Statistics
10Powerpoint Case study based on Levin et al. (1997)'s study of verbs of sound (Int. J. Corpus Ling.2:23-64)
11Powerpoint Case studies based on verb forms and tenses
12Powerpoint Church et al. using statistics
13Powerpoint Stylistics and stylometry
14Powerpoint Authorship attribution
15Powerpoint Corpora and translation
16Powerpoint Corpora and Language teaching
Lab 1Word document Teacher's script for Lab class 1
Lab 2Word document Teacher's script for Lab class 2
Lab 3 Slides for Lab class 3
  Spreadsheet for t-test
  Spreadsheet for chi-squared test