Module:

Natural Language Processing
(2001/2002)

 

 

Module's
general
information
:

 

Module code:

SYT03090
Level: 3
Credit value: 10
Semester: 2
Total hours: 32
Lecturer: Dr Sophia Ananiadou
Lectures: Thursday, 16-18
Assistants: Irena Spasic
Goran Nenadic
Tutorials: Monday, 14-15, Lab A
Assessement: Examination (70%, 2 hours)
  Coursework (30%)

Module's
aims and outcomes
:

 

To introduce students to goals, methods and applications of natural language processing. Intended learning outcomes:
1. To have an understanding of the methods used in natural language processing (NLP) and their relation with computer science
2. To have an understanding of standard methods of morphological and syntactic analysis of natural language
3. To examine the difficulties involved with the processing of language (ambiguity)
4. To examine important applications which benefit from the use of natural language processing techniques

 


Syllabus:

 

1. Goals of NLP (state of the art and state of the market)
2. Introduction to linguistics i.e. the scientific study of language
3. Computational morphology (lemmatisation, two-level model)
4. Resources for natural language processing (large corpora/documents, dictionaries, knowledge bases)
5. Corpus based analysis (use of relevant statistical techniques, representation, annotation and analysis of corpora)
6. Tools and techniques for corpus processing (sampler of corpus projects, EAGLES, BNC etc)
7. Information retrieval and NLP (key concepts, evaluation (TREC conferences), models (probabilistic, boolean logic, vector processing), systems)
8. applications: machine translation, basic design principles (MT and the user, market), sample of selected systems (SYSTRAN), information extraction.
Lab work on selected topics of NLP such as tokenisation, text preprocessing, term extraction.

Learning materials/resources:

 

1. Allen, J.: Natural Language Understanding. (2nd ed), Menlo park CA:Benjamin/Cummings. 1994. ISBN 0-8053-0330-8
2. Charniak, E.: Statistical language learning, Cambridge, MA, 1993. The MIT Press. ISBN 0262032163
3. Zernik, U.: Lexical acquisition: using on-line resources to build a lexicon. Hillsdale, NJ, 1990 Lawrence Erlbaum, ISBN 0805811273
4. Sparck Jones, K. and Willett, P. (eds): Readings in Information Retrieval, Morgan Kaufmann, 1997, ISBN 1-55860-454-5
5. Manning, C.D & Schutze, H.: Foundations of Statistical Natural Language Processing,, MIT Press, Ca. MA, 1999
6. Jurafsky, D. & Martin, J.H.: Speech and Language Processing, , Prentice Hall, 2000
7. Carpenter, B. : The Logic of Typed Feature Structures,, CUP, Ca, 1992

Notes and further reading material provided during the lectures:

 

 

Lectures:

  • Module overview
  • What is NLP?
  • Linguistic background
  • Formal Language Theory
  • Finite State Machines
  • Parsing
  • Parsing Context-free Grammars
  • Feature Structures and Unification
  • Tagging
  • Computational Morphology
  • An Introduction to Information Extraction, slides for a lecture given by Prof. Jun-ichi Tsujii (University of Tokyo, Japan)
  • Annotated Bibliography: Information Extraction and Natural Language Processing (Jun-ichi Tsujii)
  • Term Recognition
  • Annotated Bibliography: Automatic Term Extraction

Assignment:

  • NLP assignment - please return to the office by 18/04/02

Tutorials:

  • Tutorial 1: Introduction & Tagging (04/02/02)
    • Brill's tagger demo and the UPenn tag set
    • LT-POS tagger
    • A sample tagger for Windows
    • Brill's tagger for Linux
    • WinBrill tagger
    • Constraint Grammar tagger
    • Samples ...
  • Tutorial 2: Finite State Machines (18/02/02)
  • Tutorial 3: Syntactic Parsing (04/03/02)
  • Tutorial 4: ADAPT-tutorial (18/03/02)
    • Download ADAPT
    • Test sentences ...
  • Tutorial 5: Using Syntactic Parser ADAPT (15/04/02)
    • Download ADAPT Lexicon
    • Download ADAPT Grammar
  • Tutorial 6: Tokenization and pattern matching (29/04/02)
    • Sample matching programs and corpora

Resources:

  • Brill's tagger demo and the UPenn tag set
  • LT-POS tagger
  • A sample tagger for Windows
  • Brill's tagger for Linux
  • WinBrill tagger
  • Constraint Grammar tagger
  • Sample corpus
  • ADAPT
  • Sample tokenizer
  • Sample NP-pattern matching program
  • Sample matching programs and corpora