|
Module's
general
information:
|
|
Module code:
|
SYT03090 |
| Level: |
3 |
| Credit value: |
10 |
| Semester: |
2 |
| Total hours: |
32 |
| Lecturer: |
Dr Sophia Ananiadou |
| Lectures: |
Thursday, 16-18 |
| Assistants: |
Irena Spasic Goran Nenadic |
| Tutorials: |
Monday, 14-15, Lab A |
| Assessement: |
Examination (70%, 2 hours) |
| |
Coursework (30%) |
|
|
Module's
aims and outcomes:
|
| To introduce students to goals,
methods and applications of natural language processing. Intended
learning outcomes: |
| 1. |
To have an understanding of the
methods used in natural language processing (NLP) and their
relation with computer science |
| 2. |
To have an understanding of standard methods
of morphological and syntactic analysis of natural language
|
| 3. |
To examine the difficulties involved with the
processing of language (ambiguity) |
| 4. |
To examine important applications which benefit
from the use of natural language processing techniques |
|
|
Syllabus:
|
| 1. |
Goals of NLP (state of the art and
state of the market) |
| 2. |
Introduction to linguistics i.e. the scientific
study of language |
| 3. |
Computational morphology (lemmatisation, two-level
model) |
| 4. |
Resources for natural language processing (large
corpora/documents, dictionaries, knowledge bases) |
| 5. |
Corpus based analysis (use of relevant statistical
techniques, representation, annotation and analysis of corpora)
|
| 6. |
Tools and techniques for corpus processing (sampler
of corpus projects, EAGLES, BNC etc) |
| 7. |
Information retrieval and NLP (key concepts,
evaluation (TREC conferences), models (probabilistic, boolean
logic, vector processing), systems) |
| 8. |
applications: machine translation, basic design
principles (MT and the user, market), sample of selected systems
(SYSTRAN), information extraction. |
| Lab work on selected topics of NLP such as tokenisation,
text preprocessing, term extraction. |
|
|
Learning materials/resources:
|
| 1. |
Allen, J.: Natural Language Understanding.
(2nd ed), Menlo park CA:Benjamin/Cummings. 1994. ISBN 0-8053-0330-8
|
| 2. |
Charniak, E.: Statistical language learning,
Cambridge, MA, 1993. The MIT Press. ISBN 0262032163 |
| 3. |
Zernik, U.: Lexical acquisition: using on-line
resources to build a lexicon. Hillsdale, NJ, 1990 Lawrence
Erlbaum, ISBN 0805811273 |
| 4. |
Sparck Jones, K. and Willett, P. (eds): Readings
in Information Retrieval, Morgan Kaufmann, 1997, ISBN
1-55860-454-5 |
| 5. |
Manning, C.D & Schutze, H.: Foundations of Statistical Natural Language Processing,,
MIT Press, Ca. MA, 1999 |
| 6. |
Jurafsky, D. & Martin, J.H.: Speech and Language Processing, ,
Prentice Hall, 2000 |
| 7. |
Carpenter, B. : The Logic of Typed Feature Structures,, CUP, Ca, 1992
|
|
|
Notes and further reading material provided during the lectures:
|
Lectures:
- Module overview
- What is NLP?
- Linguistic background
- Formal Language Theory
- Finite State Machines
- Parsing
- Parsing Context-free Grammars
- Feature Structures and Unification
- Tagging
- Computational Morphology
- An Introduction to Information Extraction, slides for a lecture given by Prof.
Jun-ichi Tsujii (University of Tokyo, Japan)
- Annotated Bibliography:
Information Extraction and Natural Language Processing (Jun-ichi Tsujii)
- Term Recognition
- Annotated Bibliography: Automatic Term Extraction
Assignment:
- NLP assignment - please return to the office by 18/04/02
Tutorials:
- Tutorial 1: Introduction & Tagging (04/02/02)
- Brill's tagger demo and the UPenn tag set
- LT-POS tagger
- A sample tagger for Windows
- Brill's tagger for Linux
- WinBrill tagger
- Constraint Grammar tagger
- Samples ...
- Tutorial 2: Finite State Machines (18/02/02)
- Tutorial 3: Syntactic Parsing (04/03/02)
- Tutorial 4: ADAPT-tutorial (18/03/02)
- Download ADAPT
- Test sentences ...
- Tutorial 5: Using Syntactic Parser ADAPT (15/04/02)
- Download ADAPT Lexicon
- Download ADAPT Grammar
- Tutorial 6: Tokenization and pattern matching (29/04/02)
- Sample matching programs and corpora
Resources:
- Brill's tagger demo and the UPenn tag set
- LT-POS tagger
- A sample tagger for Windows
- Brill's tagger for Linux
- WinBrill tagger
- Constraint Grammar tagger
- Sample corpus
- ADAPT
- Sample tokenizer
- Sample NP-pattern matching program
- Sample matching programs and corpora
|
|