This is a list of research students supervised by Professor Harold Somers, with information about their research.
Salleh H. Abdul Rashid sabdura@pkrisc.cc.ukm.my
General research topic: Word-sense disambiguation in Malay: simulating WordNet
Date started: January 2002
Title: Bootstrapping Language Resources for Less Studied Languages: A Case Study in Word Sense
Disambiguation to Simulate WordNet for Malay
Description: The emergence of the Princeton WordNet as a de facto standard for many NLP
applications has prompted a desire to develop new monolingual and multilingual WordNets. This is both time-consuming
and expensive, whichever of several proposed approaches is adopted, and in the case of less-studied languages
there is the additoinal problem of a lack of available resources. In this research we attempt to overcome this
problem by using simulation. Using limited but readily available language resources, we simulate a Malay WordNet
by looking up malay words in a Malay-English dictionary and then using the English WordNet.
We test this simulation in word-sense disambiguation tasks. In a first experiment, we wanted to see if the
simulation was able to choose the correct interpretation of Malay homographs: on the assumption that neighbouring
words will be semantically closer to the correct meaning, we translate them into English using an on-line dictionary
and then compare them using a number of WordNet-based distance measures. Preliminary results suggest that this method
works at least as well as 'normal' WSD. A second experiment will test the hypothesis further in a task
related to lexical choice in Machine Translation.
Publications:
Abdul Rashid S (2004) 'Simulating Malay WordNet: a case study in word-sense disambiguation'. CamLing 2004
Conference, Cambridge.
Abdul Rashid S (2005) 'Simulating a Malay WordNet for word sense disambiguation'.
8th Annual CLUK Research Colloquium,
Manchester.
Dimitra Kalantzi D.Kalantzi@postgrad.umist.ac.uk
Date started: April 2003
Title: Text simplification and automated captioning
Description: Although the dream goal of a system which would take a TV signal and automatically
generate intra- and interlingual subtitles directly from the audio input is a long way off, this research
addresses the more tractable aspects of that dream. Taking the text of a transcript or screenplay as given,
we want to specify software that would help the subtitler by automatically suggesting ways in which
the text can be shortened and simplified so as to fit the many constraints of subtitles, notably number of
characters and readability within a given time frame. We are looking at techniques of text simplification and
summarization to see if they offer any ideas for how to do this. We also want if possible to collect a parallel
corpus of transcripts and subtitles to study the recurring linguistic features of this particular case of
text simplification.
| Tuomas Korhonen
tbc
Date started: Previous supervisor: Gareth Evans Title: Spelling correction for dyslexics Description: |