Research Students

This is a list of research students supervised by Professor Harold Somers, with information about their research.

Salleh H. Abdul Rashid sabdura@pkrisc.cc.ukm.my
General research topic: Word-sense disambiguation in Malay: simulating WordNet
Date started: January 2002
Title: Bootstrapping Language Resources for Less Studied Languages: A Case Study in Word Sense Disambiguation to Simulate WordNet for Malay
Description: The emergence of the Princeton WordNet as a de facto standard for many NLP applications has prompted a desire to develop new monolingual and multilingual WordNets. This is both time-consuming and expensive, whichever of several proposed approaches is adopted, and in the case of less-studied languages there is the additoinal problem of a lack of available resources. In this research we attempt to overcome this problem by using simulation. Using limited but readily available language resources, we simulate a Malay WordNet by looking up malay words in a Malay-English dictionary and then using the English WordNet. We test this simulation in word-sense disambiguation tasks. In a first experiment, we wanted to see if the simulation was able to choose the correct interpretation of Malay homographs: on the assumption that neighbouring words will be semantically closer to the correct meaning, we translate them into English using an on-line dictionary and then compare them using a number of WordNet-based distance measures. Preliminary results suggest that this method works at least as well as 'normal' WSD. A second experiment will test the hypothesis further in a task related to lexical choice in Machine Translation.
Publications:
Abdul Rashid S (2004) 'Simulating Malay WordNet: a case study in word-sense disambiguation'. CamLing 2004 Conference, Cambridge.
Abdul Rashid S (2005) 'Simulating a Malay WordNet for word sense disambiguation'. 8th Annual CLUK Research Colloquium, Manchester.

Dimitra Kalantzi D.Kalantzi@postgrad.umist.ac.uk
Date started: April 2003
Title: Text simplification and automated captioning
Description: Although the dream goal of a system which would take a TV signal and automatically generate intra- and interlingual subtitles directly from the audio input is a long way off, this research addresses the more tractable aspects of that dream. Taking the text of a transcript or screenplay as given, we want to specify software that would help the subtitler by automatically suggesting ways in which the text can be shortened and simplified so as to fit the many constraints of subtitles, notably number of characters and readability within a given time frame. We are looking at techniques of text simplification and summarization to see if they offer any ideas for how to do this. We also want if possible to collect a parallel corpus of transcripts and subtitles to study the recurring linguistic features of this particular case of text simplification.

Tuomas Korhonen tbc
Date started:
Previous supervisor: Gareth Evans
Title: Spelling correction for dyslexics
Description: