Paul Thompson

Research Associate



Contact details:
National Centre for Text Mining (NaCTeM),
School of Computer Science,
Manchester Interdisciplinary Biocentre,
University of Manchester,
131 Princess Street,
M1 7DN
Tel: +44 (0)161 306 3091
Email: Paul.Thompson[at]manchester.ac.uk

Education

MPhil, UMIST (2004)
BSc (Hons) Computational Linguistics (1st Class), UMIST (1999)

Employment History

03/2007 - present Research Associate, National Centre for Text Mining, The University of Manchester
11/2006 - 02/2007Research Assistant, The University of Manchester
06/2005 - 10/2006Knowledge Transfer Partnership Associate, Lorien, Plc and The University of Manchester
01/2000 - 05/2005Research Assistant, UMIST/The University of Manchester

Research

My main research interests lie in Natural Language Processing, in particular information extraction and corpus annotation. I have been involved in the development of resources and user interfaces for a number of NLP systems, as detailed in the Projects section below.

I am currently working on corpus annotation in the biomedical field. I was involved in the development of GREC, a corpus of MEDLINE abstracts that have been semantically annotated with gene regulation events. My most recent work involves the annotation of meta-knowledge about bio-events to aid in their correct interpretation.

Projects

02/2011 - 02/2013METANET4U
METANET4U is a European project aiming at supporting language technology for European languages and multilingualism.It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META. META is the Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information society. Our work at the University is concerned primarily with demonstrating how, by ensuring that individual language processing tools and resources are made interoperable, new applications can be built rapidly by combining together these interoperable components. We are using the UIMA framework and the U-Compare system to facilitate interoperability of tools and resources.

03/2007 - 03/2009BOOTStrep
The main purpose of this project is to build two major reusable, wide-coverage lexical and conceptual repositories for the biology domain, i.e. a bio-lexicon and a bio-ontology, using text mining techniques. My work has been focussed on annotation of gene regulation events in MEDLINE abstracts, with the purpose of acquiring semantic frames for verbs and nominalised verbs. The frames will be included within the bio-lexicon, to aid in the extraction of biological facts.

11/2006 - 02/2007Arabic WordNet
The Arabic WordNet is based on WordNet for English, developed at Princeton University, in which words are grouped into sets of synonyms and structured according to basic semantic relations between them. I worked on the development of a Java user interface to allow of searching the Arabic Wordnet and browsing of relationships between words.

06/2005 - 10/2006Knowledge Transfer Partnership
This project was a collaboration between the University of Manchester and Lorien Plc, a recruitment company based in Leeds. It had the purpose of assisting Lorien to increase their use of technology within the recruitment and selection process. My work focussed on the design and development of a web-based systems for creating and administering online pre-selection interviews and technical tests for job candidates.

09/2004 - 05/2005Parmenides
The Parmenides project was conerned with knowledge and information management. I worked on the development of pattern-matching rules and associated ontologies used to extract entities and events in 3 separate domains, i.e. biotechnology, weight management and terrorist attacks. I also developed a user interface to display the extraction results.

10/2001 - 08/2004DUMAS
This project involved the development of a framework for building agent-based multilingual speech-based applications. I worked on the development of several components and resources of a spoken email system based on this framework.

01/2000 - 04/2001CONCERTO
CONCERTO involved the conceptual annotation and retrieval of digital documents. My work was centred on the infomation extraction module and included writing pattern matching rules to discover named entities and relationships between them, in addition to the the development of user interfaces to facilitate collaborative development of rules and semi-automatic conceptual annotation.

Publications

Sophia Ananiadou, Paul Thompson and Raheel Nawaz. (2013). Enhancing Search: Events and their Discourse Context. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, Volume 7817, pages 318-334, Springer

Riza Theresa B. Batista-Navarro, Georgios Kontonatsios, Claudiu Mihaila, Paul Thompson, Rafal Rak, Raheel Nawaz, Ionannis Korkontzelos and Sophia Ananiadou (2013). Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, Volume 7816, pp. 559-571, Springer

Georgios Kontonatsios, Ioannis Korkontzelos, Balakrishna Kolluru, Paul Thompson and Sophia Ananiadou (2013). Deploying and Sharing U-Compare Workflows as Web Services. Journal of Biomedical Semantics, 4:7

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2013). Negated BioEvents: Analysis and Identification. BMC Bioinformatics, 11:14 (Highly Accessed)

Sophia Ananiadou, John McNaught and Paul Thompson (2012). The English Language in the Digital Age. In Georg Rehm and Hans Uszkoreit (Eds.) White Paper Series, Springer

Maria Liakata, Paul Thompson, Anita de Waard, Raheel Nawaz, Hank Pander Maat and Sophia Ananiadou (2012). A three-way perspective on scientific discourse annotation for knowledge extraction. In Proceedings of the ACL Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 37-46

Makoto Miwa, Paul Thompson and Sophia Ananiadou (2012). Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics, 28(13), 1759-1765

Makoto Miwa, Paul Thompson, John McNaught, Douglas B. Kell and Sophia Ananiadou (2012). Extracting semantically enriched events from biomedical literature. BMC Bioinformatics 13:108 (Highly Accessed)

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2012). Identification of Manner in Bio-Events. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 3505-3510.

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2012). Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers. In Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012), pp. 24-31

Xinkai Wang, Paul Thompson and Sophia Ananiadou (2012). Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012) pp. 1148-1155

Paul Thompson, Yoshinobu Kano, John McNaught, Steve Pettifer, Teresa Attwood, John Keane and Sophia Ananiadou (2011). Promoting Interoperability of Resources in META-SHARE. In Proceedings of the IJCNLP Workshop on Language Resources, Technology and Services in the Sharing Paradigm (LRTS), Chiang Mai, Thailand, November, pp. 50-58.

Paul Thompson, Raheel Nawaz, John McNaught and Sophia Ananiadou (2011). Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics, 12:393 (Highly Accessed)

Paul Thompson, John McNaught, Simonetta Montemagni, Nicoletta Calzolari, Riccardo del Gratta, Vivian Lee, Simone Marchi, Monica Monachini, Piotr Pezik, Valeria Quochi, C.J. Rupp, Yutaka Sasaki, Giulia Venturi, Dietrich Rebholz-Schuhmann and Sophia Ananiadou. (2011). The BioLexicon: a large-scale terminological resource for biomedical text mining. BMC Bioinformatics 12:397 (Highly Accessed)

Sophia Ananiadou, Paul Thompson, Yoshinobu Kano, John McNaught, Teresa K. Attwood, Philip J. R. Day, John Keane, Dean A. Jackson and Steve Pettifer (2011). Towards Interoperability of European Language Resources. Ariadne, 67.

C.J. Rupp, Paul Thompson, William J. Black, John McNaught and Sophia Ananiadou (2010). A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain. In Proceedings of Interdisciplinary Workshop on Verbs: The Identification and Representation of Verb Features (Verb 2010), Pisa, Italy.

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2010). Event Interpretation: A Step towards Event-Centred Text Mining. In Proceedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010), CLARIN/DARIAH 2010, Vienna, Austria.

Sophia Ananiadou, Paul Thompson and Raheel Nawaz (2010). Improving Search Through Event-based Biomedical Text Mining. In Proceedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010), CLARIN/DARIAH 2010, Vienna, Austria.

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2010). Evaluating a Meta-Knowledge Annotation Scheme for Bio-Events. In Proccedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP 2010), ACL 2010, Uppsala, Sweden. p. 69-77.

Sophia Ananiadou, Paul Thompson, James Thomas, Tingting Mu, Sandy Oliver, Mark Rickinson, Yutaka Sasaki, Davy Weissenbacher and John McNaught (2010). Supporting the Education Evidence Portal via Text Mining. Philosophical Transcations of the Royal Society A, 368(1925), 3829-3844.

Raheel Nawaz, Paul Thompson, John McNaught and Sophia Ananiadou (2010). Meta-Knowledge Annotation of Bio-Events. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Malta, May.

Paul Thompson, Syed A. Iqbal, John McNaught and Sophia Ananiadou (2009). Construction of an annotated corpus to support biomedical information extraction BMC Bioinformatics 10:349

Yutaka Sasaki, Paul Thompson, John McNaught and Sophia Ananiadou (2009). Biological Event Recognition with Textual Induction In Proceedings of 3rd International Symposium on Languages in Biology and Medicine (LBM-2009).

Yutaka Sasaki, Paul Thompson, John McNaught and Sophia Ananiadou (2009). Three BioNLP Tools Powered by the BioLexicon. In Proceeedings of EACL 2009 Demonstration Session, pp. 61--64.

Giulia Venturi, Simonetta Montemagni, Simone Marchi, Yutaka Sasaki, Paul Thompson, John McNaught, Sophia Ananiadou (2009). Bootstrapping a Verb Lexicon for Biomedical Information Extraction. In Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009), pp. 137--148, Springer

Yutaka Sasaki, Paul Thompson, Philip Cotter, John McNaught and Sophia Ananiadou (2008) Event Frame Extraction Based on a Gene Regulation Corpus. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), pp. 761-768, Manchester, August

Paul Thompson, Giuila Venturi, John McNaught, Simonetta Montemagni and Sophia Ananiadou (2008). Categorising Modality in Biomedical Texts. In Proceedings of the LREC 2008 workshop "Building and Evaluating resources for biomedical text mining" Marrakech, Morocco, May.

Paul Thompson, Philip Cotter, Sophia Ananiadou, John McNaught, Simonetta Montemagni, Andrea Trabucco and Giulia Venturi (2008). Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, May.

William Black, Andrew Conroy, Adam Funk, Allan Ramsay, Mark Stairmand and Paul Thompson. (2004). Multilingual Discourse Processing. In B. Gambäck and K. Jokinen, editors, Proceedings of the 20th International Conference on Computational Linguistics, pp. 15-21, Geneva, Switzerland, August. ACL. ‘Robust and Adaptive Information Processing for Mobile Speech Interfaces: DUMAS Final Workshop’

Markku Turunen, Esa-Pekka Salonen, Mikko Hartikainen, Jaakko Hakulinen, William Black, Allan Ramsay, Adam Funk, Andrew Conroy, Paul Thompson, Mark Stairmand, Kristiina Jokinen, Jyrki Rissanen, Kari Kanto, Antti Kerminen, Bjorn Gamback, Magnus Sahlgren, Fredrik Olsson, Maria Cheadle, Preben Hansen and Stina Nylander. (2004). AthosMail: A Multilingual Adaptive Spoken Dialogue System for the E-Mail Domain. In B. Gambäck and K. Jokinen, editors, Proceedings of the 20th International Conference on Computational Linguistics, pp. 77-86, Geneva, Switzerland, August. ACL. ‘Robust and Adaptive Information Processing for Mobile Speech Interfaces: DUMAS Final Workshop’.

Paul Thompson, Mark Stairmand and William Black. (2004). Utterance Planning in an Agent-based Dialogue System. In Proceedings of the 3rd International Conference on Natural Language Generation, University of Brighton, Brockenhurst, England, July

William Black, Paul Thompson, Adam Funk and Andrew Conroy. (2003). Learning to Classify Utterances in a Task-Oriented Dialogue. In Kristiina Jokinen, Yorick Wilks, Bjorn Gamback, William Black and Roberta Catizone, editors, Proceedings of the EACL Workshop on Dialogue Systems: Interaction, Adaptation and Styles of Management, pp 9-16, Budapest, Hungary, April.

Paul Thompson (2003). “Decision Trees for Dialogue Act Classification”. In 6th Annual Computational Linguistics in the UK Research Colloquium, Edinburgh, Scotland, January.