Paul Thompson

Research Associate



Contact details:
National Centre for Text Mining (NaCTeM),
School of Computer Science,
Manchester Interdisciplinary Biocentre,
University of Manchester,
131 Princess Street,
M1 7DN
Tel: +44 (0)161 306 3091
Email: Paul.Thompson[at]manchester.ac.uk

Education

MPhil, UMIST (2004)
BSc (Hons) Computational Linguistics (1st Class), UMIST (1999)

Employment History

03/2007 - present Research Associate, National Centre for Text Mining, The University of Manchester
11/2006 - 02/2007Research Assistant, The University of Manchester
06/2005 - 10/2006Knowledge Transfer Partnership Associate, Lorien, Plc and The University of Manchester
01/2000 - 05/2005Research Assistant, UMIST/The University of Manchester

Research

My main research interests lie in Natural Language Processing, in particular information extraction and corpus annotation. I have been involved in the development of resources and user interfaces for a number of NLP systems, as detailed in the Projects section below.

Projects

01/2014 - 06/2015Mining the History of Medicine
This project, a collaboration between NaCTEM and the Centre for the History of Science, Technology and Medicine (CHSTM) aims to demostrate the potential of text mining technlogy to assist medical historians to search and explore long-spanning archives of historical medical documents, and to help them to reveal, explore and discuss long-term, large-scale historical transformations related to medicine and public health. The project has focussed on two specific archives, i.e., the British Medical Journal (BMJ) (1840 - present day) and the London-area Medical Officer of Health (MOH) reports (1848-1972). To faciltate the automatic extraction of semantic information from these and other historical medical archives, we have developed a corpus, including documents from different periods and covering different writing styles, which has been manually annotated by medical historians with medically and historically relevant entity types, and events that involve these entities. Using this corpus, we have trained models for entity and event recognition that are robust to temporal and stylistic variations in the archives. We have additionally developed a time-sensitive inventory of medical terminology, by applying futher text mining methods to the archives. The inventory lists medical terms, along with their (possibly time-sensitive) synoynms, variants and other sematically related terms. The trained models for entity and event recognition have been incorporated into an interoperable text mining pipeline for medical history. The culmination of the project has been the development of the History of Medicine (HOM) semantic search system. Using the results of applying the text mining pipeline to the entire contents of the archives, HOM allows users to rapidy refine search results, based upon the presence of specific types of semantic information within documents. Furthermore, the system provides graphical tracking of terminology usage over time, and suggests terms that are related to initial query terms, to help users to widen their searches.

02/2011 - 02/2013METANET4U
METANET4U is a European project aiming at supporting language technology for European languages and multilingualism.It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META. META is the Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information society. Our work at the University is concerned primarily with demonstrating how, by ensuring that individual language processing tools and resources are made interoperable, new applications can be built rapidly by combining together these interoperable components. We are using the UIMA framework and the U-Compare system to facilitate interoperability of tools and resources.

03/2007 - 03/2009BOOTStrep
The main purpose of this project is to build two major reusable, wide-coverage lexical and conceptual repositories for the biology domain, i.e. a bio-lexicon and a bio-ontology, using text mining techniques. My work has been focussed on annotation of gene regulation events in MEDLINE abstracts, with the purpose of acquiring semantic frames for verbs and nominalised verbs. The frames will be included within the bio-lexicon, to aid in the extraction of biological facts.

11/2006 - 02/2007Arabic WordNet
The Arabic WordNet is based on WordNet for English, developed at Princeton University, in which words are grouped into sets of synonyms and structured according to basic semantic relations between them. I worked on the development of a Java user interface to allow of searching the Arabic Wordnet and browsing of relationships between words.

06/2005 - 10/2006Knowledge Transfer Partnership
This project was a collaboration between the University of Manchester and Lorien Plc, a recruitment company based in Leeds. It had the purpose of assisting Lorien to increase their use of technology within the recruitment and selection process. My work focussed on the design and development of a web-based systems for creating and administering online pre-selection interviews and technical tests for job candidates.

09/2004 - 05/2005Parmenides
The Parmenides project was conerned with knowledge and information management. I worked on the development of pattern-matching rules and associated ontologies used to extract entities and events in 3 separate domains, i.e. biotechnology, weight management and terrorist attacks. I also developed a user interface to display the extraction results.

10/2001 - 08/2004DUMAS
This project involved the development of a framework for building agent-based multilingual speech-based applications. I worked on the development of several components and resources of a spoken email system based on this framework.

01/2000 - 04/2001CONCERTO
CONCERTO involved the conceptual annotation and retrieval of digital documents. My work was centred on the infomation extraction module and included writing pattern matching rules to discover named entities and relationships between them, in addition to the the development of user interfaces to facilitate collaborative development of rules and semi-automatic conceptual annotation.

Publications

Thompson, P., Carter, J., McNaught, J. and Ananiadou, S. (In Press). Semantically Enhanced Search System for Historical Medical Archives. Proceedings of DigitalHeritage 2015

Thompson, P., McNaught, J. and Ananiadou, S. (In Press). Customised OCR Correction for Historical Medical Text. Proceedings of DigitalHeritage 2015

Thompson, P., Ananiadou, S. and Tsujii, J. (In Press). The GENIA Corpus: Annotation Levels and Applications. In: Ide, N. and Pustejovsky, J.(Eds.) Handbook of Linguistic Annotation, Springer

Alnazzawi, N., Thompson, P., Batista-Navarro, R. T. B. and Ananiadou, S. (2015). Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Medical Informatics and Decision Making, 15(Suppl. 2):S3

Mihaila, C., Batista-Navarro, R. T. B., Alnazzawi, N., Kontonatsios, G., Korkontzelos, I., Rak, R., Thompson, P. and Ananiadou, S. (In Press). Mining the biomedical literature. Health Care Analytics, CRC Press, pages 251-308.

Alnazzawi, N., Thompson, P. and Ananiadou, S. (2014). Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature. Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), Gothenburg, Sweden, pp. 69-74, Association for Computational Linguistics

Ananiadou, S., Thompson, P., Nawaz, R., McNaught, J. and Kell, D. B. (2014). Event Based Text Mining for Biology and Functional Genomics. Briefings in Functional Genomics, 14(3), 213-230

Kontonatsios, G., Mihaila, C., Korkontzelos, I., Thompson, P. and Ananiadou, S. (2014). A hybrid approach to compiling bilingual dictionaries of medical terms from parallel corpora. Statistical Language and Speech Processing, Second International Conference, SLSP 2014 pages 57-69, Springer

Miwa, M., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2014).
Comparable Study of Event Extraction in Newswire and Biomedical Domains. Proceedings of Coling 2014, pp. 2270 -2279

Rehm, G., Uszkoreit, H., Ananiadou, S., Bel, N., Bieleviciene, A., Borin, L., Branco, A., Budin, G., Calzolari, N., Daelemans, W., Garabik, R., Grobelnik, M., Garcia-Mateo, C., Van Genabith, J., Hajic, J., Hernaez, I., Judge, J., Koeva, S., Krek, S., Krstev, C., Linden, K., Magnini, B., Mariani, J., McNaught, J., Melero, M., Monachini, M., Moreno, A., Odijk, J., Ogrodniczuk, M., Pezik, P., Piperidis, S., Przepiorkowski, A., Rognvaldsson, E., Rosner, M., Pedersen, B., Skadina, I., De Smedt, K., Tadic, M., Thompson, P., Tufis, D., Varadi, T., Vasiljevs, A., Vider, K. and Zabarskaite, J.. (2014).
The Strategic Impact of META-NET on the Regional, National and International Level. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp. 1517-1524, European Language Resources Association

Rosner, M., Attard, A., Thompson, P., Gatt, A. and Ananiadou, S. (2014). Extending a Tool Resource Framework with U-Compare. Human Language Technology Challenges for Computer Science and Linguistics, Lecture Notes in Computer Science, vol 8387, pp. 315-326

Ananiadou, S., Thompson, P. and Nawaz, R. (2013). Enhancing Search: Events and their Discourse Context. Computational Linguistics and Intelligent Text Processing, Lecturure Notes in Computer Science, vol 7817, pages 318-334, Springer

Ananiadou, S., Thompson, P. and Nawaz, R. (2013). Mining events from the literature for bioinformatics applications. ACM SIGWEB Newsletter, Autumn 2013

Batista-Navarro, R. T. B., Kontonatsios, G., Mihaila, C., Thompson, P., Rak, R., Nawaz, R., Korkontzelos, I. and Ananiadou, S. (2013). Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, vol 7816, pages 559-571, Springer, Berlin Heidelberg

Kontonatsios, G., Korkontzelos, I., Kolluru, B., Thompson, P. and Ananiadou, S. (2013). Deploying and Sharing U-Compare Workflows as Web Services. Journal of Biomedical Semantics, 4:7

Kontonatsios, G., Thompson, P., Batista-Navarro, R. T. B., Mihaila, C., Korkontzelos, I. and Ananiadou, S. (2013). Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Sofia, Bulgaria, pp. 43-48

Mihaila, C., Kontonatsios, G., Batista-Navarro, R. T. B., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2013). Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Association for Computational Linguistics, Sofia, Bulgaria, pp. 79-88 (LAW Challenge Award)

Nawaz, R., Thompson, P. and Ananiadou, S. (2013). Negated BioEvents: Analysis and Identification. BMC Bioinformatics, 14:14 (Highly Accessed)

Nawaz, R., Thompson, P. and Ananiadou, S. (2013). Towards Event-based Discourse Analysis of Biomedical Text. International Journal of Computational Linguistics and Applications, 4(2), 101-120

Nawaz, R., Thompson, P. and Ananiadou, S. (2013). Something old, something new: identifying knowledge source in bio-events. International Journal of Computational Linguistics and Applications, 4(1), 129-144

Thompson, P., Nawaz, R., Korkontzelos, I., Black, W.J., McNaught, J. and Ananiadou, S. (2013). News Search Using Discourse Analytics. Proceedings of the 2013 Digital Heritage International Congress, Marseille, France, pp. 597-604, IEEE

Sophia Ananiadou, John McNaught and Paul Thompson (2012). The English Language in the Digital Age. In Georg Rehm and Hans Uszkoreit (Eds.) White Paper Series, Springer

Maria Liakata, Paul Thompson, Anita de Waard, Raheel Nawaz, Henk Pander Maat and Sophia Ananiadou (2012). A three-way perspective on scientific discourse annotation for knowledge extraction. Proceedings of the ACL Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 37-46

Makoto Miwa, Paul Thompson and Sophia Ananiadou (2012). Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics, 28(13), 1759-1765

Makoto Miwa, Paul Thompson, John McNaught, Douglas B. Kell and Sophia Ananiadou (2012). Extracting semantically enriched events from biomedical literature. BMC Bioinformatics 13:108 (Highly Accessed)

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2012). Identification of Manner in Bio-Events. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 3505-3510.

Raheel Nawaz Paul Thompson and Sophia Ananiadou (2012). Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers. Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012), pp. 24-31

Xinkai Wang, Paul Thompson and Sophia Ananiadou (2012). Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012) pp. 1148-1155

Rosner, M., Attard, A., Thompson, P., Gatt, A. and Ananiadou, S. (2011). Extending a Tool Resource Framework with U-Compare. Proceedings of the 5th Language & Technology Conference (LTC'2011)

Paul Thompson, Yoshinobu Kano, John McNaught, Steve Pettifer, Teresa Attwood, John Keane and Sophia Ananiadou (2011). "Promoting Interoperability of Resources in META-SHARE". Proceedings of the IJCNLP Workshop on Language Resources, Technology and Services in the Sharing Paradigm (LRTS), Chiang Mai, Thailand, November, pp. 50-58 (pdf).

Paul Thompson, Raheel Nawaz, John McNaught and Sophia Ananiadou (2011). "Enriching a biomedical event corpus with meta-knowledge annotation". BMC Bioinformatics, 12:393 (link) (Highly Accessed)

Paul Thompson, John McNaught, Simonetta Montemagni, Nicoletta Calzolari, Riccardo del Gratta, Vivian Lee, Simone Marchi, Monica Monachini, Piotr Pezik, Valeria Quochi, C.J. Rupp, Yutaka Sasaki, Giulia Venturi, Dietrich Rebholz-Schuhmann and Sophia Ananiadou. (2011). "The BioLexicon: a large-scale terminological resource for biomedical text mining." BMC Bioinformatics 12:397 (link) (Highly Accessed)

Sophia Ananiadou, Paul Thompson, Yoshinobu Kano, John McNaught, Teresa K. Attwood, Philip J. R. Day, John Keane, Dean A. Jackson and Steve Pettifer (2011). "Towards Interoperability of European Language Resources". Ariadne, 67 (link)

C.J. Rupp, Paul Thompson, William J. Black, John McNaught and Sophia Ananiadou (2010). "A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain". Proceedings of Interdisciplinary Workshop on Verbs: The Identification and Representation of Verb Features (Verb 2010), Pisa, Italy. (pdf)

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2010). "Event Interpretation: A Step towards Event-Centred Text Mining". Proceedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010), CLARIN/DARIAH 2010, Vienna, Austria.

Sophia Ananiadou, Paul Thompson and Raheel Nawaz (2010). "Improving Search Through Event-based Biomedical Text Mining". Proccedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010), CLARIN/DARIAH 2010, Vienna, Austria.

Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2010). "Evaluating a Meta-Knowledge Annotation Scheme for Bio-Events". Proccedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP 2010), ACL 2010, Uppsala, Sweden. p. 69-77 (pdf)

Sophia Ananiadou, Paul Thompson, James Thomas, Tingting Mu, Sandy Oliver, Mark Rickinson, Yutaka Sasaki, Davy Weissenbacher and John McNaught (2010). "Supporting the Education Evidence Portal via Text Mining". Philosophical Transcations of the Royal Society A, 368(1925), 3829-3844.(link)

Raheel Nawaz, Paul Thompson, John McNaught and Sophia Ananiadou (2010). "Meta-Knowledge Annotation of Bio-Events". Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Malta, May. (pdf)

Paul Thompson, Syed A. Iqbal, John McNaught and Sophia Ananiadou (2009). "Construction of an annotated corpus to support biomedical information extraction". BMC Bioinformatics 10:349(link)

Yutaka Sasaki, Paul Thompson, John McNaught and Sophia Ananiadou (2009). "Biological Event Recognition with Textual Induction" Proceedings of 3rd International Symposium on Languages in Biology and Medicine (LBM-2009). (pdf)

Yutaka Sasaki, Paul Thompson, John McNaught and Sophia Ananiadou (2009). "Three BioNLP Tools Powered by the BioLexicon." Proceeedings of EACL 2009 Demonstration Session, pp. 61--64. (pdf)

Giulia Venturi, Simonetta Montemagni, Simone Marchi, Yutaka Sasaki, Paul Thompson, John McNaught, Sophia Ananiadou (2009). "Bootstrapping a Verb Lexicon for Biomedical Information Extraction". Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009), pp. 137--148, Springer (pdf)

Yutaka Sasaki, Paul Thompson, Philip Cotter, John McNaught and Sophia Ananiadou (2008) Event Frame Extraction Based on a Gene Regulation Corpus, Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), pp. 761-768, Manchester, August (pdf)

Paul Thompson, Giuila Venturi, John McNaught, Simonetta Montemagni and Sophia Ananiadou (2008). "Categorising Modality in Biomedical Texts". LREC 2008 workshop "Building and Evaluating resources for biomedical text mining" Marrakech, Morocco, May. (pdf)

Paul Thompson, Philip Cotter, Sophia Ananiadou, John McNaught, Simonetta Montemagni, Andrea Trabucco and Giulia Venturi (2008). "Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora". Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, May. (pdf)

William Black, Andrew Conroy, Adam Funk, Allan Ramsay, Mark Stairmand, and Paul Thompson. (2004) “Multilingual Discourse Processing”. In B. Gambäck and K. Jokinen, editors, Proceedings of the 20th International Conference on Computational Linguistics, pp. 15-21, Geneva, Switzerland, August. ACL. ‘Robust and Adaptive Information Processing for Mobile Speech Interfaces: DUMAS Final Workshop’ (pdf)

Markku Turunen, Esa-Pekka Salonen, Mikko Hartikainen, Jaakko Hakulinen, William Black, Allan Ramsay, Adam Funk, Andrew Conroy, Paul Thompson, Mark Stairmand, Kristiina Jokinen, Jyrki Rissanen, Kari Kanto, Antti Kerminen, Björn Gambäck, Magnus Sahlgren, Fredrik Olsson, Maria Cheadle, Preben Hansen, and Stina Nylander. (2004). “AthosMail: A Multilingual Adaptive Spoken Dialogue System for the E-Mail Domain”. In B. Gambäck and K. Jokinen, editors, Proceedings of the 20th International Conference on Computational Linguistics, pp. 77-86, Geneva, Switzerland, August. ACL. ‘Robust and Adaptive Information Processing for Mobile Speech Interfaces: DUMAS Final Workshop’. (pdf)

Paul Thompson, Mark Stairmand, and William Black. (2004) “Utterance Planning in an Agent-based Dialogue System”. In Proceedings of the 3rd International Conference on Natural Language Generation, University of Brighton, Brockenhurst, England, July (pdf)

William Black, Paul Thompson, Adam Funk, and Andrew Conroy. (2003) “Learning to Classify Utterances in a Task-Oriented Dialogue”. In Kristiina Jokinen, Yorick Wilks, Björn Gambäck, William Black, and Roberta Catizone, editors, Proceedings of the EACL Workshop on Dialogue Systems: Interaction, Adaptation and Styles of Management, pp 9-16, Budapest, Hungary, April. (pdf)

Paul Thompson (2003) “Decision Trees for Dialogue Act Classification”. In 6th Annual Computational Linguistics in the UK Research Colloquium, Edinburgh, Scotland, January.