National Centre for Text Mining
School of Computer Science,
Manchester Interdisciplinary Biocentre,
University of Manchester,
131 Princess Street,
Tel: +44 (0)161 306 3091
MPhil, UMIST (2004)
BSc (Hons) Computational Linguistics (1st Class), UMIST (1999)
|03/2007 - present|| Research Associate,
National Centre for Text Mining, The University of Manchester |
|11/2006 - 02/2007||Research Assistant,
The University of Manchester |
|06/2005 - 10/2006||Knowledge Transfer
Partnership Associate, Lorien, Plc and The University of Manchester
|01/2000 - 05/2005||Research Assistant,
UMIST/The University of Manchester |
My main research interests lie in Natural Language Processing, in
particular information extraction and corpus annotation. I have been
involved in the development of resources and user interfaces for a
number of NLP systems, as detailed in the Projects section below.
This project, a collaboration between NaCTEM and the Centre for the History of Science, Technology and Medicine (CHSTM) aims to demostrate the potential of text mining technlogy to assist medical historians to search and explore
long-spanning archives of historical medical documents, and to help them to reveal, explore and discuss long-term, large-scale historical transformations related to medicine and public health. The project has focussed on two specific archives, i.e.,
the British Medical Journal (BMJ) (1840 - present day) and the London-area Medical Officer of Health (MOH) reports (1848-1972). To faciltate the automatic extraction of semantic information from these and other
historical medical archives, we have developed a corpus, including documents from different periods and covering different writing styles, which has been manually annotated by medical historians with medically and historically relevant entity types, and events that involve these entities. Using this corpus, we
have trained models for entity and event recognition that are robust to temporal and stylistic variations in the archives. We have additionally developed a time-sensitive inventory of medical terminology, by applying futher text mining methods to the archives. The inventory lists medical terms, along with their
(possibly time-sensitive) synoynms, variants and other sematically related terms. The trained models for entity and event recognition have been incorporated into an interoperable text mining pipeline for medical history. The culmination of the project has been the development of the History of Medicine (HOM) semantic search system. Using the results of applying the text mining pipeline to the entire contents of the archives, HOM allows users to rapidy refine search results, based upon the presence of specific types of semantic information within documents. Furthermore, the system provides graphical tracking of terminology usage over time, and suggests terms that are related to initial query terms, to help users to widen their searches.
METANET4U is a European project aiming at supporting language technology for European languages and multilingualism.It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META.
META is the Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information society. Our work at the
University is concerned primarily with demonstrating how, by ensuring that individual language processing tools and resources are made interoperable, new applications can be built rapidly
by combining together these interoperable components. We are using the UIMA framework and the U-Compare system to facilitate interoperability of tools and resources.
The main purpose of this project is to build two major reusable,
wide-coverage lexical and conceptual repositories for the biology
domain, i.e. a bio-lexicon and a bio-ontology, using text mining
techniques. My work has been focussed on annotation of gene regulation
events in MEDLINE abstracts, with the purpose of acquiring semantic
frames for verbs and nominalised verbs. The frames will be included
within the bio-lexicon, to aid in the extraction of biological facts.
The Arabic WordNet is based on WordNet
for English, developed at Princeton University, in which words are
grouped into sets of synonyms and structured according to basic semantic
relations between them. I worked on the development of a Java user
interface to allow of searching the Arabic Wordnet and browsing of
relationships between words.
This project was a collaboration between the University of Manchester
and Lorien Plc, a recruitment company based in Leeds. It had the purpose
of assisting Lorien to increase their use of technology within the
recruitment and selection process. My work focussed on the design and
development of a web-based systems for creating and administering online
pre-selection interviews and technical tests for job candidates.
|06/2005 - 10/2006||Knowledge Transfer
The Parmenides project was conerned with knowledge and information
management. I worked on the development of pattern-matching rules and
associated ontologies used to extract entities and events in 3 separate
domains, i.e. biotechnology, weight management and terrorist attacks. I
also developed a user interface to display the extraction results.
|09/2004 - 05/2005||Parmenides|
This project involved the development of a framework for building
agent-based multilingual speech-based applications. I worked on the
development of several components and resources of a spoken email system
based on this framework.
CONCERTO involved the conceptual annotation and retrieval of digital
documents. My work was centred on the infomation extraction module and
included writing pattern matching rules to discover named entities and
relationships between them, in addition to the the development of user
interfaces to facilitate collaborative development of rules and
semi-automatic conceptual annotation.
|01/2000 - 04/2001||CONCERTO|
Mihaila, C., Batista-Navarro, R. T. B., Alnazzawi, N., Kontonatsios, G., Korkontzelos, I., Rak, R., Thompson, P. and Ananiadou, S. (In Press). Mining the biomedical literature. Health Care Analytics, CRC Press
Thompson, P., Ananiadou, S. and Tsujii, J. (In Press). The GENIA Corpus: Annotation Levels and Applications. In: Ide, N. and Pustejovsky, J.(Eds.) Handbook of Linguistic Annotation, Springer
Thompson, P., McNaught, J. and Ananiadou, S. (In Press). Customised OCR Correction for Historical Medical Text. Proceedings of DigitalHeritage 2015
Alnazzawi, N., Thompson, P., Batista-Navarro, R. T. B. and Ananiadou, S. (2015). Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Medical Informatics and Decision Making, 15(Suppl. 2), S3
Alnazzawi, N., Thompson, P. and Ananiadou, S. (2014). Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature. Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), Gothenburg, Sweden, pp. 69-74, Association for Computational Linguistics
Ananiadou, S., Thompson, P., Nawaz, R., McNaught, J. and Kell, D. B. (2014). Event Based Text Mining for Biology and Functional Genomics. Briefings in Functional Genomics, 14(3), 213-230
Kontonatsios, G., Mihaila, C., Korkontzelos, I., Thompson, P. and Ananiadou, S. (2014). A hybrid approach to compiling bilingual dictionaries of medical terms from parallel corpora. Statistical Language and Speech Processing, Second International Conference, SLSP 2014 pages 57-69, Springer
Miwa, M., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2014). Comparable Study of Event Extraction in Newswire and Biomedical Domains. Proceedings of Coling 2014, pp. 2270 -2279
Rehm, G., Uszkoreit, H., Ananiadou, S., Bel, N., Bieleviciene, A., Borin, L., Branco, A., Budin, G., Calzolari, N., Daelemans, W., Garabik, R., Grobelnik, M., Garcia-Mateo, C., Van Genabith, J., Hajic, J., Hernaez, I., Judge, J., Koeva, S., Krek, S., Krstev, C., Linden, K., Magnini, B., Mariani, J., McNaught, J., Melero, M., Monachini, M., Moreno, A., Odijk, J., Ogrodniczuk, M., Pezik, P., Piperidis, S., Przepiorkowski, A., Rognvaldsson, E., Rosner, M., Pedersen, B., Skadina, I., De Smedt, K., Tadic, M., Thompson, P., Tufis, D., Varadi, T., Vasiljevs, A., Vider, K. and Zabarskaite, J.. (2014). The Strategic Impact of META-NET on the Regional, National and International Level. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp. 1517-1524, European Language Resources Association
Rosner, M., Attard, A., Thompson, P., Gatt, A. and Ananiadou, S. (2014). Extending a Tool Resource Framework with U-Compare. Human Language Technology Challenges for Computer Science and Linguistics, Lecture Notes in Computer Science, vol 8387, pp. 315-326
Ananiadou, S., Thompson, P. and Nawaz, R. (2013). Enhancing Search: Events and their Discourse Context. Computational Linguistics and Intelligent Text Processing, Lecturure Notes in Computer Science, vol 7817, pages 318-334, Springer
Ananiadou, S., Thompson, P. and Nawaz, R. (2013). Mining events from the literature for bioinformatics applications. ACM SIGWEB Newsletter, Autumn 2013
Batista-Navarro, R. T. B., Kontonatsios, G., Mihaila, C., Thompson, P., Rak, R., Nawaz, R., Korkontzelos, I. and Ananiadou, S. (2013). Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, vol 7816, pages 559-571, Springer, Berlin Heidelberg
Kontonatsios, G., Korkontzelos, I., Kolluru, B., Thompson, P. and Ananiadou, S. (2013). Deploying and Sharing U-Compare Workflows as Web Services. Journal of Biomedical Semantics, 4:7
Kontonatsios, G., Thompson, P., Batista-Navarro, R. T. B., Mihaila, C., Korkontzelos, I. and Ananiadou, S. (2013). Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Sofia, Bulgaria, pp. 43-48
Mihaila, C., Kontonatsios, G., Batista-Navarro, R. T. B., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2013). Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Association for Computational Linguistics, Sofia, Bulgaria, pp. 79-88 (LAW Challenge Award)
Nawaz, R., Thompson, P. and Ananiadou, S. (2013). Negated BioEvents: Analysis and Identification. BMC Bioinformatics, 14:14 (Highly Accessed)
Nawaz, R., Thompson, P. and Ananiadou, S. (2013). Towards Event-based Discourse Analysis of Biomedical Text. International Journal of Computational Linguistics and Applications, 4(2), 101-120
Nawaz, R., Thompson, P. and Ananiadou, S. (2013). Something old, something new: identifying knowledge source in bio-events. International Journal of Computational Linguistics and Applications, 4(1), 129-144
Thompson, P., Nawaz, R., Korkontzelos, I., Black, W.J., McNaught, J. and Ananiadou, S. (2013). News Search Using Discourse Analytics. Proceedings of the 2013 Digital Heritage International Congress, Marseille, France, pp. 597-604, IEEE
Sophia Ananiadou, John McNaught and Paul Thompson (2012). The English Language in the Digital Age. In Georg Rehm and Hans Uszkoreit (Eds.) White Paper Series, Springer
Maria Liakata, Paul Thompson, Anita de Waard, Raheel Nawaz, Henk Pander Maat and Sophia Ananiadou (2012). A three-way perspective on scientific discourse annotation for knowledge extraction. Proceedings of the ACL Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 37-46
Makoto Miwa, Paul Thompson and Sophia Ananiadou (2012). Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics, 28(13), 1759-1765
Makoto Miwa, Paul Thompson, John McNaught, Douglas B. Kell and Sophia Ananiadou (2012). Extracting semantically enriched events from biomedical literature. BMC Bioinformatics 13:108 (Highly Accessed)
Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2012). Identification of Manner in Bio-Events. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 3505-3510.
Raheel Nawaz Paul Thompson and Sophia Ananiadou (2012). Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers. Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012), pp. 24-31
Xinkai Wang, Paul Thompson and Sophia Ananiadou (2012). Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012) pp. 1148-1155
Rosner, M., Attard, A., Thompson, P., Gatt, A. and Ananiadou, S. (2011). Extending a Tool Resource Framework with U-Compare. Proceedings of the 5th Language & Technology Conference (LTC'2011)
Paul Thompson, Yoshinobu Kano, John McNaught, Steve Pettifer, Teresa Attwood, John Keane and Sophia Ananiadou (2011). "Promoting Interoperability of Resources in META-SHARE". Proceedings of the IJCNLP Workshop on Language Resources, Technology and Services in the Sharing Paradigm (LRTS), Chiang Mai, Thailand, November, pp. 50-58 (pdf).
Paul Thompson, Raheel Nawaz, John McNaught and Sophia Ananiadou (2011). "Enriching a biomedical event corpus with meta-knowledge annotation". BMC Bioinformatics, 12:393 (link) (Highly Accessed)
Paul Thompson, John McNaught, Simonetta Montemagni, Nicoletta Calzolari, Riccardo del Gratta, Vivian Lee, Simone Marchi, Monica Monachini, Piotr Pezik, Valeria Quochi, C.J. Rupp, Yutaka Sasaki, Giulia Venturi, Dietrich Rebholz-Schuhmann and Sophia Ananiadou. (2011). "The BioLexicon: a large-scale terminological resource for biomedical text mining." BMC Bioinformatics 12:397 (link) (Highly Accessed)
Sophia Ananiadou, Paul Thompson, Yoshinobu Kano, John McNaught, Teresa K. Attwood, Philip J. R. Day, John Keane, Dean A. Jackson and Steve Pettifer (2011). "Towards Interoperability of European Language Resources". Ariadne, 67 (link)
C.J. Rupp, Paul Thompson, William J. Black, John McNaught and Sophia Ananiadou (2010). "A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain". Proceedings of Interdisciplinary Workshop on Verbs: The Identification and Representation of Verb Features (Verb 2010), Pisa, Italy. (pdf)
Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2010). "Event Interpretation: A Step towards Event-Centred Text Mining". Proceedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010), CLARIN/DARIAH 2010, Vienna, Austria.
Sophia Ananiadou, Paul Thompson and Raheel Nawaz (2010). "Improving Search Through Event-based Biomedical Text Mining". Proccedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010), CLARIN/DARIAH 2010, Vienna, Austria.
Raheel Nawaz, Paul Thompson and Sophia Ananiadou (2010). "Evaluating a Meta-Knowledge Annotation Scheme for Bio-Events". Proccedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP 2010), ACL 2010, Uppsala, Sweden. p. 69-77 (pdf)
Sophia Ananiadou, Paul Thompson, James Thomas, Tingting Mu, Sandy Oliver, Mark Rickinson, Yutaka Sasaki, Davy Weissenbacher and John McNaught (2010). "Supporting the Education Evidence Portal via Text Mining". Philosophical Transcations of the Royal Society A, 368(1925), 3829-3844.(link)
Raheel Nawaz, Paul Thompson, John McNaught and Sophia Ananiadou (2010).
"Meta-Knowledge Annotation of Bio-Events". Proceedings of the Seventh
International Conference on Language Resources and Evaluation (LREC
2010), Malta, May.
Paul Thompson, Syed A. Iqbal, John McNaught and Sophia Ananiadou (2009).
"Construction of an annotated corpus to support biomedical information
extraction". BMC Bioinformatics 10:349(link)
Yutaka Sasaki, Paul Thompson, John McNaught and Sophia Ananiadou (2009).
"Biological Event Recognition with Textual Induction" Proceedings of
3rd International Symposium on Languages in Biology and Medicine
Yutaka Sasaki, Paul Thompson, John McNaught and Sophia Ananiadou (2009).
"Three BioNLP Tools Powered by the BioLexicon." Proceeedings of EACL
2009 Demonstration Session, pp. 61--64. (pdf)
Giulia Venturi, Simonetta Montemagni, Simone Marchi, Yutaka Sasaki, Paul
Thompson, John McNaught, Sophia Ananiadou (2009). "Bootstrapping a Verb
Lexicon for Biomedical Information Extraction". Proceedings of the
10th International Conference on Intelligent Text Processing and
Computational Linguistics (CICLing 2009), pp. 137--148, Springer (pdf)
Yutaka Sasaki, Paul Thompson, Philip Cotter, John McNaught and Sophia
Ananiadou (2008) Event Frame Extraction Based on a Gene Regulation
Corpus, Proceedings of the 22nd International Conference on
Computational Linguistics (Coling-2008), pp. 761-768, Manchester,
Paul Thompson, Giuila Venturi, John McNaught, Simonetta Montemagni and
Sophia Ananiadou (2008). "Categorising Modality in Biomedical Texts". LREC
2008 workshop "Building and Evaluating resources for biomedical text
mining" Marrakech, Morocco, May. (pdf)
Paul Thompson, Philip Cotter, Sophia Ananiadou, John McNaught,
Simonetta Montemagni, Andrea Trabucco and Giulia Venturi (2008).
"Building a Bio-Event Annotated Corpus for the Acquisition of Semantic
Frames from Biomedical Corpora". Proceedings of the Sixth
International Conference on Language Resources and Evaluation (LREC
2008), Marrakech, Morocco, May.
William Black, Andrew Conroy, Adam Funk, Allan Ramsay, Mark Stairmand,
and Paul Thompson. (2004) “Multilingual Discourse Processing”. In B.
Gambäck and K. Jokinen, editors, Proceedings of the 20th
International Conference on Computational Linguistics, pp. 15-21,
Geneva, Switzerland, August. ACL. ‘Robust and Adaptive Information
Processing for Mobile Speech Interfaces: DUMAS Final Workshop’
Markku Turunen, Esa-Pekka Salonen, Mikko Hartikainen, Jaakko Hakulinen,
William Black, Allan Ramsay, Adam Funk, Andrew Conroy, Paul Thompson,
Mark Stairmand, Kristiina Jokinen, Jyrki Rissanen, Kari Kanto, Antti
Kerminen, Björn Gambäck, Magnus Sahlgren, Fredrik Olsson, Maria Cheadle,
Preben Hansen, and Stina Nylander. (2004). “AthosMail: A Multilingual
Adaptive Spoken Dialogue System for the E-Mail Domain”. In B. Gambäck
and K. Jokinen, editors, Proceedings of the 20th International
Conference on Computational Linguistics, pp. 77-86, Geneva,
Switzerland, August. ACL. ‘Robust and Adaptive Information Processing
for Mobile Speech Interfaces: DUMAS Final Workshop’.
Paul Thompson, Mark Stairmand, and William Black. (2004) “Utterance
Planning in an Agent-based Dialogue System”. In Proceedings of the
3rd International Conference on Natural Language Generation,
University of Brighton, Brockenhurst, England, July
William Black, Paul Thompson, Adam Funk, and Andrew Conroy. (2003)
“Learning to Classify Utterances in a Task-Oriented Dialogue”. In
Kristiina Jokinen, Yorick Wilks, Björn Gambäck, William Black, and
Roberta Catizone, editors, Proceedings of the EACL Workshop on
Dialogue Systems: Interaction, Adaptation and Styles of Management,
pp 9-16, Budapest, Hungary, April. (pdf)
Paul Thompson (2003) “Decision Trees for Dialogue Act Classification”.
In 6th Annual Computational Linguistics in the UK Research Colloquium,
Edinburgh, Scotland, January.