Dr Sophia Ananiadou

Links - Software

Text mining tools used in NaCTeM

TerMine extracts automatically technical terms. It is based on a hybrid, domain independent automatic term recognition method, C-value.

C-value combines linguistic and statistical information, emphasis being placed on the statistical part. The linguistic analysis enumerates all candidate terms in a given text using linguistic filters. C-value uses as input, text annotated with part-of-speech tags. For biomedical text processing we use the Genia part-of-speech tagger. The statistical analysis assigns a termhood to a candidate term by using the following four characteristics:

The current implementation is optimized for scalability and processing speed: given a set of 1.3 million MEDLINE abstracts (2GB text), TerMine (standalone version) extracts 9.8 million term candidates and their termhood scores in about ten minutes.

Demo of TerMine

Medie is an intelligent search engine, retrieving biomedical events from Medline. Medie is based on the analysis of Enju which performs deep parsing of biomedical text.

Demo of Medie

InfoPubMed is based on Medie.. It helps users to find relevant information about genes, proteins and their interactions.

Demo of InfoPubMed

Privacy Policy | Contact Me | ©2006 University of Manchester School of Informatics