Dr Sophia Ananiadou

Links - Software

Genia Part-of-Speech Tagger for biomedical text mining
Enju, a probabilistic HPSG parser
TIMS, Tag Information Management System [pdf]

Text mining tools used in NaCTeM

TeMine: a terminology management system.

TerMine extracts automatically technical terms. It is based on a hybrid, domain independent automatic term recognition method, C-value.

C-value combines linguistic and statistical information, emphasis being placed on the statistical part. The linguistic analysis enumerates all candidate terms in a given text using linguistic filters. C-value uses as input, text annotated with part-of-speech tags. For biomedical text processing we use the Genia part-of-speech tagger. The statistical analysis assigns a termhood to a candidate term by using the following four characteristics:

the occurrence frequency of the candidate term
the frequency of the candidate term as part of other longer candidate terms
the number of these longer candidate terms
the length of the candidate term

The current implementation is optimized for scalability and processing speed: given a set of 1.3 million MEDLINE abstracts (2GB text), TerMine (standalone version) extracts 9.8 million term candidates and their termhood scores in about ten minutes.

Demo of TerMine

Medie: an intelligent search engine

Medie is an intelligent search engine, retrieving biomedical events from Medline. Medie is based on the analysis of Enju which performs deep parsing of biomedical text.

Demo of Medie

InfoPubMed: extracts genes, proteins and their interactions

InfoPubMed is based on Medie.. It helps users to find relevant information about genes, proteins and their interactions.

Demo of InfoPubMed

Dr Sophia Ananiadou

Links - Software

Professional Links

New Book out now: Text Mining for Biology and Biomedicine