Interoperability of text mining tools (2006)
The topic of this project is Integration of a Typed Feature Structure -based Logic Programming System LiLfes, with UIMA.
LiLFeS has been developed and maintained by the NLP laboratory, Department of Computer Science, the University of Tokyo, under the direction of Prof. J. Tsujii. This project will enhance the current work performed by NaCTeM in collecting and developing software modules of NLP and text Mining developed by the community and distributing them in interoperable forms. The proposed enabler layers will add all the benefits of TFS-based logic programming to our collection of software in UIMA compliant forms.
The aim of the project is to develop the enabler layers of UIMA for LiLFeS which provides the UIMA user community with functionalities of TFS-based logic programming system, and to transfer NLP modules based on TFS formalisms developed jointly by the NLP groups at the University of Manchester and Tokyo. NaCTeM will be responsible for distributing the end results of the UIMA compliant unification engine and LiLFeS to the user community of NLP and Text Mining software.
Duration: 2006--2007 , Funding: $22,000
Comparator of Natural Language Processing Tools using UIMA (2007)
The aim of this proposal is to produce an integration environment which would allow tools independently developed by other groups to be combined as freely and as flexibly as possible. Our integration environment is not a mere repository of UIMA compliant tools. In order to achieve such a high level of interoperability, we have designed common type systems sharable among different groups. These include POS tags, named entities for Biomedicine, phrase markers for shallow parsers, and dependency links for shallow dependency parsers. Besides our work within NaCTeM, we have also leveraged type systems designed in two of our related research projects, BOOTStrep and GENIA.. Our ongoing work includes the definition of sharable type systems for more complex tools such as deep parsers and relational annotations for Biology and other domains related with NaCTeM's service provision namely health and social sciences. We have proposed an annotation type system designed and implemented for an UIMA tool suite that is the backbone for our text mining applications.
Although sharing common type systems is the first step towards interoperability, we go a step further and guarantee the valid combination of NLP tools. For example, two taggers using the same type system for POS often assign different tags to specific words. The proposed integration environment will help users to locate such minute differences and thus develop, with minimum cost, necessary translation modules to emulate tools to be replaced. Because the performance of NLP tools is highly dependent on text types and subject domains as well as specific applications, the users of the integration environment would require guidance for selecting the best combination of tools for their specific tasks.
In this proposal, comparators play the central role in the integration environment in two ways: they will help users to replace original Text Analysis Engines (TAE) in generic aggregate TAEs and to choose the best combination of TAEs for their tasks. Based on the common type systems which we have defined for bio-text mining (and currently other domains) we construct a set of task specific comparators and develop an integration environment which deploys those comparators.
Duration: 2007-2008
Collaborators: Y. Kano (University of Tokyo); Jun'ichi Tsujii (University of Tokyo); Eric Nyberg (CMU)
Funding: $22,000
This proposal is based on a jointly developed software platform (U-Compare) which allows users to configure their own workflow, compare possible combinations of UIMA components and visualize the results based on our shared type system.
The aim of the current proposal is to enhance U-Compare by: expanding its type system;
transferring more NLP/TM components and linguistic resources into its environment to meet broader requirements of biomedical NLP; providing an interface of NLP components enhanced with machine learning software.
The enhanced U-Compare will allow users to choose the best of breed tools and resources from a large repository of UIMA compliant NLP tools for different tasks.