Information for Prospective Postgraduate Students
- academic year 2010/11-

  I will be glad to supervise postgraduate students in various areas of text mining and natural language processing. Here are some general themes (please see below for detials):
  • Integrated text and data mining
  • Text mining for biology, biomedicine, medicine and health-care
  • Text mining for e-science, e-commerce and e-government
  • Text processing for the Semantic Web
  • Text analytics and sentiment analysis (e.g. blog mining)
  • Multi-lingual text mining
  • NLP for Serbian

Scholarships available

  • Mining Protein Interaction Data and its Context from the Scientific Literature (CASE Studentship with Pfzier)
    A full four year BBSRC-funded CASE studentship is available to study the way findings, experiments and knowledge about protein interactions is presented in the literature, and in particular how contextual information that details protein interactions are encoded and presented. Focus will be placed on pharmaceutically relevant protein interaction data sets (e.g., pathogens such as HIV, hepatitis viruses, malaria etc.). More details on the research topic are available here.

    The studentship is open to UK/EU applicants and will pay tuition fees in addition to an enhanced stipend. It will also involve a research placement with the industrial CASE partner, Pfizer Global R&D in Sandwich. Applicants should ideally have experience in computational biology, bioinformatics, computer science or a related subject area. Knowledge of a programming language and text and/or data mining would be a distinct advantage. Please e-mail {David.Robertson, G.Nenadic}_AT_manchester.ac.uk if you require further details. The deadline for applications is March 31, 2010.

  • Extraction, representation and exploration of analytical methods in computational biology
    A full four year BBSRC-funded studentship is available to model and create, by the automated extraction from the scientific literature, a knowledge base of data analysis techniques from across computational biology. Supplemented with profiles that summarise the semantics and usage of the techniques, this resource will give biologists the ability to choose a data analysis technique most suited to their questions and data, in particular when outside a researcher's immediate expertise. More details on the research topic are available here.

    The studentship is open to UK/EU applicants and will pay tuition fees in addition to a generous stipend. Applicants should have a good first degree in computer science, bioinformatics or computational biology, or in a related subject area. An MSc in a relevant subject area would be a distinct advantage. Programming skills will be essential, as is an interest in multi-disciplinary research. The work will involve development of a broad skill base, including knowledge representation and modelling techniques, text mining and bioinformatics. Please e-mail {Robert.Stevens, David.Robertson, or G.Nenadic}_AT_manchester.ac.uk if you require further details. The deadline for applications is 16th April, 2010.

  • Google European Doctoral Fellowship Programme
    The School of Computer Science is seeking to recruit up to two PhD students who are eligible to be nominated for the Google European Doctoral Fellowship Programme for 2010. The programme, running for the first time this year, has been established by Google to support the best and brightest PhD students in the field of Computer Science. Europe's leading Computer Science departments have been asked to nominate candidates for these fully funded fellowships to study various topics, including Natural Language Processing, Machine Translation, Search and Information Retrieval and Social Computing.

    The Google European Doctoral Fellowship Programme was created to support outstanding PhD students doing exceptional research in the field of computer science. Google will award approximately 10 named fellowships, and these are open to prospective home/EU and international PhD students. The application form must be received by the University of Manchester (if you want to study here) by March 19th 2010.

  • School of Computer Science's PhD Scholarships for UK/EU applicants in 2010/11
    The School of Computer Science of the University of Manchester has a number of three-year PhD studentships to offer to highly motivated EU/UK students. The scholarship include a wide range of topics, including In addition, there is one scholarship available that is specifically related to research in areas of interest to the Medical Research Council (http://www.mrc.ac.uk/) - so if you are interested in text mining in the area of health-care and medicine, please contact me.

    Applicants should have an excellent first degree in computer science, informatics, bioinformatics, mathematics or related discipline, and preferrable an MSc and/or some publications. The studentships pay tuition fees and a stipend to cover living expenses for 3 years. Because of conditions associated with this funding, these studentships are open to students eligible for home fees only; this includes UK and EU nationals (non-EU students should check this page). Decisions on successful applications will be made on 26 March, 28 May, 30 July 2010. More details on how to apply are available here.

Environment

The School of Computer Science is one of the leading Schools in the UK reknown for the excellence of its research. The international reputation of its research reflects on its high ranking in the last national Research Assessment Exercise (RAE), which places the School among the best five Computer Science departments in the UK (ranked second for research power). The School has a vibrant research environment with more than 150 PhD students, 90 research staff and 70 academic staff.

Our research TEAM is part of the Text Mining/NLP research group within the School of Computer Science. We are also affiliated to the Manchester Interdisciplinary BioCentre. We investigate methodologies for the extraction of both explicit and implicit knowledge from large collections of textual documents. This field is known as text mining (TM), natural language processing (NLP) and/or text analytics. More precisely, we are intrested in

  • Terminology mining (term/entity identification, controlled vocabularies)
  • Relationship extraction from text (linking entities)
  • Service architectures for text mining (interoperable services, middleware, resources, frameworks).
Our research combines methods from computational lingustics (e.g. shallow parsing, local grammar modelling), knowledge representation (ontologies) and intensive data mining (feature selection, classification and clustering).

Our main focus is in the domains of biology (biomedical literature), healthcare and medicine (patient/hospital records), but we also investigate other domains/genres (e.g. blogs).

If you are interested in any of the scholarships above (or topics below), please contact me directly at .

Research topics

Text mining for biology, biomedicine and health-care

In general, the main objective of these topics is to develop solutions to locate, extract and present useful information and knowlegde burried in various biomedical textual resources. Research undertaken here will be be closely related to the activities of the Manchester Interdisciplinary Biocentre (MIB), in particular with the research project "Mining term associations from literature to support knowledge discovery in biology".
  • Meta-data annotation of biomedical documents. The goal of this project would be to develop a system that will generate meta-data annotation (i.e. assign domain specific categories to documents) automatically. The main idea is make use of both domain terms recognised in documents and existing databases. Also, the project may include extraction of lexical, syntactic and contextual associations from documents, and thier further incorporation in meta-annotation.

  • Integrated and constrastive text and data mining for biological research. The aim of this research is to use various results of text and data mining in order to integrate or contrast findings drawn from heterougenous sources.

  • Understanding terminological coordinations. We have shown that morpho-syntactic information is not sufficient for recogniton and identification of terminoloigcal coordination. The goal of this project is to investigate alternative methods (such as background knowledge and statistics) to improve both precision and recall of coordination extraction.

  • Design and evaluation metrics for bio-text mining. Text mining scenarios are small-scale, but real-world problems that are defined in close cooperation with domain specialists in order to support solving a specific set of problems by text mining. This project will design and evaluate a framework for text mining scenarios in various subdomains.

Text mining for e-science, e-commerce, e-health and e-government

Apart from biomedicine, other domains also generate huge document repositories. The main objective of these topics is to address specific application areas of e-science, e-commerce and e-government.
  • Automatic terminology processing for e-science, e-commerce and e-government. The aim is to investigate methods for automatic identification of terms in these domains. This will include term recognition, term classification and mapping of terms into existing term-databases, or population of knowledge bases and ontologies.

  • Terminiology and ontology driven text mining. The goal of this project would be to investigate possibilities for text mining in the domains of e-commerce, engineering and legislation (e-government) using existing, manually produced terminiologies and ontologies, as well as resources that have been automatically mined from documents.

  • Integration of text mining into business intelligence applications for e-commerce. The goal of this project would be to investigate possibilities for integrating text and Web mining into systems that provide business intelligence for various sectors, including e-commerce.

  • Design and evaluation metrics for text mining. Text mining scenarios are small-scale, but real-world problems that are defined in close cooperation with domain specialists in order to support solving a specific set of problems by text mining. This project will design and evaluate a framework for text mining scenarios in various domains (e.g. biomedical, engineering, legislation).

Text processing for the Semantic Web

The Semantic Web is one of the main research directions and concepts for improving (automated) accessibility of the Web. The main objective of these topics is to develop text processing methods that will automatically generate knowledge that can be used as a basis for Semantic Web applications.
  • Mining bioinformatics services, resources and workflows from documents. There are a number of services and resources available to the bioinformatics community, but meta-data that describe them is typically scarce. This project aims to develop text mining techniques to automatically describe, locate, retrieve and reason about bioinformatics services and resources. We investigate methods that extract descriptions from various document types (articles, reviews, application notes, email archives, discussion forums, etc), and map them to service descriptions using both general service ontologies and domain-specific ontologies. As a working and target environment, the project uses the myGRID/Taverna infrastructure.

  • Trustworthiness of information presented on the Web. The main issue is to investigate to what extent the information that is presented on the Web can be trusted.

Text analytics and sentiment analysis

Sentiment analysis is the extraction of attitudes and opinions from human-authored documents. The capture and analysis of such attitudes and opinions in an automated and structured fashion might offer a powerful technology to a number of problem domains, including business intelligence, marketing, national security, and crime prevention. This project would aim to develop technologies for extraction and analysis of sentiment from free text using a combination of natural language processing (NLP), text mining and machine learning techniques. An interesting epxerimental area would be blog mining. The work will evolve building models of sentiment from which suitable templates for extraction will be designed. Apart from the domains mentioned above, the approach will be tested in the scientific domain (testing the hypothesis that scientific articles involve less sentiment than other genres).

Multi-lingual text mining

  • Terminology driven multi-lingual information retrieval in digital libraries. The goal of this project is to investigate possibilities for cross-lingual text mining that is driven by domain terminologies that are acquired in parallel.

  • Topic-focused crawler in a multi-lingual environment. (see above)

  • Text categorisation. The scope of this project is to develop techniques for multi-lingual categorisation of documents, in particular in a dynamic Web environment using a set of ontological and terminological resources.

NLP for Serbian

The main idea is to provide standards-based solutions to basic NLP problems for a highly morhologically rich langauge like Serbian.
  • POS-tagging for Serbian. The idea is to investigate various POS-tagging methods (including rule-based, probabilistic and machine-learing) for different text types. Also, a challenging problem could be to design a POS-tagger based on a voting system.

  • Named-entity recognition. The scope of this project would be to develop methods (rule-based, probabilistic or machine-learning) to recognise various classes of named entities in Serbian.

  • Shallow parsing for Serbian. The aim is to develop a set of local grammars to support identification of basic chunks in text.

  • Information retrieval for Serbian. The idea is to investigate various indexing methods for information retrieval in Serbian, and to produce a simple search-engine. The project will also include a langauge identification module.

  • Development of domain-specific WordNets. The aim is to develop basic ontologies for specific domains and to integrate them into the Serbian WordNet. The project would also include validation of the developed WordNets.

   For more details, please contact me directly at .


Back to Goran Nenadic's home page