FBK home > INFORMATION TECHNOLOGY > Data & Knowledge Management (DKM)

PhD thesis

Three new Ph.D positions are currently available within the DKM research unit. These PhD positions are among the grants of the ICT International Doctoral School of the University of Trento. The call will be open in spring. In the meanwhile for more information you can contact the reference person indicated in each of the position. Ph.D thesis are developed within the Data and Knowledge Management research unit of FBK (http://dkm.fbk.eu), an international and interdisciplinary group of about 15 people doing both research and projects in the area of data analysis, knowledge acquisition, representation, integration and reasoning.

Contents

[edit] Information extraction for ontology engineering

(for more details and statements of interest contact Marco Rospocher or Chiara Ghidini <ghidini@fbk.eu>)

[edit] Background

Albeit the growing maturity of ontological engineering tools, ontology knowledge acquisition remains a highly manual, time-consuming, and complex task, that can easily hinder the ontology building process. Automatic ontology learning is a well-established research field whose goal is to support the semi-automatic construction of ontologies starting from available digital resources (e.g., a corpus, web pages, dictionaries, semi-structured and structured sources) in order to reduce the time and effort in the ontology development process.

In spite of the efforts and progresses made in ontology learning, and of the ambitious research plan of the ontology learning field, whose aim is to extract increasingly complex information ranging from terms, to relations, to hierarchies, and finally to axioms, state of the art methods and tools still mainly focus on the extraction of terms, with few exceptions addressing more complex tasks such as the extraction of (possibly hierarchical) relations, and axioms. Moreover the performances of the current algorithms appear to be more suitable to support the construction of light-weight medium-quality ontologies, rather than good quality conceptualizations of a domain according to the good practices in ontology modeling. To make very simple examples, current algorithms for term extraction provide reasonable performances in terms of precision and recall but lack the needed quality in providing a precise, shared, and well-founded, distinction between the classification of a term in an individual or in a concept. Similarly most algorithms for relation extraction are able to identify relations at the instance level, but are not able to abstract to the concept level, or to identify further characteristics of these relations (e.g., their cardinality, functionality, symmetry, and so on).

[edit] Objective

The aim of this thesis is to investigate how to combine work in automatic ontology learning, which is mainly based on Natural Language processing, information extraction, statistics, and machine learning techniques, and work in methodologies and tools for manual knowledge engineering to produce (semi)-automatic services for ontology learning better supporting the construction of rich and good quality ontologies. The work will start from an investigation of the current techniques for information extraction available in the field of Natural Language processing and their comparison with the requirements coming from the ontology design methodologies in the ontology engineering field, and will then research how to tailor those techniques in order to fulfill these requirements and to produce tools (or services) able not only to extract individuals, concepts, relations, hierarchies, and axioms, but to ground them in good ontology practices.

The work will address key research challenges in both Natural language processing and ontology engineering. It will have strong algorithmic and methodological aspects, together with implementation-oriented tasks.


[edit] Integrating logical and statistical reasoning

(for more details and statements of interest contact Luciano Serafini <serafini@fbk.eu>)

[edit] Background

In the last decade, automated reasoning techniques have reached a high level of complexity able to support reasoning on large knowledge repositories expressed in different logical languages. Examples are: SAT based reasoners for propositional logic, SMT (SAT solver modulo theory), reasoners on Description Logics and other semantic web languages, and resolution based theorem provers. In the meanwhile, complex statistical methods such as support vector machines, kernel methods, and graphical models have been studied and developed. These systems are capable of learning regularities in large data-set and of synthesizing the result in a model that supports stochastic inference. The two methodologies have reached such a level of maturity, that one could figure out also the possibility of profitably combine them in a unique uniform system which allow at the same time learning and reasoning.

During the last three years the FBK joint research project Copilosk has investigated the advantages of combining these two methods for solving problems in natural language processing, with extremely interesting and encouraging results, which show that the usage of background knowledge (available in the semantic web) in combination with machine learning methods improves the performance in many important NLP tasks [1,2,3]. Continuing in this direction we would like to design a general methodology and formal reference model. In the literature there have already been some attempts in this direction, such as Markov Logic Networks [4], Fuzzy Logics, and works that bridges logics with kernel machines [6]. These approaches however are extensions of Machine Learning techniques in order to include some logical knowledge, and they present some limits in the exploitation of logical reasoning in combination with learning.

[edit] Objective

With this thesis we would like to define a formal framework that integrates in a uniform model reasoning and learning. In this new framework it should be possible to define the following two general tasks:

  • Learning from data in presence of background knowledge. This task is quite important as it implements what can be seen as incremental learning, where the learning is performed in successive steps, and at each step the system can reuse the knowledge acquired in the previous steps.
  • Logical reasoning in presence of real observed data. In this task logical reasoning is performed by taking also into account the statistical regularities observable in data. This allows implementing "plausible reasoning" i.e., inference which are not logically fully correct but that are in fact acceptable because some extreme cases never happen (according to the data), and are therefore irrelevant from the statistical point of view.

This new framework should combine one of the standard statistical models, such as graphical models or regularization methods, with automatic reasoning techniques such as SAT based or tableaux based or resolution based reasoning.

[edit] References

[1] Volha Bryl, Claudio Giuliano, Luciano Serafini, Kateryna Tymoshenko. Using Background Knowledge to Support Coreference Resolution. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010), Lisbon, Portugal, August 16-20, 2010, pp. 759-764.

[2] Volha Bryl, Claudio Giuliano, Luciano Serafini, Kateryna Tymoshenko. Supporting natural language processing with background knowledge: coreference resolution case. In Proceedings of the 9th International Semantic Web Conference (ISWC 2010), Shanghai, China, November 7-11, 2010 (Springer), pp. 80-95.

[3] Volha Bryl, Sara Tonelli, Claudio Giuliano, Luciano Serafini. A Novel FrameNet-based Resource for the Semantic Web. To appear in the proceedings of ACM Symposium on Appliced Computing (SAC) 2012, Technical Track on The Semantic Web and Applications (SWA), Riva del Garda (Trento), Italy, March 25-29, 2012.

[4] Matthew Richardson and Pedro Domingos, Markov Logic Networks. Machine Learning, 62 (2006), pp 107-136.

[5] Michelangelo Diligenti, Marco Gori, Marco Maggini, Leonardo Rigutini: Bridging logic and kernel machines. Machine Learning 86(1): 57-88 (2012)


[edit] Behavior recognition and induction via semantic reasoning over human activity processes

Thesis in collaboration with the Skil lab in Trento of Telecom Italia

(for more details and statements of interest contact Luciano Serafini <serafini@fbk.eu>)

[edit] Background

The modern (smart) mobile devices allow for a very wide variety of actions (communication, browsing, application execution) and in addition to standard data related to phoning, include many different sources of information coming from sensors (e.g. GPS position, accelerometer data, etc.). This scenario has led to the birth of novel research areas such as context awareness, situation detection, activity recognition, behavior understanding and many others, which aim at exploiting all these information in order to support the user in multiple daily tasks. In parallel, but on a completely different stage, the semantic web and the linked open data made available a huge quantity of semantic data and knowledge, concerning semantic tagging of geographical data (e.g., openstreetmap) or general knowledge about persons, locations, organizations and events, (e.g., available in dbpedia, freebase, etc.) and general terminologic and ontological knowledge (e.g., schema.org, sumo and dolce upper level ontologies, yago2, wordnet and Framenet. The above scenario opens the possibility of new research challenges of combining raw sensor data with semantic information and ontological knowledge for the analysis of human behavior. The implementation of this vision requires and effective and deep integration of techniques from different disciplines in computer science as data mining, machine learning, semantic web and knowledge representation and reasoning. The aim of this PhD proposal is to address key research challenges in these fields and, in particular, to investigate the benefits of applying the semantic based technologies for modeling and reasoning over human activity processes.

[edit] Objective

The student will develop a research plan which will cover the following three important and complementary aspects:

(i) investigate on models for combining/modifying/extending the standard techniques of data and knowledge processing in order to provide a framework that support reasoning/learning with raw data, information and knowledge.

(ii) definition of reasoning services on top of the applied techniques/formalisms; (iii) modeling, development and experimentation on practical real-world problems in different fields (e-health, smart-city, ...)

Academic advisor: Prof. Luciano Serafini

Industrial Advisor: Michele Vescovi (Michele.vescovi@guest.telecomitalia.it)