You are here

Building Quality Event-centric Knowledge Graphs from Text

Three-year PhD Grant on "Building Quality Event-centric Knowledge Graphs from Text" offered as part of the Joint PhD Programme in Smart Computing (Universities of Florence, Pisa, and Siena).
The grant is financed by the Data and Knowledge Management (DKM) research unit at Fondazione Bruno Kessler (FBK), Trento, Italy, where most of the research activities will be conducted under the supervision of Dr. Marco Rospocher.

PhD Call: http://www.unifi.it/cmpro-v-p-11202.html
Deadline: 10th August 2017 - at 12.00 p.m (midday) Italian time

Prospective students are strongly encouraged to get in touch with Dr. Marco Rospocher before applying.


Description

Aim of this PhD Studentship is to undertake research in the area of knowledge extraction from text. This is a challenging interdisciplinary research area at the crossroad of Natural Language Processing (NLP), Knowledge Representation and reasoning (KRR), and Semantic Web (SW).

Recent approaches for knowledge extraction from text (e.g., [1], [2]) have focused on the extraction of event knowledge from resources such as news articles, Wikipedia pages, blog posts, etc. These approaches build on NLP pipelines consisting of tools performing several tasks (e.g,, Named Entity Recognition and Classification, Entity Linking, Semantic Role Labeling). The output of these NLP tools is then processed to distill the event knowledge which is represented in a graph, where nodes uniquely identify entities, events,
and situations of the world, and arcs represent semantic relations between them (e.g., the participation of an entity in an event with its role). Events play a central role in the resulting knowledge graph: beside enabling to relate entities, they capture changes in the world as reported in news articles, blog posts, etc, a complementary information to the static encyclopedic content typically covered by traditional knowledge bases.

While achieving increasingly good performances, state-of-the-art approaches suffer of some limitations.

  • First, as the various modules composing the NLP pipelines works independently and returns (only) the best solution for their task, combining their output may produce contradictory information for the same piece of text: for instance, for the same textual span, a tool may extract reference to an entity of the 21st century, but another may ground the content in the first century B.C..
  • Second, these approaches usually translate the natural language sentences into an event-centric representation (explicit knowledge), but they usually fall short in distilling the (implicit) knowledge following from what written in those sentences, a cognitive process that humans typically perform when interpreting a text: for instance, if the text says that someone was released from jail, we can infer that the person was sentenced to imprisonment beforehand.

The PhD project will focus on developing and implementing techniques to distill quality, i.e. complete and coherent, event-centric knowledge graphs from large collections of texts, and to infer consequences of what explicitly mentioned in them.
More precisely, the work will address two complementary aspects:

  1. on the one side, to develop advanced techniques to distill knowledge from the output of the NLP tools used, considering -globally- the results of all the processing performed on a single sentence or document, so to improve the quality and coherence of the resulting event-centric knowledge graph;
  2. on the other side, to develop novel techniques that use the extracted knowledge to derive additional facts and consequences, not explicitly mentioned in the source text (e.g., events that should have occurred because of other events, but are not mentioned in the source text).

The developed techniques will be based on background knowledge models, built either as the result of the ontological analysis of the content produced by knowledge extraction frameworks (e.g., compatibility between complementary annotations such as entity linking and semantic role labelling, consequences / prerequisites / correlations between event types) or learned from available annotated resources. Alternative approaches (e.g., combinatorial optimization techniques, logical reasoning, machine learning) will be investigated.

Candidates should have a high-score MSc Degree in Computer Science, ICT or Mathematics. Previous (basic) knowledge of Semantic Web, Knowledge Representation and Natural Language Processing is required. Candidates should have solid programming skills, in particular of JAVA language. Candidates should be willing to study new, challenging research topics and technologies, be committed to work in a research-driven environment, and have a problem solving attitude.

The position will be supervised in Trento (Italy) by Dr. Marco Rospocher.


References

[1] Building Event-Centric Knowledge Graphs from News (Marco Rospocher, Marieke van Erp, Piek Vossen, Antske Fokkens, Itziar Aldabe, German Rigau, Aitor Soroa, Thomas Ploeger, Tessel Bogaard), In
Web Semantics: Science, Services and Agents on the World Wide Web, volume 37--38, 2016.
[2] Frame-Based Ontology Population with PIKES (Francesco Corcoglioniti, Marco Rospocher, Alessio Palmero Aprosio), In IEEE Transactions on Knowledge and Data Engineering, volume 28, 2016.