You are here

Knowledge Extraction for Entity Retrieval and Ranking

Scientific Area:

  • Knowledge Extraction (from textual resources)

  • Information Retrieval

  • Semantic Web

Description of Activities:

Aim of this research activity is to apply knowledge extraction techniques to improve the performance of the entity retrieval and ranking task. The task consists in retrieving and ranking entities that are relevant to understand free-text web-style queries. Among possible uses, these entities and their structured descriptions (e.g., name, image, biography, ...) can be shown alongside the textual results of the query, as done by mainstream search engines (see, e.g., Google Knowledge Graph). A possible, state-of-the-art approach for retrieving these entities consists in analyzing web documents that are found to be query-relevant, and then extracting (via knowledge extraction techniques) and ranking the entities mentioned in these documents.

The research and development activities will be carried on within the context of a state-of-the-art knowledge extraction-powered information retrieval framework ( The work will require to adapt the framework, which currently retrieves and ranks documents, in order to retrieve and rank entities mentioned in the documents.

Required Skills and knowledge:

  • Solid Java programming skills;

  • Basic knowledge of Semantic Web formats, languages, and technologies: RDF, SPARQL, OWL, LOD;

  • Basic knowledge of Natural Language Processing (e.g., what is NLP, what are the typical NLP tasks for knowledge extraction);

  • Basic knowledge of Information Retrieval main concepts (e.g., TF, IDF) and models (e.g., Vector Space Model);

  • Willingness to study new, challenging research topics and technologies;

  • Commitment to work in a research-driven environment;

  • Problem solving attitude.

Competencies to be Acquired:

  • Participation to the R&D activities of a leading EU research institute;

  • Acquisition of advanced knowledge and skills in Semantic Web and knowledge extraction techniques and technologies;

  • Acquisition of advanced knowledge and skills in Information Retrieval and Natural Language Processing;

  • Contribution to the development of a state-of-the-art research-driven tool.

Duration: 3 to 6 months approximately, based on the planned activities.

Preferential Background: Computer Science, ICT

Selection Procedure: A short task-based assessment will be conducted at the beginning of the internship/thesis to assess the skills and capabilities of the student in accomplishing the planned activity.

Contact Person: Dr. Marco Rospocher ( )