You are here

Scalable managing and querying of textual resources and the structured knowledge extracted from them

Scientific Area:

  • Knowledge Extraction (from textual resources)

  • Semantic Web

  • Scalable Databases

  • Multimedia

Description of Activities:

Aim of this research activity is to investigate techniques for the scalable storing and querying of textual resources and the structured knowledge extracted from/related to them.

The research and development activities will be carried on within the context of a state-of-the-art framework for storing, managing, retrieving, and querying both unstructured (e.g. text) and structured (e.g., RDF triples) content in an integrated and interlinked way: the KnowledgeStore ( Different internship/thesis opportunities are available, to be discussed and agreed with the candidate:

  • KS1: Evaluation and integration of alternative triple-store backends into the KnowledgeStore. The KnowledgeStore is built on complementary frameworks for storing and managing unstructured and structured content. The relevant structured content is stored in a triple store component. Triple stores are databases specifically developed to store Semantic Web content. Currently, the KnowledgeStore relies on Virtuoso for the triple store part. Goal of this activity is to investigate alternative BigData technological solutions (e.g. Blazegraph, GraphDB) to be used as triple store component into the KnowledgeStore. This will require: (i) to evaluate the robustness, scalability, data injection and retrieval speed, and so on, of the new solution, comparing it with the current technology used; and (ii), integrate the chosen solution into the KnowledgeStore framework.

  • KS2: Joint expressive querying over structured and unstructured data. Currently, different access, retrieval, and querying mechanism are provided in the KS to access unstructured and structured content. Goal of this activity is to investigate and develop techniques to support running expressive queries, involving both structured and unstructured content, against the KnowledgeStore.

  • KS3: Data analytics over the KnowledgeStore. Performing data analytics over big data framework is a trendy and challenging topic. Data analytics refers to techniques for “inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making” (cfr. Wikipedia). Goal of this activity is to investigate and develop techniques for data analytics over the KnowledgeStore.

  • KS4: The KnowledgeStore goes multimedia. The current version of the KnowledgeStore supports only textual resources as unstructured content. Goal of this activity is to extend the framework to cope with multimedia unstructured content, such as images and videos. Besides investigating the extension of the framework to store and reproduce (e.g., stream) multimedia content, a revision of the KnowledgeStore data model is needed to accommodate additional metadata (e.g., image size) or structured content related to multimedia resources.

Required Skills and knowledge:

  • Solid Java programming skills;

  • Basic Knowledge of Semantic Web formats, languages, and technologies: RDF, SPARQL, OWL, LOD;

  • Basic Knowledge of Big Data technologies and frameworks;

  • Willingness to study new, challenging research topics and technologies;

  • Commitment to work in a research-driven environment;

  • Problem solving attitude.

Competencies to be Acquired:

  • Participation to the R&D activities of a leading EU research institute;

  • Acquisition of advanced knowledge and skills in Semantic Web formats and technologies;

  • Acquisition of advanced knowledge Big Data frameworks;

  • Contribution to the development of a state-of-the-art research-driven tool.

Duration: 3 to 6 months approximately, based on the planned activities.

Preferential Background: Computer Science, ICT

Selection Procedure: A short task-based assessment will be conducted at the beginning of the internship/thesis to assess the skills and capabilities of the student in accomplishing the planned activity.

Contact Person: Dr. Marco Rospocher ( )