EPISA project: a FAIR path to semantic archives at the Portuguese National Archives

You are here

29 Jan 2020

EPISA project: a FAIR path to semantic archives at the Portuguese National Archives

The EPISA project (Entity and Property Inference for Semantic Archives) is part of the ongoing renewal of DGLAB’s existing data infrastructure and aims, amongst other goals, to develop a prototype for an open-source knowledge graph platform adopting a new data model for archival description. Content integration is among its key features, as we intend to create a flexible data model that can both interoperate with other information systems and accommodate information regarding cultural resources other than archival documents.

The Portuguese National Archives are managed by DGLAB (Direção-Geral do Livro dos Arquivos e das Bibliotecas), a public administration body responsible for the management of several information systems that support its mission to safeguard, enhance and promote governmental and public records, as well as other historical documents in its custody. It holds the most relevant cultural heritage collection, largely digitized and accessed both by history researchers and by laypeople from all the Portuguese-speaking countries and beyond.

About 3.5 million metadata records are currently held on a system designed 20 years ago, according to the standards by the International Council of Archives (ICA). These records are mainly composed by textual descriptions of the context and the contents of the documents, but large amounts of born-digital information are growing in the system.

Due to the complexity of the paradigm shift involved in the  representation of archival information on a linked data model, this project is also devoted to finding ways to guarantee the effective migration of contents stored according to ICA standards to an ontology-based model, requiring both the use of existing cross-walks and the inference of the new relations with semi-automated methods. The need for a new generation of description tools that includes libraries, archives and museums – more fine grained, more flexible and specially more machine-actionable – led to the choice of the CIDOC-CRM (ISO 21127:2014) as root ontology. The role of DGLAB as a large archival institution (it integrates the headquarters in Lisbon and the majority of the district archives) and also as a regulator for the state, municipal and private archives, is a guarantee of the impact of the project results. We anticipate that the proposed change in cultural heritage metadata will give users a better knowledge of the repository and improved tools for more flexible  and richer retrieval, as well as a stronger presence in aggregators, both in cultural heritage and elsewhere, as we ensure compliance to the FAIR data principles.

EPISA, financed by the Portuguese Foundation for Science and Technology (FCT), is a collaboration between INESC TEC - Institute for Systems and Computer Engineering, Technology and Science, as principal contractor, and DGLAB and the University of Évora, as participating institutions.

 

Maria José de Almeida
DGLAB m-jose.almeida@dglab.gov.pt

Inês Koch
INESC TEC, FEUP ines.koch@inesctec.pt

Cláudia Guedes
FEUP up201505409@fe.up.pt

 

Click on the poster image to enlarge

If you want to hear what we have to say, click here, and you can also meet our team!

Name & surname: 
Maria Jose de Almeida; Inês Koch; Cláudia Guedes
Scientific Discipline / Research Area: 
Engineering and Technology/Other engineering and technologies
Affiliation: 
Direção-Geral do Livro dos Arquivos e das Bibliotecas (DGLAB); Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência (INESC TEC)
  • Patricia Herterich's picture

    Author: Patricia Herterich

    Date: 03 Apr, 2020

    This looks really interesting, archival data is often neglected so thank you for highlighting the progress in that community!

    Do you already have an idea how to attempt the atomization in practice? Do you envisage to do this with some text mining and other automated workflows or will this require a lot of manual effort? 

    Many thanks,

    Patricia

  • Maria Jose de Almeida's picture

    Author: Maria Jose de A...

    Date: 03 Apr, 2020

    Thanks for your interest!

    The proposed approach relies heavily on AI and machine learning for the semantic migration of existing data. For new archival description records we are working on automated workflows as a semi-automated description of the objects is highly desired. We intend to develop tools that automatically propose instance values for the entities and properties used to describe specific objects, always revised and examined by the archival professionals using a graphical user interface.

submit a comment