Short Papers (2A): National and International Cooperation and Infrastructure

Zoe Borovsky, Chair

Finding new environments for cooperation with digital research and new values for heritage collections: experiences at the National Library of Spain

Elena Sánchez Nogales (National Library of Spain)

Short abstract:
Reusable open data, crowdsourcing… and more. Sharing the strategies developed at the National Library of Spain to strengthen, support and multiply connections with digital research.

Long abstract:
The National Library of Spain (BNE, Biblioteca Nacional de España), preserving the most important bibliographic and documentary collection in the country, has historically had a close and deep relationship with the research community.

Digitisation of collections, initiated in a massive way in 2008, changed the nature of this relationship, and while expanding and improving access to resources, it also implied the new risk of losing a direct (‘physical’) connection with researchers and their needs, and losing visibility of how the Library resources were used.

Over the last years, the BNE has worked to strengthen, support and multiply connections with digital research. Calling for proposals of new digitisations and actively cooperating in digital projects was a first step, but the Library continued beyond by setting a specific strategy to promote the use and reuse of data and digital resources within the framework of their BNElab, (https://bnelab.bne.es/en/). It comprises different lines of action 2018-2022, in cooperation with a community of experts in digital transformation and of course with researchers.

One of these lines looks at collections as data, and here catalogues (bibliographic and authorities) and other data produced by the Library, have been converted into open reusable formats and published in a dedicated space https://bnelab.bne.es/en/data/ - as well as in official open data catalogues.

Helped by experts in data and digital humanities, the Library is promoting the use of these new resources. And the process is also having an interesting internal dimension: ‘questioning’ these data files from a researcher’s view has enabled cataloguing teams and digital collections managers to learn about new uses for our catalogues, and also be aware about digital researchers needs, problems posed by our data and enrichment possibilities. By using tools such as Open Refine, considering new external sources for integration or learning about visualization tools, this work has indeed initiated a change in the way we look at our data and cataloguing practices, and at our catalogue as an information tool itself (beyond description and discovery).

But the Library is also offering new tools and digital platforms such as the ‘Interactive books’ project or ComunidadBNE (https://comunidad.bne.es/), a crowdsourcing initiative recently launched and very well received by the general public and also by academic and research groups. ComunidadBNE is conceived as a digital environment of collaboration between general public, specialists and librarians, for cooperative transcription, geo-referencing, identification or tagging of images from the Library’s digital resources, and also for the enrichment of the bibliographic catalogue and authorities. The platform was built on an open source base code, Pybossa – adapted, developed, and also made open and licensed for free reuse in GitHub. Since the initiative was presented, different research groups have already confirmed interest in participating and using the platform for their academic projects.

This paper proposal aims at sharing the strategy, experience and initial results, as well as future projects within these new lines of action at the BNE.

Growing an international Cultural Heritage Labs community

Sally Chambers (Ghent Centre for Digital Humanities, Belgium)
Mahendra Mahey (British Library Labs, British Library, London, United Kingdom)
Katrine Gasser (Royal Danish Library, Denmark)
Milena Dobreva-McPherson (UCL Qatar, Doha, Qatar)
Kristy Kokegei (History Trust of South Australia, Adelaide, Australia)
Abigail Potter (Library of Congress, Washington D.C., United States)
Meghan Ferriter (Library of Congress, Washington D.C., United States)
Rania Osman (Bibliotheca Alexandrina, Alexandria, Egypt)

Short abstract:
Cultural Heritage Labs help people to innovate and experiment with digital collections. We will present our growing ‘Labs’ international community and outline our future activities.

Long abstract:
‘Cultural Heritage Labs’ in galleries, libraries, archives and museums around the world help researchers, artists, entrepreneurs, educators and innovators to work on, experiment, incubate and develop their ideas of working with digital content through competitions, awards, projects, exhibitions and other engagement activities. They do this by providing services and infrastructure to enable, facilitate and give access to their data both openly online and onsite for research, inspiration and enjoyment.

In September 2018, the British Library Labs team organised a ‘Building Library Labs’’ international workshop. The event provided the opportunity for colleagues that are planning or already have digital experimental ‘Labs’ to share knowledge, experiences and lessons learned. The workshop, which attracted over 40 institutions from North America, Europe, Middle East, Asia and Africa, demonstrated a clear need and enthusiasm for establishing an international support network. Within 6 months, a second international workshop was organised at the Royal Danish Library in Copenhagen in March 2019. In total we have brought together some 120 participants and an even wider community of around 250 people online. Some have been sharing their experiences in setting, using and running innovation labs, but there was a sizeable group of attendees who are planning to set up such labs and need advice and support in how to do this.

The aim of this short paper is to present the journey and development of the International Labs community and outline our future activities.

The principle of the network is that by fair sharing and ‘paying forward’ our expertise, knowledge and experiences, the group hopes to ensure that organisations don’t have to ‘re-invent the wheel’. Organisations can learn from each other and enable collaboration across borders through their digital collections, data, services, infrastructure and practice. This we hope this will result in building better digital ‘Labs’ for their organisations and their users and help to further open up data and services for everyone.

People are the essence of the international Labs network. From the results of an initial global Building Library Labs survey, including 40 responses from 23 countries, there was significant interest from the wider cultural heritage sector, beyond libraries. With currently 250 people, from over 60 institutions, based in over 30 countries affiliated with the network, a solid set of communication tools were needed. The network has a shared Google drive, a mailing list and a Wiki, as well as an active WhatsApp group, a Slack channel and meets regularly via Zoom.

With two successful events behind us, and plenty of enthusiasm and willingness to continue activities further, we now looking to the future. Planned activities include: a book sprint to capture significant knowledge and expertise within the Labs network serving as a reference guide for people wanting to build their own lab, populating our wiki and creating a global directory of Cultural Heritage labs. Further regional and international events have and are also being organised. In less than a year, the Labs network has come a long way, and this is only the beginning!

Keyword searchable digital library of Serbian historic newspapers: Building the national infrastructure

Adam Sofronijevic (University of Belgrade library)
Tamara Šević (Ministry of Culture and Media of Republic of Serbia, University of Belgrade)

Short abstract:
The paper presents the process of building the national infrastructure that allows for keyword searchable digital library of Serbian historic newspapers.

Long abstract:
The paper presents the process of building the national infrastructure that allows for keyword searchable digital library of Serbian historic newspapers. The infrastructure building encompass implementation of technical infrastructure, defining and setting up the working processes and drafting and making legal documents official. The process started in 2013 when University library Belgrade and National library of Serbia took part in Europeana newspaper project that allowed for first METS-ALTO files of Serbian historic newspapers to be created at project partner institution - University of Innsbruck. The first step in building technical infrastructure was creation of keyword searchable digital library at University library Belgrade that provides search and display of digitized historical newspapers and other documents. The website http://www.unilib.rs/istorijske-novine/browse offers simple and advanced search interfaces, and an interface for browsing through the digitized works. The website relies on the BnLViewer software by the National Library of Luxembourg to display digitized METS-ALTO documents, while the search backend uses Lucene/Solr search engine with enhancements made by the University library Belgrade. The website infrastructure is presented in the paper in some details. Collection descriptions and other texts of the website itself are published under the CreativeCommons Attribution (CC-BY) license, to facilitate their reuse in other projects. In 2015 a proprietary software has been acquired for creation of METS-ALTO files - docWorks by CCS Hamburg. University library Belgrade and National library of Serbia got one Basic license each and few years the second biggest library in Serbia, Matica sprska Library, acquired one license as well. The paper presents in some detail the process of creation of compatible work processes in these three libraries accomplished through project and other special activities. Activities that provided for training of librarians and volunteers to allow them to work with this software is also depicted. Collaboration with different smaller libraries from Serbia and the region that provided jpg images to be refined with docWorks is described along with defining the role of libraries that are content providers and promoters of keyword searchable digital library infrastructure. Most recent steps in building technical infrastructure such as additional design and branding of the interface so that it is more functional and branded as central point of access to historic newspapers in Serbia supported by six big libraries and hosted in government data center are explained. The paper specially present the process of drafting and making legal documents regarding keyword searchable digital library of Serbian historic newspapers official. In November of 2018 document entitled National framework for digitization of periodicals has been signed by four big libraries and ministry of culture and information of Republic of Serbia. This document establish METS-ALTO as recommended format for digitizing newspapers and other periodicals in Republic of Serbia. The legal documents drafted in 2019 that allow for implementation of the National framework are also presented. Paper also presents aspects of the process of building this national infrastructure that encompass collaboration with private companies, media, publishers and other stakeholders.

The role of CLARIN and linguistic annotation in the digital transformation in research

Martin Wynne (Bodleian Libraries, University of Oxford, United Kingdom)

Short abstract:
The CLARIN research infrastructure supports the linguistic annotation of digital texts, which allows smarter searching, and can make new forms of research possible.

Long abstract:
Large and growing quantities of textual materials are being made available by libraries in digital form. Many library users will find that this is a convenient way to discover and read texts, but digital texts can offer much more than a shortcut to the delivery of texts to readers. Digital texts can be searched, clusters and patterns of words can be identified, and changes in language usage over time can be traced. Corpus linguistics has developed advanced techniques for the storing, annotating, searching, exploring, and visualizing results. Resources and techniques which have been developed for the purposes of addressing linguistic research questions can also be used to explore and analyse the content of texts, by researchers in disciplines across the human and social sciences. While these opportunities are currently being exploited in many projects, there is still not yet a concerted effort to make all digital texts available in suitable formats to facilitate this sort of digital research. What more do digital libraries need to do in order to make texts available in ways that can really support advanced digital scholarship?

This question has been addressed via surveys of existing practice via the networks of the CLARIN research infrastructure, and via experimental services offered by projects within the Bodleian Libraries at the University of Oxford. The conclusion has been drawn that minimum levels of annotation are necessary for text collections in digital libraries.

Annotation of texts should include structural markup, metadata, and linguistic annotation, including:

When these annotations are made searchable, it can enable the following:

How can it be achieved? The preliminary results will be presented of a CLARIN initiative to gather information on NLP for the annotation of historical documents, and efforts to make it easier for resource curators to use the annotation tools.

Some challenges remain. Digital libraries are not necessarily centres of expertise in NLP, and some new initiatives and collaborations are often necessary. The repository will need to develop and maintain interfaces, which need development and maintenance, and to offer user support. The suggested annotations are likely to be considered provisional, and subject to ongoing updates, improvement and enhancement.