Sir Han Slone’s Information Architecture: From TEI to CSV for Data Analysis

Keywords: Collections as Data, Text encoding, Early Modern

Abstract:

Digitisation and text encoding have produced large and complex humanities datasets over the past few decades. With a growing interest in working with big data amongst arts and humanities researchers, data driven scholarship presents us with unprecedented opportunities to shed new light on old questions as well as asking new ones. This is in addition to gaining new insights into innovative ways of working that have not been previously possible.

Sir Hans Sloane (1660-1753) bequeathed his collections to the British Nation upon his death where they became the foundation of the three national institutions in the UK: the British Museum, the Natural History Museum, and the British Library.

Five of the manuscript catalogues of Sloane’s collections have been encoded in line with the Guidelines of the Text Encoding Initiative (TEI) by a collaborative project between the British Museum and UCL. Building upon this project, part of my PhD (Sir Hans Sloane: A Data Driven Research), is to leverage the mark-up of these catalogues to computationally produce different outputs for data analysis.

This paper presents a case study that focuses on one particular manuscript catalogue volume titled ‘Miscellanea’ that is formed of seven different catalogues. This examines how amenable the TEI mark-up of the Early Modern catalogues is to computational methods and analyses the use of the Python programming language in extracting and converting targeted data from XML to CSV.

This has the potential benefits of contributing to the Collections as Data movement and adds value to data driven humanities research by showcasing how new knowledge and insights have risen from the use of digital methods in the context of Early Modern documents.


1. Sloane amassed a diverse collection and upon his death prints, drawings, books, manuscripts, herbarium, antiquities along with other treasures were offered to the British nation.

2. Digital editions available at https://reconstructingsloane.org/enlightenmentarchitectures/

3.  Always Already Computational at https://collectionsasdata.github.io/