Session 1

Thursday 14:00 - 15:30

High Tor 2

Chair: Michael Pidd

Who wrote the Jack the Ripper letters? A stylometric analysis

  • Andrea Nini

University of Manchester

In 1888, a number of prostitutes were murdered in Whitechapel, London and the perpetrator(s) were never caught. However, in the 209 letters that were received before and after the events, the murderer allegedly identified himself as ‘Jack the Ripper’, a name that is to this day associated with the case. These letters were key in providing an evocative name to the press and in creating the persona of Jack the Ripper, still alive today in the form of books, movies, plays, and tours. Interestingly, however, historical evidence suggests that the two most important of these letters responsible for the creation of ‘Jack the Ripper’ were fabricated by journalists with the aim of selling more newspapers, with the later ones probably written by hoaxers after the police decided to make these two public. The present paper reports on a stylometric analysis aimed at identifying which of the 209 letters allegedly attributed to Jack the Ripper were written by the same person. This task has recently been referred to as ‘authorship clustering’, a special case of ‘authorship verification’. Due to the brevity of the Jack the Ripper letters, frequency methods could not be applied and a novel clustering method based on the presence/absence of word 2-grams using the Jaccard distance was applied instead. The results support the hypothesis that the two most important earliest letters were written by the same person and that a third letter, the ‘Moab and Midian’ letter, can be connected to these two. This letter is controversial as some historical evidence suggests it was fabricated by journalists at the Central News Agency. The implications for these results for both the Jack the Ripper case, its socio-cultural dimension, and modern authorship analysis will be discussed.


Exploring Contagion and Migration in European Cultural Memory via Text Mining

  • Susan Leavy,
  • Derek Greene,
  • Karen Wade,
  • Maria Mulvany,
  • Gerardine Meaney

University College Dublin

This paper presents an overview of the Contagion, Biopolitics and Migration in European Cultural Memory project, which aims to combine data analytics and cultural analytics to investigate themes of disease, health, and migration in the British Library Digital Labs corpus. The project plans to study the long-term effects and influence of cultural representations on public understanding of infectious diseases and their prevention. Specifically, in this work we describe the use of methods from text mining and machine learning to study a corpus of over 47,000 texts, covering fiction and non-fiction, ranging from the 18th century to the early 20th century. The natural language processing techniques involved include word embedding and topic modelling. For instance, language pertaining to disease, health, and migration is explored through lexicons generated using word embedding methods trained on the library corpus. We show how the outputs of these methods can be used to characterise and better understand the discourse around disease and migration during this time period.

The history of a database and the digital afterlife of books

  • Stephen H. Gregg

Bath Spa University

When we look at a book on a database, we may partially forget the human digits making list of collections, handling the books into the scanners, pounding keyboards for data-entry; we may also forget that they have an origin and that they evolve. So my own approach also insists on the temporal and material dimensions of the database. As D. F. McKenzie reminds us, a proper study of any text – including digital ones –  ‘directs us to consider the human motives and interactions which texts involve at every stage of their production, transmission, and consumption.’ Drawing on the recent work bridging book history and digital humanities (for example, Sarah Werner, Matthew Kirschenbaum), this paper will begin to bring to life the history of Gale-Cengage’s Eighteenth-Century Collections Online. It will examine aspects of the messy and sometimes ad hoc building of ECCO, the scholarly re-use of its data, and its expansion through parallel publishing projects.