Session 1

Thursday 14:00 - 15:30

High Tor 2

Who wrote the Jack the Ripper letters? A stylometric analysis

  • Andrea Nini

University of Manchester

In 1888, a number of prostitutes were murdered in Whitechapel, London and the perpetrator(s) were never caught. However, in the 209 letters that were received before and after the events, the murderer allegedly identified himself as ‘Jack the Ripper’, a name that is to this day associated with the case. These letters were key in providing an evocative name to the press and in creating the persona of Jack the Ripper, still alive today in the form of books, movies, plays, and tours. Interestingly, however, historical evidence suggests that the two most important of these letters responsible for the creation of ‘Jack the Ripper’ were fabricated by journalists with the aim of selling more newspapers, with the later ones probably written by hoaxers after the police decided to make these two public. The present paper reports on a stylometric analysis aimed at identifying which of the 209 letters allegedly attributed to Jack the Ripper were written by the same person. This task has recently been referred to as ‘authorship clustering’, a special case of ‘authorship verification’. Due to the brevity of the Jack the Ripper letters, frequency methods could not be applied and a novel clustering method based on the presence/absence of word 2-grams using the Jaccard distance was applied instead. The results support the hypothesis that the two most important earliest letters were written by the same person and that a third letter, the ‘Moab and Midian’ letter, can be connected to these two. This letter is controversial as some historical evidence suggests it was fabricated by journalists at the Central News Agency. The implications for these results for both the Jack the Ripper case, its socio-cultural dimension, and modern authorship analysis will be discussed.


Exploring Contagion and Migration in European Cultural Memory via Text Mining

  • Susan Leavy,
  • Derek Greene,
  • Karen Wade,
  • Maria Mulvany,
  • Gerardine Meaney

University College Dublin

This paper presents an overview of the Contagion, Biopolitics and Migration in European Cultural Memory project, which aims to combine data analytics and cultural analytics to investigate themes of disease, health, and migration in the British Library Digital Labs corpus. The project plans to study the long-term effects and influence of cultural representations on public understanding of infectious diseases and their prevention. Specifically, in this work we describe the use of methods from text mining and machine learning to study a corpus of over 47,000 texts, covering fiction and non-fiction, ranging from the 18th century to the early 20th century. The natural language processing techniques involved include word embedding and topic modelling. For instance, language pertaining to disease, health, and migration is explored through lexicons generated using word embedding methods trained on the library corpus. We show how the outputs of these methods can be used to characterise and better understand the discourse around disease and migration during this time period.

At the origins of the Political Discourse of the 5-Star Movement (M5S): Internet, direct democracy, and the “future of the past.”

  • Marta Musso,
  • Marzia Maccaferri

The 5-star Movement is a political party in Italy operating almost exclusively online. It was officially established as a political movement in 2009, and quickly became the second most important political force in Italy. The party (or non-party as defined by its members) grew from a series of advocacy campaigns launched in the early Noughties by Beppe Grillo, a popular comedian, and Roberto Casaleggio, a web entrepreneur, using the then new communication forms of blogging, online forums, and Meetup platforms. The web is not just a tool for the movement, but an integral part of the party’s ideology, which advocates the advent of direct democracy thanks to web technologies.

Unlike traditional political parties, the Movement operates almost exclusively online, without any headquarters and no non-digital types of communication (until recently, candidates were forbidden to give interviews on TV or to the press); Beppe Grillo’s blog (, used as an aggregator by early activists, is now the main “spokesperson” of the party; it is not just a tool of communication that replaces traditional party newspapers, but an integral part of the party’s life and of its history.

At the same time, the party thrived as a protest movement because it carefully crafted a “bipartisan populism” that does not allow to label the Movement or its members as either left-winged or right-winged, but draws largely from both spheres of protest vote. Consequently, the position of the 5-Star Movement on key (and divisive) issues such as immigration and the Eurozone strongly oscillates, the party’s communication on “ideologically charged” issues is characterised by strong u-turns.

This research will first give an overview of the relation between the 5-Star Movement and the World Wide Web, particularly the blog. It will analyse how this relation evolved over the years and why the analysis of the blog is fundamental to understanding the ideology and practices of the party.

In second instance, it will run a comparison of the live V archived blog (through the Internet Archives) to test whether it would be possible to reconstruct the evolution of the political discourse of the 5-Star Movement without web archives. The blog makes its post available online from 2005 onwards; this paper will test whether archived blog posts have been altered to reflect the newest line on these divisive issues, ultimately to argue about the importance of web archives as the only tool to reconstruct the evolving ideology of a party that officially communicates only through the World Wide Web.