Session 10 — Discursive ways with historical language data

Friday 11:30 - 13:00

High Tor 2

Chair: George Ionita

Finding meaning through linguistic probability in 60,000 early modern English texts: Innovations from the Linguistic DNA project

  • Seth Mehl

University of Sheffield

This paper describes methodological innovations and presents new findings in relation to the Linguistic DNA project. The Linguistic DNAproject maps semantic and conceptual change in Early Modern English, using a data-driven approach based on computational analysis of lexical co-occurrence in approximately 60,000 texts found in Early English Books Online (specifically EEBO-TCP). To do this, we first define semantics and concepts in relation to a long history of linguistic theory, from Paul’s (1897) philological semantics to Evans’s (2009) cognitive approach, and propose a new notion of discursive concepts, which include a wide range of discursive meanings – from traditional semantic relations to real-world social and cultural relationships. Based on Fano’s (1960) original descriptions of mutual information, we identify co-occurring lexical trios and the sections of texts in which they occur. We analyse these sections of text manually using tools from semantics, pragmatics, discourse analysis, and sociolinguistics. In this paper, I present examples of examples of such trios with the goal of interrogating meaning in Early Modern texts, and a particular focus on the elements of semantic, pragmatic, discursive, and social meanings that can be conveyed by such trios. I also present plans for the project moving forward, including the challenging task of working with such larger data, and prospects for searching that very large data in a bottom-up way.

A distant history of Libraries: “Is this the librarye that thou haddest chosen”?

  • Iona Hine

University of Sheffield

Between 2015 and 2018 researchers at the Universities of Sheffield, Sussex and Glasgow have collaborated to model conceptual change in early modern texts, using the manually transcribed portions of Early English Books Online and Eighteenth Century Collections Online. Collectively, and with extensive input from Digital Humanities developers, the Linguistic DNA team have developed a novel technique for exploring association between words, by modelling so-called ‘discursive concepts’.

This paper explores what we can learn about the ‘library’ and associated discursive concepts using the approaches and data from that research collaboration. What might the distinctive distant reading techniques developed by the Linguistic DNA project add to our understanding of bibliographic past and the scholarly future?

Fuzzy Dating and Ambiguous Courting: Accounting for varying metadata precision in historical semantic development

  • Fraser Dallachy

University of Glasgow

Accurate dating of the emergence and development of concepts is hampered by real world concerns about how rapidly a newly coined word makes its way into transmitted material and the survival of material which would enable the most accurate dating. As part of the AHRC-funded Linguistic DNA project, the ‘Lexicalization Pressure’ research stream faces multiple sources of potential dating fuzziness. This paper discusses methods for taking account of this fuzziness when investigating the data, using the noun court as an example.

When identifying the date at which a new word, or a new sense of a word, emerges, there is a necessary degree of uncertainty in the results. For lexicographical resources, dating accuracy is unavoidably affected by the fact that it is rarely the case that a word is coined in print, and thus the first usage is lost to posterity. In addition, the earliest known citation may occur in texts which are themselves of uncertain date, or in dictionary entries which rarely allow a sense of how recent a word’s coinage is. In text corpora, dating fuzziness is inherent in the use of historical collections such as Early English Books Online (EEBO-TCP), firstly because date of composition and/or printing may be uncertain, especially for earlier texts. When these are known, an extensive lag between a text’s composition and its printing may still mean that there are important questions to be asked about whether the language of the text can then genuinely be said to be ‘current’ at the time of printing.

Such problems have long been considered by lexicographers and others working with historical language. Whereas much painstaking work has been done by individual researchers, the Linguistic DNA project must work with the metadata available to it with minimal manual intervention. This paper discusses approaches developed by the ‘Lexicalization Pressure’ subtheme of the LDNA project to account for these uncertainties. This includes identifying semantic fields in which lexicographical data is restricted, and adjustment of the parameters used when employing dating metadata in LDNA processor outputs to study semantic field development.