Finding meaning through linguistic probability in 60,000 early modern English texts: Innovations from the Linguistic DNA project

This paper describes methodological innovations and presents new findings in relation to the Linguistic DNA project. The Linguistic DNAproject maps semantic and conceptual change in Early Modern English, using a data-driven approach based on computational analysis of lexical co-occurrence in approximately 60,000 texts found in Early English Books Online (specifically EEBO-TCP). To do this, we first define semantics and concepts in relation to a long history of linguistic theory, from Paul’s (1897) philological semantics to Evans’s (2009) cognitive approach, and propose a new notion of discursive concepts, which include a wide range of discursive meanings – from traditional semantic relations to real-world social and cultural relationships. Based on Fano’s (1960) original descriptions of mutual information, we identify co-occurring lexical trios and the sections of texts in which they occur. We analyse these sections of text manually using tools from semantics, pragmatics, discourse analysis, and sociolinguistics. In this paper, I present examples of examples of such trios with the goal of interrogating meaning in Early Modern texts, and a particular focus on the elements of semantic, pragmatic, discursive, and social meanings that can be conveyed by such trios. I also present plans for the project moving forward, including the challenging task of working with such larger data, and prospects for searching that very large data in a bottom-up way.