Friday 14:00 - 15:30
Chair: Katherine Rogers
Keywords: literary studies, computational linguistics, OCR correction
This poster (size A0) presents our project, A Question of Style: individual voices and corporate identity in the Edinburgh Review, 1814-1820, which is funded by a Research Society for Victorian Periodicals Field Development Grant running until October 2017.
We want to assess the assumption that early nineteenth-century periodicals succeeded in creating, through a “transauthorial discourse”, a unified corporate voice that hid individual authors behind an impersonal public text (Klancher 1987).
We are creating a sample corpus of approximately 500,000 words comprising 325,000 words from the Edinburgh Review and 175,000 from its competitor, the Quarterly Review, for a total of about 80 articles. To assist our OCR correction, metadata creation and textual markup, we are developing a suite of Python scripts, based on our previous work with post-OCR correction (King 2013) and semi-automated TEI markup (Willis et al 2010).
We employ methods from periodical studies, book history, computational linguistics and computational stylistics to “operationalise” our definition of style in order to select features that can be measured empirically, transforming concepts into a set of operations (Moretti 2013). We will focus on features at the level of words and sentences such as: vocabulary richness, length of articles, length of sentences, length of quotations from text under review, distribution of parts of speech, distinctive vocabulary of each journal, distinctive vocabulary of each author, distinctive vocabulary in each type of review (literature, travel, politics etc.), using methods such as term frequency: inverse document frequency, Burrows’ Delta and Zeta methods, Moretti’s Most Distinctive Words Method, and Principal Component Analysis.
Finally, we will qualitatively describe the results of this stylistic analysis and evaluate them within the context of both literary scholarship on nineteenth-century periodicals and computational linguistics scholarship, using our literary and historical interpretation to generate critical knowledge out of our measurements. [297 words]
King, David. “Digging in the library.” Invited lecture presented at Biodiversity Informatics Horizons 2013, Rome. September 2013
Moretti, Franco. “Operationalizing”: or, the function of measurement in modern literary theory” Stanford Literary Lab. Pamphlet 6. Stanford Lit. Lab, December 2013.
Willis, Alistair, David King, David Morse, Anton Dil, Chris Lyal, and Dave Roberts. “From XML to XML: The why and how of making the biodiversity literature accessible to researchers.” Language Resources and Evaluation Conference (LREC), Valletta. May 2010.
Distorted Projections: Spatial Imaginaries and Desired Trajectories in Christina Stead’s For Love Alone
University of Edinburgh
“In the part of the world Teresa came from, winter is in July, spring brides marry in September, and Christmas is consummated with roast beef, suckling pig, and brandy-laced plum pudding at 100 degrees in the shade, near the tall pine-tree loaded with gifts and tinsel as in the old country, and old carols have rung out all through the night.”
From its opening lines, Christina Stead’s 1945 novel For Love Alone establishes a sense of being “out of place”, signalling to readers that the geography into which they are about to be immersed is distorted and unstable. As the narrative unfolds, the coming of age of the protagonist Teresa is marked by her longing to escape from parochial, provincial Sydney to the great metropolis of culture, London. But this is a trajectory whose fulfilment proves very different to its imagined anticipation, and it serves as a fictional rendition of the spatial and cultural displacement felt by many Australian writers of the twentieth century caught between the cultural authority of English publishers and literary standards, and the imperative to contribute to the project of building a national literature that could emerge out of the shadow of its European and English progenitors.
Since its creation in 2005 as an online search tool for a handful of classical Chinese texts, the Chinese Text Project (http://ctext.org) has gradually grown to become the largest and most widely used digital library of pre-modern Chinese texts, as well as a platform for exploring the application of new digital methods to the study of pre-modern Chinese literature. This paper discusses how several unique aspects of the project have contributed to its success. Firstly it demonstrates how simplifying assumptions holding for domain-specific OCR (Optical Character Recognition) of historical works have made possible reductions in complexity of the task and thus led to increased recognition accuracy. Secondly it shows how crowd-sourced proofreading and editing using a publicly accessible version-controlled wiki system has made it possible to leverage a large and distributed audience and user base, including many volunteers located outside of traditional academia, to improve the quality of digital content and enable the creation of accurate transcriptions of previously untranscribed texts and editions. Finally, it explores how the implementation of open APIs (Application Programming Interfaces) has greatly expanded the utility of the library as a whole, facilitating open and decentralized integration with other projects, as well as leading to entirely new applications in digital humanities teaching and research.