Developing the Oxford English Dictionary as a toolkit for DH research

We present some approaches to developing the OED as a resource to support the large-scale study of historical documents in English. The OED has had a fitful relationship with Digital Humanities research: although many recognize that in principle the OED contains information useful to the analysis of historical text, in practice this information has been difficult to parse and extract as complete and consistent data.

Nevertheless, in recent years OED has a worked with a number of DH projects to build bespoke data sets to meet particular needs. By reviewing these projects, we can develop a more general model of OED services for DH. These include variant-spelling data for normalizing and lemmatizing historical and non-standard text, sense inventories for disambiguation, and query-term expansion for concept-based corpus search. We look at the practicalities of turning dictionary content into simple API functions that can plug into the wider infrastructure of textual research.

We also identify the limitations of this approach: areas where historical lexicography, as traditionally conceived, is not well- aligned with DH needs, and where we therefore need to build out data in new directions. Partnerships within the DH community are helping us to tackle these challenges. As just one resource among many for DH scholars, the OED can make itself useful as a suite of services focussed on ease of use, problem-solving, reliability, and interoperability.