Crowdsourcing Full-text Manuscript Transcription: Beyond MediaWiki

This paper will trace the development of text-based humanities projects built and hosted by the academic crowdsourcing organization, Zooniverse.org. Zooniverse began with ‘citizen science’ projects in astrophysics in 2007 and has since developed thirty projects in the sciences and the humanities. The first Zooniverse humanities project, ‘Ancient Lives’ (http://ancientlives.org/), was launched in July 2011. It is a character-by-character transcription project that has recorded over 1.5 million transcriptions of ancient Greek papyri, the work of over 250,000 unique online volunteers. In January 2014 Zooniverse, in partnership with the Imperial War Museum and National Archives (Kew), launched ‘Operation War Diary’ (http://www.operationwardiary.org/) a partial text transcription and tagging project devoted to uncovering the detail of what life was like on the Front in WWI. To date (August 2014) nearly 10,000 unique OWD volunteers have contributed 32 months’ worth of FTE days of work to the project, amounting to nearly 500,000 classifications (tags and transcriptions) that enable new understandings of battles, the spread of illness, and soldiers’ daily lives during the war. Our OWD development team has now also developed a ‘data digger’ which aggregates the responses of N crowd users (7 for OWD) and reveals variation and agreement in user tagging and transcription.

Zooniverse will soon be embarking on full text transcription projects with leading institutions in the USA and UK, including Tate Britain. The Tate project will enable full text transcription and rich indexing of artists’ archival materials, data that will be integrated with the museum’s art catalogues.

This talk will explore the possibilities and potential pitfalls of full-text transcription and present the early stages of our development work. Creating platforms driven by granular tasks is key to the Zooniverse approach, and marks a significant departure from the MediaWiki platform.