1. Introduction

If we reflect on issues and emerging trends in digital humanities resources, and on their impact on humanities research, the field of classics can be regarded as being at the forefront of digital humanities, and among the disciplines that most benefited from the introduction of new methods. Within classics, papyrology (the study of Greek and Latin books and documents, dating from the fourth century BC to the eighth century AD, unearthed in archaeological excavations rather than handed down to us through the medieval transmission) has always been particularly concerned with the adoption of digital technologies: the constantly fragmentary state of its texts implies the use of specific lexica to help papyrologists read, restore and identify extremely damaged documents – tools that are best suited to the digital format.1

Figure 1: One of the most important papyrological findings, the Constitution of the Athenians by Aristotle (British Library, Pap. 131)

The first comprehensive examination of digital papyrological resources has recently been carried out by Reggiani in his 2017 monograph ‘Digital Papyrology’, which greatly helped bridge a gap in the literature on this topic. Nonetheless, much remains to be done towards a complete picture of digital papyrology, especially as to how issues in digital papyrology are grounded in the broader context of digital humanities, and what the evaluation of these resources is from the user’s point of view. This paper will offer a brief reflection on the former aspect, focusing on resources of a flourishing recent trend in digital papyrology and digital humanities: those based on an open participation model, that is, on crowd- and community-sourcing. Reference will be made to the typology elaborated by Dunn and Hedges in their 2012 ‘Crowdsourcing Scoping Study’, further developed in their 2018 monograph, ‘Academic Crowdsourcing in the Humanities’. This typology aims to identify some fundamental facets common to a wide variety of projects: the asset, or primary source transformed by the crowd over the course of the project; the process, or method, employed by the crowd to transform the asset; the task, the nature of the crowd's activity; and the output, or result of the transformation (Dunn-Hedges 2018, 27-49; Dunn-Hedges 2012, 20-40).

2. Crowdsourcing ancient fragmentary texts

Ancient Lives, which was relaunched in 2018 after running from 2011 to 2014, is aproject by the University of Oxford in collaboration with the Zooniverse platform for academic crowdsourcing. It is the only papyrological project designed to involve non-experts, and is therefore classifiable as a crowdsourcing project proper, while others more specifically address the community of classicists. It allows any user, even one with no knowledge of papyrology or Greek, to transcribe unpublished papyri from the Oxyrhynchus collection in the Sackler Library in Oxford. In order to involve any interested individual in the transcription of papyrus fragments, its interface includes a virtual keyboard, positioned next to the image of the chosen papyrus; this facilitates the recognition of patterns in handwriting and their matching with the Greek letters, even without knowing the Greek alphabet, solely on the basis of letter forms.

Figure 2: Ancient Lives, transcription interface with virtual keyboard

It is worth examining the transcription process involved in Ancient Lives in light of Dunn and Hedges' typology. Transcription can be either a mechanical or an editorial task type, depending on the asset involved.2 At its simplest level, transcription is a mechanical task when no or little expertise is necessary; on the other hand, when transcription demands more initiative, experience and interaction with the project team, it is classified as an editorial task. The transcription of Greek papyri, which by nature requires a certain degree of skill and experience, and would normally be classifiable as an editorial type of task, is in Ancient Lives facilitated by the aid of a bespoke interface. The transcription process was thus transformed into a mechanical task which can be undertaken by anyone who is capable of simply recognising letter shapes. The involvement of pattern recognition in the task allows for classifying of the transcription process of Ancient Liveswithin one more task type, configurational, though at a simple level, that is, the identification of simple geometrical shapes. Hence, Ancient Lives is an interesting case in which two different task types, mechanical and configurational, are co-present in the same process. The presence of the two task types does not complicate the transcription process; indeed, it is what allows for its simplification. This method (transforming transcription into a configurational task) could be adopted by other projects that involve transcription of languages in non-Latin alphabets, to make the resource accessible to more participants.

So far, Ancient Lives seems to be a unique case of engagement of non-expert users in a task as complex as the transcription of texts in a language and alphabet they may not know. The only project that apparently offers a parallel is the Scribes of the Cairo Geniza, a project by the University of Pennsylvania, also hosted on the Zooniverse platform. This resource invites volunteers to work on fragmentary texts written in Hebrew or Arabic found in the Cairo Geniza (a repository for disused books with sacred texts of the Jewish tradition) and mostly dating from the tenth to the thirteenth centuries A.D. Like in Ancient Lives, users are not required to have a knowledge of the languages used in the manuscripts; however, the process type involved in this project is not transcription, but categorisation; that is, a type that does not involve direct work on the text. Participants are required to indicate whether the script of a fragment is in Hebrew or Arabic alphabet, whether its appearance is formal or informal, and to identify marks, text in the margins and other visual features of the manuscripts.

Figure 3: A task within the Scribes of the Cairo Geniza project: identifying the script (Hebrew or Arabic) of a fragment

Future developments in this project, however, envisage the launch of a transcription interface to work on the sorted fragments, as announced by a banner on the project website. It will be interesting to see which type of user this implementation will address: those familiar with the languages involved in the project, or a wider public.

The work carried out by Ancient Lives users has been envisaged for a corpus of cuneiform texts, planned at the universities of Southampton and Oxford.3 Ancient Lives constitutes a model for this planned corpus in that those volunteers who do not know cuneiform may nonetheless join the platform and work on the identification of signs or sign clusters, even without knowing the meaning of the words. Unlike Ancient Lives, the cuneiform project envisages a series of tasks scaled according to users' competencies. Therefore, more tasks will also be available for users with a knowledge of the script and languages attested: enriching texts by linking words to dictionaries, translations and publications.4 Although this project is yet to be realised, the possibility of carrying out different tasks according to participants' skills is an interesting idea that could be taken into consideration in Ancient Lives. In Ancient Lives, other activities for more expert users could be carrying out an interpretive transcription, proposing the identification of a text, and acting as a moderator in the dedicated forum to guide less expert contributors.

The importance of Ancient Lives lies in taking forward the usability of crowdsourcing projects. The idea of simplifying the transcription by allowing users to click on a virtual keyboard rather than typing has in fact been applied to another Zooniverse project, Shakespeare's World. Through an implementation inspired by the Ancient Lives interface, this project enables users to capture frequent features of handwriting, such as abbreviations, by simply clicking on buttons, rather than actually transcribing them.

Figure 4: Shakespeare's World’s interface, which allows marking up abbreviations, deletions and other scribal editorial interventions

Ancient Lives has demonstrated that a task which is normally the purview of specialists can become accessible by providing a suitable, high-quality interface.5

3. Collecting numerical and statistical information in large humanities collections

More insights for digital humanities projects can come from the analysis of another facet identified in the reference typology: the "asset" type, that is, the source material enhanced during the project, such as text in transcription-based initiatives.6

All papyrological open collaborative projects present "text" as asset type, since the transcription process is at the core of the work of the papyrologist. Text is the predominant asset type in humanities crowdsourcing in general as well, a characteristic that mirrors the key role of written sources in humanities scholarship.7 Within the "text" asset type, we find a difference between Ancient Lives and other digital humanities projects, as already pointed out: the primary sources for transcription are written in a historical language, and in an alphabet and language that may be unfamiliar to participants.

In addition to the text asset type, we can notice that the first version of Ancient Lives asked volunteers, besides transcribing the text of the papyri, to measure their margins by means of a digital ruler, in order to gather data that could provide a basis for statistical analyses on the format of bookrolls,8 an activity unfortunately not present in the current platform. This feature can fall within the "numerical or statistical information" asset type.9 This is a category that is seldom attested in humanities crowdsourcing; as far as I am aware, the only other example is the collection of meteorological data from historical ship logs in Old Weather.

Figure 5: Old Weather, 2010-present, Met Office & National Maritime Museum

In fact, this is a project that concerns both science and humanities, as it aims to gather scientific data about past weather and sea conditions and information about historical ships and routes. The presence in papyrological resources of this rare asset type can be explained with the importance of material aspects in this discipline; that is, with the close relationship between the text and the object that bears the writing, with its complex physical and editorial features.

However, the same asset type – numerical information – deals with a different object in the two projects: in Ancient Lives, the extraction of numerical data concerns the appearance of the text-bearing material, rather than the content. This feature could be introduced in more projects involving the transcription of manuscripts. Although transcription is one of the most common processes in humanities crowdsourcing,10 no project as yet involves gathering numerical data on the layout of manuscripts. This would be an activity for which no specialist knowledge is required. The data gathered in the aforementioned Cairo Geniza project (degree of formality of the scripts, presence of justified text, marks and other visual characteristics) concerns the appearance of documents, and is not numerical data; moreover, Cairo Geniza data is not meant for statistical analysis, but rather for preparing scholars' transcription work by providing them with an idea of what the content of the fragments could be. The same transpires in Shakespeare's World, which asks users, as well as transcribing the text of manuscripts, to flag graphics such as illustrations, maps and symbols, but not numerical information. 11 To my knowledge, Ancient Lives represents a unique case of a project that indicates the possibility to collect data thorough crowdsourcing for statistical purposes to be analysed through data mining techniques, whether about the format of bookrolls or about the text (for example, about scribal practices and mistakes, features of the language of the papyri, and features of bilingualism).

4. Conclusion

In conclusion, analysing crowdsourcing from the point of view of fundamental common facets highlights methodologies for collaborative processing or creating knowledge which can be replicated in other projects for humanities scholarship. I hope the lessons and good practices identifiable in crowdsourcing for papyrology will contribute to the development of open, collaborative humanities resources. I believe that framing more projects into this and other classifications will help to bring to light a richer variety of forms of crowdsourcing, and to show that this is not merely a cheap way of digitising content, but a research method for producing valid academic knowledge.

5. References

Brusuelas, J. 2016, ‘Engaging Greek: Ancient Lives’, in G. Bodard & M. Romanello (eds), Digital Classics Outside the Echo-Chamber, Ubiquity Press, London, pp. 187-204, viewed 20 February 2019, <https://doi.org/10.5334/bat.k>.

Crane, G. 2004, ‘Classics and the Computer: An End of the History’, in S. Schreibman, R. Siemens, & J. Unsworth (eds), A Companion to Digital Humanities, Blackwell, Oxford (reprint: Wiley-Blackwell, Oxford, 2008), pp. 46-55.

Dunn, S., & M. Hedges, 2012, Crowd-Sourcing Scoping Study: Engaging the Crowd with Humanities Research. Arts and Humanities Research Council, Swindon, viewed 20 February 2019, <https://kclpure.kcl.ac.uk/portal/files/5786937/Crowdsourcing_connected_communities.pdf>.

Dunn, S., & M. Hedges 2018, Academic Crowdsourcing in the Humanities: Crowds, Communities and Co-production,Chandos, Cambridge, MA-Kindlington.

‘Identifying Graphics’ n.d., Shakespeare’s World, viewed 20 February 2019, <https://www.shakespearesworld.org/#/guide/graphics>.

Nurmikko, T., Dahl, J., Gibbins, N., & Earl, G. 2012. ‘Citizen Science for Cuneiform Studies’, in Proceedings of the 4th annual ACM Web Science Conference, ACM, New York, viewed 20 February 2019, <https://eprints.soton.ac.uk/341015/1/NURMIKKO%252CTERHI%252C_ACM_Web_Science_2012_Extended_Abstract.pdf>.

Nurmikko, T., Dahl, J., Martinez, K., & Earl, G. 2013. ‘Web Science for Ancient History: Deciphering Proto-Elamite Online’, in Proceedings of the 5th annual ACM Web Science Conference, ACM, New York, viewed 20 February 2019, <https://www.academia.edu/3524800/Web_Science_for_Ancient_History_Deciphering_Proto-Elamite_Online>.

Reggiani, N. 2017. Digital Papyrology, vol. 1, Methods, Tools and Trends. Berlin-Boston, De Gruyter, viewed 20 February 2019, <https://doi.org/10.1515/9783110547474>.

Reggiani, N. 2018, ‘The Corpus of the Greek Medical Papyri and a New Concept of Digital Critical Edition’, in Reggiani, N .(ed) 2018, Digital Papyrology, vol. 2, Case studies on the digital edition of ancient Greek papyri,De Gruyter, Berlin-Boston, pp. 3-62, viewed 20 February 2019, <https://doi.org/10.1515/9783110547450-002>.

Terras, M. 2010, ‘The digital classicist: Disciplinary focus and interdisciplinary vision’, in G. Bodard & S. Mahony (eds), Digital Research in the Study of Classical Antiquity, Routledge, London-New York, pp. 171-189, viewed 20 February 2019, <http://www.ucl.ac.uk/infostudies/melissa-terras/research/Chapter_10_Terras.pdf>.

Terras, M. 2016, ‘Crowdsourcing in the Digital Humanities’, in S. Schreibman, R. Siemens & J. Unsworth (eds), A New Companion to Digital Humanities, Wiley, Chichester, pp. 420-436, viewed 20 February 2019, <http://www.arise.mae.usp.br/wp-content/uploads/2018/03/A-New-Companion-to-Digital-Humanities.pdf>.

Van Hyning, V. 2016, ‘How to use crowdsourcing to enrich your collections and grow relations with the public’, CILIP, The library and information association blog, blog post, 25 November, viewed 20 February 2019, <https://archive.cilip.org.uk/blog/how-use-crowdsourcing-enrich-your-collections-grow-relations-public>.