Creating Processing Models for Scholarly Digital Editions

The Guidelines of the Text Encoding Initiative (TEI) are the de facto standard for the creation of high quality scholarly digital editions. One of the barriers for TEI use is the generalised nature and many kinds of textual phenomena that it enables users to document. At DHC 2012 James Cummings discussed how the customisation of the TEI for individual projects gives them a way to overcome this by documenting their project’s specific needs. This paper builds on the DHC 2012 paper to tackle a different problem: the difficulties for developers implementing processing workflows for this extremely generalistic scheme. Developers can, to be sure, look at the project’s schema and decide how to process individual elements for whatever outputs are needed, but there is no method of documenting a processing model in TEI ODD customisations. This lack of a model is because the intended uses are so wide and varied -- there can be many possible outputs from any one TEI document. You might generate a PDF, DOCX, RDF, EPUB, or HTML website from either a single document or a collection of them, but each of those might produce different many views on the document and/or extracted lists of linked metadata. It is important to ensure that these different processing models are documented consistently through planned revisions to the TEI ODD customisation language so that One Document to really Do-it-all. As part of the Marie Curie ITN ‘DiXiT’ project on scholarly digital editions the University of Oxford is investigating processing models for such editions. This is based on the TEI Simple initiative which the TEI Consortium is currently developing, and for which Oxford will lead on the definition of processing information. This paper will discuss how to document and generate processing workflows based on a project’s TEI customisation.