The Problem of Citation in the Digital Humanities

by Jonathan Blaney

1. Introduction

In 1989 the Queen’s Christmas message to the nation was recorded at the Albert Hall, with a live audience of children from the Commonwealth; the message is now available on YouTube.1 One of the themes was environmentalism and the Queen urged her audience to look after the environment for the sake of all humanity:

if we also learn to live by the golden rule, which Jesus Christ taught us: Love thy neighbour as thyself

The Queen is citing the injunction, love your neighbour as yourself, and attributing it to Jesus. She gives no source but relies upon the tacit knowledge that she is referring to the gospels, and the linguistic knowledge that, because she said thy neighbour, thyself, probably to the King James translation.

You can indeed find this passage in the gospels, on multiple occasions (at Matthew 19.19, Mathew 23.39, Mark 12.31-33 and Luke 10.27), but always in the context of discussions of Jewish law and commandments, because the real source of the quotation is in the Hebrew Bible, at Leviticus 19.18. In fact Jesus is merely quoting: he is citing scripture himself.

Is this pedantry or does it matter? Well, there is a long-established narrative in Western culture that describes Judaism as legalistic, ossified and rightly superseded by new ideas like, say, love your neighbour as yourself. I came by the Queen’s 1989 address from a lecture by the New Testament scholar EP Sanders, called The Question of Uniqueness in the Teaching of Jesus,2 where Sanders gives the Queen’s citation as an example of the narrative described above (in fairness, Sanders’ target is not so much the Queen as other biblical scholars who, he thinks, should know better).

The question arises: how can this chain of connections be cited in a published work? As well as the YouTube film already mentioned, there is also a transcription of the address on the official website of the Royal Family,3 but no video or audio. Is the YouTube film better evidence than the website’s more official transcription? I have been unable to find any transcript of the message published in print.

The Sanders lecture was a one-off occasion and was published as a little booklet by the University of London in 1990. I first read it in the Bodleian and if a reader in Sheffield wanted a print copy the nearest convenient one is probably still the Bodleian copy, 110 miles away, because, according to COPAC, there are only five copies in university libraries in England. Fortunately the booklet is now freely available as a pdf4 and anyone with an internet connection can read it without travelling.5

But there remains a strong presumption in parts of the humanities to prefer print wherever possible. Some teachers, and many journals, will ask for a print version of Sanders’ lecture to be cited. Some journals will simply change a digital citation to a print citation, and the reader in Plymouth (nearest copy London, 190 miles) might not know that there is a version literally at their fingertips.

This is a practice in the humanities that creates a problem in the digital humanities. It is a problem for the humanities in general, because it cannot be a good thing for research practice to be distorted, for researchers to report that their methodology was one thing when in fact it was another, but this paper will focus on the problem it is creating in the digital humanities.

2. TIDSR

A couple of years ago British History Online,6 the Institute of Historical Research’s digital library, was awarded funding from JISC to carry out an analysis of the use of the site and then to make some improvements, based on the findings. We followed the very useful Toolkit for the Impact of Digitised Scholarly Resources, developed by the Oxford Internet Institute, which recommends methods of analysing the uses of a digital resource, to carry out the study.7 An important additional benefit of the funding award was that it allowed members of the team to take something of a reflective pause, away from the usual process of trying to add to the site or improve the user interface. This may not be possible for everyone who maintains a digital resource, but any opportunity to do so should be welcomed.

Part of the toolkit involves qualitative methods such as focus groups and user interviews. Here we were able to build on longitudinal work that we have been doing at the Institute for the last 10 years, asking historians, researchers and students about their attitude to digital resources in research, writing, and teaching. Of the quantitive measures, one element in the toolkit is bibliometrics. We used Scopus and Google Scholar to search for journal citation of British History Online. The aggregate results for 2008-9 were:

Scopus 37

Google Scholar 43

Common to both 9

Page views in the period 31,367,021

This immediately raises the question of digital citation and potential problems with it. British History Online is a well-used and well-established resource: it gets about 15,000,000 page impressions a year. It has been active for 10 years. Run by the Institute of Historical Research and the History of Parliament Trust, the site can surely have no question mark over its academic reliability. The “contact us” page receives messages from historians all over the world, expressing gratitude for making their research easier, for example from American scholars who often cannot get the print versions of the books that have now been made available to them. So is British History Online being under-cited? One clue is in feedback questions, received via “contact us”, like this one:

Is there anyway of obtaining accurate page numbers for parts of the sources? Although the whole site is fantastic for research, it makes citation very difficult.

In face we have made citation very easy. We have a citation box at the top of every page, with drop-down menu offering different citation formats, such as MLA and Chicago.8 What our correspondents mean, of course, is although the whole site is fantastic for research, it makes citation of print copies which I haven’t used very difficult. What they want us to provide is the means for them to ignore our work and deliberately not acknowledge it any way. The fact that this does not occur to many people to be a strange request sheds interesting light on the culture of humanities citation at the moment.

The basic proposal of this paper is: cite the resource you used, not the ones you did not use. This is not revolutionary. For example, The Chicago Manual of Style says:9

If a book is available in more than one format, cite the version you consulted. For books consulted online, list a URL…

3. Weak Objections

In the interviews mentioned above, and elsewhere, a number of academics were asked about their views on digital citation (some of them strongly disagree with the position taken here) and two of the objections which have arisen frequently are:

  1. Citations are too long.
  2. What’s the point? Isn’t this just like citing which library I got a book from?

It is true that some citations are very long and I will make some suggestions for mitigating that problem at the end of the paper. Nevertheless, it is useful with any of these arguments to imagine how they would sound if digital citation was not the focus of the discussion. Does any scholar ever say that they would not cite a book because its title was too long, so they just cite a different book instead?

Perhaps argument 2 is essentially based on a naïve view of the work that goes into creating digital versions. On British History Online every text has been retyped by somebody; I think by any standards that makes it a new edition. On Google Books everything has been, not necessarily very accurately, captured via optical character recognition; that makes it a new edition. Again we could ask our non-digital-context question: could anyone in good conscience claim they had consulted a medieval manuscript when they had actually looked at a photograph of that manuscript in a book?

But in this context another question to ask is, how would academics feel if their students were doing this? Let me give just one example among many possible ones of what that might mean.

On British History Online we are adding Rymer’s Foedera, a collection made in the eighteenth century by Thomas Rymer, a poet who appears in Pope’s Dunciad, of transcriptions of treaties entered into by English monarchs since William I. We are transcribing the first edition (which is extremely rare, even in academic libraries) and putting it on BHO, alongside summaries that were made in the late nineteenth century of what is in each document, its date and its page number in various editions. See an example from a volume already published:10) In the right-hand column is the transcription of the first edition, and the left is from what is called the Syllabus to Rymer’s Foedera. In fact the numbers and dates given in the Syllabus are very often inaccurate. This is not being corrected in the transcription, but the correct summary is being put next to the correct document for the first time.

How would academics feel if their students cited this rare first edition of this Latin text, when they had in fact used a freely available online edition with an English summary – a summary that may not be correct? If they were consistent they should condone or even applaud the practice.

4. Strong Objections

There are two strong objections to digital citation that I would like to engage with now, and then, again, try to offer a couple of mitigating solutions at the end.

  1. Opacity.
  2. And in 100 years’ time?

Opacity came up in the interviews with historians that we carried out for the toolkit. One commented that he always made his postgraduate students change references to some web resources (not all) to, for example, National Archives document references. He said that when a website like Gale’s is being cited you cannot tell from the extremely long URL what the source is – it could be anything from Gale’s extensive collections.

A second objection is the 100 years’ time test: references to books will still be valid and usable in 100 years’ time but URLs might not be: they will break; they will become old technology, and so on.

As it happens, British History Online digitises lots of books that were published about a hundred years ago. They have citations like this:

PRO, E135/2/57, ff. 47v-8

PRO stands for Public Record Office, which no longer exists, and classmarks like E135 may also be obsolete. PRO records are now part of the National Archives collections and the classmarks in some cases have been changed to what they call ‘modern document references’. So this citation could be, in a sense, a broken link.

However, you can go to the National Archives and consult large sheets of paper which give a mapping from the old PRO reference to the TNA’s modern document reference.

In other words, this is what libraries and archives do. In 2013 legislation will extend the remit of legal deposit to cover the whole of the UK web domain, and the British Library is actively planning to extend its current web archiving project to take in far more data.

We can probably trust our major libraries to keep the digital equivalent of the big bits of paper. But by hiding away digital citations and digital research practice we are, just as with project impact metrics, diluting the case that these institutions can make about the importance of their web archiving work.

Further back in history, the Cotton Library used to catalogue its manuscripts by the busts of Roman Emperors that were on the shelf-ends. For example:

MS Vitellius A.XV, f.132

British History Online users regularly ask things like, “what’s this MS Nero reference?” The Cotton Library has not existed for 300 years or so, but these manuscript references are not broken links because, of course, the British Library still catalogues them under the Roman Emperors and you can call up MS Vitellius A.XV, f.132, (the only surviving manuscript of Beowulf). Links do not have to break if we put some effort into stopping them breaking.

5. The Golden Age of Print Citation?

Print citation itself should not be regarded as a scholarly promised land. Here is another example of feedback received on BHO in 2012:

I am trying to find the source of ‘Trans. Hist. Soc. (NewSer.), xiv, 241’

Nowhere in the book with this footnote does it say what this refers to. There are many historical societies that publish transactions. With some detective work we can surmise that this was probably a citation to Transactions of the Historic Society of Lancashire and Chester, Volume 50.

That is a somewhat dated example (the relevant volume of the Transactions was published over a century ago) so I asked my colleagues who edit the IHR’s journal Historical Research for their views on the quality of citation to print or manuscript by historians. In general they are not impressed. Here is an example from one author’s manuscript in which they refer to the same archival holding in three different formats:

2001.1048

2003/2426

2006.690.4.

Furthermore, as Anthony Grafton’s entertaining history of the footnote makes clear, print citation has always been more cultural than many like to think:

Even a brief exercise in comparison reveals a staggering range of divergent practices under history’s apparently stable surface”.11

Grafton goes on to say that actually a more important function of footnotes is to provide credentials:

Like the diplomas on the dentist’s wall, footnotes prove that historians are ‘good enough’ practitioners to be consulted and recommended”.12

I think this might be an unspoken reason why some historians resist digital citation. Footnotes can show that someone is a member of the guild: they have visited the archives, they have access to the rare book room, the ear of the special collections librarian. However digital resources do not require a hierarchy of readers’ tickets; they expand the guild; they do not even require the reader to know that the source exists before it is found.

This is exactly why I think that academics who want to assert the value of expertise and experience should embrace the principle cite what you used. If everyone did this we would know who spent months in the archive or the library, consulting the primary source, and whose methodology shows that they did not. Transparency should be welcome to scholarship.

6. Recommendations

6.1. Recommendations for Writers

Writers should talk about their methodology. This is simply good practice. It is qualitatively different to read a physical book than an image or a transcription online. It may appear to be a conservative position to change citations to print citations, but actually it is undervaluing the book and manuscript as physical objects, and reading as a physical activity.

When citing a web resource with absurdly long links there can be a need to keep those out of print publications: they are basically useless because no one is likely to be willing or able to type them out accurately. But writers can still say that they used the digital resource; they can look for ways of conveying the information succinctly. As is happening increasingly in publishing, writers might think in terms of the online version being different, perhaps with an online appendix – even on the author’s webspace, if the publisher will not provide for it.

Recall that some resources are not cited bibliographically by convention and this is universally accepted. This paper begins with a Bible quotation; nobody would ever give page numbers for one of these. Similarly with scholarly dictionaries like the Oxford English Dictionary and encyclopedic formats like the Oxford Dictionary of National Biography – both of which, incidentally, have some very long entries, (look up make in the OED or Shakespeare in the ODNB) but still page numbers and volume numbers are omitted.

If a researcher has benefited from a resource in work then they should acknowledge it. If they describe that resource and its benefits and its limitations, then others can use it more intelligently.

6.2. Recommendations for Publishers

Publishers should say what their policy is and why. People certainly disagree with the position taken in this paper, but at least this debate should be in the open, so that others realise it is a live issue and can make up their own minds.

I looked at the websites of 12 leading history journals: Past and Present; History Workshop Journal; Gender and History; Contemporary British History; English Historical Review; Historical Research; Economic History Review; History; Social History; Twentieth Century British History; Cold War History and Northern History.

I found that none of them had a statement on their web pages – at least where I could find it – about their citation policy on digital citation versus print citation. Some of them, in their author guidelines, give guidance on how to cite books, articles, manuscripts, but not digital resources. History Workshop Journal, for example, describes how to cite a tape recording but not a website.

In fairness, I should say that the IHR’s own journal, Historical Research, does not say what its policy is either. Although it does give guidelines on how to cite websites, as do some of the others on the above list.

If journals are going to change authors’ texts from digital citation to print citation, which some do as a matter of policy, then at least they should say what they are doing and explain their rationale. If we want to have a stab at real metrics for the usage of digital resources in academia then this is a piece of absolutely minimal information that we need to be able to collect before we can even start.

Of course publishers are thinking about this, and they have an economic motive for doing so that academics perhaps do not. Here is an example I came across recently:13

Pour citer cet article

Référence papier

Jean-Michel Steiner, «Constuire un bâtiment pour la Bourse du travail de Saint-Étienne: un enjeu politique et idéologique dans une grande ville ouvrière (1888-1907)», Cahiers d’histoire, Revue d’histoire critique, 116-117 | 2011, 87-100.

Référence electronique

Jean-Michel Steiner, «Constuire un bâtiment pour la Bourse du travail de Saint-Étienne: un enjeu politique et idéologique dans une grande ville ouvrière (1888-1907)», Cahiers d’histoire, Revue d’histoire critique [En ligne], 116-117 | 2011, mis en ligne le 01 juillet 2014, consulté le 27 août 2012. URL: http://chrhc.revues.org/2369

The citation makes it pretty clear that there is a moving wall with online publication, the full text being available online from July 2014. If you do cite the electronic version it has all of the information from the paper copy, and more; if the link breaks, if the website folds, you can still track down this article as easily as you can from the print citation; if these things do not happen then you can, from 2014, check the citation much more easily.

6.3. Recommendations for Resource Providers

Resource providers should use short ULRs that can be printed. They should advocate digital citation. They should argue with people who tell them how great their resource is for their research but who maintain they will never cite it.

On BHO we took out the inline page numbers, because we consider this an artefact of the original book. It certainly stimulates debate with people who write to us and ask why. We do not always convince them, but at least the discussion is being held.

Resource providers should try to give transparent URLs, where the bibliographic information is humanly understandable. They should, where possible, draw on the bibliographic conventions of their subject area. This means that if, in the worst-case scenario, a project folds, the university takes down the website, nobody archives it and the internet breaks, the link is still informative enough for the post-apocalyptic researcher to find the place cited. For example (I should emphasise that this is not what is currently done on our website, but is something we would like to move over to):

www.british-history.ac.uk/VCH/Oxon15/Clanfield-religious

This is no harder to understand than a traditional citation that might look like this:

VCH Oxon, volume 15, p.139

In fact, of course, the traditional citation is less transparent – we cannot know which parish or topic is being discussed – and more error prone: if I make a mistake and type (assuming I am not pasting from the url bar anyway):

www.british-history.ac.uk/VCH/Oxon15/Clanfeild-religious

Then the diligent reader can probably correct my error quite easily, but a one-character error in the traditional citation above might give:

VCH Oxon, volume 15, p.139

VCH Oxon, volume 5, p.139

VCH Oxon, volume 15, p.319

Everyone has seen these kinds of mistakes in print citations, but unless the reader is extremely eager to find what is being cited they probably abandon the search when the location specified does not contain the required information.

To return to the Queen’s 1989 Christmas message, I mentioned that, as well as being on YouTube, the transcription of the message is available on the Royal Family’s website. Here is the full URL:

http://www.royal.gov.uk/ImagesandBroadcasts/TheQueensChristmasBroadcasts/ChristmasBroadcasts/ChristmasBroadcast1989.aspx

Even staunch republicans will have to admire the Queen’s URL structure. If the monarchy’s website disappeared, anyone reading this broken link would know exactly what is being referred to: in fact they would be fully equipped for finding the information elsewhere, probably online.

  1. http://www.youtube.com/watch?v=cWhh9Xubwe4 [4’25”-4’34”]
  2. EP Sanders, The Question of Uniqueness in the Teaching of Jesus, University of London, 1990
  3. http://www.royal.gov.uk
  4. http://www.biblicalstudies.org
  5. http://www.biblicalstudies.org.uk/pdf/uniqueness_sanders.pdf
  6. http://www.british-history.ac.uk
  7. http://digitisation.jiscinvolve.org/wp/2009/05/18/you-can-now-measure-the-impact-of-your-online-resource
  8. http://british-history.ac.uk/report.aspx?compid=22172
  9. http://www.chicagomanualofstyle.org/tools_citationguide.html
  10. http://www.british-history.ac.uk/report.aspx?compid=115132
  11. Anthony Grafton, The Footnote: A Curious History, Faber and Faber, 1997, 7.
  12. ibid 22
  13. http://chrhc.revues.org/2366