Before embarking on fieldwork, it was important to gain an understanding of current research methodologies and practices in the disciplines identified by the project. This would also provide a response to our first research question:

What is the role of search within humanities research methodology, how is it used across the different subject domains, what are its current deficiencies and what impact does it have on research results?

Improved access to the test dataset of Thomason’s collection of seventeenth-century newsbooks was deemed to have a direct impact for researchers and students of early modern English politics and culture and the history of the popular press, in the disciplines of History, English Literature, English Language and Linguistics, Politics and Journalism Studies. To begin, existing literature on historical approaches for each discipline and overarching humanities research practice was reviewed.

1. Historical Approaches by Discipline

1.1. History

The subject matter of history “consists of the attempts of human beings in the past to organise life materially and conceptually, individually and collectively”.1 Historians use a narrative to examine and analyse a sequence of past events and determine patterns of cause and effect objectively. Many historians, particularly economic and social historians, use social science concepts, theories and methodologies and, more recently, there has been interest in applying cultural theory and approaches derived from literary studies to historical research, as well as digital humanities methodologies.2 As such, some see it as a bridge between the humanities and social sciences.

Methodology matters because the way we study something shapes (or may even determine) the knowledge we derive of it: from original framing of the research hypothesis to designing a study, selecting sources and how they are analysed. It is not just technique but the way in which a whole problem is seen, including theories and concepts that are applied, what sources are used and how they are organised. Generally, a historical research methodology can be simplified into a set of steps with some overlap and back and forth movement between them: identifying a research topic and formulating a research problem or question, data collection or literature review, evaluating materials, data synthesis and report writing or narrative exposition.

An experimental study of historical research practice by Tourou et al. (2009)3 revealed that researchers approach the process of search differently in printed and electronic document collections. The majority preferred to search in print collections, rather than digital ones. This is perhaps because experience with digital collections has previously achieved poor results; irrelevant documents retrieved (low precision) and important relevant documents missed (low recall). Researchers used comparatively fewer keywords and fewer combinations of keywords for digital searches. They also often neglected to use synonyms and related concepts, limiting themselves to specific keywords. This is perhaps because they expected the system to be advanced enough to retrieve documents to match keyword combinations, synonyms and other terms linked to the ones given. In addition, even if advanced search was an option, they would still confine themselves to simple search. They also often missed important documents due to incorrect or incomplete metadata. A lack of trust in digital historical material is suggested.

The authors suggest steps of a mental technique, approaching a methodology for historical research, which could potentially be used to shape user requirements in the development of tools for supporting historical research. These steps include: identifying and isolating keywords in the research topic (people, places, dates etc.); focusing on each keyword looking for primary and secondary sources; separating compound terms to draw out new related concepts; searching different combinations of keywords; searching synonyms or derivatives of the keywords; enriching initial terms with related ones incrementally in order of relevance; mentally organising terms in a hierarchical taxonomy; and making connections between related terms.

This could potentially be used to shape user requirements in the development of tools for supporting historical research, in relation to, among others: structuring metadata adequately to give the researcher more control over the search process and retrieve more relevant results; providing a simple search function which then offers a menu to disambiguate intentions, rather than requiring users to specify metadata fields to match keywords themselves; and automatically formulating query terms into all possible combinations. These observations could likely also be considered useful from the perspective of other research disciplines, particularly when undertaking a historical approach.

1.2. English Literature

The study of English literature is concerned with the production, reception and interpretation of written texts, both literary and non-literary, and involves engaging in dialogue with past and present cultures and values. The subject encourages inter-, multidisciplinary and mixed methods approaches.4 For the purposes of this project though, and considering the test dataset, we will address its research practices and methodologies from the perspective of historicist approaches to literature. These approaches see a work of literature as both a reflection and a product of the times and circumstances in which it was written, working on the basic premise that the history of a nation has significant effects on its literature and that a piece can be better understood and appreciated by understanding the social, cultural and intellectual context surrounding its creation. As a result, approaches may be deemed very similar to those of history itself.

At the same time, no ‘history’ can be truly objective or comprehensive because it is constantly written and rewritten. However, studying literature within the context of both the history of the author and the history of the critic can illuminate the biases present in a critic’s response to a work, enabling a better understanding of the culture, context and themselves. Newer varieties of such historicism relate the idea of literature as a cultural text to other key concepts of text, reader and history. Historically-informed, ideologically sensitive literary critics are keen to investigate the dynamics of power in literary texts. Studying newsbooks, traditionally fenced off as a historical primary source, as historicist literary critics enables us to attend to their role in the development of seventeenth-century prose style (see Joad Raymond, The Invention of the Newspaper (1996) and Lennard Davis, Factual Fictions (1997) on the connections between journalistic reportage in the 17th Century and the rise of the novel in the 18th Century).

Researchers might approach digital historical resources with some particular research aims such as: how to do quantitative analysis of style when there is no standard system of spelling/orthography? How do different editorial styles develop over time (within a single newsbook and across multiple newsbooks)? How did the business of seriality impact on journalistic prose style? And how did the language of documentary reportage differ from editorial prose style(s)? Aspects of functionality that might support these types of questions could be ways of automatically searching for variant spellings and the ability to compare and link sources.

In terms of research methods and methodologies, English Literature is a discipline with prevailing notions of “radical individualism in research rather than research collaboration”.5 It has historically placed a great deal of emphasis on learning by doing rather than providing and discussing research training. Until surprisingly recently, PhD students were not required to include a methodology section in their theses, something which is commonplace in most other disciplines. However, research council reviews have led to the provision of research methods training and a greater awareness these issues.6

1.3. English Language and Linguistics

As the term indicates, ‘English language and linguistics’ actually refers to two separate ways of studying language. English language scholars study current and past English varieties from across the globe and can be said to be descriptive linguists. Linguistics, on the other hand, in the traditional sense of the word is a theoretical discipline concerned with the scientific study of language. The research practices of theoretical linguists differ greatly from those of descriptive linguists because their aim is to model the structures of sounds, words, sentences and meaning (phonology, morphology, syntax and semantics) on an abstract level, rather than describe languages as they are spoken in real life. With that in mind, and because the purpose of this section is to gain an understanding of the methodologies of scholars likely to be impacted by the improved access to Thomason’s collection; the following will focus on the research practices of linguists who rely on empirical data in their work.

A review of the various subfields of linguistics, including sociolinguistics, pragmatics, discourse analysis and historical linguistics, reveals that even within those more specialised areas, research practices vary significantly. If we turn our attention to those studying the language of the past and who are most likely to benefit from digital historical resources like Thomason’s newsbooks; their research aims will either be to test existing hypotheses or use representative corpora to generate new insights into language.7 If testing a hypothesis they will be searching for elements of words (e.g. suffixes indicating past tense of a verb), whole words and their associated spelling variants, or phrases of words. If trying to generate new insights, they will be more inclined to browse to gain an overview of new phenomena and importantly, the context in which they occur. This means that a digital resource suitable for linguistic research must be able to accommodate these approaches. Traditionally, linguists have relied on corpora of spoken or written language, compiled especially for linguistic research. Whilst simply meaning ‘body’, the term ‘corpus’ in corpus linguistics implies certain notions of sampling and representativeness, finite sizes, machine-readable forms and standard references of the language variety they represent.8The texts are also marked-up by structure, part-of-speech, grammar and/or semantics to make them more suitable for linguistic research9 and users can search for sequences, frequencies and concordances – functions not seen in other search interfaces within other disciplines.

Based on the above, it is obvious that linguists’ research practices vary depending on their field of interest, their research question and the collection they are sampling. Those studying historical language change often turn to non-linguistic corpora like Thomason’s newsbooks because examples of language use in the past are few and far between; so they have to take what they can find so to speak. According to Allan and Robinson (2012), for example, EEBO and ECCO are excellent sources of primary data for research into semantic change,10 while Fitzmaurice and Taavitsainen (2008) acknowledge that using historical texts will naturally have its difficulties and that it is “natural to encounter obscurity, vagueness, and ambiguity of language use [because we have] no direct access to the speakers and original contexts of production”.11

Compiling linguistic corpora is expensive and labour intensive, so linguists are increasingly turning to resources outside their discipline. Often these will be compilations of historical texts; as a result linguists are adapting their methods to suit those collections. It is doubtful that a uniform research practice will emerge in the future, simply because the research questions and required data are so wide-ranging. For now, it appears that most linguists are quite happy to seek out new resources, even if they lack some of the features useful to them, and instead adapt their methods to suit the resource. However, this does not mean that developers of resources should not try to at least include some functionality helpful to linguists, such as accommodating variant spellings, enabling wildcard searching and displaying search results in context. These are all features that suit a multiplicity of research practices in linguistics and language study.

1.4. Politics and Political History

Politics as a discipline aims to develop a knowledge and understanding of government and society. The interaction of people, ideas and institutions provides a focus for understanding how values are distributed and resources allocated at many levels, from local to sectoral, national, regional and global. Such analyses relate in turn to questions of power, justice, order, conflict, legitimacy, accountability, obligation, sovereignty, governance and decision-making. As a research area, it encompasses philosophical, theoretical, institutional and issue-based concerns relating to governance.12

The study of politics involves describing political phenomena, explained using general theories, patterns or generalisations. Normative political theory or political philosophy relates to the normative study of the political values of society and the international order, examined historically and analytically. Positive political theory or explanatory political theory relates to the study of the general mechanisms and forces steering behavioural interactions of individuals and institutions at domestic, regional and global levels to assign values and resources. Political science or political analysis takes these theoretical perspectives to inform and evaluate historical events, political behaviour, the mechanisms of political institutions and actors, political processes and the policy outputs of governance and regulatory structures.13

Researchers in the field of politics use a range of strategies and methods to suit a variety of purposes, including textual analysis, historical research, use of contemporary media, discourse analysis, structured, semi-structured or unstructured interviews, focus groups, surveys, statistical and deductive modelling, and computer simulation. Politics, like many other fields, draws upon the knowledge bases of related disciplines. Its study is implicitly comparative yet characterised by explicitly comparative investigations across time and space. Explaining what caused an event or how an institution works involves questioning what might have happened under different circumstances. Research methods and methodologies in politics include using information retrieval techniques, quantitative and qualitative methods, research design, and information technology.14

In terms of the test dataset, however, we are perhaps talking specifically about political history; in which case please refer back to the ‘History’ section above for methods and methodologies. Political history studies the organisation and operation of power in society through the narrative and analysis of political events, ideas, movements, organs of government, voters, parties and leaders. It also often involves the deconstruction of myths and received wisdom. An important aspect of political history is the study of ideological differences and their implications for historical change. Studies of political history typically centre around a single nation and its political change and development. Political historians could find a lot of valuable information that is infinitely more accessible to them in a digitised and rekeyed newsbook resource, particularly for this period of great political unrest. The ability to search for specific individuals and actors in the political arena and significant dates and events, could be used to gather a knowledge base from which to ask new questions about the portrayal and representation of such figures in early forms of Journalism and the influences exerted over and by newsbook editors and publishers in the political arena.

1.5. Journalism Studies

As a discrete academic area, journalism studies is a relatively young field compared to History or English, but it has quickly become an academic field with its own distinct identity and research foci. Located between the humanities and social sciences, Journalism Studies provides an intellectual framework to the study of journalism and focuses, broadly, on the profession, practices, products, political economy, and history of journalism. It is inherently multidisciplinary and as a discrete field, traces its history to cultural studies, political studies, applied linguistics, sociological studies, and more. It has developed in part by drawing on research approaches from such related areas. While it can be difficult to pin down, due to its many strands and interests, the validity of these different approaches is well-suited for journalism studies, “given that the very nature of journalism means that it is doing a complex variety of things, sometimes simultaneously and often changing in particular geographical and historical contexts”.15 Within Journalism Studies, research in the field of journalism history has its own manifestos and motivations and, once again, it is these we will focus on here since they are most pertinent to this project and the test dataset. In research terms, journalism history was for too long simply naïve history, “restricted to narrow vehicles of political communication ignoring broader cultural engagement”.16 Journalism history has developed, however, to challenge the field of journalism studies to think more deeply, offering “an antidote to the ‘presentism’” of journalism17 and reinforcing academic discussions of the cultural, political, and linguistic development of news and journalism throughout history.

Journalism Studies researchers may be interested in any number of aspects of the newsbooks as they represent the birth of regular or serialised English journalism during the Civil War and are therefore the starting point for the evolution of journalism and its influence on political, economic and social life into modern times. A detailed analysis of the language and structure of newsbooks by Frank (1961), with its precise and culturally attuned agenda, led the way for historians to take an interest in journalism as something more than background for their work.18 The newsbooks in digital form may be studied in a similar manner, as they will help locate the early stages of journalism’s contribution to a long process of political unrest and enlightenment, to analyse the language and structure of reporting in the Early Modern period, alongside issues of censorship, authorship and publication. As an opportunity to explore the role of early journalism in providing information on politics and news; the tracking of versions of reported events and comparing these across titles and issues could be aided by providing effective ways of browsing in and around newsbook titles and issues to compare how events were covered. The ability to search for specific terms and word choices will further allow scholars to revisit journalism’s history to explore this period more fully, an aspect of research that can contextualise more modern discussions of journalism and its societal role. In their digital form, the newsbooks also allow academics to reassess the role of the proprietor, and juxtapose content between and across in a more accessible manner than traditional archives have allowed, and to address early shifts in the way news was being delivered.19

2. Research Culture and Methodology

Search must be situated in the overarching culture and research methodology of each discipline. Exploring the research cultures in question has revealed some useful insights that have undoubtedly shaped the design process. In order to attempt to improve search, it is important to understand an interface in response to research practices and values.

We can already make general observations about each research area. For example, it is probably fair to say that historical research culture is still very archive-based because of the sources in question, although, with the increase in digitisation projects over the years, that is starting to change. English literature, language and linguistics have a varied research culture because of the variety of different areas covered: ranging from social history, use of language, deep reading for stylistics and so on. Political research culture incorporates historical and contemporary political and philosophical theory, as well as social science and economic aspects, such as document analysis, census polling, interviews and surveys. Politics departments and research centres are often organised into research groups, for example: comparative politics, governance and public policy, international relations, political economy and political theory. Journalism is similar but for different reasons because of its relative youth as an academic subject it takes a collaborative and multi-disciplinary approach. It is informed by a variety of academic disciplines including politics, sociology, cultural studies and history.

It is perhaps also the case, that patterns of research are directed in some sense by funding bodies and the stipulations and trends they set. By making these ‘taken for granted’ assumptions visible we have been able to assess whether they align in practice.

2.1. Refinding and Change Blindness

A technical study by Teevan (2008) looks at how people recall, recognise and reuse search results.20 When someone issues a query, they expect certain things from the search results returned. These might be based on the current information needed, but could also be influenced by how a search engine is perceived to function, the expected ranking of results and any previous searches conducted on the subject area. Information previously accessed can move, change or disappear completely without the user’s knowledge and be difficult to ‘refind’. Previous interactions with search result lists are important for an understanding of future result lists and as such consistency is key. Dynamic menus have been shown to hinder users because items no longer appear where anticipated. Selberg and Etzioni (in Teevan, 2008) noted that:

“Unstable search engine results are counter-intuitive for the average user, leading to potential confusion and frustration when trying to reproduce the results of previous searches.”

Teevan (2008) presents a method for creating consistent result lists for previously issued queries, so that they appear the same despite potentially including new and perhaps better results. It is modelled on the concept of ‘change blindness’ (the limits of human memory capacity and attention allow obvious changes to pass unnoticed). How probable a search result would be remembered depends onits ranking in a result list, and whether it had been clicked on or not, as clicked-on results were much more likely to be subsequently recalled. The recency effect may make the last result clicked particularly memorable because it was the last one seen or because it was what the searcher was looking for. A recognition study was conducted into different types of merge: clicked merge (results first clicked ranked highest in subsequent searches), intelligent merge (old and new results merged), original (no merging) and new (entirely new results).

Static result lists work well in terms of refinding, however, they do not assist new information discovery. On the other hand, while a completely new result list supports finding new information, it hinders refinding. Returning results that appear static but contain new information appears to perform almost as well in both cases. Therefore, intelligent merge was found to be the best method to facilitate both finding and refinding due to the preservation of memorable aspects of an original result list (i.e. clicked on results), whilst at the same time including new results in the place of previous results that have been forgotten. Meeting the expectations of users effectively, based, for example, on their previous interactions with a search engine, is fundamental to supporting finding and refinding behaviour successfully.

2.2. Search Strategies

The immediacy of access provided by the web has provoked a change in search strategies. The traditional ‘just-in-case’ method would be to accumulate stores of information from visits to libraries and archives, kept on hand in case required. Online access means that information need not be stored locally and can be retrieved as and when to meet the needs of a specific enquiry. However, skill and technique are important for this.21 Some of the initial questions regarding search strategies we set out to address in the landscape survey, alongside the first project research question (see ‘About the Project’), were as follows:

  • What do academics do in terms of preparation for conducting search? How do they come up with their search terms? Is it a structured/intuitive process or ad hoc? Do they find their way through organically, working on triggers, or is it more mechanistic, having a structure in mind before they start, or just entirely random?
  • What level of trust do researchers have in search results and how do they validate findings?
  • Is there a shift or negotiation between on and offline practice?
  • Is it important for those conducting search to have an awareness of how search technically works?

Before the survey was distributed we made some general speculations about how PhD students might approach search as opposed to established academics. For example, we expected to find that PhD students take a more mechanistic or methodical approach, as they are generally at the start of their research careers, taking courses about how to conduct research, developing their own practices and finding out what works best for their own research. They are also starting to discover resources and places to find information and are perhaps more used to online resources, having been brought up with technology as a natural medium.22

On the other hand, we speculated that Professors might prove to have deeper background knowledge, and intuition for resources and where to find what is needed, as well as a well-developed research methodology established over many years of research. Depending on what academic activities they are involved in they may be less computer literate and uncomfortable using the web for research purposes for reasons of trust and validity.

2.3. Humanities Research Practice in Current Literature

There have been a number of recent studies conducted around research practices in the humanities. The two of most significance to this project are a study conducted by the Research Information Network (RIN)23 and the two-part Log Analysis of Internet Resources in the Arts and Humanities (LAIRAH) study.24

The RIN study looked at information practices in the humanities to try to understand how researchers from a range of disciplines find and use information, and how processes and methods have adapted to the introduction of new technologies. Their most significant findings were threefold: firstly, in terms of the research process, scholars seem to work in bursts because of time demands and therefore are beginning to favour digital resources because they are easier to access. Secondly, searching for new information was often done by using Google as a starting point before moving on to other resources online or in print. Digital resources are used mainly for speed and convenience but also for evaluating the relevance of a text before finding it in print. Thirdly, researchers generally use what works and mix resources that complement their research.

Some barriers to the use of digital resources included a lack of awareness of tools, a lack of standardisation of online databases and archives, a lack of institutional training and support, irregular use of resources, and difficulties in data linking between archives. The report transforms these findings into policy recommendations for funders, libraries and publishers. The specific aim of the RIN study was to identify ways of providing the right kind of support for researchers, whereas this research focused on developing user involvement in resource production, which addresses some of these issues from a different perspective.

The LAIRAH project was funded by the AHRC’s ICT strategy projects scheme. It used quantitative and qualitative methods to establish an understanding of how users approach digital resources in the arts and humanities sector and directly identify factors “that may predispose a digital resource to become used or neglected in the long-term” reference?.

This research identified four main issues. The first is the naming and description of a resource. The title should clearly indicate what the content is so that, amongst other things, it can be easily discovered through keyword searching. There should also be information about the extent of the resource, where the original data comes from and the methodology for selection and digitisation because the clues we are used to in the physical world are lacking. Secondly, different ways of engagement were identified as being of importance to researchers. The study found that users were very much aware that one of the advantages of digital resources is their ability to enable users to manipulate data in a variety of ways. Thirdly, ease of access is vital because the more hindrances a user comes across the more likely they are to give up on a resource. However, the study also found, and our findings went on to support this, that if a certain resource is vital to a scholar’s work they will persist and struggle with difficult interfaces and barriers to access to get at what they need, developing “work-arounds”. Fourth, in terms of interface, researchers expect a professional and user-friendly front end but interface design for digital humanities material seems to be managed in a random fashion with some projects benefitting from a good designer while others prioritise the back-end materials.

The second part of the LAIRAH study25 gives details of surveys of digital content providers and provides policy recommendations for the projects themselves and for funding bodies to impose on compulsory project deliverables, such as public access documentation. The report recommends making researchers or “users” the focus of digital resource development in the future by consulting them as soon as possible and maintaining contact throughout a project, as well as carrying out user surveys and interface tests and incorporating this feedback into the design. One policy recommendation states that “information should be disseminated widely about the reasons for user testing and its benefits”, since they discovered that very few of the projects surveyed maintained contact with users or carried out any organised testing meaning that they had little idea how popular the resource was or how it was being used. They concluded that, had users been consulted, then the design process would have been much simpler.

These studies identified a significant problem with current resource development processes, showing a real need to define much more specifically how digital humanities resource development projects might go about maximising user involvement in the design process for the benefit of their future sustainability. The next section explores our methodology for addressing these issues.