Following the survey and focus groups, a series of interviews with 8 content providers were conducted, including three national libraries/archives, a digital historical library, an online genealogy resource, a major search engine, a large-scale content provider and two strands of a national funding programme which champions the use of digital technologies in UK education and research. In this section, we will investigate the different processes and practices of these content providers. In particular, how they are governed by funding structures and the information market, as well as how these concerns impact on selection and digitisation processes and the design and integration of search. This will also go some way to answering our second research question: How is the design of search conditioned by technical considerations or the viewpoint of content creators? Most significantly for this project, however, is what is being done in terms of user needs, research, testing and feedback.
We acknowledge that certain methods may have evolved since the interviews were conducted in late 2012, in response to new demands and changing practices in the digital humanities forum.
1. Competition
The libraries and archives did not express concern about competitors since much of their digitised materials are drawn from unique holdings; in fact, they identified instances of collaboration and sharing of best practice between similar institutions. The search engine set out its policy of dealing with competition as ‘launch and iterate’, to build the best product they can and constantly update it. As a commercial enterprise, the online genealogy resource explained its preference to target material to which it could claim exclusivity to maintain a competitive advantage in the family history marketplace. The team also keeps a close eye on technical and product developments, particularly social media-oriented and makes a concerted effort to provide much more background information on their datasets than any competitors, most of which is primarily of academic interest.
2. Business Models, Funding Structures and Selection Processes
Business models varied considerably depending on the thrust of the enterprise, some being commercially driven and others publicly. These models influence every aspect of their work, including how funding is implemented and how material is selected for digitisation, amongst others. For the libraries and archives, the selection of material to be digitised is led by where the funding originates from. In some cases it could be considered to be user driven in the sense that potential users approach them asking for materials to be digitised that they wish to use. For example, the digital historical library was funded by The History of Parliament to publish the House of Lords Journals. On the other hand, there is generally a fund-raising team in place to actively search out money for digitisation but the source of this funding will more than likely exert an influence on the selection process. More often than not, it is essential to demonstrate a need and a value in digitising the proposed material. In addition, strategic priorities are often highlighted by surveys of higher education institutions, one recent example being a desire for more investment in digitising newspapers. There tends to be involvement of a panel or advisory board to determine what material is selected.
The online genealogy site does not seek any external funding and explains that the selection of material is based on their own understanding of what is of value to the target users as well as their expectation of how they might value it. They try to ensure that their collections are as complete as possible and exclusive but would not refuse non-exclusive material if it is deemed to be of real value and relatively easily accessible.
The large-scale content provider now has a platform to which new digitised material is simply added. However, there are also many individual content packages that were created ten to fifteen years ago; this was the digital “stone age” relatively speaking. The provider is aware that researchers are not happy with some of these services but, while they may have every intention of updating them one day, as a multi-national company the UK branch is not necessarily in control of what subscription fees and other funding sources are put towards. Generally, it would be new digitisation projects following trends in popular demand or specific material requests confirmed with market research.
From the other side, the funder explained that while their main aim is to specify standards and advocate best practice, they are very much immersed in this process whereas other funders are not, creating or designing programmes which build on previous initiatives and, therefore, capturing learning and passing it on so that it can be taken forward beyond their involvement. This could make them more attractive as a funding partner because of the level of involvement they offer in comparison to other similar funders.
3. Digitisation Processes, Search Design and Integration
Digitisation methods, and the design and integration of search within online resources are also heavily influenced by funding sources, selection decisions, consultation with users and the individual approaches institutions have chosen to take. Developmental flexibility is largely dependent on whether digitisation and/or web development are taken care of in house or outsourced.
The digital historical library took the decision to present the digital version as an edition in its own right, a representation of the material in digital form, rather than a finding aid to then access a book. Their search engine is powered by Google and they therefore have little control or influence over how it actually works; however, all material is completely rekeyed in xml and produced with high quality metadata attached so it should theoretically be easy to locate.
The national libraries have varying digitisation capabilities and in some cases will outsource for mass digitisation projects. OCR is added to printed text but quality varies, as this is also often outsourced, and a balance must be struck when there is a finite amount of money, between digitising as much content as possible and doing a lot of tidying up of the data. External funding often dictates that content goes to a third party site and therefore the library has little control over what it will look like. They will stress attribution but unless their involvement is requested they have to be flexible depending on the nature of the project, the funders and other partners who might have primary responsibility for the design of the interface. The number of different search facilities and interfaces accessible through the main library site can create issues for users, which is why attempts are currently being made to standardise and consolidate existing interfaces; however it will be a long and challenging process. At the moment, users need to have a good understanding of how the library is organised and spend time on the website to recognise what is located within different catalogues. Many institutions have a web services team or something similar, who have a sense of what is a good design and what is needed to accentuate things that are key and important, and often there will be the involvement of an external design agency who will mock up wire frames to be tested with users.
The search engine product teams are composed of a group of software engineers working with a product manager who coordinates the work of the engineers and pulls their code together into a working product. Search design ideas can come from within the team, employees outside the team or regular users. They also draw a lot from analysis of aggregated logs data; for example, if the top result for a given type of query is clicked on less than the second result this indicates that the results are not being presented in the right order.
The online genealogy site manages all their web development and design, data manipulation and design, and software development in house. The nature of the data they deal with means that they have had to develop complex and dynamic search facilities and interfaces that are common to multiple datasets and which can deal with both structured and unstructured records. This means that the underlying system software must also be flexible and powerful in order to be able to incorporate different types and formats of data relatively easily.
4. Accessibility, Discoverability and Impact
The content providers were aware of different user groups accessing their resources and the need to provide different methods of achieving this. The digital historical library stated that the austerity of the resource limited how user friendly it could really be to non-expert users, too advanced even for undergraduates; however, to tackle this they are gradually adding helpful contextual information without diluting the academic content of the site. They have also updated the front page with browsable category lists, more appealing to the casual user and making it easier to locate the most popular types of materials, such as maps.
The libraries and archives have tended to split their users into 3 categories in order to address their differing access requirements; explorer, rambler and tracker, for example. Most have attempted to give direct access to expert users who know what they want but for less expert users the front page offers menus to enable browsing collection lists categorised by subject or historical period and highlights particular well known collections or topical items reflecting current exhibitions or displays or things which have recently appeared in the media.
Discoverability is important for the uptake and usefulness of digitised resources. Cleaning up catalogue records and applying OCR are considered particularly key as they enhance searchability and, as such, the value the digitisation offers over simply having to look through mounds and mounds of analogue material. Transcriptions or translations would also be desirable to many academics and casual users alike; however, the associated costs make the provision of these unfeasible, particularly for large-scale projects. The online genealogy site acknowledges that some source documents are relatively inaccessible to users, because of difficulty reading old handwriting or texts are in other languages, but has not yet undertaken any transcription or abstraction work itself, currently relying on those previously created. However, with the increasing availability of digitised images they are beginning to consider whether and how they might address this. The funder is concerned with how to make material discoverable, but also how to promote it and make it sustainable. They make suggestions and recommendations to achieve all of these things based on learning from previous programmes.
Impact has become one of the key buzzwords in Digital Humanities and one of the main measures of the success of digital resources. Many libraries and archives have taken to using TIDSR (Toolkit for the Impact of Digitised Scholarly Resources) to access newly launched resources, as well as running website analytics and studying user statistics. However, some content providers are coming to recognise that assessing impact post-launch limits what can be done about the results, with some looking to turn the process around by trying to anticipate impact pre-launch, in order to address any issues that might compromise this.
5. User Research, Testing and Feedback
The content providers interviewedhad generally adopted quite similar approaches to user testing and some quite frankly admitted that they did not really ask anybody before launching new digital resources. Websites are built by a development team, led by a small group of people, some of whom may have a vested interest in how the resource is built and what apparatus it offers for their own research purposes or otherwise.
The digital historical library’s resource has been developed and evolved during several consultation meetings with academics and students since its inception over a decade ago. It has proven to be too advanced for undergraduates, and since their premium content section is sold to universities, they have had to try to make it more accessible. They were funded to use TIDSR to assess their outputs, for which surveys, focus groups, interviews and testing were conducted. Feedback was gathered from each of these strands and changes were made and retested. One of the issues is that the resource is fairly fixed at this point with new material simply fed into the existing structure. Some work has been done in terms of improving accessibility but a complete redesign with greater user involvement would be costly and most of the work they have done has been more about improving how the collections are organised. They have also implemented a simple usability survey to gather monthly feedback from a broader user community.
One of the national libraries talked about user research and testing in terms of a newspaper digitisation project they had recently undertaken. The library formed a panel of academics to advise on the project since the material is being delivered for them. This panel was based on existing contacts identified by curatorial staff. Initial designs were taken to an agency who would bring in users to work with the designs, observed by the library development team. Feedback received from the wider user community through an open form on the website is sorted into issues with the interface and problems with core features of the software. In the case of the latter, they might then return to the external software developer with changes or enhancements because they are also interested in providing a more bespoke service to their clients.
The other national library frankly admitted that a great deal of money was being spent on digitisation and developing extensive metadata, which was then inserted into websites where the interface is flawed. Impact studies show that the reason users do not use them is because they do not like using them but once things are launched it is unlikely that any fundamental changes can be implemented. They recognise the need to embark on more of an understanding of their users and have been addressing this in more recent projects by trying to factor in usability from the beginning with the input of an academic steering group and user/developer workshops to engage in some iterative design work.
The national archive has a user experience team who attempt to communicate with users as early as possible and define how things will be presented to the end user. They have thought a great deal about accommodating their three identified user personas: explorer, rambler and tracker, from historians and archivists to amateur family historians and a more general public. A working prototype is built and volunteers are found to play around with these and undertake some set tasks. Feedback is then gathered about what they liked and did not like and what they would like to see in terms of the interface. For feedback they also have a method of gathering this through the main site and through a blog. This is collated and discussed in weekly planning meetings.
The search engine’s user base is the population at large so user research must necessarily draw on the knowledge and experience of a wide variety of communities. To this end, they have a range of user experience research teams based worldwide with particular areas of focus and expertise, most of whom are specialists in human-computer interaction. They use direct observation and questioning and a number of offsite study methods to learn about how services are being used and what could be improved. Feedback can inspire new features or will be implemented into longer-term decisions about product direction.
The online genealogy site admits that while they have done user research in the past via questionnaires sent to registered users, for some time they have relied on the sense that they know what is needed and what the competition is doing. However, the relative popularity of different content types is monitored by examining dataset usage statistics.
6. Constraints, New Demands and Changing Practices
A number of constraints were identified, which mainly related to issues of control over selection, digitisation and interface design decisions. For example, if the software driving a resource is not developed in house, say a library management system, this will usually have its own defined development cycle, which can be difficult to get around. The cost of digitisation can be an inhibitor and trying to offer as much content as possible can mean that the provision of scholarly apparatus suffers as a result. Additionally, a project may have little say in such matters if a partner, which is the vehicle for dissemination, has a vested interest in ensuring their interface and so on is fit for purpose. The provider may be able to suggest or develop some aspects of functionality but these could well be at the mercy of the partner’s development cycle. One major constraint can be the provision of quality OCR. Poor OCR is a problem that experienced researchers are well aware of and know that it can prevent them from finding results that they know are there; so for resources to be taken up in the long term it is important that the OCR meets a high standard.
The demands emerging in the digital humanities forum are mainly focused around providing better accessibility for wider audiences: this means different ways of accessing data for different levels of expertise, particularly for national and commercial concerns; evolving and innovating functionality and access to content in line with researcher requirements and expectations; showcasing iconic holdings for a non-professional academic context; and adding aspects to give more helpful context without diluting academic integrity. There is a move towards standardisation by consolidating project websites into an overarching site; for example, within a manuscript’s website there might be content from a number of different digitisation projects. Standardisation of search facilities for multiple datasets is also being undertaken. Funders are requiring improvements in the impacts and sustainability of projects, which can only come from taking user requirements into consideration, among other concerns. As mentioned above, there is increasing demand for more accurate searching achieved through high standards of OCR and rekeying.
For researchers who were perhaps in denial about using Google and Wikipedia to conduct scoping searches, it is increasingly acceptable to use and cite search as a finding aid generally. Scholars are becoming more open to using and citing digital resources, even resource discovery through social media is becoming more common.
7. Conclusions
The interviews with the eight content providers generated some understanding of how search design is approached by a range of organisations concerned with research infrastructure or research products. The content providers are governed by funding structures, partner institutions and the information market, which impact on how they select what to digitise and what digitisation processes they adopt, including how they design and integrate search into their sources. Control is sometimes also relinquished by integrating outsourced background software and design work as opposed to keeping these in-house. This understanding goes some way to addressing the second research question: How is the design of search conditioned by technical considerations or the viewpoint of content creators?
Levels of user involvement in the design of the content providers’ products varied considerably, from limited consultation that only takes place when a resource is launched, if at all, to fairly well developed user testing processes and implementation of user community feedback. What was clear was that the content providers involve users to a greater or lesser extent in the design and development of digital resources, but they are not driving design decisions from the beginning through to the end of a project. Users are generally brought in to evaluate work that has already been done and it seems that, while their feedback and any feedback gathered from the wider user community might well be collected, it is not always acted upon because of time and funding restrictions. Sometimes the resources that researchers have the most problems with are old in digital terms, and have not been updated because funding from subscriptions or otherwise is being directed towards new digitisation projects directed by market demand. All of this means that researchers may be forced to persevere with resources they do not find intuitive and user-friendly because of the desire to tap into the material they contain, developing ‘work-arounds’ to address hindrances caused by issues with search and interface design.
The content providers are each at different stages of implementing more user-centred approaches as they begin to realise the positive impact these can have on the uptake and sustainability of digitisation projects. The importance of engaging users in the design process is gaining recognition amongst the funders of academic research as a consideration for helping with the ongoing impact of funded resources. An infoKit called ‘Planning a Participatory Workshop’1 is available to download from another strand of the national funding programme, which gives ideas for a variety of techniques that could be used in isolation or combination to elicit ideas from a group. However, while participatory workshops are, of course, a way to involve users to a greater extent in design and development, they are still not being placed in the driving seat.
Footnotes
- Jisc infoNet (2012) Planning a participatory workshop, [Online] Available: http://www.jiscinfonet.ac.uk/infokits/participatory