by Seul Lee
1. Introduction
In 2020, the Public Library Association (PLA) found that more than 93% of public libraries in the United States offer digital collections that were in high demand. According to this report, more than 90% of public libraries were providing access to e-books. In addition, Overdrive, an e-book provider for libraries, reported a significant 33% increase in e-book checkouts during the same year (Burleigh, 2021). Subsequently, a survey conducted by Statista in 2022 (Watson, 2023) underscored this trend by revealing that the borrowing of digital books from libraries and schools had reached an impressive 522 million instances. Moreover, the impact of this surge in digital library services also extended to academic circles. JSTOR, a widely recognized nonprofit resource for academic journals in the humanities, social sciences, and other disciplines, announced in 2020 that its users conducted 262 million searches and accessed over 230 million journal articles through viewing and downloading in 2019 alone. Such proliferation of online library services, such as JSTOR and IEEE Xplore, has experienced exponential growth due to financial and spatial constraints encountered by traditional libraries, coupled with an increase in user demand and a rapid expansion of publications (Brown, 2013, p. 37).
As more and more people have shifted the ways in which they access information from more traditional media sources to online information platforms, such as digital libraries and academic search engines, these tools have become essential for both research and everyday information-seeking activities. The emergence of these platforms has not only transformed the way people search for and access information, but has also influenced how scholars share and publicize their research. This has led to the rapid growth of large scholarly search platforms, such as Google Scholar. In particular, Google Scholar has become one of the most widely used academic search engines due to its convenience, efficiency, timeliness, user-friendly interface, search capabilities, and extensive coverage. According to SimilarWeb data from May 2023, scholar.google.com received approximately 134 million visits.
This shift, however, has amplified concerns related to the transparency and potential biases of content curation processes. It has also raised questions about the reliability, credibility, and transparency of quality control for resources, as well as the accuracy of their representations. This includes the accuracy and completeness of descriptive tools and other metadata, as well as unrecognized biases, subjectivities, errors, or omissions that may arise during the processes of curating, digitizing, and presenting materials. However, these complexities may not be readily apparent to users of libraries and academic search platforms, especially in digital library, archive, or search platforms where there is no information professional available to guide the user through the system or descriptive tools. Moreover, several leading online academic search engines, through the ways in which they process and present information, can make certain information more or less visible, or even invisible, thus affecting or influencing end users.
In order to gain a better understanding of how certain online resources are retrieved and presented on these academic search engines, this study investigates the various factors that can potentially shape search outcomes on these platforms. Specifically, this research delves into the intricate aspects encompassing biases, errors, and omissions that may arise in the processes of digitization, information retrieval, and representation. It examines these factors from both technical and functional dimensions, encompassing elements such as corporate crawling, indexing, ranking, and presentation criteria, all of which wield the potential to influence the visibility of articles. The overarching goal of this study is to elucidate how these multifaceted variables can impact the search results for online academic resources, consequently affecting the accessibility and diversity of information. By conducting a detailed analysis of these dimensions, this research seeks to advance our comprehension of the intricate dynamics that impact academic search engines.
2. Analysis
2.1 Archival Information Retrieval Systems
A productive place to begin this examination is the area of archival information retrieval since the process of shaping the presentation of primary sources from accession to retrieval has been of notable scholarly concern for many years. According to Elizabeth Yakel, an archival studies scholar (2003, p. 2), archived materials, their searchability, and representations are all value-added processes of archival arrangement and description. Archival materials, including digitized resources in digital libraries, are affected by the nature and degree of the “archivalization” processes (Ketelaar, 2001, p. 133) they have undergone, beginning with the appraisal or selection decisions that resulted in their inclusion in the archive. These processes potentially incorporate a range of conscious and unconscious biases, errors, omissions, and commissions that are inherent in the historical layering of descriptive interpretations presented in archival finding aids for primary materials that may have accumulated and described and re-described over many years. Additionally, these materials are frequently shaped by archival decision-making addressing the specific needs, resources, requirements, and technical standards of its institutional mission or disciplinary context. These are well-known issues associated with searching for archival materials. In this regard, despite the sophistication and numerous advantages of the digital platforms being used, archival search results often fail to provide the exact or complete materials or information we are searching for.
This archival discourse has been significantly influenced by the French philosopher Jacques Derrida (1995), who reasoned that technological advancements not only alter the process of archiving but also impact the substance of the material that needs to be archived. As argued by Ketelaar (2001), the technologies employed to create, maintain, and utilize records have an impact on the composition, format, and organization of such records. File specifications for each online library platform can also differ to meet their specific requirements for storing and accessing data. For instance, the file specifications used by JSTOR were initially based on the specifications developed by Elsevier Science for their TULIP project. However, these specifications have since been modified and expanded to align with JSTOR’s unique requirements for data storage and access (Garlock, et al, 1997). Nevertheless, these complexities may not be readily apparent to users of any archive, particularly in a digital archive where there is no information professional available to explain their system, descriptive tools, or the ways their search engines work. As a result, they may seem intangible. Furthermore, the authority of the archive as a trusted repository may discourage researchers from questioning the presentation and representation of archival materials, or even their existence or absence. Because of these factors, digital and archival literacy efforts urge end users who access online information through digital libraries or academic search platforms to carefully assess which sources to use or trust.
In addition to the challenges involved in converting resources into digital formats, another confounding factor in locating archival materials and content that may be relevant to a user’s query can, counter-intuitively, be the finding aids and retrieval system themselves. The effectiveness of archival retrieval depends on various factors, including the current practices, perspectives, and workflows of the institution and the archivist. For instance, because the vocabularies used by librarians, programmers, and institutions are very distinctive, the terms they have developed and chosen to use for their systems and online platforms may differ from what we are familiar with or what other platforms utilize in their index and search systems. It also depends on the assumptions and “efficiencies” embedded in the finding aids and retrieval system. The constraints of and imposed by publishers and the database and retrieval systems that disseminate their publications are also clearly evident in digital library practices. For example, due to its embargo policy, JSTOR may not have articles published within the last few years available for access. According to their policy (2023), JSTOR collects content from selected journals that require an embargo period before their articles can be accessed. An embargo period is a specific duration of time following the publication of a journal during which JSTOR restricts access to the articles. It can be challenging, therefore, for users to locate the most recent articles on a particular subject on the JSTOR website and this in turn can affect the scope and currency of existing research used in new scholarship.
Additionally, depending on users’ settings and institutional affiliations, the full text of certain articles indexed by academic search engines and digital libraries may not be freely accessible due to subscription or other accessibility constraints. Furthermore, some academic search platforms may incorporate users’ personal settings, such as their preferences, search history, or institutional affiliations to personalize their search results. While such personalization can enhance user experiences, it can also result in filter bubbles, where users are only exposed to content that aligns with their pre-existing interests and search history. In addition, depending on their disciplinary-focus, digital archives and libraries may not be as comprehensive as Google Scholar or physical libraries. On the other hand, although Google Scholar has a broader coverage of scholarly articles, users may need to refer to additional discipline-specific databases or digital libraries to ensure comprehensive access to a wider array of relevant publications.
2.2 Google Search
Google Scholar is another widely used academic search platform. It may require a deeper understanding of its search algorithms and capabilities to be used effectively. Although Google does not disclose the specific details of how Google Scholar’s algorithm operates, it is heavily influenced by Google’s ranking algorithms. At a high level, in general, Google’s search engine generates a correlation map between keywords and content, and then ranks the content for each keyword based on its offline relevance. When a user enters a sentence or word, the search engine queries the correlation map to find corresponding ranked results and displays them to the user. These search engines use similar workflows, but they store different types of data depending on their specialization. Google Scholar, for instance, archives scholarly-related data such as publications and grants, to rank documents in the same way as researchers do. It considers various factors such as the full text of each document, the publication source, the author, and the frequency and recency of citations in other scholarly literature. On the other hand, Google has a wider range of data, including finance, news, videos, and more.
In addition, these search platforms utilize various ranking algorithms to determine the relevance and importance of web pages or resources. For instance, Google employs PageRank, while other search engines and digital libraries may use collaborative filtering or other alternative methods. Despite the lack of explicit disclosure from Google regarding the factors that determine its relevance algorithm, many scholars and researchers have identified approximately 200 factors that are believed to contribute to this process. These factors encompass a range of elements, such as the quantity of received links, the presence of relevant keywords and related terms in key areas of the document, the speed of the server hosting the page, the length of the text, the user experience, the implementation of mobile-first design, the use of semantic tagging, and the age of the domain (Rovira, C. et al., 2021).
Furthermore, a search engine comprises technical components and human actors, incorporating the interests and directives of corporations, administrations, and regulatory authorities. Human actors constantly interact with technical elements to manually adjust search results that align with corporate guidelines and user preferences. More importantly, throughout the process of shaping search results, various actors make non-innocent editorial interventions. These interventions include judgments about which content to include, how to categorize it, and how to present it through search engine interfaces. Nevertheless, only a limited amount of information regarding these elements and processes is documented or readily available to the general public.
In comparison to Google, the current state of academic research on search engine optimization (SEO) for Google Scholar is still in its early stages, with a lack of comprehensive findings. Moreover, the guidance provided for optimizing Google Scholar results is often influenced by research conducted on Google’s search algorithm. Although these two algorithms of Google and Google Scholar operate on distinct document types and in vastly different environments, the subsequent section of this study first examines factors that could potentially influence search results on Google in general. Then, we delve into the factors that are specifically relevant to Google Scholar search optimization. First, search results processing criteria, such as how crawling, indexing, ranking, and presentation criteria affect search results are examined. By analyzing the algorithms and relevant practices employed by these search engine platforms, the factors that affect the visibility and ranking of content can be identified.
The first half of the subsequent section reports on an examination of the impact of various elements on Google’s search results by tracing the process of creating the results: crawling, indexing, ranking, and presenting. It outlines the key factors that influence search results. Acknowledging the components involved in producing search results can help end-users gain a deeper understanding of the potential biases or artful manipulations that may be present in the search results presented to them. To accomplish this, we will first identify the elements and processes involved in generating search results and comprehend how these components and actors operate within the system. There are three fundamental stages in search processing: crawling and indexing, ranking, and presentation.
Step 1: Crawling and Indexing. The first step of a crawler-based search engine, such as Google, is crawling to discover new or updated web resources, gather information, and extract links using their web crawler bots or spiders. Search engines then store the information in their index system, which describes the content together with its URL. For instance, Google’s web crawlers, often referred to as Googlebots or spiders, continuously gather information from billions of web pages and index those downloaded pages on their search engine server to enable users to search more efficiently. These web crawlers search for new websites, visit them, and discover additional pages by using existing site links. They also check for any changes in existing sites, broken links, and new websites. Google then renders these pages, analyzes their content, and decides what to include and where to place them in their list of search results.
Various biases can occur during the crawling and indexing processes. Even in the initial stages of crawling, search engines may not be able to crawl and index all websites on the web, which could result in some websites being missed. In other words, certain content may not be appeared or searchable by some search engines if their crawlers did not discover them or visited them but did not index them. Search engines thus are unable to cover the entire web and may not maintain up-to-date indices. Secondly, when search engines gather information about a website, the frequency of crawling and the criteria for selecting pages to be indexed from each site are determined by their platform’s protocols. What this means is that, as pointed out by Goldman (2006), search engines may intentionally or unintentionally exclude certain webpages or only include a portion of a webpage even when content is available and indexable.
Search engines also use anchor text, which is the visible text of a hyperlink that is supposed to let users know what the page they are linking to is about. Goldman (2006) also pointed out that there are certain issues with the use of third-party descriptions of the website. Search engines can use third-party descriptions of a webpage when indexing it, and they can utilize metadata, including the anchor text used by third parties to hyperlink to a website. It is possible for them to present websites using a term that the original site never used, based on the anchor text created by third parties. Furthermore, search engines consistently update and modify their indexes and content, and regularly refine their tools to provide better search results according to the standards set by their own platforms. Finally, they may intervene to modify content for various reasons. However, the detailed criteria or protocols for making exceptional modifications are not publicly documented or easily accessible to users.
Aside from these working practices, other fundamental issues exist that relate to the content being searched. Search engine algorithms and system designs are built using the major languages of the platform developers and their locations; thus, any given search engine may have uneven coverage across different languages. According to a survey by W3Techs (2021), English is the most widely used language for online content, encompassing 63 percent of the content on the World Wide Web, despite the fact that a significant amount of non-English content, such as Russian, Spanish, and French, is available. Although Google attempts to use a different default language for each domain (e.g., Google.co.kr defaults to Korean; Google.fr defaults to French), its default website, Google.com, defaults to English. Therefore, English is the most used language on Google.com. Content written in languages other than English is likely to receive less exposure on Google.com. Moreover, the accuracy and quality of search results for non-English language content may be low due to Google’s flawed ability to understand search queries written in other languages. This can cause issues with content that does not properly reflect other less dominant geographic, demographic, cultural and even linguistically variant contexts in the results. This will be further discussed in the later section on cultural dimensions.
Step 2. Ranking: Many researchers and scholars (Goldman, 2006; Lorigo et al., 2006; Rodden et al., 2008) emphasize the significance of search result ordering, highlighting that users tend to browse only the top few search results, particularly the first two results. According to a study conducted by SISTRIX in 2020, which analyzed over 80 million keywords and billions of search results, they found that the average click-through rate (CTR) for the first position in Google was 28.5%, followed by 15.7% for the second position and 11% for the third position. As Goldman (2006) states, one of the most important principles of search engine automated operations: if a website is chosen by the majority of users by satisfying their interests, it should be highly ranked. However, this can lead to algorithmic decisions that misguide search results for users who fall outside of the mainstream. Due to their nature, these algorithms may categorize the preferences and opinions of minority groups as error, without offering a detailed explanation of specific data points or a transparent policy for retrieving the best possible results (Hardt, 2014). For instance, Google’s popular ranking algorithm, ‘PageRank’, assesses the value of websites by looking at how many important websites link to them. It gives weight to the elements of hyperlinked documents in order to measure the relevancy and quality of web pages. This process implies that if a website is popular and highly visible, it is considered to be important by Google’s platform.
Aside from these systematic flaws of skewed data presentations, companies may also intentionally introduce biases related to race, gender, and other minority classes into their search results. For instance, in her book “Algorithms of Oppression” (2018), Safiya Noble examines how Google’s search results reflect users’ underlying desires, needs, interests, and biases. She specifically highlights the problems of racial and gender biases embedded in these results from a Black feminist perspective. Hargittai (2018) highlights the tendency in media to prioritize the opinions of individuals with high social status, often representing and utilizing their viewpoints more frequently.
Another factor involved in shaping search results is the interests of search companies’ stakeholders, including sponsors and advertisers. Search engines can direct users to specific advertising content that is paid for by their associated advertisers. Indeed, advertising products are one of the most lucrative revenue streams for search engine companies. However, paid advertising content is often presented in a manner that makes it difficult for users to differentiate it from organic search results. Similarly, search results are made up from various heterogeneous elements, but cannot be easily explained by merely listing the properties of the elements that compose search results. Each element continuously forms new relationships with other elements, resulting in the creation of unique properties and capabilities that contribute to the generation of search results.
Mackenzie (2006) warns against viewing algorithms as “the formal static essence of software.” In his introduction to other literature that treats algorithms as powerful and consequential actors, Ziewitz (2016) also points out that algorithms can be further imbued with agency and impact while operating with biases, making mistakes, or exercising power and influence. Ziewitz examines algorithms not only as computational artifacts but also as tools that can assist us in reevaluating our definite assumptions about agency, transparency, and normativity regarding algorithms. Diakopoulos (2013) suggests that algorithmic biases can be exposed by reverse engineering them. This involves articulating the specifications of a system through a rigorous examination based on domain knowledge, observation, and deduction to unearth a model of how it works.
Although users are aware of the potential biases in these systems, several giant search engines, including Google, Baidu, and others dominate search engine markets or are even treated as default options for a search engine. In addition, when big, leading search engines like Google take action, other competitors tend to follow their guidelines. Although some search engines, such as DuckDuckGo, do not retain users’ IP addresses or other personal information, they still operate similarly to other major search engines like Google. Such similarities imply that it may be difficult for users to access information that differs from what big search engines provide from their perspective.
Step 3. Presentation: Search engine interfaces consist of various elements, including page layouts, keywords, plaintext, languages, icons, fonts, sizes, colors, scroll bars, preview sections, manuals, function keys, digital documents, images, videos, and other visual elements. These are all factors that can affect users’ perception of search results. As Drucker (2018) notes, Krug’s concept of a successful interface is one that becomes invisible and disappears, allowing users to focus solely on the content presented on the screen. According to Drucker’s explanation, the abstract and metaphorical representations of a graphic-based operating system interface (GUI) can obscure material elements and functions in its system operations. This can lead users to construct their own meaning, which can affect the rhetorical semantics. With this review, we can understand how these different components fit together and interact with each other during search processes, as well as the potential impact they have on search results.
2.3 Google Scholar Search
The factors mentioned above are those that have the potential to impact search results on the Google platform in general. Although Google Scholar has its own specific algorithmic approach, it still utilizes Google’s overall search infrastructure and approach, which encompasses crawling, indexing, and ranking components. Although the factors specifically relevant to optimizing Google Scholar search results have not been clearly identified, and the guidance provided for optimizing Google Scholar results is often influenced by research conducted on Google’s search algorithm, some studies have found distinctive characteristics of the Google Scholar system. As highlighted by Rovira’s research team (2021), there are four key characteristics that differentiate academic documents on Google Scholar from regular web pages. Firstly, they are mostly in PDF format, not HTML. They also include bibliographic citations that link to other academic documents, rather than hyperlinks. Also, once published, academic works are typically not modified. Lastly, the author, metadata, and date of publication are often clearly identified.
The algorithms used in Google Scholar’s information retrieval system also differ from those used in Google’s general web search. As Rovira’s research team (2021) pointed out, the visibility of academic papers and conference articles depends on how well they are optimized for Google Scholar’s algorithms while an individual’s academic brand and the visibility of their web pages, blogs, and videos, heavily rely on their Google ranking algorithms. Since Google Scholar is designed to meet the specific information requirements of researchers and scholars, it recognizes that its users are typically looking for scholarly resources to support their academic work or research. Therefore, the algorithms used in Google Scholar may assign greater importance to factors such as academic relevance, accessibility of full text, and citation metrics. Google Scholar focuses on indexing scholarly sources, including academic publishers, institutional repositories, and university websites. It gives priority to sources that are considered authoritative within the academic community. However, this approach may exhibit certain publication biases by favoring content from journals with higher citation and impact factors, while underrepresenting content from lesser-known, non-English, or less popular disciplinary journals. Since citations are frequently used as an indicator of the impact and influence of a publication within the academic community, citation analysis and co-citation analysis are commonly used to determine the importance and relevance of scholarly articles in Google Scholar. Additionally, although Google Scholar is user-friendly and inclusive of books and articles in various languages, it includes citations from a wide range of sources, such as Word documents and PowerPoint presentations, which leads to inflated, indiscriminate, and fluctuating scores (Bjork, S. et al, 2014).
Additionally, depending on users’ settings and institutional affiliations, the full text of certain articles indexed by academic search engines and digital libraries may not be freely accessible due to subscription or other accessibility constraints. Furthermore, some academic search platforms may incorporate users’ personal settings, such as their preferences, search history, or institutional affiliations to personalize their search results. While such personalization can enhance user experiences, it can also result in filter bubbles, where users are only exposed to content that aligns with their pre-existing interests and search history.
3. Conclusion
This study has shown the various factors that may potentially influence search results on online scholarly platforms, as well as on Google and Google Scholar platforms. I hope that these findings can contribute to a more nuanced understanding of these platforms and their resources, which have become essential to our research activities. I also hope that it can assist users in making informed decisions regarding the sources provided by these platforms and the information they utilize.
Acknowledgement
I would like to express my sincere gratitude to my advisor, Professor Anne Gilliland, for providing invaluable feedback and suggestions on this research paper. I am responsible for any substantive deficiencies.
References
Public Library Association, 2020, Public library technology survey: summary report, viewed 21 May 2023. <https://www.ala.org/pla/sites/ala.org.pla/files/content/data/PLA-2020-Technology-Survey-Summary-Report.pdf>
Burleigh, D 2021, 33% growth for digital books from public libraries and schools in 2020 sets records, viewed 21 May 2023. <https://company.overdrive.com/2021/01/07/33-growth-for-digital-books-from-public-libraries-and-schools-in-2020-sets-records/>
Watson, A 2023, Digital books borrowed from libraries and schools worldwide from 2017 to 2022, by format, viewed 21 May 2023. <https://www.statista.com/statistics/250007/downloading-or-borrowing-e-books-on-a-public-library-website-in-the-us/>
JSTOR n.d. Journals, viewed 21 May 2023. <https://about.jstor.org/librarians/journals/>
Brown, L. 2013, Case Study 1 – The JSTOR platform, in Baker, D. and Evans, W. (ed.) A Handbook of Digital Library Economics Operations, Collections and Services. Chandos Publishing: Oxford, pp.37-45.
SimilarWeb 2023, https://scholar.google.com/, viewed 1 June 2023. <https://www.similarweb.com/website/scholar.google.com/#overview>
Yakel, E 2003, ‘Archival representation’, Archival Science, vol. 3, pp. 1-25. viewed 15 May 2023. <https://deepblue.lib.umich.edu/bitstream/handle/2027.42/41831/10502_2004_Article_5139967.pdf?sequence=1>
Ketelaar, E 1999, ‘Archivalisation and archiving’, Archives & Manuscripts, vol. 27, no. 1, pp. 54-61. viewed 15 May 2023. <https://publications.archivists.org.au/index.php/asa/issue/view/183>
Ketelaar, E 2001, ‘Tacit Narratives: The Meanings of Archives’, Archival Science, vol. 1, pp. 131-141. viewed 15 May 2023. <https://fketelaa.home.xs4all.nl/TacitNarratives.pdf>
Derrida, J 1995, ‘Archive Fever: A Freudian Impression’, Diacritics, vol. 25, no. 2, pp. 9-63. viewed 15 May 2023. <https://doi.org/10.2307/465144>
Garlock, K, Landis, W & Piontek, S 1997, ‘Redefining access to scholarly journals: A progress report on JSTOR’, Serials Review, vol. 23, no. 1, pp. 1-8. viewed 18 May 2023. <https://www.tandfonline.com/doi/abs/10.1080/00987913.1997.10764359>
JSTOR 2023, About the Moving Wall, viewed 13 June 2023. <https://support.jstor.org/hc/en-us/articles/115004879547-About-the-Moving-Wall>
Beel, J & Gipp, B 2009, ‘Google Scholar’s Ranking Algorithm: An Introductory Overview’, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), vol. 1, pp. 230-241. viewed 23 May 2023.
<https://docear.org/papers/Google%20Scholar’s%20Ranking%20Algorithm%20–%20An%20Introductory%20Overview%20–%20preprint.pdf>
Google n.d., A guide to Google Search ranking systems, viewed 1 June 2023.
<https://developers.google.com/search/docs/appearance/ranking-systems-guide>
Google n.d., How results are automatically generated, viewed 1 June 2023.
<https://www.google.com/search/howsearchworks/how-search-works/ranking-results/>
StatCounter 2023, Desktop search engine market share worldwide. viewed 1 June 2023.
<https://gs.statcounter.com/search-engine-market-share/desktop/worldwide/>
DataReportal 2021, Digital 2021 April global Statshot report. viewed 1 June 2023.
<https://datareportal.com/reports/digital-2021-april-global-statshot/>
Goldman, E 2006, ‘Search Engine Bias and the Demise of Search Engine Utopianism’, Yale Journal of Law and Technology, vol. 8, pp. 188-200. viewed 13 May 2023.
<https://digitalcommons.law.scu.edu/cgi/viewcontent.cgi?article=1112&context=facpubs>
Bakkalbasi, N, Bauer, K, Glover, J & Wang, L 2006, ‘Three options for citation tracking: Google Scholar, Scopus and Web of Science’, Biomedical Digital Libraries, vol. 3, no. 7, viewed 12 May 2023.
<https://link.springer.com/article/10.1186/1742-5581-3-7#citeas>
Jacsó, P 2005, ‘Google Scholar: the pros and the cons’, Online Information Review, vol. 29, no. 2, pp. 208-214, viewed 12 May 2023.
<https://www.emerald.com/insight/content/doi/10.1108/14684520510598066/full/html>
Lorigo, L, Pan, B, Hembrooke, H, Joachims, H, Granka, L & Gay, G 2006, ‘The influence of task and gender on search and evaluation behavior using Google’. Information Processing & Management, vol. 42, no. 4, pp. 1123-1131, viewed 12 May 2023.
<https://www.sciencedirect.com/science/article/pii/S0306457305001366>
Rodden, K, Fu, X, Aula, A & Spiro, I 2008, ‘Eye-mouse coordination patterns on web search results pages’. CHI EA ’08: CHI ’08 Extended Abstracts on Human Factors in Computing Systems. vol. 8, pp. 2997-3002, viewed 12 May 2023.
<https://dl.acm.org/doi/10.1145/1358628.1358797>
SISTRIX 2020, Why (almost) everything you knew about Google CTR is no longer valid. viewed 1 May 2023.
<https://www.sistrix.com/blog/why-almost-everything-you-knew-about-google-ctr-is-no-longer-valid/>
W3Techs 2021, Usage statistics of content languages for websites. Q-Success, viewed 12 May 2023.
<https://w3techs.com/technologies/overview/content_language>
Hardt, M 2014, How Big Data is Unfair. Medium. viewed 12 May 2023
<https://medium.com/@mrtz/how-big-data-is-unfair-9aa544d739de>
Noble, S. U 2018, Algorithms of Oppression: How Search Engines Reinforce Racism. New York: NYU Press.
Hargittai, E 2018, ‘Potential Biases in Big Data: Omitted Voices on Social Media’, Social Science Computer Review, vol. 1, no. 15. viewed 1 June 2023.
<http://www.mkoganresearch.com/assets/hargittai.pdf>
Mackenzie, A 2006, Cutting Code: Software and Sociality. Peter Lang: New York, NY.
Ziewitz. M 2016, ‘Governing Algorithms: Myth, Mess, and Methods’, Science, Technology, & Human Values, vol. 41, no. 1. viewed 12 March 2023.
<https://journals.sagepub.com/doi/abs/10.1177/0162243915608948>
Diakopoulos, N 2013, ‘Rage Against the Algorithms’, The Atlantic. viewed 12 March 2023.
<https://www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255/>
Drucker, J 2013, ‘Reading Interface’, Publications of the Modern Language Association. Vol. 128, no. 1, viewed 1 May 2023. <https://doi.org/10.1632/pmla.2013.128.1.213>
Kaufman, G & Flanagan, M 2016, ‘High-Low Split: Divergent Cognitive Construal Levels Triggered by Digital and Non-digital Platforms.’ CHI ’16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. Vol. 16, viewed 13 March 2023.
<https://dl.acm.org/doi/10.1145/2858036.2858550>
Rovira, C, Codina, L & Lopezosa, C 2021, ‘Language Bias in the Google Scholar Ranking Algorithm’, Future Internet, vol. 13, no. 2, p. 31. viewed 3 May 2023.
<https://www.mdpi.com/1999-5903/13/2/31>
Bjork, S, Offer, A & Söderberg, G 2014, ‘Time series citation data: the Nobel Prize in economics’, Scientometrics, vol. 98, pp. 185-196. viewed 3 May 2023.
<https://link.springer.com/article/10.1007/s11192-013-0989-5#citeas>
Harter, S.P 1992, ‘Psychological relevance and information science’. Journal of the American Society for Information Science, vol. 43, pp. 602–615. viewed 3 May 2023.
<https://asistdl.onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-4571(199210)43:9%3C602::AID-ASI3%3E3.0.CO;2-Q>
Lee, S 2021, ‘Digital literacy: search results as assemblages’, PhD Qualifying Exam paper, University of California, Los Angeles, Los Angeles.
About the Author
Seul Lee is a doctoral candidate in Information Studies at UCLA, holding a B.A. in MIS, an M.A. in Data Science, and a graduate certificate in Digital Humanities. Her research focuses on how people search for, use, and evaluate information in various contexts, with a particular interest in user-generated content (UGC). Her dissertation examines the positionality of UGC, the components contributing to biases, mis/disinformation, and distinct narratives in online resources, and the factors shaping information behaviors. She is currently engaged in projects, examining online hate speech, creating data science tutorials, and digitizing oral histories of early Korean American immigrants.