SCWAReD: Scholar-Curated Worksets from the HathiTrust Research Center

Keywords:

digital libraries, datasets, text analysis

Abstract:

The Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) project, generously supported by the Mellon Foundation, is producing a suite of curated worksets of materials from the HathiTrust Digital Library. SCWAReD aims to address inequities in both library collections and digital humanities research by identifying and remediating gaps within HathiTrust, and by using computationally-assisted efforts to recover content that is already part of the HathiTrust Digital Library but that may be difficult to discover with traditional metadata, in a traditional catalog, from within a massive digital collection. 

SCWAReD’s flagship collaboration is with the Black Books Interactive Project, part of the History of Black Writing, founded in 1983 at the University of Mississippi by SCWAReD Co-PI Maryemma Graham and hosted since 1998 at the University of Kansas. Four more projects were selected to create curated worksets, concurrently in development: “Mining the Native American Authored Works in HathiTrust for Insights” directed by Kun Lu, Raina Heaton, and Raymond Orr (University of Oklahoma), “The Black Fantastic: Curated Vocabularies, Artifact Analysis and Identification” directed by Clarissa West-White (Bethune Cookman University) and Seretha Williams (Augusta University), “Creating Period-Specific Worksets for Latin American Fiction,” directed by José Eduardo González (University of Nebraska, Lincoln) and “The National Negro Health Digital Project: Recovering and Restoring a Black Public Health Corpus,” directed by Kim Gallon (Purdue University). In each partnership, project teams bring content expertise, research questions, and curation experience, while HTRC assists with  HathiTrust collection access, provides research tools and environments, and methodological and technical expertise in text & data mining. Each workset will be accompanied by a scholarly introduction, documented derived datasets, and project reports. These comprehensive research packages will be hosted by HTRC and disseminated for re-use in research and teaching.

Our presentation will provide an overview of SCWAReD and preliminary analysis results from its collaborative projects.