Sociolinguists know that competition between linguistic variables is not unorganized and random, but guided by both language internal and external factors. Variation also reveals language ideologies that can lie behind any linguistic choice. By connecting linguistic data with language external factors such as place, time, social status, level of education, or ideological and political environment, we can observe how social meanings arise in context. Interfacing structured and unstructured data also enables new kinds of questions about linguistic variation and change, and offers opportunities to test and experiment on novel methods to reveal the logic behind ostensibly random variation.
This poses a challenge from a computer science perspective, as current tools are not able to fluidly cross-question linguistic phenomena and contextual information. The new open source tools to be discussed allow the user to define data subsets based on both linguistic features and the various extralinguistic criteria included in the corpus metadata. This constrained subset can be subjected to further linguistic analysis and visualization, as well as projected again through structured metadata into interactive visualizations combining the two, such as plotting the spread of linguistic phenomena through time and space. The exploration of visualization parameters allows us to detect interesting variations in the material, and to extract the relevant subset of the data for linguistic analysis.
As a case study of defining data subsets based on structured data, we explore the computer-assisted filtering of -er derivatives in a historical corpus by cross-referencing types with gold-standard present-day data as well as the Oxford English Dictionary. If successful, the same procedure will later be applied to the study of the inflectional comparative -er. Our aim is to analyse sociolinguistic variation in the productivity of both derivational and inflectional -er, operating on the hypothesis that similar variation and change may be observed in both derivational and inflectional processes.