This paper tracks literary influence and shared literary styles using data visualizations. The case study here is a comparison of the surprising literary similarities between two seventeenth-century texts, John Milton's Paradise Lost (1667) and John Bunyan's The Pilgrim's Progress (1678), texts composed by two authors at opposite ends of the educational and literary spectrum. I conclude that it is possible to represent aspects of literary style visually, and that the resulting graphs can point us to new understandings of both style and literary influence.
Both texts in my case study are religious: while Milton's details the Old Testament story of Genesis, Bunyan's is a New Testament allegory of Christian faith. Despite their topical similarity, there is no substantive evidence that Bunyan read Milton's poem (indeed, Bunyan was in prison when Milton's poem was published). My hypothesis is that both writers are responding independently to available English editions of the Bible. But to what degree? And how might we detect stylistic signals that lie underneath the obvious similarities of theme and diction?
My method here is to understand literary influence not through the word choices made by these writers but instead through the rhythms of grammatical sentence structures. Part-of-speech tagging allows us subsequently to compare grammatical structures via sequence alignment techniques drawn from biological genome studies. As with DNA, we can find common sequences shared by different sentences and we can map those relative similarities on a scatter plot where each dot represents a sentence and where greater grammatical similarity is signified by dots that are closer together. A second technique detects the placement of verbs within a sentence (a signature feature of what has been called "periodic” or “grand” style), and we can easily demonstrate that, while both Bunyan and Milton have borrowed from the Bible, Bunyan's style is more faithful to the original whereas Milton's style is more periodic.