Voyant Tools Experiment with Shakespeare’s Corpus
The image above is entirely taken up by a random selection of contextual uses of the word love. Ten words on either side of the keyword give an impression of how the word is being used and it’s place in the narrative. It provides a context for us, and a hint at frequency of use by repetition. In this case, the narrative is The Twelfth Night a play Shakespeare wrote in 1599 at approximately the midpoint of his writing career. However, with tabular data only, and without the aid of visuals, it is difficult to see patterns. The contention that certain concepts and themes in Shakespeare’s works are ephemeral, making his masterful examination of them all the more valuable rings true. Data visualization helps to bring this contention into focus.
Adding a visual element immediately helps nail down the frequency, and distribution of the word love in relation to tangentially related terms. In the bubble chart above, 4 plays written during the beginning, middle, and end of Shakespeare’s career are represented by a horizontal line divided into segments of equal length. To give context, this example uses the 5 key terms that appear most frequently in the corpus. Each keyword is represented by a bubble color, with the relative frequency indicated by the bubble’s size in each segment. The larger the bubble, the more frequently a term is used in that particular segment.
Love grouped with a random selection of tangentially related keywords shows a clear pattern thematically in Shakespeare corpus. All 8 key terms appear in each of the representative plays. However, in Love’s Labour’s Lost, the first of Shakespeare’s plays written in 1590, love and war are dominant key terms. Whereas with The Tempest, Shakespeare’s last play (1611), nearly every key term with the exception of good is presented by an equally sized, and relatively small bubble. This in itself is interesting as the word good, along with the words shall, lord, king, and sir is one of the most frequently used words in the entire corpus. That its frequency is so ordinary and tempered in Shakespeare’s last play almost demands further study.
There is also the question of context. Word choice in relation to originality is an obvious place to start. As noted in the table below, Shakespeare repeated himself very rarely. However, he did repeat, word for word, a sixword phrase that included the key term love: “love says like an honest gentleman”. This is actually not that outlandish as there is an instance in his corpus where he repeats a 25 word phrase.
The collocates graphs below represent keywords and terms that occur in close proximity as a force directed network graph. Love, a primary search term is possibly unsurprisingly linked to the word hate. What may come as a surprise is how much more frequently it is used, indicated by its relative size. The collocates are in maroon and largely correspond with what one would expect. Although the close linkage between hate and fear instead of love and fear might be something to investigate in more detail.
Incidentally, the word “reality” is not found even once in all of Shakespeare’s plays. This is symbolic, if not indicative, of the limitation of thematic studies through keyword analysis. Ultimately, the most telling information visualization for the ephemeral might be that which mimics the concepts being measured in both form and content.
The illustration below visualizes the frequency and distribution of the word love in all 37 of Shakespeare’s plays. Each play is represented by a vertical column. The height of the column illustrates the relative length of the document as compared to the entirety of the corpus. The red dots indicate the location of the word love with the brightness, subtle but present, further noting the relative frequency of the word within the corpus.
- The visualizations in this blog post were created using Voyant Tools. Voyant Tools is an opensource project and the code is available through GitHub. The code is under a GPL3 license and the content of the web application (including the documentation) is under a Creative Commons By Attribution license.
- Shakespeare corpus from The Gutenberg Project