Monthly Archives

7 Articles


Voyant with Werewolves

Posted by Zohra Saed on

Marie de France Lai of the Werewolf

12th C. female poet

Here is how you make medieval texts interesting — select ones about magical monsters (ok maybe all have magical monsters) and then add these fancy things. I tried it with my students and they loved this. This was absolutely the most fun I’ve had with this text. Everything else required downloads or vids that took too long, or were blocked by DOE! (Yes I am guilty of doing classwork at work).



Voyant Trials with Latinx

Posted by Nancy Bocanegra on

This was my first time using Voyant but it felt good to actually use a database especially after reading much about the learning outcomes of using such systems. I’m definitely a more experiential type learner.

In my trial, I used a piece from a Latino Studies class,“Mapping and recontextualizing the evolution of the term Latinx: An environmental scanning in higher education” by Cristobal Salinas Jr. & Adele Lozano. It reflects on the term Latinx and the evolution of it over the years in higher education. I wanted to see what this tool would tell me compared to what I had read.

Cirrus View: The four main words were latinx (236); term (196); gender (97); googe (89); scholar (83), 

It allows us to see main terms of the article which highlights key subjects in the essay. The amount of times these  words are used correspond to the effect they hope to make on the reader. Another interesting aspect was the correlation and wordtree tool which in itself revealed the answers laying in the essay.  The wordtree also serves as a visual to trace the root of the main term. Overall the visuals help with identifying certain trends in the writing and help to analyze key questions one may have.

Chatting with Veliza was also interesting. But i didn’t quite understand how it all worked or the main purpose of this tool. I was just “click happy” and clicked away on things until something happened.


As I clicked around trying the different settings, I realized that if you set up a certain setting a certain way, once you leave that setting it reverts back to the default. Bubblelines for example, I could add on the the syntax keywords and click on the option to “separate lines with terms” and then, when I change the setting it goes back to zero.

Next Steps

For the next tests, I will want to use more content to compare and find possible correlations to see how helpful it will be to dump a lot of data onto a system and be left with more data to analyze and reflect back on.



Using Voyant Affectively with Joni Mitchell

Posted by Natasha Ochshorn on

I was originally curious to use Voyant on songs by Joni Mitchell because they are texts that I have a strong emotional response to. Since I am much more emotionally than clinically motivated with academic work, I was curious to use a tool that felt so cold on a text I feel so feverish about, and to see if it would produce any feeling or excitement in me stronger than, huh, that’s interesting.

For my sample I used the lyrics from Blue (1971), Court and Spark (1974), and Hejira (1976). They’re among her most critically and commercially acclaimed, as well as being ones that I happen to have an intimate familiarity with. I also liked, for no real reason other than shape, that there was another album released in between all of them.

The most commonly used words are not very interesting, although they are perhaps a good lesson in the geography of Mitchell’s music which is personal (I’m 55 times), descriptive (like 54 times), qualifying (just 43 times), and romantic (love 39 times). There are also words whose high counts are initially striking, but which mostly appear due to the repetitious nature of songwriting. I was initially struck by the frequent occurrence of green, before I thought to check and see how many of those instances came from the song Little Green (9 out of 11). If Mitchell were more prone to choruses instead of the single-line refrains she gravitates towards, I imagine this would be even more noticeable.

Most immediately striking to me was the high count for oh (37 times), an utterance that is potentially not a part of the author’s written lyrics, and is instead a translation on the part of the transcriber. These ohs illuminate the potential problems of text mining, specifically the lack of easily represented content, but they are also illustrative of the ways this tool – perhaps accidentally – has the potential to shine a spotlight on something that may have otherwise gone unnoticed. If I had been asked to pick a word to represent these albums I may have chosen sad (5 times), ambivalent (0 times), or aching (0 times), but oh embraces all of these terms without excluding joy (“oh love can be so sweet”), humor (“oh you’re a mean old Daddy”), or the refrain of wistfulness that runs through River (“oh I wish, 5 times). It’s not so much as a placeholder between the words, an exhale of yearning, but it was a sweet jolt of recognition to see it writ so large in Voyant’s word cloud.

Interestingly, what I found more meaning in than the frequently repeating words was the long list of words that were only used once or twice. These are mostly nouns and adjectives, and taken together it is their immense variation, their lack of frequency, that best illustrates Mitchell’s versatility and specificity as a songwriter. There is a pleasure in scrolling through the list and coming across the word pachyderm, for instance, and remembering that she used it as a bizarre euphemism in Blue Hotel Room. The context is still missing. Vain (1 use) is not particularly exciting to me, but used in combination with darling (4 uses) it transforms both words into something more intimate and specific. Perhaps the ability to identify word-types or parts of speech would help with this. If you could search for all nouns with their modifiers, for instance. But even knowing what was missing, I was surprised at the strong emotional response I had just to looking at a list of words in alphabetical order, and at the strong sense of person I got from beautiful stretches like shanty, shadows, shades, sex, settled or pushed, punishing, punched (all 1 use each).

I think the next step that I would take with this exploration would be to widen the scope to other artists, and see if the same patterns emerged, the same things felt interesting to me. How would the early work of John Lennon look, with its much more formal genre constraints? What about a writer like Taylor Swift, who blends the personal and observational qualities that Mitchell uses within a pop-music structure?




Mining Indignados’ Blogs

Posted by Pedro Cabello del Moral on


Text mining of the most relevant blogs and websites about Spanish “Indignados” movement using Voyant-Tools.

Selection criterion: Official websites of the movement compiled in

Official means, in this case, the websites that came directly from the organization (squares and other spontaneous places for meetings)

Defining a corpus is not easy; that’s why it is useful to build upon a previously made selection.

Purpose of the experiment: Search for the most common buzzwords or theoretical concepts related to the movement and gathered in its blogs and websites. They will be extracted from the list of terms within the first 250. Words that are not relevant for the experiment (as verbs; prepositions; or names of people, places, months or days) will not be taken into account.


First of all, it should be pointed out that some of the websites and blogs are not active anymore (perhaps they can be retrieved using internet archive). Moreover, some of the domains belong now to other business or even to other countries. Last but not least, other sites are blocked because of their political content (especially in the U.S.).

Voyant-Tools proves unable to move all the corpus (perhaps due to the abovementioned limitations). Then I decided to conduct the experiment splitting the corpus in chunks. That option did not work either; so, I opted for extracting the malfunctioning links from the corpus. It rendered the same error. Finally, facing the impossibility of analyzing a comprehensive corpus, I took the decision to include only 20 websites, which would represent some of the biggest cities of Spain. After doing that, I was able to visualize the data.

Interpreting the results

Some of the most abundant concepts were Assembly, Consensus, Commission, Share, Proposal(s), Camp, Twitter, Facebook, Minutes, and Plaza. Other buzzwords  in the list of the 100 more used were Evictions, 15m, Compañeros, Demonstration, Repression

When the link map is displayed the results that are shown are by no means contradictory. It seems obvious that Assembly is connected to General, Commissions and Inter. Likewise, it comes as no surprise that Share is associated with Facebook, Twitter, Blog, and Pinterest.

Looking at the bubble lines, one can unfortunately realize that the outputs are manly concentrated in a few websites (Bilbao, Soria, Salamanca, Gudadalajara, and Vigo); most probably as a result of being more text mining friendly. That could simply have occurred because of presenting more text in their home tab.


One of the unusual findings is that Share  is the most used concept (98 times) followed by Assembly (56 times). Thus, one can conclude that text mining tools applied to social movements’ archives help discover largely used words which give account of concepts that shape collective political subjectivities and that are propably unnoticed. For instance, quantitave analysis shows that Share not only has a practical use, connected to the idea of spreading the word through social media, it also portrays the idea of a sharing community.

It is clear that for a proper analysis, the selection and preparation of the corpus is fundamental. The rendering of the data is affected by minimal changes on the corpus. Perhanps my modest experiment with Voyant cannot serve the purpose of generalizing assumptions. In this respect, the relationship between signals and concepts was not fully developed and appeared as weak in my case study. However, signals such as Consensus, Assembly, Share, Proposal or Plaza configured a linguistic map around certain concepts that are definitely related to the new political subjectivity of the “Indignados” movement. Some of them were evident, other surprising, and many remain undiscover, waiting for future reserarches.


Voyant Tools Experiment with Shakespeare’s Corpus

Posted by Taylor Dietrich on

The image above is entirely taken up by a random selection of contextual uses of the word love. Ten words on either side of the keyword give an impression of how the word is being used and it’s place in the narrative. It provides a context for us, and a hint at frequency of use by repetition. In this case, the narrative is The Twelfth Night a play Shakespeare wrote in 1599 at approximately the midpoint of his writing career. However, with tabular data only, and without the aid of visuals, it is difficult to see patterns. The contention that certain concepts and themes in Shakespeare’s works are ephemeral, making his masterful examination of them all the more valuable rings true. Data visualization helps to bring this contention into focus.

Adding a visual element immediately helps nail down the frequency, and distribution of the word love in relation to tangentially related terms. In the bubble chart above, 4 plays written during the beginning, middle, and end of Shakespeare’s career are represented by a horizontal line divided into segments of equal length. To give context, this example uses the 5 key terms that appear most frequently in the corpus. Each keyword is represented by a bubble color, with the relative frequency indicated by the bubble’s size in each segment. The larger the bubble, the more frequently a term is used in that particular segment.

Love grouped with a random selection of tangentially related keywords shows a clear pattern thematically in Shakespeare corpus. All 8 key terms appear in each of the representative plays. However, in Love’s Labour’s Lost, the first of Shakespeare’s plays written in 1590, love and war are dominant key terms. Whereas with The Tempest, Shakespeare’s last play (1611), nearly every key term with the exception of good is presented by an equally sized, and relatively small bubble. This in itself is interesting as the word good, along with the words shall, lord, king, and sir is one of the most frequently used words in the entire corpus. That its frequency is so ordinary and tempered in Shakespeare’s last play almost demands further study.

There is also the question of context. Word choice in relation to originality is an obvious place to start. As noted in the table below, Shakespeare repeated himself very rarely. However, he did repeat, word for word, a six­word phrase that included the key term love: “love says like an honest gentleman”. This is actually not that outlandish as there is an instance in his corpus where he repeats a 25 word phrase.

The collocates graphs below represent keywords and terms that occur in close proximity as a force directed network graph. Love, a primary search term is possibly unsurprisingly linked to the word hate. What may come as a surprise is how much more frequently it is used, indicated by its relative size. The collocates are in maroon and largely correspond with what one would expect. Although the close linkage between hate and fear instead of love and fear might be something to investigate in more detail.

Incidentally, the word “reality” is not found even once in all of Shakespeare’s plays. This is symbolic, if not indicative, of the limitation of thematic studies through keyword analysis. Ultimately, the most telling information visualization for the ephemeral might be that which mimics the concepts being measured in both form and content.

The illustration below visualizes the frequency and distribution of the word love in all 37 of Shakespeare’s plays. Each play is represented by a vertical column. The height of the column illustrates the relative length of the document as compared to the entirety of the corpus. The red dots indicate the location of the word love with the brightness, subtle but present, further noting the relative frequency of the word within the corpus.

  • The visualizations in this blog post were created using Voyant Tools. Voyant Tools is an open­source project and the code is available through GitHub. The code is under a GPL3 license and the content of the web application (including the documentation) is under a Creative Commons By Attribution license.
  • Shakespeare corpus from The Gutenberg Project



Livejournal, Weird Twitter, and Digital Composing

Posted by Jesse Rice-Evans (she/they) on

I’ve been excited about multimodal composing since stumbling on Jackie Rhodes and Jonathan Alexander’s work on queer web texts in On Multimodality (CCCC/NCTE) back in 2016. That semester, in Tom Peele’s Rhetorical Theories and Composition Theories course at CCNY, I dove into the wild world of theorizing digital writing as situationist, activist, and queer AF.


Rhodes and Alexander discuss their own work towards practicing the queering of texts that they theorize in the text, but the chapter that really thrilled me focused on Guy Debord’s Society of the Spectacle and theories of dérive and détournement as they relate to queer webtexts and digital composing. Essentially, Debord viewed dismantling and re-purposing, ignoring the original intention of a space or text, as crucial to creating a situationist cultural ethos. Rhodes and Alexander respond by dis-composing, the rhet/comp incarnation of détournement, by taking texts and intentionally stripping order and even meaning from the writing.


On the train home that night, I frantically googled free coding classes, “détournement for Dummies,” anything to help me do this. I ended up stumbling into it on Twitter: Weird Twitter, a now-mostly-defunct movement of surrealism and un-meaning through genre manipulation, had been doing this work for years. Thus, was born.


I’ve always been into computer stuff. I coded my MySpace AND LiveJournal(s) with a custom background, color scheme, font, proto-emojis, an avatar, stuff to get my crush’s digital attention, etc.

JRE Livejournal layout

(uh oh, my background image!)

JRE Livejournal post

Once I realized that this impulse had always come from a place of loving to experiment, that I could play with rhetorical strategies in these multimodal formats that I’d always loved, I caught the bug again. Hard. Since, I’ve developed my own slew of playful online texts: an interactive memoir, the aforementioned messy theory-in-progress, a cleaner but fictionalized chap of poems.


Rhodes and Alexander swiftly followed On Multimodality with a fully-online multimedia piece titled TECHNE: Queer Meditations on Writing the Self (Computers and Composition Digital Press), which took their theorizing about the need for instructors to actually do some multimodal writing into practice. TECHNE is a wild mash-up of practice and abandon: a navigable webtext with integrated images, video, and audio. Rhodes and Alexander manage to maintain citation practices while incorporating their own original written & multimedia scholarship, including some weird arty video work run through iMovie’s slow-motion tool.


The thing that excites me most is how Rhodes and Alexander are breaking apart the standard essay/article form used across humanities scholarship, even in decidedly critical intellectual spaces: crit theory, cultural studies, critical university studies, comp/rhet. Because while we all talk a big game about experimenting with form and pedagogy, we still write in the same tedious, exclusionary forms we always have.

Break *clap emoji* this *clap emoji* shit *clap emoji* up.


Possible Impact of DH’s Different Ways of Knowing?

Posted by Heather Zuber on

It was great meeting you all on Tuesday!  As we introduced ourselves, I noticed that many students had (often extensive) digital experience/knowledge already.  Also, I noticed that nearly everyone expressed a need/desire to somehow enhance/ground/expand their existing DH knowledge by taking this class–a more formal/(traditional?) kind of process/structure/way of producing knowledge in the humanities.  I was thinking that these observations might speak to what Matt was saying about how DH “activities” represent different ways of knowing/forms of knowledge.  I’m wondering whether my own feeling that the digital/DH knowledge I have is somehow not legitimate is due to the fact that it was not gained/produced in the same way/through the same process as humanities knowledge and does not look or feel like that kind of knowledge either.  Also, does this lack of confidence in existing digital knowledge in part maybe come from pressure from outside (and even inside) DH to always somehow “prove” what we know?

Skip to toolbar