Daily Archives

4 Articles


Voyant with Werewolves

Posted by Zohra Saed on

Marie de France Lai of the Werewolf

12th C. female poet

Here is how you make medieval texts interesting — select ones about magical monsters (ok maybe all have magical monsters) and then add these fancy things. I tried it with my students and they loved this. This was absolutely the most fun I’ve had with this text. Everything else required downloads or vids that took too long, or were blocked by DOE! (Yes I am guilty of doing classwork at work).



Voyant Trials with Latinx

Posted by Nancy Bocanegra on

This was my first time using Voyant but it felt good to actually use a database especially after reading much about the learning outcomes of using such systems. I’m definitely a more experiential type learner.

In my trial, I used a piece from a Latino Studies class,“Mapping and recontextualizing the evolution of the term Latinx: An environmental scanning in higher education” by Cristobal Salinas Jr. & Adele Lozano. It reflects on the term Latinx and the evolution of it over the years in higher education. I wanted to see what this tool would tell me compared to what I had read.

Cirrus View: The four main words were latinx (236); term (196); gender (97); googe (89); scholar (83), 

It allows us to see main terms of the article which highlights key subjects in the essay. The amount of times these  words are used correspond to the effect they hope to make on the reader. Another interesting aspect was the correlation and wordtree tool which in itself revealed the answers laying in the essay.  The wordtree also serves as a visual to trace the root of the main term. Overall the visuals help with identifying certain trends in the writing and help to analyze key questions one may have.

Chatting with Veliza was also interesting. But i didn’t quite understand how it all worked or the main purpose of this tool. I was just “click happy” and clicked away on things until something happened.


As I clicked around trying the different settings, I realized that if you set up a certain setting a certain way, once you leave that setting it reverts back to the default. Bubblelines for example, I could add on the the syntax keywords and click on the option to “separate lines with terms” and then, when I change the setting it goes back to zero.

Next Steps

For the next tests, I will want to use more content to compare and find possible correlations to see how helpful it will be to dump a lot of data onto a system and be left with more data to analyze and reflect back on.



Using Voyant Affectively with Joni Mitchell

Posted by Natasha Ochshorn on

I was originally curious to use Voyant on songs by Joni Mitchell because they are texts that I have a strong emotional response to. Since I am much more emotionally than clinically motivated with academic work, I was curious to use a tool that felt so cold on a text I feel so feverish about, and to see if it would produce any feeling or excitement in me stronger than, huh, that’s interesting.

For my sample I used the lyrics from Blue (1971), Court and Spark (1974), and Hejira (1976). They’re among her most critically and commercially acclaimed, as well as being ones that I happen to have an intimate familiarity with. I also liked, for no real reason other than shape, that there was another album released in between all of them.

The most commonly used words are not very interesting, although they are perhaps a good lesson in the geography of Mitchell’s music which is personal (I’m 55 times), descriptive (like 54 times), qualifying (just 43 times), and romantic (love 39 times). There are also words whose high counts are initially striking, but which mostly appear due to the repetitious nature of songwriting. I was initially struck by the frequent occurrence of green, before I thought to check and see how many of those instances came from the song Little Green (9 out of 11). If Mitchell were more prone to choruses instead of the single-line refrains she gravitates towards, I imagine this would be even more noticeable.

Most immediately striking to me was the high count for oh (37 times), an utterance that is potentially not a part of the author’s written lyrics, and is instead a translation on the part of the transcriber. These ohs illuminate the potential problems of text mining, specifically the lack of easily represented content, but they are also illustrative of the ways this tool – perhaps accidentally – has the potential to shine a spotlight on something that may have otherwise gone unnoticed. If I had been asked to pick a word to represent these albums I may have chosen sad (5 times), ambivalent (0 times), or aching (0 times), but oh embraces all of these terms without excluding joy (“oh love can be so sweet”), humor (“oh you’re a mean old Daddy”), or the refrain of wistfulness that runs through River (“oh I wish, 5 times). It’s not so much as a placeholder between the words, an exhale of yearning, but it was a sweet jolt of recognition to see it writ so large in Voyant’s word cloud.

Interestingly, what I found more meaning in than the frequently repeating words was the long list of words that were only used once or twice. These are mostly nouns and adjectives, and taken together it is their immense variation, their lack of frequency, that best illustrates Mitchell’s versatility and specificity as a songwriter. There is a pleasure in scrolling through the list and coming across the word pachyderm, for instance, and remembering that she used it as a bizarre euphemism in Blue Hotel Room. The context is still missing. Vain (1 use) is not particularly exciting to me, but used in combination with darling (4 uses) it transforms both words into something more intimate and specific. Perhaps the ability to identify word-types or parts of speech would help with this. If you could search for all nouns with their modifiers, for instance. But even knowing what was missing, I was surprised at the strong emotional response I had just to looking at a list of words in alphabetical order, and at the strong sense of person I got from beautiful stretches like shanty, shadows, shades, sex, settled or pushed, punishing, punched (all 1 use each).

I think the next step that I would take with this exploration would be to widen the scope to other artists, and see if the same patterns emerged, the same things felt interesting to me. How would the early work of John Lennon look, with its much more formal genre constraints? What about a writer like Taylor Swift, who blends the personal and observational qualities that Mitchell uses within a pop-music structure?




Mining Indignados’ Blogs

Posted by Pedro Cabello del Moral on


Text mining of the most relevant blogs and websites about Spanish “Indignados” movement using Voyant-Tools.

Selection criterion: Official websites of the movement compiled in https://movimientoindignadosspanishrevolution.wordpress.com/paginas-web-oficiales-del-15m-en-espana/

Official means, in this case, the websites that came directly from the organization (squares and other spontaneous places for meetings)

Defining a corpus is not easy; that’s why it is useful to build upon a previously made selection.

Purpose of the experiment: Search for the most common buzzwords or theoretical concepts related to the movement and gathered in its blogs and websites. They will be extracted from the list of terms within the first 250. Words that are not relevant for the experiment (as verbs; prepositions; or names of people, places, months or days) will not be taken into account.


First of all, it should be pointed out that some of the websites and blogs are not active anymore (perhaps they can be retrieved using internet archive). Moreover, some of the domains belong now to other business or even to other countries. Last but not least, other sites are blocked because of their political content (especially in the U.S.).

Voyant-Tools proves unable to move all the corpus (perhaps due to the abovementioned limitations). Then I decided to conduct the experiment splitting the corpus in chunks. That option did not work either; so, I opted for extracting the malfunctioning links from the corpus. It rendered the same error. Finally, facing the impossibility of analyzing a comprehensive corpus, I took the decision to include only 20 websites, which would represent some of the biggest cities of Spain. After doing that, I was able to visualize the data.

Interpreting the results

Some of the most abundant concepts were Assembly, Consensus, Commission, Share, Proposal(s), Camp, Twitter, Facebook, Minutes, and Plaza. Other buzzwords  in the list of the 100 more used were Evictions, 15m, Compañeros, Demonstration, Repression

When the link map is displayed the results that are shown are by no means contradictory. It seems obvious that Assembly is connected to General, Commissions and Inter. Likewise, it comes as no surprise that Share is associated with Facebook, Twitter, Blog, and Pinterest.

Looking at the bubble lines, one can unfortunately realize that the outputs are manly concentrated in a few websites (Bilbao, Soria, Salamanca, Gudadalajara, and Vigo); most probably as a result of being more text mining friendly. That could simply have occurred because of presenting more text in their home tab.


One of the unusual findings is that Share  is the most used concept (98 times) followed by Assembly (56 times). Thus, one can conclude that text mining tools applied to social movements’ archives help discover largely used words which give account of concepts that shape collective political subjectivities and that are propably unnoticed. For instance, quantitave analysis shows that Share not only has a practical use, connected to the idea of spreading the word through social media, it also portrays the idea of a sharing community.

It is clear that for a proper analysis, the selection and preparation of the corpus is fundamental. The rendering of the data is affected by minimal changes on the corpus. Perhanps my modest experiment with Voyant cannot serve the purpose of generalizing assumptions. In this respect, the relationship between signals and concepts was not fully developed and appeared as weak in my case study. However, signals such as Consensus, Assembly, Share, Proposal or Plaza configured a linguistic map around certain concepts that are definitely related to the new political subjectivity of the “Indignados” movement. Some of them were evident, other surprising, and many remain undiscover, waiting for future reserarches.

Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar