The Visualization, Digital Studies, and Literary History

Posted by Param Ajmera on


What new forms of knowing and expression are made possible by the digitization of culture? What does visualization and design mean in a digital humanities context? These questions open a reflection on the methods and outcomes of DH. I am eager to see how the creation and curation of data has been reconceived by the humanities. I am searching for new forms of interface and experience of cultural objects that are made possible by digital technology. From the sociology to the materiality of texts, and from cultural criticism to modeling data, the collection of texts below show how a heightened sensitivity to the affordances of digital media provide new approaches towards creating and sharing knowledge about culture and society. Despite the wide variety of texts and different disciplines from which this material has been drawn, a few common threads of analysis emerge: firstly, how we think and how we express our thought have become deeply circumscribed by the possibilities of digital media. That being said, the possibilities of digital media seem to change and expand with every passing software update, and so there is much that is yet to be determined in the future of DH. Secondly, how we choose to organize and represent information has a direct bearing on the ways in which it will be read, and so the matter of aesthetics and design becomes central to digital humanist inquiry. Thirdly, the power of the image, graphic, visual, or pictoral representation in suggesting new relations or patterns is very alluring, what Drucker calls their “representational force” which means that visuals and graphics must be made and read with elevated attention to their underlying set of assumptions and rhetoric. Ultimately, the works collected below serve as useful points of departure for thinking about various practices that bridge the subjective, affective, and experiential ways of knowing championed by the humanities with the networked and quantitative practices enabled by digital media, with a particular attention towards the role of visualization.

Brown, Vincent. Slave Revolt in Jamaica, 1760-1761: A Cartographic Narrative. http://revolt.axismaps.com/. Accessed 13 Mar. 2018.

Cordell, Ryan. “Reprinting, Circulation, and the Network Author in Antebellum Newspapers.” American Literary History, vol. 27, no. 3, Aug. 2015, pp. 417–45.

Dillon, Elizabeth Maddock. “By Design Remapping the Colonial Archive.” Social Text, vol. 33, no. 4 125, 2015, pp. 142–147.

In this article, Dillon discusses Vincent Brown’s digital archive and mapping of events related to the Tacky’s Revolt in Jamaica. Characterizing Brown’s project as a structural reshaping of the colonial foundations of the archive, Dillon advances this effort as an ongoing contribution to the ways in which archival silences and occlusions can be challenged. Cartography, with its long history in the Western imperial imagination, is inextricably linked to the desire for colonial empire. And so advancing it as a technique for rethinking the colonial ideologies on which the archive is based proves to be a particularly fraught enterprise. Dillon suggests that this fundamental challenge can be addressed when the possibilities of digital mapping are engaged with a heightened sensitivity to the impact of design and aesthetics, positing Brown’s project as an exemplar case. Through the useful of a well-theorized visual rhetoric, a conscious attention paid to the arrangement of images and text on the display, and a well thought-out curation of textual moments from existing sources, Dillon argues that new forms of knowledge can emerge that run counter to the colonial and white-supremacist episteme of the archive and of cartography. I find this discussion to be generative in its suggestions of the ways in which digital publications can use multilayered strategies of representation to shy away from functioning as transparent representations of the past and underscore the ambiguities that comprise literary-historical studies. Redesigning the archive becomes simultaneously, a way to leverage the digital in ways that enact decolonial possibilities, as well show how design and form can be made central in the work of digital humanities.

Drucker, Johanna. Humanities Approaches to Graphical Display. Vol. 5, no. 1, 2011. Digital Humanities Quarterly, http://digitalhumanities.org/dhq/vol/5/1/000091/000091.html.

In this article Drucker outlines a new theory of data, possible for strategies for thinking about visualizations within the humanities, and a overview of the fundamental differences between the humanities and sciences. She is critical of those humanities scholars who have borrowed methods and tools for digital / quantitative analysis from the natural sciences, without a sustained attention to the underlying assumptions and frameworks on which those methods and tools are based – assumptions that often run counter to the intellectual ends sought after within the humanities. This difference is captured in Drucker’s concept of “capta” as a kind of observation that is “taken” actively or constructed through hermeneutic practices, which is oppositional to the rational notion of “data” that assumed to be “given.” Whereas capta champions the “situated, partial, and constitutive character of knowledge production,” data refers to that which is already there available for observation and recording. This fundamental difference between data and capta necessitates a new approach to the representation of the knowledge created by these two principles. In other words, the visualizations based on a theory of capta need to be different than those based on data. To assist these ends, Drucker provides humanistic approach for rethinking visualization design strategies. The models she provides show how to visualize notions such as ambiguity, hybridity, and uncertainty are illuminating examples of how quantitative approach can be reconceived to be more appropriate within a humanities context.

Hayles, N. Katherine. How We Think: Digital Media and Contemporary Technogenesis. University of Chicago Press, 2012.

In this book, Hayles asks fundamental questions regarding the implications of media upheavals in the humanities. She frames the shift towards the digital as a paradigm shift that cannot be ignored because it affects the work that humanists do at every level: from emails and bureaucracy, to databases, journals, and professional research, and also becoming a core concern in the practices of effective teaching and learning. The movement between print and digital is taken up in detail, and within this context matters such as materiality, embodiment, collaboration, and disciplinarity are given due attention. Hayles is optimistic of the new possibilities in creating knowledge that are opened by the digital, however she is also concerned with preserving the rich resources of print traditions and the centuries of thought, expression, and practice that it has supported. To bridge these two seemingly incommensurable worlds, Hayles proposes an approach that she terms “Comparative Media Studies” that attends to the nuances of differences and relations between various forms of media, while also suggesting new modalities of curricular design that are more appropriate for the post-print world. Fundamentally revolutionary, this book proposes a wholesale rethinking of disciplinary formation within the humanities, the theories and practices through which knowledge is created and shared, as well as the pedagogical imperative of moving from a content-oriented curriculum to a problem-centered course-structure. Related to this agenda is the concept of technogenesis – the notion that humanity and technology have developed together. In a world of instant communication and more information, as well as pervasive/embedded technologies in nearly all facets of life, technogenesis captures the changes in human relations and human understanding that are effected by digital technology. This book is very generative in thinking deeply about the implications of the digital and the ways in which change must be effected, at all levels of operation, for the humanities to carve out a compelling role in the digital age.

Klein, Lauren F. “The Image of Absence: Archival Silence, Data Visualization, and James Hemings.” American Literature, vol. 85, no. 4, Jan. 2013, pp. 661–88. CrossRef, doi:10.1215/00029831-2367310.

Klein’s article makes timely interventions in American literary studies, debates on the ways in which we read texts, and the possibilities of digital archival studies. She draws a parallel between network visualization and “surface reading” to center attention on the presence of slavery in the Papers of Thmoas Jefferson. Using Protovis, a toolkit developed by the Stanford Visualization Group, Klein finds evidence supporting the central role occupied by a household slave, James Hemings, in the shaping of Jefferson’s daily life. In stark contrast with the fecundity of details about Jefferson’s life and the vast volume of documents that surround him, Hemings’s life story has been forever lost to an archive that was unconcerned with preserving the details of his life. Klein’s success at the finding Hemings’s ghostly presence in the Jefferson archive underscores how the kinds of archival deformations enabled by digital tools can be used in American studies. Her ability to produce critique using two widely contentious methods – quantitative analysis and Best and Marcus’s “Surface Reading” – teaches how we can and should blend digital tools and critical conceptual frameworks. I particularly admire her deeply thoughtful approach to using tools borrowed from the sciences and her insistence; her consistence insistence on the limits of digital tools and infrastructure.

Manovich, Lev. The Language of New Media. MIT Press, 2001.

Published in 2001, Manovich’s book occupies the contradictory space between ringing deeply true, and at the same time, feeling tragically dated. It is particularly useful for its definitions of new media and its comparative overview of two key moments in media history – the advent of cinema, and the computer. However, because of its gestation in a pre-Facebook, pre-Amazon, and early-Google context, Manovich is unable to consider the structural role occupied powerful monopolistic corporations in determining the ways in which the internet and digital media are used today. Criticisms aside, Manovich uses a very interdisciplinary approach, what he calls “digital materialism” in his analysis, which draws on diverse approaches to cultural analysis, such as literary studies, film theory, media studies, marxism, among others. All things considered, I find this book to be a very good overview of media history with lucid descriptions of key terms – a wonderful theoretical foundation when considering the role of media technology in shaping society and culture.

Rusert, Britt. “New World: The Impact of Digitization on the Study of Slavery.” American Literary History, vol. 29, no. 2, May 2017, pp. 267–86.

Tufte, Edward R. Envisioning Information. Graphics Press, 1990.

Wyman, Bruce, et al. “Digital Storytelling in Museums: Observations and Best Practices: Digital Storytelling in Museums.” Curator: The Museum Journal, vol. 54, no. 4, Oct. 2011, pp. 461–68. CrossRef, doi:10.1111/j.2151-6952.2011.00110.x.


Drawing Maps

Posted by Natasha Ochshorn on

The readings on modeling this week reminded me of a book I was given a few years ago by Andrew DeGraff called, Plotted: A Literary Atlas. In it, DeGraff maps the journeys of characters from the literary cannon. What interested me, looking at it again after doing these readings, was that he mapped not only more traditional journeys (Huck Finn down the river, Frederick Douglass to freedom), but also the emotional journeys of the characters in Pride and Prejudice, the spacial ones of A Wrinkle In Time, and the temporal ones of A Christmas Carol. This was an artistic project more than an academic one, but it was fun to look at how this author/artist used maps and graphs to enhance his personal relationship with these books.

Also interesting was that even in this obviously subjective/artistic work, DeGraff discusses the limitations of his project.

Much as the skyline of New York creates a rough map of the bedrock that it rests upon, or in the way that a map of the London Tube can tell you where the population centers are, these maps provide a sense of contour – sometimes literal and sometimes metaphorical – for their literary inspirations. 


Bookworm Assignment – New Words in the World

Posted by clararamazzotti on

Hi everyone!

I’ve tried both Bookworm and Voyant just to be sure that I was using properly this kind of tools.

My first idea was to look at several kind of words related to the gender (female/woman/girl/wife) in correlation with sexuality (sex/love/candor/prude/licentious/libertine/womanizer) and deviance (the idea came up thinking to Freud’s works and Bleuler who invented the word “schizophrenia”, for a long time these words and the female gender were related one to each other.) but I was not very sure of how I could do that.

So I chose a topic: I was curious about how much female characters in Literature were linked to words belonged to psychiatric or sexual deviances, and my first thought was to the Gothic productions, on published books between 1890 and 1920. I used Dracula by Bram Stoker, where the main female character has a very strong fascination to a misterious sexuality (in a great struggle because of the pruderie of the time).

I know, it’s not a very happy topic, but I thought it was cool to understand with a computer analysis which kind of words were used most!

I’ve tried with Voyant but I was not so much satisfied. I don’t know if I’ve did wrong, I want the system could let us to choose the words to analyze or to put some words in correlation, but the results were not very interesting (they just confirm to me something that I already know about the book).

The system analyze Dracula and found, of course, al lot of verbs but in particular a great use of “SHALL”. Lucy was nominated 225 times while the name Dracula appeared 188 times (probably because it’s like Voldemort, “you don’t have to say his name, he’s a monster!”). I chose several words for an analysis:

After many experiments I decided to not use my first idea (I have to learn better how to use this tools) and I’ve tried just with Dracula. So, we have eyes + Dracula + face. In this segment you could find Lucy and Mina going up and down, in a sort of dance.

So I thought “ok, and Bookworm?”. I thought this second tool could be more interesting, but maybe the topic was perfectible 🙂 In this case I just put “sex” and “schizophrenia” in a limited time (1890-1910), just to look at the number of books that used this words in the same period in which Stoker wrote his novel.

Guys, they wrote a lot about sex:


Unfortunately I have no idea of which kind of books Bookworm analyze, or with what criteria so it could be everything: medical books (and it’s ok), novels? – and it’s not so common, etc.



Annotated Bibliography – Big Data Impact on Literature

Posted by clararamazzotti on

Clara Ramazzotti

Prof. Matthew K. Gold

Approaching to Digital Humanities, 4 credits

March, 6th 2018

Topic: Big Data. Their first impact on literary studies and how they could be used in Digital Humanities.

Thesis statement: Digital Humanities is a big category that includes several things not yet clearly definied. In this context, doubts and attempts about what Big Data could do in the Humanities seems to be very stimulating, in particular in the period between 2005 (Google Books launch) and 2017. I focused my research on sources from 2015 to the end of 2017. What I understood so far is that the great quantity of informations and digitalized materials seem to be the center of an intellectual struggle where scholars are asking each other whether the great “quantity” of information, interpreted in contrast with their “quality”, was what we need for a better job on literary sources. I found interesting the articles I listed because some of them provide some possible answers or explanation about the Big Data/Literature duo (Mazzola, Lei Zeng) or because some of them tried to understand how to use Data at its best, as business/industry/scientific researchers did.

Annotated Bibliography

Crane, Gregory. What Do You Do with a Million Books? D-Lib Magazine, 2006. http://www.dlib.org/dlib/march06/crane/03crane.html. Last access: 03/05/2018

The ability to extract from the stored record of humanity useful information in an actionable format for any given human being of any culture at any time and in any place will not emerge quickly, but the fundamental tools on which such a system would be built are moving forward.

Multiple choices for multiple options on the same text: this is what a digital library could do. Crane discussed in 2006 about how a digital library could work with a still valid critique on the way books and sources are reproduced (rooted in tradition of print, as merely recreation of a printed book into a pdf or HTML text online). Traditional researches were obviously much more limited in their ability to meet a particular needs, and of course computers need much less time to do that.

Ganascia, Jean-Gabriel. The Logic of the Big Data Turn in Digital Literary Studies Front. Digit. Humanit., 2015. https://www.frontiersin.org/articles/10.3389/fdigh.2015.00007/full. Last access: 03/05/2018

But, what does “big” mean for the Digital Humanities? A million, a billion, and a trillion bytes are small compared to the Terabytes and Petabytes that are usually considered as the standard for “big data.” In the case of Digital Literary Studies, the total number of texts that can be characterized as literary works, including novels, poetry, and theater, does not exceed a few million books, which has been seen characterized as a delimiting horizon.

Thanks to this article it’s possible to clarify where Digital Humanities started, proposing technologically equipped methodologies in activities where, for centuries, intuition and intelligent handling had played a predominant role. Big Data, in this context, become revealing of how these new approaches can be applied to traditional scholarly disciplines, such as Literature, and what digitization allows; also, they discuss the nature of the Humanities in general. In fact, the main value of the Big is that they can renew, with the use of computers, the Humanities.

Kaplan, Frédérick. A map for big data research in digital humanities Front. Digit. Humanit., 2015. https://www.frontiersin.org/articles/10.3389/fdigh.2015.00001/full. Last access: 03/05/2018

Will we learn more by analyzing 10 millions books that we cannot read individually or by reading five carefully (Moretti 2005)?

The author chose to analyze how humanist processed and interpreted data, explaining the differences between Big Data and Small Data: researches in Big Data usually mean a focus on large or dense cultural datasets, which call for new processing and interpretation methods. On the other hand, Small Data regroup more focused works that do not use massive data processing methods and explore also interdisciplinary dimensions linking computer science and humanities research.

Lei Zeng, Marcia. Smart Data for Digital Humanities Journal of Data and Information Science, Volume 2, Issue 1, Pages 1–12, 2017. https://doi.org/10.1515/jdis-2017-0001. Last access: 03/05/2018

“Data is the new oil” (Humby, 2006) has become a defining phrase used by many in recent years as the evidence became more and more convincing. “However, in its raw form, data is just like crude oil; it needs to be refined and processed in order to generate real value. Data has to be cleaned, transformed, and analyzed to unlock its hidden potential” (TiECON East, 2014). […] advanced technologies, under the umbrella of Big Data and Smart Data, allow researchers of the humanities to join the mainstream of the digital age with new abilities as never before.

I chose this article because it clarifies some issues on the usage of Big Data in the Humanities, and it explains in a very simply way the differences between Smart and Big (quality and quantity) and what we are probably missing to achieve a goal (Lei Zeng talks about technological issues, for example).
The author hopes a very positive evolution in DH’s use of “Smart Data”, the ability to achieve big insights from trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale, as the only kind of data that could gives value in this field. In a few words, the Smart Data approach is useful to transform unstructured data to structured and semi-structured data.

Mazzola, Roberto. Google Books e le scienze (post)umane Laboratorio dell’ISPF, XII, 2015. https://doaj.org/article/5c7d4a8edb5941c890c3d30f72a4f568. Last access: 03/05/2018

L’estensione agli studi umanistici di questo nuovo approccio alla realtà e alla conoscenza ha suscitato le resistenze di quanti hanno difficoltà ad accettare l’idea di ridurre un libro, un dipinto, un brano musicale ecc., a mero flusso di informazioni codificate.

This article is an interesting analysis of what kind of work was done in Europe (Italy, France, Germany) when Google Books came out. The essential questions “how we could use Big Data” and “how we could save our heritage” found some clarifications in this article. It’s relevant how the author explains the passage from a scientific method (hyphotesis – verification) to data/algorithms new era, with the possible consequence to see the “traditional humanist” as an old character without purpose.
The example of Google Books and the work made by the company to obtain a free open access to all the knowledge is really fascinanting, but it’s more like a library of snippets, using the words of Nicholas Carr, pieces of knowledge useful as second-hand quotes.

Moretti, Franco. Canon/Archive. n+1 Foundation, 2017, New York City.

On the basis of programming, much more becomes possible: from the refinement of the corpus to the analysis of initial results; from the review of the critical literature to the design of follow-up experiments. This functional division of labor, whose results no individual scholar could ever achieve in isolation, is clearly indispensable to modern research.

This book could demonstrate that everything is measurable and, thanks to technology, studied as in a laboratory. It’s not casual that the first thing very noteworthy in this series of pamphlets are all the ways a book could be investigated. The way a writer uses paragraphs, the number of character’s presences in a story, etc. Literary Laboratory is fundamental to understand how far could go the DH: a series of tests and experiments to obtain a result, not just electronical transfer from paper to usb.
It’s a fascinating source for this topic because a scholar could find a method, an attempt to do something relevant for DH,
but at he same time the honest perception that DH have presented themselves as a radical break with the past, with the paradox that, in a new approach, not everything has to be new.

Ramsay, Stephen. The Hermenutics of Screwing Around; or what you do with a million booksDigital Culture Books, 2010. http://dx.doi.org/10.3998/dh.12544152.0001.001. Last access: 03/05/2018

There was no way to ask, “Which of these books contains the phrase ‘Frank Zappa?’ ” The fact that we can now do that changes everything, but it doesn’t change the nature of the thing. When we ask that question—or any question, for that matter—we are still searching. We are still asking a question and availing ourselves of various technologies.

Are we reading always the same book? Are we too canonical and perhaps Big Data are the way to compare and read more than a percentual? These are the questions that came up with this essay with the perception that more digitalized books (or cultural products) doesn’t mean a change in the critic’s scholar himself (the deeply research on a topic or the comparision between sources), but a significant evolution in the way scholars could do that, using their time and their knowledge in more than one field/topic/location.

Rojas Castro, Antonio. Big Data in the Digital Humanities. New conversations in the global academic context AC/E Digital Culture Annual Report, 2017. https://hcommons.org/deposits/item/hc:11759/. Last access: 03/05/2018

We should begin by dismissing certain clichés about the humanities and ask ourselves about their classic objects of study, bearing in mind the methods that are currently available. This requirement is not unrelated to the work of humanists, who have always been in contact with other fringe disciplines such as anthropology, Marxism and gender studies.

Humanists have established a dialogue with computer studies, and humanists are working on several methods: in this article, the author hopes that literary studies and computer analysis can eventually reconcile.
In particular, I found it helpful because, as Rojans Castro explains, the classic definition of Big Data is a formula: Volume (Terabytes, Petabytes, Exabytes), Velocity (data that is constantly generated) and Variety (texts, images, sounds), and if we take the three Vs as a basis, Big Data don’t fit in the humanities. But in the literary academic context, the expression Big Data, as we know, is associated more to “distant reading” (Moretti, 2007) or “macroanalysis” (Jockers, 2013) and we could start from this new way to “read” cultural products.

Schuessler, Jennifer. Reading by the Numbers: When Big Data Meets Literature. The New York Times, 09/30/2017. https://www.nytimes.com/2017/10/30/arts/franco-moretti-stanford-literary-lab-big-data.html. Last access: 03/05/2018

[…] scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

In this 2017 interview to Franco Moretti, there is an interesting series of considerations, useful for this study. First of all, the literary criticism tends to emphasize the singularity of exceptional works that have stood the test of time, like the creation of a canon and its use, and also it considers the literature as a drastic evolution from a period (or an author) to another one.
What Data and computer analysis could do is to give us lab’s results that could unsettle established ideas of literary history.

Svensson, Patrick. Big Digital Humanities: Imagining a Meeting Place for the Humanities and the Digital. University of Michigan Press, Ann Arbor, Michigan, 2016. http://dx.doi.org/10.3998/dh.13607060.0001.001. Last access: 03/05/2018

The bigness of big data in the humanities may refer to the number of perspectives inherent in the material and the richness of critical inflection rather than the sheer quantity of data. In addition, the digital humanities has also come to be seen as a site for challenging and renewing the humanities and academy.

Every definition and example, from Google Books to Stanford Lab, gave the opportunty to think to DH throught different points of view. And this article adds another word to this “DH vocabulary” debates: engagement, something that a professional figure finds on the social media insights and during a strategic work.
So, DHs and Big Data could be seen also as a strategic work where
 the digital humanities engages with the digital as a tool, as an object of inquiry, and as an expressive medium.


Voyant Assignment

Posted by Tristen Goodwin on

Hey everyone,

So my data set to test out Voyant was 215 instruction booklets from a collection of Nintendo 64 games, I thought it would be a fun and interesting exploration to see what would be of importance. I didn’t know what to expect when trying this, but after seeing the results, I gained an understanding that theses instruction booklets held two different “narratives”.

The thought process I had that sparked this idea, was how would video games reflect our society in very practical ways? I looked at instruction booklets as they can be seen as the bridge that connects the player to the game itself(if they decide to read the booklet first, of course). So the words that are frequent within these text all correspond to usage of a controller, which I was prepared for so I wanted to look deeper at words that aren’t as frequent but have some importance. In the visualization above, the words that are closer to the center, are words that matter to the actual game, but as you look farther away, the words that appear are more of  a legalized form of writing. For example, the word player/s and consumer are both showing up the corpus which  would determine the same entity. But player/s are more frequently used than consumers, and this can lead to the question: why?  The usage of language is important in all texts, and voyant gives us the clear understanding that the text surrounding the the center  holds just as much importance as the words in the center. There’s a narrative for the players on how to play a game, but there is also a narrative for the company to handle all legality in dealing with their property(the games themselves).

If I were to proceed with this experiment, I would perhaps take a corpus of instruction booklets from other companies and compare and even expand my view towards the modern day. While this was a simple observation, it does open new discoveries on how text is formed and what that can mean to the readers in question.


Voyant with Werewolves

Posted by Zohra Saed on

Marie de France Lai of the Werewolf

12th C. female poet

Here is how you make medieval texts interesting — select ones about magical monsters (ok maybe all have magical monsters) and then add these fancy things. I tried it with my students and they loved this. This was absolutely the most fun I’ve had with this text. Everything else required downloads or vids that took too long, or were blocked by DOE! (Yes I am guilty of doing classwork at work).



Voyant Trials with Latinx

Posted by Nancy Bocanegra on

This was my first time using Voyant but it felt good to actually use a database especially after reading much about the learning outcomes of using such systems. I’m definitely a more experiential type learner.

In my trial, I used a piece from a Latino Studies class,“Mapping and recontextualizing the evolution of the term Latinx: An environmental scanning in higher education” by Cristobal Salinas Jr. & Adele Lozano. It reflects on the term Latinx and the evolution of it over the years in higher education. I wanted to see what this tool would tell me compared to what I had read.

Cirrus View: The four main words were latinx (236); term (196); gender (97); googe (89); scholar (83), 

It allows us to see main terms of the article which highlights key subjects in the essay. The amount of times these  words are used correspond to the effect they hope to make on the reader. Another interesting aspect was the correlation and wordtree tool which in itself revealed the answers laying in the essay.  The wordtree also serves as a visual to trace the root of the main term. Overall the visuals help with identifying certain trends in the writing and help to analyze key questions one may have.

Chatting with Veliza was also interesting. But i didn’t quite understand how it all worked or the main purpose of this tool. I was just “click happy” and clicked away on things until something happened.


As I clicked around trying the different settings, I realized that if you set up a certain setting a certain way, once you leave that setting it reverts back to the default. Bubblelines for example, I could add on the the syntax keywords and click on the option to “separate lines with terms” and then, when I change the setting it goes back to zero.

Next Steps

For the next tests, I will want to use more content to compare and find possible correlations to see how helpful it will be to dump a lot of data onto a system and be left with more data to analyze and reflect back on.



Using Voyant Affectively with Joni Mitchell

Posted by Natasha Ochshorn on

I was originally curious to use Voyant on songs by Joni Mitchell because they are texts that I have a strong emotional response to. Since I am much more emotionally than clinically motivated with academic work, I was curious to use a tool that felt so cold on a text I feel so feverish about, and to see if it would produce any feeling or excitement in me stronger than, huh, that’s interesting.

For my sample I used the lyrics from Blue (1971), Court and Spark (1974), and Hejira (1976). They’re among her most critically and commercially acclaimed, as well as being ones that I happen to have an intimate familiarity with. I also liked, for no real reason other than shape, that there was another album released in between all of them.

The most commonly used words are not very interesting, although they are perhaps a good lesson in the geography of Mitchell’s music which is personal (I’m 55 times), descriptive (like 54 times), qualifying (just 43 times), and romantic (love 39 times). There are also words whose high counts are initially striking, but which mostly appear due to the repetitious nature of songwriting. I was initially struck by the frequent occurrence of green, before I thought to check and see how many of those instances came from the song Little Green (9 out of 11). If Mitchell were more prone to choruses instead of the single-line refrains she gravitates towards, I imagine this would be even more noticeable.

Most immediately striking to me was the high count for oh (37 times), an utterance that is potentially not a part of the author’s written lyrics, and is instead a translation on the part of the transcriber. These ohs illuminate the potential problems of text mining, specifically the lack of easily represented content, but they are also illustrative of the ways this tool – perhaps accidentally – has the potential to shine a spotlight on something that may have otherwise gone unnoticed. If I had been asked to pick a word to represent these albums I may have chosen sad (5 times), ambivalent (0 times), or aching (0 times), but oh embraces all of these terms without excluding joy (“oh love can be so sweet”), humor (“oh you’re a mean old Daddy”), or the refrain of wistfulness that runs through River (“oh I wish, 5 times). It’s not so much as a placeholder between the words, an exhale of yearning, but it was a sweet jolt of recognition to see it writ so large in Voyant’s word cloud.

Interestingly, what I found more meaning in than the frequently repeating words was the long list of words that were only used once or twice. These are mostly nouns and adjectives, and taken together it is their immense variation, their lack of frequency, that best illustrates Mitchell’s versatility and specificity as a songwriter. There is a pleasure in scrolling through the list and coming across the word pachyderm, for instance, and remembering that she used it as a bizarre euphemism in Blue Hotel Room. The context is still missing. Vain (1 use) is not particularly exciting to me, but used in combination with darling (4 uses) it transforms both words into something more intimate and specific. Perhaps the ability to identify word-types or parts of speech would help with this. If you could search for all nouns with their modifiers, for instance. But even knowing what was missing, I was surprised at the strong emotional response I had just to looking at a list of words in alphabetical order, and at the strong sense of person I got from beautiful stretches like shanty, shadows, shades, sex, settled or pushed, punishing, punched (all 1 use each).

I think the next step that I would take with this exploration would be to widen the scope to other artists, and see if the same patterns emerged, the same things felt interesting to me. How would the early work of John Lennon look, with its much more formal genre constraints? What about a writer like Taylor Swift, who blends the personal and observational qualities that Mitchell uses within a pop-music structure?




Mining Indignados’ Blogs

Posted by Pedro Cabello del Moral on


Text mining of the most relevant blogs and websites about Spanish “Indignados” movement using Voyant-Tools.

Selection criterion: Official websites of the movement compiled in https://movimientoindignadosspanishrevolution.wordpress.com/paginas-web-oficiales-del-15m-en-espana/

Official means, in this case, the websites that came directly from the organization (squares and other spontaneous places for meetings)

Defining a corpus is not easy; that’s why it is useful to build upon a previously made selection.

Purpose of the experiment: Search for the most common buzzwords or theoretical concepts related to the movement and gathered in its blogs and websites. They will be extracted from the list of terms within the first 250. Words that are not relevant for the experiment (as verbs; prepositions; or names of people, places, months or days) will not be taken into account.


First of all, it should be pointed out that some of the websites and blogs are not active anymore (perhaps they can be retrieved using internet archive). Moreover, some of the domains belong now to other business or even to other countries. Last but not least, other sites are blocked because of their political content (especially in the U.S.).

Voyant-Tools proves unable to move all the corpus (perhaps due to the abovementioned limitations). Then I decided to conduct the experiment splitting the corpus in chunks. That option did not work either; so, I opted for extracting the malfunctioning links from the corpus. It rendered the same error. Finally, facing the impossibility of analyzing a comprehensive corpus, I took the decision to include only 20 websites, which would represent some of the biggest cities of Spain. After doing that, I was able to visualize the data.

Interpreting the results

Some of the most abundant concepts were Assembly, Consensus, Commission, Share, Proposal(s), Camp, Twitter, Facebook, Minutes, and Plaza. Other buzzwords  in the list of the 100 more used were Evictions, 15m, Compañeros, Demonstration, Repression

When the link map is displayed the results that are shown are by no means contradictory. It seems obvious that Assembly is connected to General, Commissions and Inter. Likewise, it comes as no surprise that Share is associated with Facebook, Twitter, Blog, and Pinterest.

Looking at the bubble lines, one can unfortunately realize that the outputs are manly concentrated in a few websites (Bilbao, Soria, Salamanca, Gudadalajara, and Vigo); most probably as a result of being more text mining friendly. That could simply have occurred because of presenting more text in their home tab.


One of the unusual findings is that Share  is the most used concept (98 times) followed by Assembly (56 times). Thus, one can conclude that text mining tools applied to social movements’ archives help discover largely used words which give account of concepts that shape collective political subjectivities and that are propably unnoticed. For instance, quantitave analysis shows that Share not only has a practical use, connected to the idea of spreading the word through social media, it also portrays the idea of a sharing community.

It is clear that for a proper analysis, the selection and preparation of the corpus is fundamental. The rendering of the data is affected by minimal changes on the corpus. Perhanps my modest experiment with Voyant cannot serve the purpose of generalizing assumptions. In this respect, the relationship between signals and concepts was not fully developed and appeared as weak in my case study. However, signals such as Consensus, Assembly, Share, Proposal or Plaza configured a linguistic map around certain concepts that are definitely related to the new political subjectivity of the “Indignados” movement. Some of them were evident, other surprising, and many remain undiscover, waiting for future reserarches.

Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar