Annotated Bibliography – Big Data Impact on Literature

Clara Ramazzotti

Prof. Matthew K. Gold

Approaching to Digital Humanities, 4 credits

March, 6th 2018

Topic: Big Data. Their first impact on literary studies and how they could be used in Digital Humanities.

Thesis statement: Digital Humanities is a big category that includes several things not yet clearly definied. In this context, doubts and attempts about what Big Data could do in the Humanities seems to be very stimulating, in particular in the period between 2005 (Google Books launch) and 2017. I focused my research on sources from 2015 to the end of 2017. What I understood so far is that the great quantity of informations and digitalized materials seem to be the center of an intellectual struggle where scholars are asking each other whether the great “quantity” of information, interpreted in contrast with their “quality”, was what we need for a better job on literary sources. I found interesting the articles I listed because some of them provide some possible answers or explanation about the Big Data/Literature duo (Mazzola, Lei Zeng) or because some of them tried to understand how to use Data at its best, as business/industry/scientific researchers did.

Annotated Bibliography

Crane, Gregory. What Do You Do with a Million Books? D-Lib Magazine, 2006. http://www.dlib.org/dlib/march06/crane/03crane.html. Last access: 03/05/2018

The ability to extract from the stored record of humanity useful information in an actionable format for any given human being of any culture at any time and in any place will not emerge quickly, but the fundamental tools on which such a system would be built are moving forward.

Multiple choices for multiple options on the same text: this is what a digital library could do. Crane discussed in 2006 about how a digital library could work with a still valid critique on the way books and sources are reproduced (rooted in tradition of print, as merely recreation of a printed book into a pdf or HTML text online). Traditional researches were obviously much more limited in their ability to meet a particular needs, and of course computers need much less time to do that.

Ganascia, Jean-Gabriel. The Logic of the Big Data Turn in Digital Literary Studies Front. Digit. Humanit., 2015. https://www.frontiersin.org/articles/10.3389/fdigh.2015.00007/full. Last access: 03/05/2018

But, what does “big” mean for the Digital Humanities? A million, a billion, and a trillion bytes are small compared to the Terabytes and Petabytes that are usually considered as the standard for “big data.” In the case of Digital Literary Studies, the total number of texts that can be characterized as literary works, including novels, poetry, and theater, does not exceed a few million books, which has been seen characterized as a delimiting horizon.

Thanks to this article it’s possible to clarify where Digital Humanities started, proposing technologically equipped methodologies in activities where, for centuries, intuition and intelligent handling had played a predominant role. Big Data, in this context, become revealing of how these new approaches can be applied to traditional scholarly disciplines, such as Literature, and what digitization allows; also, they discuss the nature of the Humanities in general. In fact, the main value of the Big is that they can renew, with the use of computers, the Humanities.

Kaplan, Frédérick. A map for big data research in digital humanities Front. Digit. Humanit., 2015. https://www.frontiersin.org/articles/10.3389/fdigh.2015.00001/full. Last access: 03/05/2018

Will we learn more by analyzing 10 millions books that we cannot read individually or by reading five carefully (Moretti 2005)?

The author chose to analyze how humanist processed and interpreted data, explaining the differences between Big Data and Small Data: researches in Big Data usually mean a focus on large or dense cultural datasets, which call for new processing and interpretation methods. On the other hand, Small Data regroup more focused works that do not use massive data processing methods and explore also interdisciplinary dimensions linking computer science and humanities research.

Lei Zeng, Marcia. Smart Data for Digital Humanities Journal of Data and Information Science, Volume 2, Issue 1, Pages 1–12, 2017. https://doi.org/10.1515/jdis-2017-0001. Last access: 03/05/2018

“Data is the new oil” (Humby, 2006) has become a defining phrase used by many in recent years as the evidence became more and more convincing. “However, in its raw form, data is just like crude oil; it needs to be refined and processed in order to generate real value. Data has to be cleaned, transformed, and analyzed to unlock its hidden potential” (TiECON East, 2014). […] advanced technologies, under the umbrella of Big Data and Smart Data, allow researchers of the humanities to join the mainstream of the digital age with new abilities as never before.

I chose this article because it clarifies some issues on the usage of Big Data in the Humanities, and it explains in a very simply way the differences between Smart and Big (quality and quantity) and what we are probably missing to achieve a goal (Lei Zeng talks about technological issues, for example).
The author hopes a very positive evolution in DH’s use of “Smart Data”, the ability to achieve big insights from trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale, as the only kind of data that could gives value in this field. In a few words, the Smart Data approach is useful to transform unstructured data to structured and semi-structured data.

Mazzola, Roberto. Google Books e le scienze (post)umane Laboratorio dell’ISPF, XII, 2015. https://doaj.org/article/5c7d4a8edb5941c890c3d30f72a4f568. Last access: 03/05/2018

L’estensione agli studi umanistici di questo nuovo approccio alla realtà e alla conoscenza ha suscitato le resistenze di quanti hanno difficoltà ad accettare l’idea di ridurre un libro, un dipinto, un brano musicale ecc., a mero flusso di informazioni codificate.

This article is an interesting analysis of what kind of work was done in Europe (Italy, France, Germany) when Google Books came out. The essential questions “how we could use Big Data” and “how we could save our heritage” found some clarifications in this article. It’s relevant how the author explains the passage from a scientific method (hyphotesis – verification) to data/algorithms new era, with the possible consequence to see the “traditional humanist” as an old character without purpose.
The example of Google Books and the work made by the company to obtain a free open access to all the knowledge is really fascinanting, but it’s more like a library of snippets, using the words of Nicholas Carr, pieces of knowledge useful as second-hand quotes.

Moretti, Franco. Canon/Archive. n+1 Foundation, 2017, New York City.

On the basis of programming, much more becomes possible: from the refinement of the corpus to the analysis of initial results; from the review of the critical literature to the design of follow-up experiments. This functional division of labor, whose results no individual scholar could ever achieve in isolation, is clearly indispensable to modern research.

This book could demonstrate that everything is measurable and, thanks to technology, studied as in a laboratory. It’s not casual that the first thing very noteworthy in this series of pamphlets are all the ways a book could be investigated. The way a writer uses paragraphs, the number of character’s presences in a story, etc. Literary Laboratory is fundamental to understand how far could go the DH: a series of tests and experiments to obtain a result, not just electronical transfer from paper to usb.
It’s a fascinating source for this topic because a scholar could find a method, an attempt to do something relevant for DH,
but at he same time the honest perception that DH have presented themselves as a radical break with the past, with the paradox that, in a new approach, not everything has to be new.

Ramsay, Stephen. The Hermenutics of Screwing Around; or what you do with a million booksDigital Culture Books, 2010. http://dx.doi.org/10.3998/dh.12544152.0001.001. Last access: 03/05/2018

There was no way to ask, “Which of these books contains the phrase ‘Frank Zappa?’ ” The fact that we can now do that changes everything, but it doesn’t change the nature of the thing. When we ask that question—or any question, for that matter—we are still searching. We are still asking a question and availing ourselves of various technologies.

Are we reading always the same book? Are we too canonical and perhaps Big Data are the way to compare and read more than a percentual? These are the questions that came up with this essay with the perception that more digitalized books (or cultural products) doesn’t mean a change in the critic’s scholar himself (the deeply research on a topic or the comparision between sources), but a significant evolution in the way scholars could do that, using their time and their knowledge in more than one field/topic/location.

Rojas Castro, Antonio. Big Data in the Digital Humanities. New conversations in the global academic context AC/E Digital Culture Annual Report, 2017. https://hcommons.org/deposits/item/hc:11759/. Last access: 03/05/2018

We should begin by dismissing certain clichés about the humanities and ask ourselves about their classic objects of study, bearing in mind the methods that are currently available. This requirement is not unrelated to the work of humanists, who have always been in contact with other fringe disciplines such as anthropology, Marxism and gender studies.

Humanists have established a dialogue with computer studies, and humanists are working on several methods: in this article, the author hopes that literary studies and computer analysis can eventually reconcile.
In particular, I found it helpful because, as Rojans Castro explains, the classic definition of Big Data is a formula: Volume (Terabytes, Petabytes, Exabytes), Velocity (data that is constantly generated) and Variety (texts, images, sounds), and if we take the three Vs as a basis, Big Data don’t fit in the humanities. But in the literary academic context, the expression Big Data, as we know, is associated more to “distant reading” (Moretti, 2007) or “macroanalysis” (Jockers, 2013) and we could start from this new way to “read” cultural products.

Schuessler, Jennifer. Reading by the Numbers: When Big Data Meets Literature. The New York Times, 09/30/2017. https://www.nytimes.com/2017/10/30/arts/franco-moretti-stanford-literary-lab-big-data.html. Last access: 03/05/2018

[…] scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

In this 2017 interview to Franco Moretti, there is an interesting series of considerations, useful for this study. First of all, the literary criticism tends to emphasize the singularity of exceptional works that have stood the test of time, like the creation of a canon and its use, and also it considers the literature as a drastic evolution from a period (or an author) to another one.
What Data and computer analysis could do is to give us lab’s results that could unsettle established ideas of literary history.

Svensson, Patrick. Big Digital Humanities: Imagining a Meeting Place for the Humanities and the Digital. University of Michigan Press, Ann Arbor, Michigan, 2016. http://dx.doi.org/10.3998/dh.13607060.0001.001. Last access: 03/05/2018

The bigness of big data in the humanities may refer to the number of perspectives inherent in the material and the richness of critical inflection rather than the sheer quantity of data. In addition, the digital humanities has also come to be seen as a site for challenging and renewing the humanities and academy.

Every definition and example, from Google Books to Stanford Lab, gave the opportunty to think to DH throught different points of view. And this article adds another word to this “DH vocabulary” debates: engagement, something that a professional figure finds on the social media insights and during a strategic work.
So, DHs and Big Data could be seen also as a strategic work where
 the digital humanities engages with the digital as a tool, as an object of inquiry, and as an expressive medium.


Skip to toolbar