Gender and Big Data: Finding or Making Stereotypes?

In collaboration with Visualizing English Print and the Middle Modernity Group, the Digital Humanities Research Network is pleased to present the Center for the History of Print and Digital Culture Annual Lecture:

“Gender and Big Data: Finding or Making Stereotypes?”

Laura Mandell (Professor of English and Director of the Initiative for Digital Humanities, Media, and Culture, Texas A&M University)

On Friday April 15, NovelTM member Laura Mandell presented at U-W Madison on Gender and Big Data. On Tuesday April 26, there will be a response to her lecture:

Gender, Big Data, and Digital Humanities:
Responding to Laura Mandell
Tuesday, April 26, 2016
4:00-6:00 p.m.
Memorial Library Commons, Room 460
More information here


In his book Macroanalysis, Matthew Jockers argues that we have reached a “tipping point.” Now that we have so much digital data, we can use techniques and methodologies used to explore big data: text mining, topic modeling, machine learning, named entity recognition, etc. Two problems confront digital literary historians of women writers who wish to apply these methodologies. First, the number of women writers who published works before 1800 in Britain and America, as well as the number of their publications that have been preserved, is small compared to men, a problem compounded by how few works by early modern women writers are currently being digitized: roughly 4% of 307,000 volumes in the Early English Books Online and Eighteenth-Century Collections Online were written by women writers.  Second, many of the data analysts currently comparing what they call “female writing” to “male writing” propagate rather than interrogate stereotypes about women and women writers.

Sociologists have worked on such problems, and in this talk, I will outline some of their strategies and discuss how literary critics who wish to perform macroanalysis might make use of them.  Data scientists in the commercial world have worked on the problem of representing minorities “fairly” even when they are represented by a small sample. Thanks to the robust history of feminist theory and criticism, we have the means for generating vocabularies, taxonomies, and ontologies for semantic searching and supervised topic modeling that differ from those generated through big-data techniques that naïvely privilege historically oppressive discourses. Second, the need to shift from quantitative to qualitative analysis (and back again) is augmented when analyzing textual data produced by minorities. I argue that, once again, the concern for social justice enhances intellectual work by effectively demonstrating the inadequacies of claiming “new” discoveries based upon “statistical significance” alone.



Laura Mandell is a Professor of English at Texas A&M University, where she also directs the Initiative for Digital Humanities, Media, and Culture. Her 2015 monograph, Breaking the Book: Print Humanities in the Digital Age, explores the cognitive consequences and emotional effects of human interactions with physical books and reveals why the traditional humanities disciplines are resistant to digital humanities. She is also the author of Misogynous Economies: The Business of Literature in Eighteenth-Century Britain (1999)  as well as the editor of a Longman Cultural Edition of The Castle of Otranto and The Man of Feeling. Her article in New Literary History, “What Is the Matter? What Literary History Neither Hears Nor Sees,” describes how digital work can be used to conduct research into conceptions informing the writing and printing of eighteenth-century poetry. She is Project Director of the Poetess Archive, on online scholarly edition and database of women poets, 1750-1900,  Director of 18thConnect, and Director of ARC, the Advanced Research Consortium overseeing NINES, 18thConnect, and MESA. Her current research involves developing new methods for visualizing poetry, developing software that will allow all scholars to deep-code documents for data-mining, and, as part of the eMOP project, improving OCR software for early modern and 18th-century texts via high performance and cluster computing.