By Ted Underwood and David Bamman
1500-word abstract of a paper delivered Sat, Jan 9th, at MLA 2016, in a panel with Deidre Lynch and Andrew Piper.
By visualizing course evaluations, Ben Schmidt has reminded us how subtly (and irrationally) descriptions of real people are shaped by gendered expectations. Men are praised for being funny, and condemned for being boring. Women are praised for being helpful, and condemned for being strict.
Fictional characters are never simply imagined people; they’re also aspects of novelistic form (Lynch 1998). But gendered patterns of description do appear in fiction, and it might be interesting to know how those patterns have changed. This also happens to be a problem where natural language processing can help us, since English pronouns have grammatical gender. (The gender of “me” is a trickier problem; for the purposes of this paper, we have regretfully set first-person narrators aside.)
We used BookNLP (a pipeline developed in Bamman et al. 2014a) to identify characters and the words connected to them. We applied it to 45,000 works of fiction distributed (unevenly) over the period 1780-1989. (The works themselves were partly drawn from HathiTrust and partly located at the Chicago Text Lab.) BookNLP does make errors (Vala et al., 2015), and any analysis on this scale will miss a great deal that is implied rather than said. But readers are so interested in character that it may be worth putting up with some gaps and uncertainties in order to glimpse broad historical patterns.
We asked, first, how strongly characterization is shaped by gender, and how that pressure waxed or waned across time. For instance, if you didn’t have names or pronouns, or tautological clues like “her Ladyship” and “her girlhood,” how easy would it be to infer a character’s (grammatical) gender from the apparently-genderless verbs, nouns, and adjectives associated with her?
One way to find out is to train a model to predict gender just from those implicit clues, testing it against the ground truth established by pronouns. When we do this, a long-term trend is perceptible: the linguistic differences between male and female characters get clearer to the middle of the nineteenth century, and then slowly get blurrier, through at least the 1980s.
Read Ted and David’s entire abstract here