DHC Weekly 1/18/19: text analysis and JSTOR

January 18, 2019
 
Hello Digital Humanities fans, and welcome to part two of Things Sylvia Learned at the MLA Conference! This week I want to talk about a new text analysis tool that JSTOR is developing, and the ways it might broaden the idea of a “text” to include the critical thought that surrounds it, as well as destabilizing that critical thought’s location within a static discipline or field.
 
Perhaps you, like me, use the JSTOR website regularly without having noticed the “Tools” tab, hiding out next to “Advanced Search” and “Browse.” Last week at the MLA, the kind people staffing the JSTOR booth were good enough to give me an introduction to the tools in beta that are already accessible and useable on the site. One of these tools, the Text Analyzer, which allows a user to search JSTOR by uploading a document, which an algorithm digests and turns into search keywords, has been giving me a little trouble (these tools are still in beta!) but while I navigate the troubleshooting process, I want to talk about the other new tool, the JSTOR Understanding Series.
 
I also won their t shirt raffle, but I swear I was going to write about these tools anyway, this is NOT t-shirt-sponsored content
 
The JSTOR Understanding Series takes (public domain) texts in their entirety, and displays them with a little number next to each line - this number is the number of articles on JSTOR that cite or quote that line. So a pretty famous section of Romeo and Juliet, for example, might look like this:
 
Text from the balcony scene from Romeo and Juliet is on the left, on the right a column of blue hyperlinked numbers corresponding to each line
 
Romeo’s line “she speaks,” perhaps unsurprisingly, is quoted only once, whereas Juliet’s much more famous “What’s in a name? That which we call a rose / By any other word would smell as sweet,” has a significantly higher count, with 78 articles quoting the full line and 51 pulling just the pithier “a rose / by any other word…” Clicking on the numbers lets you see the articles that quote the line or passage:
 
On the left, the same text form Romeo and Juliet as the last image, on the right a dialogue box with the information of the one article quoting the line "she speaks"
 
The value of this tool is multifaceted. I, for example, would have loved to have known about it when I was writing my senior seminar paper as a Barnard English major. The assignment was not only to present a holistic reading of a play by Shakespeare, but also to situate that reading within the existing field of criticism, a difficult task considering that the field of Shakespeare scholarship has had 400 years to become a dense and sprawling forest in which it is remarkably difficult for an undergraduate to find their way. To search, say, “Antony and Cleopatra” on a library database or JSTOR is to uncover far more reading material than is possible to comb through, even for a thesis paper. With a tool like this, however, a student can allow their own reading of and interests within a text to guide them towards the relevant critical material.
 
The JSTOR Understanding Series tool effectively broadens the idea of “the text”, at least in the context of critical writing, in which the text could somewhat loosely be defined as the material under the lens of thought, to include the existing scholarship around it. One could, of course, argue that this broadening is already implicit in an assignment that asks a student to engage with existing literary criticism, as that student is hardly bringing different skills or attitudes to their close readings of the source text than they are to the critics - fundamentally, the work of interpretation needed to advance a reading of a play is pretty similar to the work of interpretation needed to agree or disagree with an argument.
 
But the JSTOR Understanding Series goes beyond the idea of transferable skills and actually creates, digitally, a composite text in which it is possible to read the source and the critics more or less simultaneously. This is true even if you never open up and browse the list of citing articles - suddenly, a line’s relative weight in the world of scholarly writing is immediately evident, replacing the line number as the most visible and immediate organizing principle scaffolding the way the text is read. Romeo and Juliet’s reception is as present on the page as its poetry is. And once you start clicking those hyperlinked citation numbers, the relationship between each line of the source text and the article that cites it is so immediate as to all but erode the barrier between primary text and secondary critic altogether.
 
The barrier between source and criticism is not, however, the only barrier being eroded; the JSTOR Understanding Series compiles every article that quotes a given line, its algorithm is heedless of disciplinary silo-ing or relevance. The result is that the critical thought being welcomed into the realm of “source text” is dizzyingly interdisciplinary. For an example, let’s take a look at those 51 citations of “a rose / by any other word would smell as sweet.”
 
Screencap of a dialogue box saying "51 articles quoting the selected passage" and two of those articles -- from a biology journal and a law journal
 
A glance at just the first page of results for this line reveals, among other things, two papers from science journals about literal plants, a paper from a law journal, an article called “What’s in a Name.com?: the Effects of Name Changes on Stock Prices and Trading Activity”, an article on a Washington Irving-penned satirical periodical that once quoted the line, as well as several of the expected papers on the play itself published in journals of English literature or drama.
 
Obviously, there are immediately apparent downsides to the catholic nature of the algorithm. If I were writing my seminar paper on Romeo and Juliet and wanted to quickly locate my own reading of this line vis-a-vis a body of scholarship I would be pretty frustrated at having to scan and discard articles that are not really about Shakespeare. I imagine this problem becomes far less irksome with lines that are much written-about but less culturally iconic. For example, the entries, numbering 20, for Romeo’s claim that Juliet, “hangs upon the cheek of night / As a rich jewel in an Ethiop’s ear,” turn up more of the expected articles in Shakespeare Quarterly and The English Journal, as well as books tagged with the keywords “Black feminist theory”.
 
However, even when the algorithm is casting its most frustratingly wide net, it still reveals something productive about the nature of Romeo and Juliet, namely, that this is a text that can expand to encompass not only a critical context of scholarly thought but a cultural context of interdisciplinary allusion. Shakespeare is not just Shakespeare with the JSTOR Understanding Series. Rather, the text of Romeo and Juliet must in some way include the botanist making a point about the way plant species are named and the lawyer writing probably extremely intelligently about some stuff that I tried to skim and summarize and could not understand one single word of.
 
I can’t even tell which of the words in this article’s title are proper nouns.
 
This blurring of critical and cultural contexts is actually made more visible by the JSTOR Understanding Series’ greatest limitation in these early stages of its operation - the extremely white, Western, and male nature of its corpus, which comprises pretty much solely works of classic American and British literature and thought like the complete works of Shakespeare, the King James Bible, the United States Constitution, and so on. As of right now, the only works you can read on JSTOR in this digitally-dual form are the works of the Western canon, so, the works with the most culturally ubiquitous language that are the most likely to be used as epigraphs or brief allusions by writers in any and all fields.
 
I imagine the Understanding Series is mostly being thought of the way I pictured my senior-thesis-writing self as using it - as a way to conduct very specific research tailored to the places in a text that the researcher is already invested in. Or, perhaps, as a way of telling at a glance which parts of a text have been historically been the most contentious or placed under the most scrutiny. I, however, find the Understanding Series most interesting and compelling for the ways it destabilizes the idea of a text altogether. Changing technology changes the way we read (think, for example, of the idea of a stable and immutable text of a poem only taking root as privately-circulated handwritten copies stopped being the primary form of publication). Tools such as the JSTOR Understanding Series seem to me to be technologies with the potential to change how we think, relate to, and handle texts, as readers and as scholars. To online editions of highly allusive works like Ulysses or Moby Dick that allow for new levels of simultaneously presented text and contextual information, the Understanding Series promises to add a whole new layer of text-as-text-plus. And I, for one, am excited to see what kinds of readings will follow!