Using digital tools for text analysis, mapping, networks

During my studies in digital public humanities, the past three weeks have been spent examining the hows and whys of using various tools that can help researchers make sense of data. The specific applications we have worked with are Voyant Tools, CartoDB, and Palladio. Each has its strengths and offers ways to understand data that can foster new questions for the researcher.

Our introduction to these tools came through analysis of the Library of Congress collection “Born in Slavery: Slave Narratives from the Federal Writers’ Project, 1936 to 1938.” Using Voyant Tools, we performed textual analysis. CartoDB allowed us to map the data. Palladio was used to create network visualizations.

For me, the biggest lesson was how the different tools allowed exploration of different facets of the same data. The “Born in Slavery” collection includes some 2,300 interviews with former slaves. Decades ago, a scholar examining the interviews might have performed a “close reading” of the text to derive meaning from the stories. While deeper analysis certainly was not impossible, it would have required a great investment of time and, consequently, expense. With the development of various digital tools, it is now much easier and cost-effective to look at these interviews differently. Through textual analysis, for example, a researcher is able to get a sense of how frequently certain words or phrases appear in a corpus such as the interviews. This might lead to questions about what topics were important to the former slaves. Mapping applications enable linking the interviews to specific places, which might provide a sense of the importance and understanding of geography among the subjects. Network visualizations allow seeing connections that might otherwise be obscured in the text. I was especially interested to see how the frequency of certain words in the corpus differed between male and female former slaves, which raised a question about gendered meaning.

Voyant Tools, CartoDB, and Palladio are not perfect tools, nor are they meant to provide all the answers — if any at all. Like any other tool, consideration must be given to which one is right to use given a particular problem or challenge. More importantly, these tools are means to allow the user to generate new questions, to see through the data, to paraphrase the Voyant Tools motto. Stephen Robertson, in a detailed explanation of the work behind the Digital Harlem project, observes how mapping data about daily life in a specific New York district “prompted questions I might otherwise have ignored and facilitated comparisons that I would not have considered.”1

Another exciting aspect of the three digital tools is that they make data accessible to the broader public, allowing humanities scholarship to reach an audience beyond the academy. This is a critical component of digital public humanities. In an era when the role of and investment in the humanities is being questioned across universities, exposing projects to public scrutiny has taken on urgency. As the creators of the ORBIS project at Stanford University have noted, they did not expect that digitally modeling land, river, and sea travel during the Roman Empire would engage an audience beyond a few dozen academics. They were wrong.

Learning about these tools also has informed my teaching in a journalism program. I have already shown students some of the tools, such as the ability to create trend lines of coverage in The New York Times, which resulted in an interesting discussion about why the frequency of certain words or phrases changes over time. Tools like this may help a journalist develop new story ideas that might not have been obvious.

1. Stephen Robertson, “Putting Harlem on the Map,” Writing History in the Digital Age, accessed October 30, 2016,