Exploring crowdsourcing in the digital humanities

Crowdsourcing — in the original sense of outsourcing work to the crowd — has great potential in digital humanities work. Recent exploration of a number of crowdsourced projects has given me some ideas for my own work, as well as warned me about pitfalls.

The biggest pitfall is believing that asking the “crowd” to help with the work will mean less to do for those directly entrusted with a project. Anyone who has worked with social media management or community engagement knows that the care and feeding of the crowd demands great investment. For a project that asks the crowd to contribute time, energy, and intellectual capital, management of contributors is doubly important. It is not enough to enlist the public’s assistance for a project and then leave it to do the work. Depending on the scope and scale of the project, managers must be prepared to train contributors to do the work correctly. Once trained, contributors must be monitored to ensure that the work is being done correctly and efficiently. Plus, contributors should also be rewarded so that they are encouraged to continue the work. Although studies show that the motivation of contributors is not necessarily recognition, it is nonetheless an aspect of maintaining good relations with them.

Another pitfall — depending on the project — is believing that crowdsourcing will yield an actual crowd, rather than a handful of devoted contributors. For example, a study of the Transcribe Bentham project found that of all persons who registered, just 0.6 percent represented the most active core of contributors — a total of seven individuals. Remarkably, they contributed 70 percent of all transcriptions.1 A core group of contributors may mean less work for project managers, but if the scale of the project is large, it also may mean that not enough work is being completed.

Verifying the work of the crowd is crucial. Not everyone who volunteers their time to a project will be willing to acquire requisite skills, which means managers have to spend resources to make sure the work is of an acceptable quality. Some projects, such as the New York Public Library’s Building Inspector, use the crowd as part of the verification process by having contributors check the work of other contributors. This, in a sense, democratizes the verification process.

The benefits of crowdsourcing digital humanities work also vary according to the project and its goals. At its simplest, the Building Inspector project “gamified” verifying and correcting footprints of structures identified in digitized insurance maps of New York City. (The subtitle of the project website declares, “Kill Time. Make History.”) Almost no training is required of the crowd and it is able to jump right in to take part in the project. On the other hand, Transcribe Bentham asks contributors to learn some best practices in deciphering handwritten manuscripts and transcribing them to text that can work with XML. The demands of each project require the crowd to have a different set of skills. Which project’s crowd produces “better” information is not the question. Both garnered public support; both accomplished a goal of the project.

Perhaps the most important benefit of crowdsourcing a digital humanities project is the connection that is made with the public. By incorporating the audience in the production of knowledge, it is possible to convey the value of a project in a way that is far more meaningful than a press release that declares, “Look at what some researchers have done.”

I am exploring ways that crowdsourcing might help with a project that examines cultural production of the Latvian Baptist immigrant community of West Philadelphia. For example, fleshing out individual community members’ life stories beyond what can be discovered in the U.S. census and other public records could be accomplished by providing a crowdsourcing mechanism, although the “crowd” here would likely be limited to descendants. Another aspect of the project involves using and transcribing diaries written in English, Latvian, and German. Two of the languages I could handle, but for the third, I would first have to learn German. Would it better to ask the crowd to help?

1. Tim Causer, Justin Tonra, and Valerie Wallace, “Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham,” Literary and Linguistic Computing, 27:2 (2012), 119-137.