Last week, one of our favorite things (the library) teamed up with one of our favorite radio shows (The Moth) using one of our favorite technologies (speech-to-text) to grow a taxonomy of storytelling. We’re elated!
If you’re a fan of The Moth, you know it’s “storytelling, live, without notes” — a live event, radio show, podcast, and more that features real people telling stories on a stage. Stories range from raw and tragic to deeply meaningful and wildly funny.
Since launching in 1997, more than 20,000 stories have been told from Moth stages across the U.S., and now, throughout the world. In an effort to better curate their collection of stories and make them available to people with disabilities such as hearing impairment, The Moth set out to find a partner who could help create transcripts.
This led The Moth into a collaboration with The New York Public Library and a speech-to-text company called PopUp Archive. With a grant from the Knight Foundation, the three organizations collaborated on a very interesting strategy for creating and archiving transcripts of The Moth.
Turning Speech into Natural-Language Text
The problem with speech-to-text programs is, well, sometimes they don’t hear very well. In addition to basic language barriers such as accents, dialects, and fluctuations in speech, which occur all the time, Moth recordings typically take place in a bar where there is a lot of ambient noise, varying levels of audio quality, and other factors that can make a speaker’s words difficult for a machine to accurately recognize. This means that transcripts produced by PopUp Archive would need a good edit, and we’re guessing the Knight Foundation’s grant wasn’t going to cover editing expenses for 20,000 files.
The New York Public Library (NYPL) had a solution: Crowdsourcing. Turns out NYPL Labs had developed an interactive transcript editor called Together We Listen, which allows anyone to log on, listen to Moth stories, and assist with editing the transcripts as they go. Volunteers smooth out the transcripts by looking for misspelled words, or misplaced synonyms, putting words and phrases into the context that the speaker intended as they go. Volunteers are also asked to add tags to each file to make it easier to catalogue and categorize the collection.
A Taxonomist’s Dream Come True
We probably don’t have to tell you how much we love this idea. As people who sit in an office all day categorizing words and phrases, we’re thrilled to learn that people are out there volunteering their time to curate stories. The categorization element exemplifies the quiet work that taxonomists do every day in order to curate and maintain information.
The project also illustrates the value of classification on a very practical level. By transcribing Moth stories into text, the organization is making the stories more accessible to all of us, particularly those who may not have been able to enjoy Moth stories before they were available in print. And, because these stories represent such a wide array of topics and audiences, we’ll now be able to reach into the archive and pull out the categories or topics that we want to learn about. Additionally, The Moth will be able to use the catalogue in a number of ways to enrich its offering. For example, they can dig deep and find collections of stories to publish, such as stories from cancer survivors, or humorous stories that will make us laugh. They could pull together various collections to package and sell as educational tools, or deliver favorite celebrity stories online as sponsored content.
We’ve curated more than 450,000 terms, and we’re here to tell you, it can sometimes be a lonely business. Knowing that there’s a growing community out there that loves to categorize, curate, and organize information into a hierarchy makes us smile.
In June, the NYPL is hosting a hackathon and storytelling celebration. We’ll provide more details when we hear about it. In the meantime, be part of the solution by helping to curate these stories!