Comprehensive topic classification–the kind that can distinguish between homonyms and understand slang–is an intricate business. Different topic classification systems usually leverage some form of machine learning to improve fill rate and accuracy, but the best solutions will always rely on human curators who can inject their own common sense into an automated, scalable process.
At eContext, our curators write unique vocabulary rules for each topic in our subject matter hierarchy. These rules govern the majority of eContext’s topic identification procedures; they’re what make our system so competent in classifying social text, video transcripts, and other kinds of unstructured content.
To provide a bit more information about the methods and value of our unique classification, I spoke with Brad Watson, eContext’s Director of Product, who was able to shed some light on the day-to-day processes (and challenges) associated with the curation team’s work.
Why use vocabulary rules anyway? If I want to see what websites or social posts mention shoes, can’t I just search for “shoes”?
Vocabularies are important to help distinguish categories that may seem similar but can be very different depending on the context. For example, if you are interested in Beyoncé’s “Lemonade” album, you are not necessarily interested in Country Time Lemonade. So we try to sort out the differences to make it easier for our clients.
Also, people can use a lot of different words to mean the same thing. For instance, if you’re trying to study social media behavior across the US and UK, it’s helpful to know when “trainers” means the same thing as “sneakers”. Vocabulary rules help you understand what people are talking about, even when different syntax makes things challenging to understand or group together.
Speaking of “Lemonade”, how do you stay up to date with all of the new products and concepts that have emerged over the past few years?
We keep up to date by monitoring trending topics as much as we can. Over the years we have amassed a wide array of industry news sources. We also have internal tools that help us monitor popular hashtags and other social data. The thing I’ve learned from working in this space is that it is constantly changing and evolving. So we have to be ready to do the same but it definitely keeps things interesting.
Do you have a particular area of focus? How do you determine which curator works on which subjects?
I spend a good portion of my time keeping up-to-date with Arts & Entertainment topics like Musicians, TV Shows, Movies & Celebrities. This is something I enjoy in my free time too so that is a big reason why I love the work that I do. We try to assign projects based on our curators’ interests and knowledge base whenever possible. It doesn’t always work out that way. But I’ve learned to keep an open mind about each project. Often times, breaking out of your comfort zone and learning something new can be fascinating.
What’s the hardest topic you’ve ever written vocabularies for?
Lately, the most difficult topics are generically named TV shows and movies. A recent example is the comedy series on Netflix called “Love”. As you can probably imagine, it is difficult to write accurate vocabularies for a show as generically named as that. It can be difficult to write vocabularies for music artists as well. Two recent examples are the new music artists Tiffany (K-Pop Singer) and Belly (Rapper). Not to be confused with Tiffany (80’s Pop Singer) and Belly (90’s Alternative Band). Sometimes I wish music artists would at the very least alter the spelling of their names so they’re somewhat easier to differentiate (thank you, CHVRCHES). It seems like it would be in their best interest to do so anyway to avoid confusing their fans.
If you’re interested in seeing more examples of our curators’ work, feel free to check us out at econtext.ai/try to sign up for demo credentials and test out some text. Thanks, Brad!