Recently, we talked a bit about the concept of topic intelligence: the ability to identify which specific subjects are mentioned within your text data. Aside from improved tagging and retrieval, topic intelligence can teach you about the behavior of consumers who interact with your content.
Usually in our examples, the user behaviors we discuss are positive ones: engagements, conversations, those actions that publishers and retailers seek to encourage. If you’re a purveyor of consumer content, keen awareness of that content’s many subject matters can help you figure out which topics resonate with your audience.
But what about those behaviors that organizations (and many users) find undesirable? Can you arm yourself against trolling, abuse — any of those actions which you may hope to discourage — by understanding, on a topic-by-topic basis, where those behaviors tend to occur?
This week, a handful of UK journalists grappled with just that question. Unusually for a news outlet of its kind, the Guardian turned the flashlight on itself — specifically, on the user comment threads that appear beneath most of its articles — as part of a series entitled “The Web We Want”. Comment sections in general are notorious for negative user interaction; the goal of the Guardian’s article was to spot significant trends that could identify when and where users might post abusive comments.
We decided to treat the 70m comments that have been left on the Guardian — and in particular the comments that have been blocked by our moderators — as a huge data set to explored rather than a problem to be brushed under the carpet.
The Research
The Guardian looked at comments that were labeled abusive and took note of the articles those comments appeared beneath. Researchers classified each article according to two different features:
- The gender (and other demographic characteristics) of the article’s author
- The article’s subject matter
The gender flag, it should be noted, was particularly striking. According to the investigation, “articles written by women got more blocked (ie abusive or disruptive) comments across almost all sections.” Unsurprisingly, articles written by straight, white males received the lowest percentage of blocked comments.
The examination of subject matter yielded significant results as well — and it is in this line of inquiry that we see the benefits of topic intelligence:
We also found that some subjects attracted more abusive or disruptive comments than others. Conversations about crosswords, cricket, horse racing and jazz were respectful; discussions about the Israel/Palestine conflict were not. Articles about feminism attracted very high levels of blocked comments. And so did rape.
The message is clear, if already intuitive: articles about divisive or turbulent subjects are more likely to draw abusive user comments.
It’s worth noting that the Guardian tags its articles with a fairly high degree of specificity — instead of simply relying on broad, traditional categories like sports and fashion (though those distinctions exist as well), the articles are tagged more granularly, with niche topics such as crosswords and horse racing. Even a sensitive subject like rape is tagged, because developing an awareness of subject matters means recognizing any topic, regardless of whether or not it’s pleasant to talk about. Because the Guardian organizes its content both comprehensively and with specificity, its researchers were able to develop a clearer understanding of the topics that are typically met with abusive comments.
The Response
So what do you do with this kind of intel? For one thing, publishers who can predict abuse on a topic-by-topic level are better equipped to moderate user behavior more efficiently, or even learn when to turn off commenting altogether:
The Guardian has already taken the decision to cut down the number of places where comments are open on stories relating to a few particularly contentious subjects, such as migration and race. This allows moderators to keep a closer watch on conversations that we know are more likely to attract abuse.
Beyond moderators, the Guardian’s journalists can also benefit from an awareness of troll-magnet topics. Style, tone, rhetoric — authors make innumerable decisions regarding how best to communicate with their audiences, and even if it doesn’t affect their overall approach to telling a story, knowing which subjects tend to draw abusive comments can inform that decision making with an eye towards clear and uncompromised communication. Even marketing and PR teams, working to put this content in front of new readers, can be more effective at their jobs with these kind of findings. After all, the last thing you want is for potential users to associate your content that’s “crude, bigoted, or just vile.”
Going Deeper with Topic Intelligence
Crucially, the report included links to a sibling article that provided some details on the researchers’ methodology. The comment section of this article (fittingly) contained constructive advice from readers on how this research could be strengthened.
For what it’s worth, we at eContext have the strongly-held belief that increased granularity always leads to increased clarity. Perhaps in subsequent investigations, the Guardian can go a little deeper into their topics to try and uncover even more helpful correlations between subject matter and negative user interactions. In addition, organizing tags to a hierarchical structure could foster awareness at different tiers of specificity: Which topics within “Technology” are most likely to draw abusive comments? In articles dealing with the Israel/Palestine conflict, are there certain co-occurring subjects that reliably inhibit unwanted behaviors? These kinds of questions and others require an even more exhaustive tagging system, one in which specific tags are nested under broader categories for increased flexibility and coherence.
Finally, it would certainly be useful to classify the abusive interactions themselves. Which topics tend to appear in blocked or deleted comments? Do those topics align with the referring articles’ content or do they tend to differ substantially compared to the comments deemed appropriate? As long as you have the means to structure and learn from this kind of information, we firmly believe that the more behavioral data you can attain, the more equipped you’ll be to make constructive decisions.
On the whole, we found the Guardian’s efforts to be a terrific example of an organization using topic intelligence to foster a more positive experience for their audience. We hope other outlets will follow this example and classify their inventory to gain a more comprehensive understanding of content.