More social. More surveys. More browsing histories. More call center transcripts and customer service emails. More internal documents. A lot more internal documents.
The data gold rush that’s characterized the 2010s is built on the promise of better understanding, of discovering key intelligence that you would have missed without a sufficiently large pool of information. But that promise can’t be realized until all of that data can be cleansed and organized for sufficient interpretation. Until then, it’s just cluttering up your hard drive or cloud storage, not to mention your to-do list.
The interpretation hurdle is an especially trying one for organizations that deal in a lot of text data–web content, social conversations, you name it. Text data is inherently difficult to interpret on a large scale because language is idiosyncratic and relies heavily on humans’ ability to infer context.
Cue the rise of the text analytics tools.
Now, “text analytics” is a huge umbrella, encompassing such diverse services as sentiment inference, entity extraction, document classification, and more. Any of these solutions can be extremely helpful to an organization with too much text to read and interpret manually. But here’s the issue: while automated text analysis has been in the works since the 1950’s, its widespread use for business intelligence is still relatively new.
Simply put, organizations that can benefit from text interpretation services don’t always know what to look for. And if different text processing tools are conflated, you may end up paying for something that doesn’t suit your needs. How can you shop around effectively in such a young, niche market?
For starters, let’s look at a few sub-fields of text analysis that potential users should be able to clearly differentiate:
- Entity Extraction
- What it does: Entity extraction tools identify named entities such as people, places, organizations, etc. Sophisticated entity extraction tools can append key details based on information contained in the analyzed text. For example, a capable entity extraction service can ingest a sentence like, “We’re proud to announce that Fleming Scrobble will be taking on the role of Chief Science Officer,” and correctly understand that ‘Fleming Scrobble’ is a person with the job title ‘Chief Science Officer’.
- How it’s used: Entity extraction can help companies build, correct, and maintain an organized knowledge base. By learning language patterns to identify what kind of information is being communicated, entity extraction can be used to fill out your CRM, establish connections between separate organizations, or automatically hyperlink that date/time in your iPhone message to queue up a quick calendar invite.
- Sentiment Analysis
- What it does: Sentiment analysis tools try to identify the emotions conveyed via text. Basic sentiment analysis understand individual words or phrases as being positive, negative, or neutral. More robust options may seek to identify more specific emotions (such as excitement or anger) or consider sentence/document structure to improve overall accuracy.
- How it’s used: Obviously, identifying sentiment can be vital from a market research or customer service perspective. Especially when combined with other text processing tools, sentiment analysis can help businesses keep their customers happy to capitalize on competitors’ pain points.
- Language Recognition
- What it does: This one’s pretty self explanatory.
- How it’s used: For one thing, understanding the languages that a given group of users are using can be helpful in its own right. More importantly, none of your other text analytics is going to function properly if your software is built to recognize English language patterns and you’re feeding it data in Korean.
- Topic Classification
- What it is: Unlike the other text products mentioned here, topic classification helps you understand what high-volume content is actually about. Classification tools that recognize different subject matters can utilize machine learning, pre-defined vocabulary rules, or a combination of the two. The more robust your topic classifications service is, the more granularity it will be able to distinguish. (Full disclosure: this one is our specialty.)
- How it’s used: broadly speaking, there are two main benefits to employing topic classification. The first is organization–the more accurately and specifically you’re able to identify content, the better you’ll be able to group like with like, or to retrieve what you need from a massive document inventory. The second benefit is that, if you’re able to accurately understand the frequency with which different topics are mentioned in your data, you can draw important qualitative information on what users are sharing, reading, clicking on, etc. This has massive implications for marketers, UX designers, content creators, and plenty of other professions.
Again, the problem with grouping all these services under one broad banner–“text analytics”–is that they can sometimes become conflated. If a digital publisher needs a topic classification system to automatically tag its articles, it’s no good to utilize an entity extractor and call it a day. Just because you can call something “text analytics” doesn’t necessarily mean it’s the solution that’s right for you.
On the other hand, sometimes text software is broken out into different tools, when they’re actually performing very similar functions. Read Part II of this post series to find out more about text classification.